Sunday, May 8, 2011

Distorting a Binary Tree

We know that there are 3 different ways to traverse a Binary Tree i.e. InOrder, PreOrder, and PostOrder. Let us try to ignore other traversals like LevelOrder etc for now. Each traversal has its own significance and you can never think of one traversal technique as a replacement for the other. I am going to talk here about which traversal suits better when the tree is getting distorted. I will be taking 2 examples here and proceed.

The links to left and right subtrees are as important to a tree as the next pointer for a linked list. These are all the pointers/references that keep the structure intact. Once they are lost, there is no way to recover them once again. So when it comes to distorting a tree, I would always prefer PostOrder traversal where you don't touch the current node till you are done with the sub-trees.

Deleting a Binary tree
Consider for example the problem of how to deallocating an entire Binary tree? You cant start with the root node of the tree, as if you do so, you have no way to go to the left and right subtrees. So if you deallocate the root first, you cant proceed anymore. Similarly in case of InOrder traversal, you cant deallocate the right subtree. So the most suitable procedure involves deallocating the left and right subtrees first and coming to the root at last.

Binary tree to Doubly Linked List
Now consider the problem of converting a Binary Tree into a doubly linked list where the left pointer of the tree should be made to point the InOrder predecessor node and right pointer should be altered to point the InOrder successor node.
Though the problem talks about InOrder arrangement, you should not start tackling the problem with InOrder traversal. Because as we discussed in case of delete operation, any operation that distorts the tree is best done by the PostOrder traversal.
So, you need to write a recursive procedure to convert a tree to a Doubly Linked List. You can recursively apply the procedure to the left sub-tree and get a leftDll and separately to the right sub-tree to get a rightDll.
Now, you need to form a single DLL from the leftDll, the current root node, and the rightDll. So you need to attach the last node of the leftDll to the root, and then attach the root to the rightDll's head. However, finding the last node of leftDll is O(n) operation. In order to minimize that, instead of returning a Dll, always return a circular DLL so that once you have access to its head, it is just a O(1) operation to get its tail.
Now, we have discussed that circular DLL is the best data-structure to be returned. But what exact node would you return? Remember that you expect the left subtree to return a circular DLL which is arranged InOrder. Now to preserve the InOrder property, you would want to attach the current root to the end of the Circular DLL. Similarly you would link the circular DLL that you have obtained from the right subtree and form a single circular DLL. Now you should return the head node in other words, the node returned by the left subtree's iteration if present or else the current node to maintain the invariant.

Friday, May 6, 2011

Batch calls

Batch Processing :

Consider the two services mentioned below:
UserService - As mentioned in previous post, it is used to get a social network's user details.
FeedService - Used to get feeds information for the given user.It just queries "feeds" table.

For each feed, there should be a feed owner(the guy who posted that feed),
So in order to form a complete feed information, the FeedService has to call UserService to feed owner information(such as name,profile url,photo etc) of each feed.

Problem:
Since FeedService is pinging UserService for each and every feed, we ourself will be putting so much load on UserService.

Solution:
For each call to UserService, send a batch(say 100) of feeds and get the user information of those feeds at once. (like batch processing).

Pros:
1. Load on UserService is reduced.
2. Network latency is reduced by batch size times (here 100x times).

Cons:
1. We won't get 100 feeds always. Sometimes it might be just 10 feeds. So, there is a possibility that we might not be efficiently making call to UserService.
2. Extra care to be taken during service shutdown, and ensure that all remaining feeds in cache get processed before shutdown.

The above problem can be solved as follows -
Maintain a local cache of feeds in FeedService. Whenever the cache becomes full , then make a call to UserService and get the user information. Here cache size is nothing but batch size.

Sometimes, it is possible that few feeds can be stay in cache for long time, because FeedService is waiting for cache to get filled. To avoid such scenario, we can have some threshold time within which the feeds can reside in the cache. If cache has not filled even after threshold time, then make a call to UserService irrespective of no.of feeds available in the cache.

Though, this problem looks simple, in real world , care should be taken to identify and use batch processing  wherever possible.

Thursday, May 5, 2011

Push and Pull patterns

After exploiting the SOA environment in Amazon, I like to bring few patterns (you can say, principal's) observed in SOA , which everybody should be aware of, before designing the actual web services in your architecture.

Pull pattern
 Consider typical environment in Facebook,which has the below two web services.
 1.  UserService - Gets the Facebook user details like id, name, public profile url etc. The job of this service is to fetch user details from the underlying tables. Remember that the detail need not be present in a single table, but could be spread across multiple tables. This service aggregates all the details and provides a single API call to get all details of a particular user.
 2.  BotService - This service takes responsibility of updating the search engines indexes periodically whenever a user changes his profile information. So it contacts the UserService periodically to find the users who have updated their profile in the recent past, and based on their profile information the search indexes are updated.
Current Approach
BotService is a following Pull pattern in the above discussed model. It is the responsibility of the BotService to continuously get the user updates from the UserService. UserService takes no pains to send updates to BotService.

Problem with Pull Pattern
The general problem with Pull model is that there might be unnecessary fetches from the puller service, particularly when the the number updates the needs to come from the source is very minimum. In the above said service, assuming that a user will very rarely change his profile details, BotService will add unnecessary traffic on the UserService by continuously pulling details as at most of the times, UserService might have no updates to send at all.

Going with the Push Pattern - Don't call me, I will call you".  
Instead of UserService being called by BotService, let's make BotService being called by UserService. Since UserService knows whenever a user updated public profile URL, it will call the BotService informing that the particular user has updated his/her public profile URL. UserService can be made intelligent to batch the calls to the BotService to push the updates. In case latency is an issue, timeouts can be configured in the source by the end of which it is supposed to send updates.
The Rule of thumb is that the rate at which you get the data and the timeliness at which the data should be processed should guide you on which one to choose - the Push or the Pull.

Many more interesting SOA patterns to come.
Cheers,
Inba