Friday, May 6, 2011

Batch calls

Batch Processing :

Consider the two services mentioned below:
UserService - As mentioned in previous post, it is used to get a social network's user details.
FeedService - Used to get feeds information for the given user.It just queries "feeds" table.

For each feed, there should be a feed owner(the guy who posted that feed),
So in order to form a complete feed information, the FeedService has to call UserService to feed owner information(such as name,profile url,photo etc) of each feed.

Problem:
Since FeedService is pinging UserService for each and every feed, we ourself will be putting so much load on UserService.

Solution:
For each call to UserService, send a batch(say 100) of feeds and get the user information of those feeds at once. (like batch processing).

Pros:
1. Load on UserService is reduced.
2. Network latency is reduced by batch size times (here 100x times).

Cons:
1. We won't get 100 feeds always. Sometimes it might be just 10 feeds. So, there is a possibility that we might not be efficiently making call to UserService.
2. Extra care to be taken during service shutdown, and ensure that all remaining feeds in cache get processed before shutdown.

The above problem can be solved as follows -
Maintain a local cache of feeds in FeedService. Whenever the cache becomes full , then make a call to UserService and get the user information. Here cache size is nothing but batch size.

Sometimes, it is possible that few feeds can be stay in cache for long time, because FeedService is waiting for cache to get filled. To avoid such scenario, we can have some threshold time within which the feeds can reside in the cache. If cache has not filled even after threshold time, then make a call to UserService irrespective of no.of feeds available in the cache.

Though, this problem looks simple, in real world , care should be taken to identify and use batch processing  wherever possible.

No comments:

Post a Comment