Monday, April 4, 2011

Flyweight pattern

Situation:
You have too many instances of a certain type and you are in desperate need of reducing the number of instances.

Flyweight pattern
You know that an object is a combination of both state and behaviour. As behaviours are attached to the class of the object rather than storing in the object itself (so that each object need not have a separate version of methods), the memory occupied by an object is proportional to the amount of state information stored by the object. Assume that in a text document, you represent each character as an object. In that case, if the document is lengthy, you will end up having too many objects. Flyweight pattern addresses this issue by decreasing the number of state information you store in the object. Instead of storing them as instance variables, pass the other state information as parameters to the methods.
E.g.
class MyCharacter {
  Font f;

  public MyCharacter(Font f) {
    this.f = f;
  }

  public void display(char c) {
    // Determine how to display the character in the specified font.
  }
}
Here, it is appropriate to think of character c itself having the details of its font. However, in that case, each character will have the same information in most of the cases and you might end up having so many objects in memory. Instead you have moved the font specific part alone to a separate class and now you can have a single instance for each type of Font and you need not store each character as a separate object thereby optimizing memory.

Advantage:
1. You can optimize memory.

Disadvantage:
1. It assumes that you are not special casing any of the objects, and all objects will behave identically.

Sunday, April 3, 2011

Builder Pattern

Situation
You are going to create a User class with data name, dateOfBirth, gender, height, and weight. Other than name parameter, all others are optional. In such a situation, the general amateur solution is to provide multiple constructors for the class and make the code clumsy. Builder pattern tries to address this exact problem.

Builder Pattern
The arguments that needs to be passed to a complex object is made another object in itself. As that object makes sense only for the class it is simplifying, it is made a static subclass.
public class User {
  private String name;
  private Date dateOfBirth;
  private String gender;
  private Integer height;
  private Integer weight;

  public static class Builder {
    // Required parameter
    private String name;

    // Optional parameter
    private Date dateOfBirth;

    public Builder(String name) {
      this.name = name;
    }

    public Builder dateOfBirth(Date dateOfBirth) {
      this.dateOfBirth = dateOfBirth;
      return this;
    }

    public User build() {
      return new User(this);
    }
  }

  private User(Builder b) {
    this.name = b.name;
    this.dateOfBirth = b.dateOfBirth;
  }
}

class TestUser {
  User u = new User.Builder("foo").dateofBirth(new Date()).build();
}

As you have seen above, you can declare a separate builder class within the specified class which sets all the parameters to default. Each option can be called separately and a separate build() method would create the required object. So the builder pattern simulates named optional parameters. Builder can also check whether all options are set right in the build() method.

Advantages:
1. Number of constructors become lesser.
2. If there is an additional optional parameter being added, builder needs very little change.

Disadvantage:
1. The darker side of Builder pattern is that the object creation becomes a bit complex.

Saturday, April 2, 2011

Distributed Hash tables

Hash tables
Hash tables are efficient data structures which leverage memory to obtain speed. All its operations are optimized for constant time. However, as everyone knows, the internal implementation of hash table decides how efficient the hash table can work. A poorly designed hash function will cause more collisions, causing a higher cost for both inserts and searches. Also, a wrong hash function might waste a lot of memory by not evenly distributing the keys into the slots. To get a clear picture of how to determine hash functions, you can use "Introduction to Algorithms" which gives a detailed study on this topic.

Problems with Hash table
I don't know a data structure which works for all problems in all situations. Any data structure has its own set of limitations. It is true for hash table also. Unlike other data structures like list, stack, queue, tree, etc the size of hash table is not determined by the amount of the data being stored. At the same time, if you want to increase the size of the Hash table, you have no option other than hashing all the keys into the newer table. Also, for a machine, the RAM size has some physical limits based on the machine's addressing capabilities. So in case the hash-table is maintained in-memory (as in most of the cases), the size that your hash table can occupy is has a fixed upper limit. What if you want to break that hard limit and go beyond that size?

DHT - Solution for a larger Hash table
There are so many reasons why you want a large hash table. As you might know, hash tables are very good data-structures for caching the results as it enables very fast look-ups. There is no way that you can go beyond the size limitations of a single machine. So if you want a larger hash table, the only option is to go for distributed Hash table, which means you should distribute your data across multiple hash tables in multiple machines. Many a times, scaling up is not a good solution. Scaling up means replacing the machine with a higher end machine which might have high power CPU, a large memory, etc. Scaling up is always costly and might be prone to single point failure. Scaling out is a better choice. It means, instead of having one machine, you maintain a bunch of machines which are of commodity type. They are cost efficient, at the same time, you should always accommodate fault tolerance in your architecture as you are using cheap commodity machines.

Architecture
Given that you go for a distributed hash table, how would you hash a key? The first step would be to identify where the key should go i.e. the machine where the key should be hashed to. Then as the second step in that machine, you should save the key using the normal hashing technique. So instead of applying a single hash, you should start applying 2 levels of hashes. The first to identify the machine and the second to save/look up the key in the specific machine you got in the first step. The architecture thus becomes natural. The duty of the first component thus becomes to identify which machine to use. The second component has a list of independent machines. Each are simple implementation of hash table and are ignorant of the fact that outside itself there are multiple cousins doing the same functionality.

Practical application
Memcached is a stunning implementation of the above said technique. It is a cache layer over the database and is expected to reduce the number of database look-ups. It is being used by a number of very active websites. Facebook is one such site which uses Memcached and has a hit-rate of around 99%.

Friday, April 1, 2011

Why this blog?

I am a guy who doesn't believe in Masters that you would get in big university. Because I have met people who din't have chance to study B.E/B.Tech due to monetary issues but still performing a lot (I mean, really a lot) better than most engineers I have met so far. So, does your degree make a difference? Yeah sure it does. When you enter into a new environment, you will widen your contacts knowingly or unknowingly. If you are lucky enough, you might also get a chance to see people from many cultures and you would socialize with them. Apart from that, all that makes the difference is what you do during those years than what degree that you finally get. Again, I would stress upon the fact that it is just my personal opinion. But, I am sure, corporates wont agree with me in this argument. I know so many good companies who prefer only people from prestigious institutions. Even though the candidates have a very good resume with 3 or 4 year experience, the institutions in which they did their Bachelor/Master degree has some influence in the selection process for sure.

Now, coming to the point... why this blog? This is just a diary than a blog. In the past, I have done the sin of not noting down anything that I have learnt. So, as the time flies and the memory fades, I have started feeling that I am turning into a wiped slate once again. Better late than never. So I thought I would record whatever I learn daily, restricting myself only to the softwares and related stuffs. My interest is to gain architectural skills, solve algorithms, effectively use data structures etc, and I would be discussing the same in this blog in the days to come.