Friday, 30 September 2016

Consistent Hashing - 2

Remember in consistent hashing, each shard has a big range of hash values to map objects. With this set-up data load may not balanced among different shards.

Instead of having one big range for each shard, what about adding marker to split the big range of hash values into smaller chunks and those smaller hash range will assign to different shards on different server. Thus helping with load balancing, but How?

Lets we have 20 objects to store and I still have 4 shards available each having different range of hash values equal in length. But may be out of 20 objects 14 are mapped to Shard 0 and rest are equally distributed to 1, 2 and 3. Unbalanced, Is not it? This is the case where smaller hash range can help a lot in load balancing.

Consistent hashing can use multiple markers for shards for mapping several smaller range of hash
values instead of one big range.

It will have two positive effects:

1. When the new shard is to be added, that shard will get more objects from all existing shards in the server, instead of just few from neighboring shards.Similarly, One of the existing shard is to be removed all objects that shard was holding will evenly distributed to all existing shards.

2. The overall distribution of data will be fairly even. - Helps with load balancing,

On my next entry, I will discuss another exciting design problem. Stay tune! See you soon!!



Latency Comparison Numbers - Every web developer should know

Latency Comparison Numbers

L1 cache reference 
 0.5 ns
Branch mispredict
5   ns
L2 cache reference
 7   ns                      14x L1 cache
Mutex lock/unlock  
25   ns
Main memory reference 
100   ns     20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy  
3,000   ns        3 us
Send 1K bytes over 1 Gbps network   
10,000   ns       10 us
Read 4K randomly from SSD*  
  150,000   ns      150 us          ~1GB/sec SSD
Read 1 MB sequentially from memory
250,000   ns      250 us
Round trip within same datacenter 
500,000   ns      500 us
Read 1 MB sequentially from SSD* 
1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
Disk seek
 10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
Read 1 MB sequentially from disk 
 20,000,000   ns   20,000 us   20 ms  80x memory, 20X SSD
Send packet CA->Netherlands->CA 
150,000,000   ns  150,000 us  150 ms


Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Credit
------
By Jeff Dean              
Originally by Peter Norvig

Consistent Hashing - Swift Rings

You remember our exciting startup FADU bakery shop. We have been working with great Desi brain :P. Recently our engineering team organised a DESIGNathon. It was amazing. Let's talk about what we have designed.

"A shard scheme for KEY-VALUE storage system"

Lets list out the features that our design would supports.

1. Amount of data to be stored - 100TB. Data will keep growing with the rate of 1TB/day.
2.Machine storage capacity - 72GB RAM and 10TB Secondary storage. Bare Minimum machine needed to store 100TB data would be 10.
3. We will begin with 20 Machines and machines will be added in future.
4. All key value data will be independent and query will be based on key.

Observations:

The data grows at 1TB/day.  This implies we are genrating data that would feel the storage of one machine(10TB) in 10 days. We try to keep storage utilization <=80. Under this assumption we would add new machine every 8 days.

One way for shard is that the data within the shard will fit on single machine only. In our case. data is growing with faster pace. In this  case if we have fix number of shards, data within the shard will keep growing and exceed our machine capacity ie 10TB mark. So in our case shards will have to increase with time.

Sharding strategy :

Lets begin with fix number of shard S. For every key calculate the Hash value of the key using some hashing mechanism and store the data to shard corresponding to shard number H%S. Where H is the computed hash value of the data key and S is the number of shards. But in our case data is growing faster and we will have to increase number of shard as data grows. So in this case the new Shards count becomes S+1. The H%(S+1) changes for every key causing us to relocate each key data in our data store. Duh! this gonna be extremely expensive and undesirable :(.

Do we have better strategy to avoid this undesirable behavior ?? How we handle the case when a shard goes down and we need to redistribute the data on remaining shards???

Yes, Consistent Hashing will rescue us. So, What the heck is the consistent hashing?

Consistent hashing maps data to same shard, as far as possible. It means when a new shard is added, Its takes its share of object from all other shards and when it removed its objects are shared among remaining shard.

Consistent Hashing based strategy:

It also hash the object key but instead of geting the mapped value for each object each shard is assigned a range of hash values to store the objects. Now we will map each shard to range of hash values.

Range of hash value for each shard:( Below is just example, Hashvalue are generally bigger)

Shard
Range of Hash Values
Shard 0
0000…~3fff…
Shard 1
3fff… ~7fff…
Shard 2
7fff…~bffd…
Shard 3
Bffd… ~ffff

Mapping of objects to different shards.

Object
Hash Value
Shard Mapped To         
Music 1
4b4s…
Shard 1
Music 2
Ecb2…
Shard 3
Music 3
5082…
Shard 1
Image 1
B5e7…
Shard 2
Image 2
9433…
Shard 2
Image 3
1213…
Shard 0
Movie 1
69db…
Shard 1
Movie 2
C4ab…
Shard 3


Diagrammatically we can see this as follow:




Now if I add additional shard the only change, each shard will get a new range of Hash values. Any object whose hash value is within the range of current shard will remain in same shard. For objects whose hash value is not within the range of its current shard will be mapped to another shard. But the number of such objects are very few using consistent hashing algorithm compare to basic hashing algorithm.


Lets add another shard. The below diagram re-illustrate the point.



Notice How only Movie 2 and music 2 are mapped to new Shard4 and image 1 need to mapped to Shard 3. If we used basic hash mapping then we would most likely have to recalculate the mapping for all objects and remap them accordingly. Imagine how expensive that is for million of objects.

On my next entry, I will talk about some improvement over this design. Stay tune ... :)