Your own High available Database

This is the continuation of distributed system design. In this entry we will discuss, design of high available distributed key value database.

Before going into the design details, It would be awesome to note the features which our system should support.

List of Features :

1. Amount of data to store : 100TB 

2. CRUD(Create, Retrieve, Update, Delete) operation on data. With update operation the value for particular key may grow in size in such a way that sequence of keys co-exist on one server previously, but there value grew to a size where all of them do not fit on a single machine.

3.Upper limit for the value is 1GB in size.

4. Query processing speed - 100K/S

5. High available, low latency, partial tolerance key-value database.

Obeservations:

Lets say we have 10TB hard disc machine available. We would need minimum 100TB/10TB = 10 Machine to store the data without replication. In real life we replicate the data to multiple server for fault tolerance and in that case we would need more than 10 Machines.


Design Strategy:

To scale the system and handle all the load  the future growing load one strategy would be shard the data, store it on different machine and distribute the load among those machines. Our system will store denormalized data( key --> value) and all data for a row/key would be on same machine. This would help us in reducing latency.

Going back to our feature number 5 above, high availability and low latency is our primary design goal.

Lets assume somehow we distribute our data(Key - value)  into various shards. Initially we will start with master slave architecture for shards. Each shard will have a master node and multiple slave nodes. Slaves will keep reading  all new  update from the master and keeps updating themselves. All the WRITE requests will be served by master node and slaves are used to handle READ requests. This could lead to inconsistent views on latest data across the master and clients, but would ensure high availability to read requests.

 Our architecture within a shard may look like as follow:


However, what happens to data write in case of master failure. WRITE start falling as in our design master was only responsible for taking write requests. We can promote one of the slave to become the new master. However, shard become unavailabe for the period of fail-over from master to slave to become new master. As our slave is lagging with master (data update), in case of master's hardware failure we may lose some data. This is bad!!

Can we do better?Is there any way to handle this data lose and keep system available all the time ??
In my next entry, I will extend this design and solve these problems. So take care and stay tune :). Will see you soon!! 







Comments