Showing posts with label Facebook Messanger. Show all posts
Showing posts with label Facebook Messanger. Show all posts

Thursday, 27 October 2016

Message sync protocol for messaging service

Using conventional back end systems for our messaging service like WhatsApp causes lag in performance and data usages- especially on networks with costly data plans and limited bandwidth. To fix this, we need to completely re-imagine how data is synchronized to device and change the way data is processed in the back end.

In this entry we will discuss a new sync protocol for messaging service that will decrease a non media data usage by 40% . By reducing the congestion on the network, we will see an approximately 20% decrease in the number of people who experience when trying to send messages.

Initially we started with pull based protocol for getting the data down to client. When client receives a message, it first receive a light weight push notification indicating new message is available. This trigger the app to send the server a complicated HTTPS request and receives a very large JSON response with the updated conversation view.

Instead of above model, we can move to push based snapshot and delta model. In this model client retrieve there initial snapshot of there message using HTTPS pull request and then subscribe to delta updates which are immediately pushed to the client through MQTT - a low power, low bandwidth protocol, as messages are received. As a result, without ever making the HTTPS request, the client can quickly display up-to-dated view. We can also remove the JSON based encoding for messages and delta updates.  JSON is great if we need a flexible, human readable format for transferring data without lot of developer overhead. We can replace to Apache Thrift from JSON. Switching to Thrift from JSON allows us to reduce our payload size by roughly 50%.

On server side, messaging data has traditionally been stored on spinning disks. In the pull-based model, we would write to disk before sending a trigger to Client to read from disk. This giant storage tier would serve real time message data as well as the full conversation history. But one large storage tier does not scale well to synchronize recent messages to the app in real time. In order to support synchronization with scaling we need to develop some faster sync protocol to maintain the consistency between the App Client and long term storage. To do this, we need to be able to stream the same sequence of updates in real time to App Client and to storage tier in parallel on a per user basis.

Our new Sync protocol will be totally ordered queue of messaging updates( new message, state change for messages  read etc...)  with separate pointers into the queue indicating the last update sent to your App Client and the traditional storage tier. When successfully sending a message to disk  or to your phone, the corresponding pointer will be advanced. When your phone(Client) is offline, or there is a disk outage, the pointer stays in place while new messages can still be en-queued and other pointers will advanced. As a result, long disk write latency do not hinder Client's real time communication and we can keep Client and the traditional storage tier in sync at independent rates.

Following the common sequence of operations using our brand new Message sync protocol.


1. Our ordered queue contains 5 updates and has 2 pointers:

  • The Disk pointer signals traditional disk storage is up-to-date and received update with sequence id 104.
  • The App/Client pointer indicates our App is offline and last received update was with sequence id 101.


2. Shagun sends me a new message, which is enqueued at the head of my queue and assigned a sequence id 105.


3.Then our message sync protocol sends the new message to traditional disk storage for long term persistence and the disk pointer is advanced.



4. Some time later my phone comes online, the client/App pings the queue to activate the App pointer.



5. The messing sync protocol sends all missing updates to client/App and the App pointer is advanced to indicate our App is up-to-date.



Effectively, this queue based model allows:
  • The most recent messages are immediately sent to online apps and to the disk storage tier from from protocol's memory.
  • A week's worth of messages are served by the queue's backing store in the case of disk outage or App being offline for a while.
  • Older conversation history and full inbox snapshot fetches are served from traditionally disk storage tier.

Tuesday, 25 October 2016

Design a messaging service, like WhatsApp



This entry will discuss a high level design for messaging service, like WhatsApp. Following are the list of features our messaging service is going to support.

List of features:

1. At WhatsApp scale, lets say we need to handle around 10 Billions message sends per day and 300 Millions users. Assuming each message on average  has 160 characters, implies 10B * 160  =  1.6TB data per day , not including message metadata. If messaging service provision for 10 years, we are looking at 10*1.6B*365 ~~6 Petabytes.

2. Messaging service will support only 1:1 plain text conversations and should be extendable to add group conversations.

3. Messaging service will only handle messages less than 64Kb in size.

4. Messaging service will have low latency, high consistency and availability. Consistency has higher priority than availability.

Design Strategy :

The messaging service will expose following APIs to clients.

1. SendStatus sendMessage(senderId, receipientIds, message, ClientMessageId) : This API will be idempotent. If the client retires the message, the message will not be added twice. One way to handle this by generating a random timestamp based ID on the client which can be used to avoid the same message being  sent repeatedly. Ordering of the messages will be maintained. Means, If I send message A, and then message B, then A should always appear before B. We can use server side timestamp based approach to handle this. The parameter used in the sendMessage API is self descriptive.

Returns: SendStatus -> Status of the sendMessage Request as enum.

2. ConversationList fetchConversation(userId, offset, messageCount, lastUpdatedTimeStamp):
This API will be used to fetch and show conversations in a thread. Think of this as view you see when you open the WhatsApp. For a user, we would only want to fetch few messages in one API call at a time(Lazy Loading). The offset and messageCount parameters are used to handle this.

In most cases, our API calls are made by the users who are active on messing services. As such, they already viewed conversations till a certain time stamp and only looking for updates after the time stamp. To handle the clients which are data sensitive we are using lastUpdatedTimeStamp Param.

Returns: ConversationList{
                     List<Conversation> conversations;
                     boolean isDBupdate;
                }
     

              Conversation {
                    int conversationId;
                    List<UUID>participantIds;
                    String snippet;
                    long lastUpdatedTimeStamp;
              }

Following is the high level component diagram for the messaging system.

             

Client (Mobile App/ Browser etc) calls sendMessage API for writing a message. Application server will interpret the API call and calls database to do following. 
  •  Put in the server time stamps to handle ordering of message.
  •  Figure out the conversation to which the message should be appended based on the participants. 
  •  Figure out if the recent message exists with the clientMessageId.
  •  Store the message in database.
Similarly to read conversations, client calls fetchConversation API. Application server interprets the API call and queries the database for the recent conversations.  

Message is going to be very write heavy. Unlike photos, videos etc which are written once and consumed a lot of times by lot of clients, message are written once and consumed by the participants once. For the write heavy system with a lot of data, RDBMS usually do not perform well as every write is not just an append to a table but also an updates to the multiple indices which might required locking and hence might interfere with reads and other writes. However, there are NoSQL databases like HBase, cassandra where writes are cheaper. 

With NoSQL, we need to store data in denormalized form. Every user would have his/her own copy of message box.  That means we will store two copies of the message, one for each participant for every message send for 1:1 conversation.

To increase the efficiency we can use Caching. This is however not as easy as it seems as one of our feature is to ensure high consistency. Most distributed caching system are good with availability and eventually consistent., but not tight consistent.

Lets consider the situation if the user messages and conversations are spread across machines then it causes trouble because the changes are not atomic. Consider the case when messages for a user are on one machine and conversations are on another. When a message is added, the update request is sent to server with messages and server with conversation for this user. There could be a period when one server has processed the update but other has not. If changes through servers are not atomic, the whole system is not atomic anymore. 

One way to resolve is to make sure that the caching for a user completely resides on one server. The same server can also have other users, but user assigned to exactly one server for caching. To further ensure consistency, all the writes for that user should be directed through this server and this server updates its cache when the write is successful on database before confirming success.

For messaging service, every bytes waste has very real impact on the experience of application. By sending less data and reducing HTTPS fetches, messaging service receives updates with low latency and high reliability. In the subsequent entry we will extend the design to improve performance and data usages. Meanwhile, take care and Enjoy learning!!