WhatsApp is a standout among many chat applications due to its global popularity. This article delves into its sophisticated system design, exploring key components and principles behind its reliability and seamless user experience. Here, we are considering some key features of WhatsApp Messaging: Users can send text messages, emojis, stickers, and multimedia files such as photos, videos, and voice messages to individuals or groups Voice / Video Calling: WhatsApp allows users to make voice /video calls over the internet, either one-on-one or in groups. End-to-End Encryption: All communications within WhatsApp are encrypted end-to-end, ensuring privacy and security for users' messages and calls. Delivery Confirmation And Read Receipts: Users should receive confirmation when their messages are successfully delivered to the recipient's device. WhatsApp provides read receipts indicating when a message has been read by the recipient. Group Management: Users can create, manage, and administer groups, including adding or removing participants, changing group settings, etc. Last Seen: WhatsApp's "Last Seen" feature allows users to see when their contacts were last active on the app. It indicates the time when the user was last online or when they last used WhatsApp. This feature is visible to contacts who have each other's phone numbers saved in their devices and have enabled the "Last Seen" option in their WhatsApp settings.
Responsiveness: The system should respond promptly to user actions such as sending messages, making calls, and loading
conversations. Throughput: The system should handle a large number of concurrent users and messages without significant degradation in performance. Latency: Messages and calls should be delivered with minimal delay to provide a real-time communication experience. Availability: The system should be highly available, with minimal downtime and service interruptions. Fault Tolerance: The system should be resilient to failures, with mechanisms in place to handle errors gracefully and maintain uninterrupted service. Data Integrity: Messages and user data should be stored securely and reliably, with safeguards against data loss or corruption. End-to-End Encryption: Ensuring that messages and calls are encrypted end-to-end to prevent unauthorised access or interception. Authentication and Authorisation: Strong authentication mechanisms should be in place to verify the identity of users and ensure that only authorised users can access the system and their data. Data Privacy: User data should be protected from unauthorised access, both in transit and at rest, in compliance with privacy regulations.
Peak Message Rate:
Server Infrastructure:
Consider 2.5 billion messages/hour.
Let's assume a cluster of servers capable of handling 10,000 messages per second (a conservative estimate).
Number of Servers = Peak Message Rate / Messages per Second per Server
= 2.5 billion messages/hour / 10,000 messages/second/server
≈ 250,000 servers.
Database Requirements:
WhatsApp needs databases for storing user data, messages, media files, etc.
Assuming each message is 1 KB in size, and users exchange media files at a lower rate, let's allocate 100 TB of storage for messages.
Database Servers = Total Storage Required / Storage per Database Server
= 100,000 GB / 1,000 GB/server
≈ 100 database servers.
This back-of-the-envelope calculation provides a rough estimate of the infrastructure required for the WhatsApp system design. Actual requirements may vary based on factors like user behaviour, feature set, geographic distribution, and scalability considerations. Further detailed analysis and modelling would be necessary for precise system sizing and capacity planning.
Here, we are solely focusing on text messaging within WhatsApp. Discussion on media messaging will follow in subsequent sections of this article
A chat application typically entails clients communicating with each other over a network, often employing a client-server architecture. TCP (Transmission Control Protocol) is frequently utilised as the underlying protocol for such communication due to its reliability and connection-oriented nature.
While WhatsApp's precise implementation specifics remain proprietary and undisclosed to the public, it is conceivable that they might employ WebSockets, or custom protocol based on TCP for specific functionalities, particularly for real-time bidirectional communication. WebSockets allow for real-time bidirectional communication, making them suitable for instant messaging applications like WhatsApp where messages need to be delivered and received quickly.
This article operates under the assumption that WhatsApp employs Websocket instead of its custom protocol layered on top of TCP.
Websocket Handler
A ‘Websocket Handler’ functions as a lightweight server responsible for maintaining open connections with all active users. This handler oversees the initiation, upkeep, and closure of ‘Websocket‘ connections between clients (e.g., smartphones running the WhatsApp app) and the server infrastructure. A solitary WebSocket handler can manage individual connections with up to 60,000 users. Therefore, the system should maintain a reasonable number of ‘WebSocket Handlers’ based on users count.
Websocket Manager
Web Socket Manager
Web Socket Manager is a repository of information about which web socket handlers are connected to which users and. It sits on top of a Distributed cache(Redis) which stores two types of information:
Which user is connected to which web socket handler
What all users are connected to a web socket handler
Message Service
A repository of all messages in the system. It will expose APIs to get messages by various filters like user id, message-id, delivery status, etc. This messaging service use distributed no-sql db to store the data(Cassandra). Now we can expect that new users will keep getting added to the system every day and all users, old and new, will keep having new conversations every day i.e. message service needs to build on top of a data store that can handle ever-increasing data. And we also know that there are a finite number of queries we can run on Message service due to the finite number of APIs it exposes to web socket handlers. These requirements and query patterns fit with those that distributed no-sql db (Cassandra )is best suited for.
Text messages are not the only content shared on WhatsApp there are media contents such as images, files, and videos over WhatsApp. Ideally, when content like this is sent out it will be compressed and encrypted at the device end and the encrypted content will be sent to the receiver. Even while receiving the content will be received in an encrypted format and decrypted on the device end.
Suppose user A is sending an image to user B. This will happen in two steps.
User A will upload the image to a server and get the image id(download url / hash). This image upload happens over HTTP
Then it will send the image ID to user B and user B can search and download the image from the server.
The image will undergo compression and encryption on the device before being transmitted to an Asset Service via HTTP. The Asset Service will then store the content on blob storage (S3). Depending on the traffic patterns, the content may or may not be loaded onto a CDN from blob storage (S3). Once stored in blob storage (S3), the Asset Service will dispatch the image ID to the message service, marking it as a message intended for the recipient
Thus far, we've covered messaging via WebSocket and other communication via HTTP. When it comes to voice and video calls, UDP is favoured. This preference is due to its lower latency, reduced overhead, and absence of automatic retransmission for lost packets. UDP provides a simpler and more efficient communication model compared to TCP, making it better suited for real-time applications where timely data delivery is paramount
Connector
UDP is a connectionless protocol, as the public IP addresses of both parties is the only thing required for this mode of communication. So when we say “establishing a UDP connection”, we mean sharing the information to set up this UDP connection i.e. each other's public IP address. This is where a connector comes in. A connector’s job is to identify a user’s public IP address information. So ‘user A’ will get its public IP from the connector, ‘user 2’ will get its public IP from the connector and they will share this information via ‘Websocket Handler’. Once they have each other’s public IP address they can start audio/video calls.
Signaling Server
The ‘Signaling Server’ facilitates the initial connection between peers. It helps them exchange information necessary for setting up the communication session, such as network addresses, session control messages, and media capabilities through ‘WebSocket Handler’
Before starting a media session, peers need to negotiate various parameters like codecs, resolutions, and network configurations. The ‘Signaling Server’ helps coordinate this negotiation process, ensuring that both parties agree on compatible
Call Server
Call server act as an intermediate between the users in audio and video calls when the peer to peer data transfer is not possible due to network configurations such as firewalls, NAT (Network Address Translation), or restrictive network policies.
NAT devices, commonly found in home or corporate networks, often prevent direct peer-to-peer communication by translating private IP addresses to public ones. This can impede communication between the devices. When direct peer-to-peer connections fail due to NAT, ‘Call Server’ servers act as intermediaries. They relay data between peers, allowing communication to bypass NAT restrictions.
Transcoding Service
Media files (such as images, videos, and audio) shared on WhatsApp need to be compatible with various devices and network conditions. The transcoding service ensures that media files are converted into formats that are supported by different devices and can be efficiently streamed over varying network speeds.
Transcoding allows WhatsApp to compress media files to reduce their size while maintaining acceptable quality. This is crucial for efficient storage and transmission, especially in regions with limited internet bandwidth or users with older devices.
WhatsApp's end-to-end encryption (E2EE) is a sophisticated system that ensures only the sender and recipient can read messages, and nobody in between, not even WhatsApp itself. The WhatsApp application running on client devices is responsible for encrypting outgoing messages and decrypting incoming messages.
WhatsApp's E2EE is based on the Signal Protocol, developed by Open Whisper Systems. This protocol ensures that messages remain encrypted from the sender's device until they reach the recipient's device. It uses a combination of symmetric and asymmetric encryption techniques, including key exchange using Diffie-Hellman key agreement protocol.
Key Pair Generation:
WhatsApp is known for its end-to-end encryption, which ensures that only the sender and recipient can access the contents of their messages or calls. This means that messages or calls in WhatsApp are encrypted from the sender's device until they reach the recipient's device, providing privacy and security
How does it work?
Each user generates a pair of cryptographic keys: a public key and a private key. The public key is shared with others, while the private key is kept secret and securely stored on the user's device. This key will be stored in Whatsapp signaling server.
Initiating a Chat:
When a user initiates a chat or conversation with another user, their device generates a unique session key for that specific conversation.
Encryption of Session Key:
The sender's device encrypts the session key with the recipient's public key. This ensures that only the recipient's device can decrypt the session key.
Sending Encrypted Session Key:
The sender's device sends the encrypted session key to the recipient's device.
Session Key Decryption:
Upon receiving the encrypted session key, the recipient's device decrypts it using its private key, revealing the session key.
Message Encryption and Decryption:
Once both parties have the session key, they can use it to encrypt and decrypt messages exchanged in the conversation. Each message is encrypted with the session key before being sent and decrypted using the same session key upon receipt.
Forward Secrecy:
WhatsApp implements forward secrecy, meaning that for each new message, a new session key is generated. Even if an attacker were to somehow obtain one session key, it would not compromise the security of past or future messages.
End-to-End Protection:
Throughout the conversation, messages are encrypted on the sender's device and decrypted on the recipient's device. This ensures that only the sender and recipient can read the messages, as the messages are encrypted and decrypted locally on their respective devices.
Explanation:
This sequence demonstrates the key exchange and message encryption process in an E2EE system, ensuring secure communication between sender and receiver.
The WhatsApp server achieves this feature through the ‘Message Service” along with the ‘WebSocket Handler’.
Message Processing Flow:
When a user sends a message, it is encrypted on their device and sent to WhatsApp's servers.
The server receives the encrypted message, decrypts it, and processes it for delivery.
After processing, the server encrypts the message for the recipient and forwards it to their device.
Upon successful delivery, the server updates the message status and notifies the sender's device.
When the recipient opens the chat and views the message, their device sends a read receipt to WhatsApp's servers.
The server updates the message status to "read" and notifies the sender's device accordingly.
User Service
User Service stores user-related information like name, id, profile picture, preferences, etc, and usually, this is stored in SQL(MySQL) database. This information is also cached in Distributed Cache(Redis).
Group Service
Group Service maintains all group related information like which user belongs to which group, user ids, group ids, a time when the group was created, a time when every user was added, status, group icon, etc. This data also will be stored in a SQL(MySQL) database which will have multiple slaves in different geographical locations to reduce latency. And of course cache this data in a Distributed Cache(Redis). Usually, user service and group service will first connect to Distributed Cache(Redis) to find the data, only querying a SQL(MySQL) DB slave in case the data is not found in Distributed Cache(Redis).
All user-triggered events will be captured by analytics and monitored by the Last Seen Service, which tracks the last time a user was active. This data may be stored in either a Distributed Cache (Redis) or a NoSQL database (Cassandra), organised by user ID and their respective last seen timestamps. Whenever a user opens the app, it queries the Last Seen Service for the last seen information of other users who are in their contact list
A comprehensive system design for WhatsApp must prioritise scalability, reliability, and privacy. By employing a distributed architecture, WhatsApp can handle the immense volume of messages and users while maintaining a seamless user experience. Partitioning user data across multiple servers, employing load balancing techniques, and utilising efficient data replication mechanisms ensure high availability and fault tolerance. End-to-end encryption should be implemented to safeguard user privacy and data integrity. Additionally, optimising network protocols and leveraging caching mechanisms can enhance performance and reduce latency. A well-designed system for WhatsApp should continually evolve to adapt to changing user demands and technological advancements, ensuring a robust and secure messaging platform for millions of users worldwide.