Skip to main content

System Design Interview: Real-Time Chat System

February 14, 2026By CTO15 min read
...
questions

Complete system design interview question for building a real-time chat application like Slack or WhatsApp, covering WebSockets, message delivery guarantees, and presence systems.

Role: Senior Engineer / Staff Engineer
Level: Senior
Type: System Design

System Design Interview: Real-Time Chat System

Design a real-time chat application similar to Slack or WhatsApp. This question tests understanding of real-time communication, message delivery guarantees, presence systems, and data modeling at scale.

Interview Format (45 minutes)

Time Allocation:

  • Requirements gathering: 5-8 minutes
  • High-level design: 10-15 minutes
  • Deep dive: 15-20 minutes
  • Scale and edge cases: 5-10 minutes

Step 1: Requirements Gathering (5-8 min)

A strong candidate will clarify the scope before designing anything.

Functional Requirements

Good questions to ask:

  • Is this 1:1 chat, group chat, or both? (both, groups up to 500 members)
  • Do we need message history/persistence? (yes, searchable)
  • What message types? (text, images, files)
  • Do we need read receipts? (yes)
  • Online/offline presence? (yes)
  • Push notifications? (yes, for offline users)
  • Message editing/deletion? (yes)

Agreed requirements:

  1. 1:1 and group messaging (up to 500 members)
  2. Real-time message delivery
  3. Persistent message history with search
  4. Read receipts and typing indicators
  5. Online/offline presence
  6. Push notifications for offline users
  7. Image and file sharing

Non-Functional Requirements

Good questions to ask:

  • Expected user base? (50M DAU)
  • Messages per day? (1B messages/day)
  • Message size limit? (64KB text, 100MB files)
  • Latency requirements? (<200ms delivery)
  • Geographic distribution? (global)
  • Message retention? (forever for paid, 90 days for free)

Agreed requirements:

  • Low latency (<200ms for message delivery)
  • High availability (99.99% uptime)
  • Message ordering guaranteed within a conversation
  • At-least-once delivery (with deduplication)
  • End-to-end encryption (stretch goal)

Calculations

Messages:

50M DAU, average 20 messages/day = 1B messages/day
Average message size: 200 bytes
1B x 200 bytes = 200GB/day = 73TB/year

Connections:

50M concurrent WebSocket connections (peak)
Each connection: ~10KB memory
50M x 10KB = 500GB RAM for connections alone

QPS:

1B messages/day = ~12,000 messages/sec average
Peak (3x): ~36,000 messages/sec

Red flags if candidate:

  • Designs only for HTTP polling
  • Doesn't consider message ordering
  • Ignores offline scenarios
  • Doesn't ask about group size limits

Step 2: High-Level Design (10-15 min)

API Design

WebSocket Connection:

wss://chat.example.com/ws?token=<auth_token>

// Client -> Server
{
  "type": "send_message",
  "conversationId": "conv_123",
  "content": "Hello!",
  "clientMessageId": "client_uuid_456"  // for deduplication
}

// Server -> Client
{
  "type": "new_message",
  "messageId": "msg_789",
  "conversationId": "conv_123",
  "senderId": "user_001",
  "content": "Hello!",
  "timestamp": "2026-02-14T10:30:00Z"
}

REST APIs (for non-real-time operations):

GET  /api/conversations                    # List conversations
GET  /api/conversations/:id/messages       # Message history (paginated)
POST /api/conversations                    # Create conversation/group
POST /api/conversations/:id/messages       # Send message (fallback)
PUT  /api/messages/:id                     # Edit message
DELETE /api/messages/:id                   # Delete message
POST /api/upload                           # Upload file/image

Good candidate discusses:

  • WebSocket vs SSE vs long polling trade-offs
  • REST fallback for reliability
  • Client-generated message IDs for deduplication

Core Components

┌───────────────┐
│    Clients    │
└───────┬───────┘
        │ WSS
┌───────▼───────────────────────────────────┐
│          WebSocket Gateway                │
│  (Connection management, routing)         │
└───────┬───────────────┬───────────────────┘
        │               │
┌───────▼───────┐ ┌─────▼──────────────┐
│  Chat Service │ │  Presence Service  │
│  (Messages)   │ │  (Online status)   │
└───────┬───────┘ └─────┬──────────────┘
        │               │
┌───────▼───────┐ ┌─────▼──────────────┐
│  Message DB   │ │  Redis Cluster     │
│  (Cassandra)  │ │  (Presence + Pub/Sub)│
└───────────────┘ └────────────────────┘

Data Model

Messages (Cassandra / DynamoDB):

sql
-- Partition by conversation, sorted by time
CREATE TABLE messages (
    conversation_id UUID,
    message_id TIMEUUID,
    sender_id UUID,
    content TEXT,
    content_type TEXT,       -- 'text', 'image', 'file'
    media_url TEXT,
    created_at TIMESTAMP,
    edited_at TIMESTAMP,
    deleted BOOLEAN,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Conversations (PostgreSQL):

sql
CREATE TABLE conversations (
    id UUID PRIMARY KEY,
    type VARCHAR(10),         -- 'direct', 'group'
    name VARCHAR(255),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE conversation_members (
    conversation_id UUID REFERENCES conversations(id),
    user_id UUID,
    role VARCHAR(20) DEFAULT 'member',
    joined_at TIMESTAMP,
    last_read_message_id UUID,
    PRIMARY KEY (conversation_id, user_id)
);

CREATE INDEX idx_user_conversations
    ON conversation_members(user_id);

Step 3: Deep Dive (15-20 min)

Message Delivery Flow

Sender -> WebSocket Gateway -> Chat Service -> Message DB
                                    |
                                    v
                              Message Queue
                                    |
                    ┌───────────────┼───────────────┐
                    v               v               v
              WS Gateway      WS Gateway      Push Service
              (User A)        (User B)        (Offline Users)

Implementation:

python
class ChatService:
    def handle_message(self, sender_id, conversation_id, content, client_msg_id):
        # 1. Deduplication check
        if self.message_store.exists_by_client_id(client_msg_id):
            return  # Already processed

        # 2. Validate sender is member of conversation
        if not self.is_member(sender_id, conversation_id):
            raise PermissionError("Not a member")

        # 3. Store message
        message = self.message_store.create(
            conversation_id=conversation_id,
            sender_id=sender_id,
            content=content,
            client_message_id=client_msg_id
        )

        # 4. Get conversation members
        members = self.get_members(conversation_id)

        # 5. Fan out to online members via pub/sub
        for member_id in members:
            if member_id != sender_id:
                self.pubsub.publish(
                    channel=f"user:{member_id}",
                    message=message.to_dict()
                )

        # 6. Send push notifications to offline members
        offline_members = [m for m in members
                          if not self.presence.is_online(m)]
        self.push_service.notify(offline_members, message)

        # 7. Acknowledge to sender
        return {"status": "delivered", "messageId": message.id}

Presence System

Challenge: Tracking 50M online users in real time

python
class PresenceService:
    def __init__(self, redis):
        self.redis = redis
        self.HEARTBEAT_INTERVAL = 30  # seconds
        self.TIMEOUT = 90  # seconds

    def user_connected(self, user_id):
        self.redis.hset(f"presence:{user_id}", mapping={
            "status": "online",
            "last_seen": time.time(),
            "server_id": self.server_id
        })
        self.redis.expire(f"presence:{user_id}", self.TIMEOUT)

        # Notify contacts
        self._broadcast_status(user_id, "online")

    def heartbeat(self, user_id):
        self.redis.hset(f"presence:{user_id}",
                       "last_seen", time.time())
        self.redis.expire(f"presence:{user_id}", self.TIMEOUT)

    def user_disconnected(self, user_id):
        # Don't immediately mark offline (might reconnect)
        self.redis.hset(f"presence:{user_id}",
                       "status", "away")

        # Schedule offline check after grace period
        self.scheduler.schedule(
            delay=30,
            task=self._check_still_offline,
            args=(user_id,)
        )

    def is_online(self, user_id):
        data = self.redis.hgetall(f"presence:{user_id}")
        if not data:
            return False
        return (time.time() - float(data["last_seen"])) < self.TIMEOUT

    def _broadcast_status(self, user_id, status):
        # Only broadcast to users who have this user in their contacts
        contacts = self.get_contacts(user_id)
        for contact_id in contacts:
            self.pubsub.publish(
                channel=f"user:{contact_id}",
                message={"type": "presence", "userId": user_id, "status": status}
            )

Strong candidate discusses:

  • Heartbeat mechanism vs connection-based detection
  • Grace period before marking offline
  • Fan-out problem for popular users (hundreds of contacts)
  • Lazy presence (only check when user opens a conversation)

Read Receipts and Typing Indicators

python
# Read receipts: persistent (stored in DB)
def mark_read(user_id, conversation_id, message_id):
    db.update("conversation_members",
        set={"last_read_message_id": message_id},
        where={"conversation_id": conversation_id,
               "user_id": user_id})

    # Notify other members
    pubsub.publish(f"conv:{conversation_id}", {
        "type": "read_receipt",
        "userId": user_id,
        "lastReadMessageId": message_id
    })

# Typing indicators: ephemeral (never stored)
def typing_started(user_id, conversation_id):
    pubsub.publish(f"conv:{conversation_id}", {
        "type": "typing",
        "userId": user_id,
        "status": "started"
    })
    # Auto-expire after 5 seconds (in case stop event lost)

Message Ordering

Challenge: Ensuring messages appear in correct order across devices

Approach: Server-assigned timestamps + sequence numbers

python
class MessageOrderer:
    def assign_order(self, conversation_id, message):
        # Atomic increment per conversation
        seq = self.redis.incr(f"seq:{conversation_id}")
        message.sequence_number = seq
        message.server_timestamp = time.time_ns()
        return message

    def resolve_conflicts(self, messages):
        # Sort by sequence number (primary)
        # Then by server timestamp (secondary)
        return sorted(messages,
                     key=lambda m: (m.sequence_number, m.server_timestamp))

Strong candidate discusses:

  • Client-side vs server-side timestamps
  • Causal ordering vs total ordering
  • Handling out-of-order delivery on client

Step 4: Scale and Edge Cases (5-10 min)

Scaling WebSocket Connections

Problem: Single server can handle ~500K connections max

Solution: WebSocket Gateway Cluster

┌────────────────────────────────────────────────┐
│              Load Balancer (L4)                 │
│          (Sticky sessions by user_id)          │
└──────┬──────────┬──────────┬──────────┬────────┘
       │          │          │          │
  ┌────▼────┐ ┌──▼─────┐ ┌─▼──────┐ ┌▼───────┐
  │  WS GW  │ │ WS GW  │ │ WS GW  │ │ WS GW  │
  │  500K   │ │ 500K   │ │ 500K   │ │ 500K   │
  └────┬────┘ └──┬─────┘ └─┬──────┘ └┬───────┘
       │         │         │         │
       └────────┬┴─────────┴─────────┘
                │
       ┌────────▼──────────┐
       │   Redis Pub/Sub   │
       │   (Message Bus)   │
       └───────────────────┘

Connection registry (which user is on which server):

python
class ConnectionRegistry:
    def register(self, user_id, server_id):
        self.redis.sadd(f"connections:{user_id}", server_id)

    def unregister(self, user_id, server_id):
        self.redis.srem(f"connections:{user_id}", server_id)

    def get_servers(self, user_id):
        return self.redis.smembers(f"connections:{user_id}")

    def route_message(self, user_id, message):
        servers = self.get_servers(user_id)
        for server_id in servers:
            self.pubsub.publish(f"server:{server_id}", {
                "target_user": user_id,
                "message": message
            })

Group Message Fan-Out

Problem: A message to a 500-person group means 499 deliveries

python
def fan_out_group_message(conversation_id, message):
    members = get_members(conversation_id)

    if len(members) <= 50:
        # Small group: fan-out on write (push to each member)
        for member_id in members:
            deliver_to_user(member_id, message)
    else:
        # Large group: fan-out on read (members pull when online)
        store_in_conversation_feed(conversation_id, message)
        # Only push notification to online + mentioned users
        online = [m for m in members if is_online(m)]
        mentioned = extract_mentions(message.content)
        notify_users = set(online + mentioned)
        for user_id in notify_users:
            deliver_to_user(user_id, message)

Offline Message Sync

python
def sync_messages(user_id, last_sync_timestamp):
    """Called when a user comes back online"""
    conversations = get_user_conversations(user_id)

    unread = {}
    for conv_id in conversations:
        last_read = get_last_read_message(user_id, conv_id)
        new_messages = get_messages_after(conv_id, last_read,
                                         limit=50)
        if new_messages:
            unread[conv_id] = {
                "messages": new_messages,
                "unread_count": count_unread(conv_id, last_read)
            }

    return unread

Edge Cases

Strong candidates identify:

  • Network partitions (messages sent but not acknowledged)
  • Device sync (user on phone and laptop simultaneously)
  • Large media files (separate upload flow with CDN)
  • Spam and abuse (rate limiting, content moderation)
  • Message deletion propagation across all devices
  • Clock skew between servers

Evaluation Rubric

Strong Performance (Hire)

  • Chooses WebSockets with proper justification
  • Designs for message ordering and delivery guarantees
  • Handles presence efficiently at scale
  • Considers fan-out strategies for groups
  • Discusses offline sync and push notifications
  • Clear separation of real-time vs persistent data
  • Mentions security (encryption, auth)

Adequate Performance (Maybe)

  • Functional design with WebSockets
  • Basic message storage and retrieval
  • Some scaling considerations
  • Misses edge cases like offline sync or ordering
  • Can be guided toward better solutions

Weak Performance (No Hire)

  • Only considers HTTP polling
  • No thought given to delivery guarantees
  • Doesn't address group messaging challenges
  • Can't reason about connection management at scale
  • Poor data model choices

Follow-up Questions

For senior candidates:

  • How would you implement end-to-end encryption?
  • Design the notification system in detail
  • How would you handle message search across billions of messages?
  • How would you implement message reactions and threads?

For staff+ candidates:

  • Design the infrastructure for global deployment with <100ms latency
  • How would you handle compliance (message retention, legal holds)?
  • Design the system for 500M DAU
  • How would you implement real-time translation?

This question tests real-time systems design, pub/sub patterns, presence management, and data consistency under concurrent writes. A strong candidate will balance latency requirements with delivery guarantees while maintaining clear system boundaries.

Want more insights like this?

Join thousands of CTOs and technical leaders getting weekly insights on leadership and system design.

No spam. Unsubscribe anytime.