How to answer · Updated May 11, 2026

Design a real-time chat system like WhatsApp or Slack.

The complete answer guide: what this question really tests, two example strong answers in different angles, the common weak answer rewritten, and the trap most candidates fall into. This is a system design archetype question — see the broader pattern guide for the structural shape.

What this question is really testing

The interviewer isn't evaluating whether you can recite WebSocket protocols or draw boxes with arrows. They're testing whether you can navigate ambiguity and make explicit trade-offs under constraint. The real signal they're hunting for is your ability to ask clarifying questions that reveal product intuition, then translate those answers into architectural decisions with clear justification. They're worried you'll either dive into implementation details without understanding the problem space, or worse, design a theoretical system that would cost $2 million monthly to run for 10,000 users.

The binary read happening in their head is: "Does this person think like an engineer who ships products, or someone who memorized system design patterns?" Strong candidates demonstrate they understand that "real-time chat" for a 50-person startup looks radically different than WhatsApp's 2 billion users, and they explicitly call out which constraints drive which decisions. Weak candidates treat this as a chance to showcase every technology they know—message queues, microservices, CDNs—without connecting any of it to actual requirements. The interviewer is watching whether you can hold multiple concerns in your head simultaneously: latency, consistency, cost, operational complexity, and user experience.

Two strong answers, two angles

Angle A: Constraint-driven architecture

"Before I start, I need to understand scale and feature scope. Are we talking 1,000 concurrent users or 100 million? For this discussion, let me assume 10 million daily active users with 1 million concurrent connections at peak. I'd start with WebSocket connections for bidirectional communication, but the critical design decision is message delivery guarantees. For a Slack-like system where message history matters more than WhatsApp's ephemeral nature, I'd use a write-through cache pattern: messages hit a message service that writes to both PostgreSQL for durability and Redis for fast retrieval of recent conversations. The expensive part is connection management—at 1 million concurrent WebSockets, we need a dedicated connection layer that's separate from business logic, probably 200-300 servers just holding connections open, with each server handling 3,000-5,000 connections."

Angle B: User experience backwards

"The defining characteristic of real-time chat is that when I hit send, the other person sees it within 200ms, and I need immediate confirmation my message sent. That user experience requirement drives three architectural decisions. First, optimistic updates on the client—show the message immediately with a 'sending' indicator. Second, the backend needs to acknowledge receipt before doing expensive operations like fan-out to offline users or indexing for search. Third, presence and typing indicators need a separate, more ephemeral channel than message delivery because they can tolerate data loss. I'd implement this with a stateful WebSocket gateway for active connections, a message queue for async processing of notifications and offline delivery, and a separate Redis pub/sub for presence. The trickiest part is handling network partitions—if a user's connection drops for 30 seconds, we need to replay missed messages without duplicates, which requires sequence numbers per conversation."

The common weak answer

"I would use WebSockets for real-time communication, a load balancer to distribute traffic, a message queue like Kafka for handling messages, and a database to store chat history. We'd also need a notification service for push notifications and maybe use Redis for caching. The system would be horizontally scalable with microservices."

This answer fails because it's a shopping list of technologies without any connecting tissue explaining why each component exists or what problems it solves. The interviewer reads this as pattern matching from other system design questions rather than actual reasoning. You've named five technologies but haven't addressed a single interesting problem—how do you route a message from User A's WebSocket connection on Server 1 to User B's connection on Server 3? What happens when the database is down but users are still sending messages? Reframe: "The core challenge is maintaining stateful WebSocket connections while keeping business logic stateless. I'd solve this by separating the connection layer from the message processing layer, using Redis pub/sub to bridge them—when Server 1 receives a message, it publishes to a channel that Server 3 subscribes to for delivery."

The one trap most candidates fall into

The trap is designing for eventual consistency when real-time chat actually demands strong ordering guarantees within a conversation. Candidates often suggest distributed architectures with multiple message service instances and load balancing, then handwave away the consistency problem. But if Alice sends "What time?" followed by "Never mind, found it," and Bob receives them out of order, the system is broken. The counterintuitive insight is that chat systems need to be more centralized than candidates expect, at least at the conversation level.

You cannot simply shard by user ID and call it done. Each conversation needs a single authoritative source for message ordering—either a single partition in Kafka, a single database shard, or a consistent hash that routes all messages for conversation X to the same processor. The sophisticated answer acknowledges this: "While we can horizontally scale connection handling across thousands of servers, message ordering per conversation requires coordination. I'd partition conversations using consistent hashing, ensuring all messages for conversation ID 12345 always route to the same message processor instance, which maintains ordering." This shows you understand that real-time systems have different consistency requirements than eventually-consistent systems like social media feeds.

Common questions

How long should my answer to "Design a real-time chat system like WhatsApp or Slack." be?

Aim for 60-120 seconds spoken (250-350 words). Long enough to land the situation, action, and result; short enough that the interviewer has room to follow up. Anything past two minutes risks losing them.

Should I memorize my answer word-for-word?

No — that reads as canned and falls apart the moment the interviewer asks a follow-up. Memorize the structure (the bones of the story) and the specific numbers/names that anchor it. Let the words come naturally each time.

What if I have a really good story but it was years ago?

Recent is better, but a strong story from 3 years ago beats a vague story from last quarter. If the example is older than 5 years, frame it as the moment that crystallized the lesson, then briefly bridge to how you've applied it since.

Can I use the same story for multiple questions?

Often yes — strong stories tend to demonstrate multiple competencies. The trick is reframing the angle each time. Same situation, different opening sentence: lead with the conflict for conflict questions, lead with the leadership move for leadership questions.

How do I know if my answer is actually good?

Practice it out loud and have it scored. The fastest way is a mock interview where the AI flags exactly what's vague, where you used 'we' when the question asked about 'I,' and rewrites the weakest sentence. Reading example answers helps; getting yours scored is what moves performance.

Reading isn't practicing.

Try answering this question right now before checkout, with real Claude-scored feedback in 5 seconds.

Practice this question free →
How to answer: Design a real-time chat system like WhatsApp or Slack. (2026 guide) — InstantInterviewer