🚀 Realtime Chat System (10k Concurrent Users)

Scalable WebSocket architecture for 10,000 concurrent users

🧩 Overview

The goal was to design and implement a fault-tolerant, distributed real-time chat system capable of handling 10,000 concurrent WebSocket connections while maintaining accurate user presence across servers.

This project focused on horizontal scalability, efficient message delivery, and real-time synchronization of user status across multiple services.

⚙️ Architecture Overview

The system consists of two core backend services:

1. Chat Server

Manages WebSocket connections using Socket.IO
Handles message delivery, room management, and periodic heartbeats
Publishes user activity to Redis Pub/Sub user_presence

2. Presence Service

Listens for user presence events from the user_presence channel
Persists user data and status in MongoDB for reliability
Updates the user's online/offline state and last seen timestamp

Both services connect to a 6-node Redis Cluster for distributed caching, messaging, and data replication.

Frontend clients connect to the Nginx reverse proxy, which forwards requests to HAProxy. HAProxy then balances WebSocket traffic across multiple Chat Server containers using the leastconn strategy.

🏗️ System Architecture Diagram

Client → Nginx → HAProxy → Chat Server → Redis Cluster → Presence Service → MongoDB

💬 Chat Flow Summary

Connection Flow

User connects via WebSocket (Nginx → HAProxy → Chat Server).
Chat server publishes “user online” event to Redis (user_presence).
Presence service listens and updates MongoDB.

Join Room

Client emits join room with 'username', 'recipient', 'roomId'.
Chat server joins the socket to both roomId and user:username rooms.
Fetches room message history and user conversations from Redis.
Emits messages and conversation list to connected clients.

Chat Message

Client emits chat message.
Chat server fetches user metadata from Presence Service.
Appends message to Redis list (chat:roomId) and publishes to roomId channel.
Broadcasts updated messages to all clients in the room.

Heartbeat (Every 30 Seconds)

Every heartbeat interval, the client emits a heartbeat event to the chat server via Socket.IO.
If active, the server publishes “online” event to user_presence.
The server should respond with success: true if the connection is healthy.
If the response isn't received or indicates failure, the client calls handleRetry() to attempt a resend.
If the client doesn't get a successful heartbeat after MAX_RETRIES, it assumes the connection is broken, the server emits inactive → publishes “offline” event to user_presence.
Before that, it retries sendHeartbeat after every RETRY_DELAY ms.
Once retries are exhausted, it triggers a Socket.IO reconnection listener — when reconnected, it resets the inactivity timer.

📈 HAProxy Analysis: High-Concurrency WebSocket Load Balancing

The HAProxy configuration is optimized for horizontal scaling and session stickiness—the two non-negotiable requirements for managing a real-time chat application with long-lived WebSocket connections.

Architectural Success: Load Distribution & Failover

The most critical scaling decisions are concentrated in the backend ws_backend block
Even distribution of the long-lived client connections across the 8 available API servers

Configuration	Decision Rationale	Impact on Scaling
maxconn 65536 (Global)	Sets the absolute maximum connection limit for the entire HAProxy instance to a very high number.	Guarantees the proxy itself is not the bottleneck, easily supporting the 10,000 VUser target.
balance leastconn (Backend)	Uses the least-connections algorithm.	It routes new connections to the server with the fewest active connections. This ensures the load is balanced by the number of active WebSockets, not just the number of requests.
maxconn 2048 (Server)	Limits each of the 8 backend servers to 2,048 connections.	Provides a robust total capacity of 16,384 connections, well exceeding the 10,000 target, giving us 60% buffer capacity.
check inter 1000... (Server)	Defines a health check that pings the backend every 1,000 ms.	Ensures quick detection of failed application servers, allowing HAProxy to immediately remove unhealthy servers from the load-balancing pool and reroute traffic for high availability.

📈 Load Testing Setup

Goal: Simulate realistic chat flow for 10,000 concurrent users.

Tools Used:

Artillery for traffic generation
HAProxy for load balancing between Chat server containers
Nginx as the WebSocket entry point

Flow:

Each simulated user connects to the WebSocket server with a unique username.
Joins a random room and sends messages at defined intervals.
Heartbeat and inactivity events are periodically emitted to simulate user behavior.

🧮 Key Performance Indicators (KPIs)

Metric	Result
Concurrent Users	9,300
Total Messages Sent	139,500
Failure rate	0%
Median Latency (P50)	0.2ms
99th Percentile Latency (P99)	6.8ms

Metrics collected during distributed Artillery load test (10k concurrent users).

🧠 Key Technical Highlights

Redis Cluster used for both Pub/Sub and data caching — provides partition tolerance and scalability.
HAProxy + Nginx combination ensures even load distribution and persistent WebSocket connections.
Dedicated Redis connections for Pub/Sub and regular commands — preventing blocked commands.
In-memory caching + Redis list storage for high-speed message retrieval.
Periodic heartbeats maintain real-time presence with minimal overhead.

🧩 Lessons Learned

Redis Pub/Sub channels scale effectively with clustered nodes when messages are distributed evenly.
WebSocket connection balancing through HAProxy requires sticky sessions or query-based routing for reliability.
Simulating 10k concurrent WebSocket connections with Artillery requires tuning kernel socket limits and ulimits.
Separating user presence from chat logic improves maintainability and horizontal scalability.

📘 Future Improvements

Implement Kafka for higher reliability Pub/Sub messaging.
Introduce WebRTC for voice/video integration.
Add real-time analytics: message rates, latency, error tracking.
Deploy in Kubernetes with horizontal pod autoscaling.

Stack:

Node.js, TypeScript, Socket.IO, Redis Cluster - 6 Nodes, MongoDB, Docker, Nginx, HAProxy, Artillery

Github Repo:

https://github.com/skylineCodes/pulse-chat