Who this guide is for
- DevOps and SRE teams responsible for uptime and operations
- Platform, cloud, and backend engineers deploying or tuning the stack
- Infrastructure architects planning multi-region, failover, or compliance-heavy environments
What the platform does
- Real-time messaging for 1:1 and group chat with persistent history
- WebSocket event streaming for presence, typing indicators, and delivery/read receipts
- Distributed event pipeline (Kafka) for decoupled microservices communication
- Notifications subsystem for asynchronous push fan-out
- Moderation services with rule-based filtering and optional AI adapters
- Webhooks engine for outbound callbacks with retries and signature validation
- Horizontally scalable REST APIs for chat, users, groups, and metadata
Data & storage
- TiDB cluster (PD, TiKV, TiDB SQL) as the primary relational store for users, conversations, groups, and message metadata
- MongoDB for flexible metadata, moderation data, and unstructured fields
- Three Redis clusters for caching, pub/sub, session state, and other fast-access needs
- Kafka as the event backbone for real-time messaging and inter-service pipelines
- Optional object storage (e.g., Amazon S3, MinIO, Ceph) for media, logs, documents, and other large binaries when your application handles unstructured data across services
Deployment models
- Local development (Docker Compose): single-machine environment for dependency bootstrapping, local development/QA, and CI pipelines. Not recommended for production workloads.
- Docker Swarm (recommended up to ~200k MAU / ~20k PCC): current reference architecture with lightweight cluster management, predictable service placement, secure overlay networks, and rolling updates.
- Kubernetes (enterprise, multi-region, or >200k MAU): best when you need advanced autoscaling, cross-region failover, service mesh/mTLS, cloud-native Kafka, or strict compliance requirements. Contact us for enterprise Kubernetes architecture guidance.
High-level architecture

- NGINX for TLS termination, routing, WebSocket upgrades, and load balancing
- WebSocket gateway for real-time connections, presence events, and device sessions
- Chat API for messaging logic across users, groups, conversations, and metadata
- Moderation engine for policy-based filtering and compliance checks
- Notifications service for asynchronous push notifications and event fan-out
- Webhooks service for outbound callbacks with retries
- Kafka as the central event backbone
- TiDB, MongoDB, and Redis as the stateful data stores
- Observability stack (Prometheus, Grafana, Loki/ELK) for metrics, dashboards, and logs
- Host and network: private overlay networks isolating backend traffic and optimizing latency