Skip to main content
CometChat On-Prem is an enterprise deployment and operations blueprint for a high-performance, real-time messaging platform built for reliability, low latency, and horizontal scale. It covers deployments from roughly 10k MAU up to 250k+ MAU and establishes the foundations for even higher workloads.

Who this guide is for

  • DevOps and SRE teams responsible for uptime and operations
  • Platform, cloud, and backend engineers deploying or tuning the stack
  • Infrastructure architects planning multi-region, failover, or compliance-heavy environments

What the platform does

  • Real-time messaging for 1:1 and group chat with persistent history
  • WebSocket event streaming for presence, typing indicators, and delivery/read receipts
  • Distributed event pipeline (Kafka) for decoupled microservices communication
  • Notifications subsystem for asynchronous push fan-out
  • Moderation services with rule-based filtering and optional AI adapters
  • Webhooks engine for outbound callbacks with retries and signature validation
  • Horizontally scalable REST APIs for chat, users, groups, and metadata

Data & storage

  • TiDB cluster (PD, TiKV, TiDB SQL) as the primary relational store for users, conversations, groups, and message metadata
  • MongoDB for flexible metadata, moderation data, and unstructured fields
  • Three Redis clusters for caching, pub/sub, session state, and other fast-access needs
  • Kafka as the event backbone for real-time messaging and inter-service pipelines
  • Optional object storage (e.g., Amazon S3, MinIO, Ceph) for media, logs, documents, and other large binaries when your application handles unstructured data across services

Deployment models

  • Local development (Docker Compose): single-machine environment for dependency bootstrapping, local development/QA, and CI pipelines. Not recommended for production workloads.
  • Docker Swarm (recommended up to ~200k MAU / ~20k PCC): current reference architecture with lightweight cluster management, predictable service placement, secure overlay networks, and rolling updates.
  • Kubernetes (enterprise, multi-region, or >200k MAU): best when you need advanced autoscaling, cross-region failover, service mesh/mTLS, cloud-native Kafka, or strict compliance requirements. Contact us for enterprise Kubernetes architecture guidance.

High-level architecture

  • NGINX for TLS termination, routing, WebSocket upgrades, and load balancing
  • WebSocket gateway for real-time connections, presence events, and device sessions
  • Chat API for messaging logic across users, groups, conversations, and metadata
  • Moderation engine for policy-based filtering and compliance checks
  • Notifications service for asynchronous push notifications and event fan-out
  • Webhooks service for outbound callbacks with retries
  • Kafka as the central event backbone
  • TiDB, MongoDB, and Redis as the stateful data stores
  • Observability stack (Prometheus, Grafana, Loki/ELK) for metrics, dashboards, and logs
  • Host and network: private overlay networks isolating backend traffic and optimizing latency