CometChat On-Prem Overview

CometChat On-Prem is an enterprise deployment and operations blueprint for a high-performance, real-time messaging platform built for reliability, low latency, and horizontal scale. It covers deployments from roughly 10k MAU up to 250k+ MAU and establishes the foundations for even higher workloads.

Who this guide is for

DevOps and SRE teams responsible for uptime and operations
Platform, cloud, and backend engineers deploying or tuning the stack
Infrastructure architects planning multi-region, failover, or compliance-heavy environments

What the platform does

Real-time messaging for 1:1 and group chat with persistent history
WebSocket event streaming for presence, typing indicators, and delivery/read receipts
Distributed event pipeline (Kafka) for decoupled microservices communication
Notifications subsystem for asynchronous push fan-out
Moderation services with rule-based filtering and optional AI adapters
Webhooks engine for outbound callbacks with retries and signature validation
Horizontally scalable REST APIs for chat, users, groups, and metadata

Data & storage

TiDB cluster (PD, TiKV, TiDB SQL) as the primary relational store for users, conversations, groups, and message metadata
MongoDB for flexible metadata, moderation data, and unstructured fields
Three Redis clusters for caching, pub/sub, session state, and other fast-access needs
Kafka as the event backbone for real-time messaging and inter-service pipelines
Optional object storage (e.g., Amazon S3, MinIO, Ceph) for media, logs, documents, and other large binaries when your application handles unstructured data across services

Deployment models

Local development (Docker Compose): single-machine environment for dependency bootstrapping, local development/QA, and CI pipelines. Not recommended for production workloads.
Docker Swarm (recommended up to ~200k MAU / ~20k PCC): current reference architecture with lightweight cluster management, predictable service placement, secure overlay networks, and rolling updates.
Kubernetes (enterprise, multi-region, or >200k MAU): best when you need advanced autoscaling, cross-region failover, service mesh/mTLS, cloud-native Kafka, or strict compliance requirements. Contact us for enterprise Kubernetes architecture guidance.

High-level architecture

NGINX for TLS termination, routing, WebSocket upgrades, and load balancing
WebSocket gateway for real-time connections, presence events, and device sessions
Chat API for messaging logic across users, groups, conversations, and metadata
Moderation engine for policy-based filtering and compliance checks
Notifications service for asynchronous push notifications and event fan-out
Webhooks service for outbound callbacks with retries
Kafka as the central event backbone
TiDB, MongoDB, and Redis as the stateful data stores
Observability stack (Prometheus, Grafana, Loki/ELK) for metrics, dashboards, and logs
Host and network: private overlay networks isolating backend traffic and optimizing latency

​Who this guide is for

​What the platform does

​Data & storage

​Deployment models

​High-level architecture

Who this guide is for

What the platform does

Data & storage

Deployment models

High-level architecture