Horizontal scaling
- WebSocket Gateway: ~1 replica per 1,000-1,500 peak concurrent connections (PCC)
- Chat API: scale out when average CPU exceeds ~60%
- Kafka: increase partition counts to raise throughput and parallelism
- Redis: enable Redis Cluster mode when deployments exceed ~200k MAU
Vertical scaling
- Raise file descriptor limits
- Tune kernel network queues (
somaxconn,netdev_max_backlog) - Increase application worker processes and thread pools where supported
- Example file descriptor tuning:
When to migrate to Kubernetes
- MAU exceeds ~200k
- Multi-region deployments or failover are required
- Sub-50 ms latency targets are critical
- Dynamic autoscaling and elasticity are operational priorities