Vertical scaling
Increase system resource limits and tune configurations to handle more load on existing servers:- Raise file descriptor limits
- Tune kernel network queues (
somaxconn,netdev_max_backlog) - Increase worker processes and thread pools where supported
Configure file descriptor limits
- Edit
/etc/security/limits.confand add:
- Configure systemd defaults:
- Reboot to apply changes:
- Verify:
When to migrate to Kubernetes
Consider Kubernetes when:- MAU exceeds ~200k
- You need multi-region deployments or failover
- Sub-50 ms latency targets are critical
- Dynamic autoscaling and elasticity are operational priorities (HPA/VPA)
Horizontal scaling guidelines
- WebSocket Gateway: add ~1 replica per 1,000-1,500 peak concurrent connections (PCC)
- Chat API: scale out when average CPU utilization exceeds ~60%
- Kafka: increase partition count to improve throughput and parallelism
- Redis: enable Redis Cluster mode when deployments exceed ~200k MAU to distribute data and improve scalability