Node.jsNestJSScalingPerformance

How I Scaled a Node.js Platform from 80 to 70,000+ Users

A practical case study on scaling a NestJS application from a small startup to 70,000+ active users - the bottlenecks I hit, the solutions I implemented, and what I'd do differently.

Mohamed Amine Azaiez··12 min read

When I joined DIETURE as a backend developer, the platform had around 80 active users. Two years later, it was serving over 70,000 users with 99.9% uptime. Here's exactly how we did it - the bottlenecks we hit, the solutions we implemented, and the mistakes we avoided.

The Starting Point

The application was a monolithic Node.js/Express app connected to MongoDB Atlas. It handled user authentication, real-time features via Socket.io, file uploads to AWS S3, and a REST API for a fitness and nutrition tracking platform. On paper, this setup works fine for small scale. In practice, it breaks in predictable ways as you grow.

Initial stack:

  • Node.js + Express (later migrated to NestJS)
  • MongoDB Atlas shared cluster (M10)
  • No caching layer
  • Single EC2 t3.medium instance
  • Socket.io for real-time notifications
  • AWS S3 for file storage

Phase 1: The First 500 Users - Database Bottlenecks

Around 500 concurrent users, API response times started climbing. P95 latency went from 120ms to over 800ms. The culprit: unindexed MongoDB queries performing full collection scans on every request.

The most common offender was a user lookup pattern like this:

// Before - full collection scan on every request
const user = await User.findOne({ email: req.body.email });

// Running explain() revealed the problem:
// totalDocsExamined: 52,000
// nReturned: 1
// executionTimeMillis: 340ms

Running db.users.explain("executionStats").findOne({ email: "..." }) showed a COLLSCAN examining 52,000 documents to return 1 result. Adding a unique index on email dropped this to an IXSCAN examining 1 document.

Indexes we added in the first pass:

  • email unique index on users collection
  • (userId, createdAt) compound index on activity logs
  • (status, assignedTo) compound index on tasks
  • Text index on searchable content fields

This alone brought P95 latency back under 150ms.

Phase 2: 2,000 Users - Introducing Redis Caching

At 2,000 users, the database was being hit hard with repeated identical queries - user profiles, app configuration, leaderboards. We introduced Redis (ElastiCache) as a caching layer.

// NestJS cache service with get-or-set pattern
@Injectable()
export class CacheService {
  constructor(@InjectRedis() private redis: Redis) {}

  async getOrSet<T>(
    key: string,
    ttl: number,
    factory: () => Promise<T>,
  ): Promise<T> {
    const cached = await this.redis.get(key);
    if (cached) return JSON.parse(cached);

    const value = await factory();
    await this.redis.setex(key, ttl, JSON.stringify(value));
    return value;
  }

  async invalidate(pattern: string) {
    const keys = await this.redis.keys(pattern);
    if (keys.length > 0) await this.redis.del(...keys);
  }
}

What we cached and for how long:

  • User profiles: 5-minute TTL, invalidated on profile update
  • Leaderboard data: 60-second TTL (acceptable staleness)
  • App configuration and feature flags: 10-minute TTL
  • Aggregation results (stats, summaries): 2-minute TTL

Database read load dropped ~60% immediately after deploying the cache layer.

Phase 3: 10,000 Users - Migrating to NestJS + Microservices

The monolith was becoming a liability. Every deployment was risky (the whole app went down), a crash in one feature could take down everything, and we had to scale the entire application even when only one part was under load.

We decided to migrate from Express to NestJS first - NestJS's module system, dependency injection, and built-in microservice transport made this the right foundation for what was coming.

Then we extracted high-traffic or high-risk services into independent microservices:

// Main gateway routing to microservices via RabbitMQ
@Controller('notifications')
export class NotificationController {
  constructor(
    @Inject('NOTIFICATION_SERVICE') private client: ClientProxy,
  ) {}

  @Post('send')
  sendNotification(@Body() dto: SendNotificationDto) {
    // Fire and forget - no blocking the gateway
    return this.client.emit('notification.send', dto);
  }
}

// Notification microservice handler
@EventPattern('notification.send')
async handleSend(@Payload() dto: SendNotificationDto) {
  await this.notificationService.process(dto);
}

Services we extracted and why:

  • auth-service - isolated security-sensitive logic, independent deployments
  • notification-service - email/push notifications are slow; offloaded to a queue prevents blocking API responses
  • file-service - image processing is CPU-intensive; running it on the same instance starved the API
  • analytics-service - aggregations are read-heavy; given a read replica of MongoDB

Phase 4: 30,000 Users - Fixing WebSocket Scaling

When we scaled to multiple EC2 instances behind a load balancer, WebSocket connections started dropping randomly. The problem: a user might connect to instance A, but an event emitted from instance B would never reach them.

The fix was the Socket.io Redis adapter, which uses Redis pub/sub to broadcast events across all instances:

import { createAdapter } from '@socket.io/redis-adapter';
import { createClient } from 'redis';

const pubClient = createClient({ url: process.env.REDIS_URL });
const subClient = pubClient.duplicate();

await Promise.all([pubClient.connect(), subClient.connect()]);
io.adapter(createAdapter(pubClient, subClient));

// Now emitting from any instance reaches all connected clients
io.to(userId).emit('notification', payload); // works across instances

With this in place, WebSocket reliability went to 100% regardless of which instance handled the connection.

Phase 5: 70,000+ Users - Infrastructure Maturity

At this scale, the architecture had stabilized into something we were proud of:

  • 4 EC2 c5.xlarge instances behind an Application Load Balancer with auto-scaling
  • MongoDB Atlas M50 dedicated cluster with a read replica for analytics
  • ElastiCache Redis cluster (3 nodes, automatic failover)
  • CloudWatch + PagerDuty for alerting
  • GitHub Actions CI/CD pipeline with zero-downtime deployments

Key metrics at 70,000+ active users:

  • P50 API latency: 45ms
  • P95 API latency: 180ms
  • P99 API latency: 320ms
  • Concurrent WebSocket connections: ~8,000
  • MongoDB read IOPS: ~12,000/s
  • Uptime: 99.9% (measured over 12 months)

Key Lessons

1. Index early, measure always

Don't wait for performance problems. Add indexes based on your query patterns from day one. Use MongoDB Atlas Performance Advisor - it tells you exactly which indexes are missing based on your actual query patterns.

2. Cache aggressively, invalidate carefully

Cache everything that doesn't need to be real-time. But build a solid invalidation strategy from the start - stale cached data causes subtle bugs that are hard to debug in production.

3. Extract microservices when pain is felt, not before

We migrated to microservices at 10,000 users when the monolith was causing real operational pain. Doing it at 80 users would have been premature over-engineering. Start with a modular monolith and extract when you have a clear reason.

4. WebSocket scaling needs planning

If you're using Socket.io and plan to scale horizontally, the Redis adapter is the standard solution. Factor it into your architecture early - retrofitting it later is straightforward, but the outage window when you don't have it is painful.

5. Invest in observability before you need it

Structured logging, metrics, and distributed tracing saved us countless hours. You cannot optimize what you cannot measure, and you cannot debug a production incident without logs.