TheThunderclap
Backend System Design Architecture

Mastering Backend Scalability

Deep dive into horizontal scaling, load balancing, and distributed data patterns that keep services fast under any load.

A

Anant Kumar

Staff Engineer

📅 15 February 2025
⏱ 12 min read

Introduction

Scalability is the ability of a system to handle a growing number of requests without degrading in performance. Most services start small, but as they grow, engineers face a classic choice: scale up (buy bigger servers) or scale out (add more servers). Modern cloud-native design almost always favours scaling out — also called horizontal scaling.

Horizontal vs Vertical Scaling

Vertical scaling means upgrading to a bigger machine — more CPU cores, more RAM. It works up to a point, but it's expensive, has a hard ceiling, and creates a single point of failure.

Horizontal scaling means running multiple identical instances behind a load balancer. This is cheaper, fault-tolerant, and theoretically unbounded.

docker-compose.yml
yaml
                                            services:
  app:
    image: myapp:latest
    deploy:
      replicas: 4          "hl-keyword">class="hl-comment"># run 4 identical instances
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
    environment:
      - DATABASE_URL=${DATABASE_URL}

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - app
                                        

Load Balancer Strategies

A load balancer distributes incoming traffic across your instances. The three most common strategies are:

Round-robin — requests cycle through instances in order. Simple, good default. Least connections — send to the instance with the fewest active connections. Great for long-lived requests. IP-hash — the same client always hits the same server. Useful for sticky sessions.

nginx.conf
nginx
                                            upstream app_servers {
    least_conn;                    "hl-keyword">class="hl-comment"># Least connections strategy
    server app1:3000 weight=3;
    server app2:3000 weight=1;    "hl-keyword">class="hl-comment"># app1 gets 3× the traffic
    server app3:3000;
    keepalive 32;                  "hl-keyword">class="hl-comment"># Reuse connections
}

server {
    listen 80;
    location / {
        proxy_pass         http:"hl-keyword">class="hl-comment">//app_servers;
        proxy_http_version 1.1;
        proxy_set_header   Connection "";
        proxy_set_header   Host $host;
        proxy_cache_bypass $http_upgrade;
    }
}
                                        

Database Patterns for Scale

Your database is usually the first bottleneck. Three patterns help here:

1. Read replicas — route SELECT queries to replicas, writes to the primary. 2. Connection pooling — don't open a new DB connection per request; use PgBouncer or similar. 3. Sharding — split data across multiple databases by a shard key (user ID, region, etc.).

db.ts
typescript
                                            "hl-keyword">import { Pool } "hl-keyword">from 'pg';

"hl-keyword">class="hl-comment">// Primary "hl-keyword">for writes
"hl-keyword">const writePool = "hl-keyword">new Pool({
  connectionString: process.env.DATABASE_PRIMARY_URL,
  max: 20,               "hl-keyword">class="hl-comment">// max connections
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 2_000,
});

"hl-keyword">class="hl-comment">// Read replica "hl-keyword">for SELECT queries
"hl-keyword">const readPool = "hl-keyword">new Pool({
  connectionString: process.env.DATABASE_REPLICA_URL,
  max: 40,               "hl-keyword">class="hl-comment">// replicas can handle more reads
});

"hl-keyword">export "hl-keyword">async "hl-keyword">function getUser(id: "hl-type">string) {
  "hl-keyword">class="hl-comment">// Always read "hl-keyword">from replica
  "hl-keyword">const { rows } = "hl-keyword">await readPool.query(
    'SELECT * FROM users WHERE id = $1',
    [id]
  );
  "hl-keyword">return rows[0];
}

"hl-keyword">export "hl-keyword">async "hl-keyword">function createUser(data: NewUser) {
  "hl-keyword">class="hl-comment">// Always write to primary
  "hl-keyword">const { rows } = "hl-keyword">await writePool.query(
    'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *',
    [data.name, data.email]
  );
  "hl-keyword">return rows[0];
}
                                        

Caching Layer

A cache sits in front of your database and serves frequently-requested data from memory. Redis is the go-to solution. Implement the Cache-Aside pattern: check cache first, fall back to DB on miss, populate cache on miss.

cache.ts
typescript
                                            "hl-keyword">import { createClient } "hl-keyword">from 'redis';

"hl-keyword">const redis = createClient({ url: process.env.REDIS_URL });
"hl-keyword">await redis.connect();

"hl-keyword">const TTL_SECONDS = 60 * 5; "hl-keyword">class="hl-comment">// 5 minutes

"hl-keyword">export "hl-keyword">async "hl-keyword">function getCachedUser(id: "hl-type">string) {
  "hl-keyword">const cacheKey = `user:${id}`;

  "hl-keyword">class="hl-comment">// 1. Check cache
  "hl-keyword">const cached = "hl-keyword">await redis.get(cacheKey);
  "hl-keyword">if (cached) {
    "hl-keyword">return JSON.parse(cached); "hl-keyword">class="hl-comment">// cache HIT
  }

  "hl-keyword">class="hl-comment">// 2. Cache MISS — query DB
  "hl-keyword">const user = "hl-keyword">await getUser(id);
  "hl-keyword">if (!user) "hl-keyword">return "hl-type">null;

  "hl-keyword">class="hl-comment">// 3. Populate cache
  "hl-keyword">await redis.setEx(cacheKey, TTL_SECONDS, JSON.stringify(user));
  "hl-keyword">return user;
}
                                        

Key Takeaways

Scalability is not about clever tricks — it's about removing bottlenecks systematically. Start with profiling (which layer is actually slow?), then apply horizontal scaling, smart load balancing, read replicas, and caching in that order. Don't optimise prematurely — instrument first, optimise second.

💬 Comments

0 comments

Leave a comment

0/1000

Comments are moderated. Be respectful. ✌️

📚 Related Articles