Introduction
Scalability is the ability of a system to handle a growing number of requests without degrading in performance. Most services start small, but as they grow, engineers face a classic choice: scale up (buy bigger servers) or scale out (add more servers). Modern cloud-native design almost always favours scaling out — also called horizontal scaling.
Horizontal vs Vertical Scaling
Vertical scaling means upgrading to a bigger machine — more CPU cores, more RAM. It works up to a point, but it's expensive, has a hard ceiling, and creates a single point of failure.
Horizontal scaling means running multiple identical instances behind a load balancer. This is cheaper, fault-tolerant, and theoretically unbounded.
services:
app:
image: myapp:latest
deploy:
replicas: 4 "hl-keyword">class="hl-comment"># run 4 identical instances
resources:
limits:
cpus: "0.5"
memory: 512M
environment:
- DATABASE_URL=${DATABASE_URL}
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- app
Load Balancer Strategies
A load balancer distributes incoming traffic across your instances. The three most common strategies are:
•Round-robin — requests cycle through instances in order. Simple, good default. •Least connections — send to the instance with the fewest active connections. Great for long-lived requests. •IP-hash — the same client always hits the same server. Useful for sticky sessions.
upstream app_servers {
least_conn; "hl-keyword">class="hl-comment"># Least connections strategy
server app1:3000 weight=3;
server app2:3000 weight=1; "hl-keyword">class="hl-comment"># app1 gets 3× the traffic
server app3:3000;
keepalive 32; "hl-keyword">class="hl-comment"># Reuse connections
}
server {
listen 80;
location / {
proxy_pass http:"hl-keyword">class="hl-comment">//app_servers;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
Database Patterns for Scale
Your database is usually the first bottleneck. Three patterns help here:
1. Read replicas — route SELECT queries to replicas, writes to the primary. 2. Connection pooling — don't open a new DB connection per request; use PgBouncer or similar. 3. Sharding — split data across multiple databases by a shard key (user ID, region, etc.).
"hl-keyword">import { Pool } "hl-keyword">from 'pg';
"hl-keyword">class="hl-comment">// Primary "hl-keyword">for writes
"hl-keyword">const writePool = "hl-keyword">new Pool({
connectionString: process.env.DATABASE_PRIMARY_URL,
max: 20, "hl-keyword">class="hl-comment">// max connections
idleTimeoutMillis: 30_000,
connectionTimeoutMillis: 2_000,
});
"hl-keyword">class="hl-comment">// Read replica "hl-keyword">for SELECT queries
"hl-keyword">const readPool = "hl-keyword">new Pool({
connectionString: process.env.DATABASE_REPLICA_URL,
max: 40, "hl-keyword">class="hl-comment">// replicas can handle more reads
});
"hl-keyword">export "hl-keyword">async "hl-keyword">function getUser(id: "hl-type">string) {
"hl-keyword">class="hl-comment">// Always read "hl-keyword">from replica
"hl-keyword">const { rows } = "hl-keyword">await readPool.query(
'SELECT * FROM users WHERE id = $1',
[id]
);
"hl-keyword">return rows[0];
}
"hl-keyword">export "hl-keyword">async "hl-keyword">function createUser(data: NewUser) {
"hl-keyword">class="hl-comment">// Always write to primary
"hl-keyword">const { rows } = "hl-keyword">await writePool.query(
'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *',
[data.name, data.email]
);
"hl-keyword">return rows[0];
}
Caching Layer
A cache sits in front of your database and serves frequently-requested data from memory. Redis is the go-to solution. Implement the Cache-Aside pattern: check cache first, fall back to DB on miss, populate cache on miss.
"hl-keyword">import { createClient } "hl-keyword">from 'redis';
"hl-keyword">const redis = createClient({ url: process.env.REDIS_URL });
"hl-keyword">await redis.connect();
"hl-keyword">const TTL_SECONDS = 60 * 5; "hl-keyword">class="hl-comment">// 5 minutes
"hl-keyword">export "hl-keyword">async "hl-keyword">function getCachedUser(id: "hl-type">string) {
"hl-keyword">const cacheKey = `user:${id}`;
"hl-keyword">class="hl-comment">// 1. Check cache
"hl-keyword">const cached = "hl-keyword">await redis.get(cacheKey);
"hl-keyword">if (cached) {
"hl-keyword">return JSON.parse(cached); "hl-keyword">class="hl-comment">// cache HIT
}
"hl-keyword">class="hl-comment">// 2. Cache MISS — query DB
"hl-keyword">const user = "hl-keyword">await getUser(id);
"hl-keyword">if (!user) "hl-keyword">return "hl-type">null;
"hl-keyword">class="hl-comment">// 3. Populate cache
"hl-keyword">await redis.setEx(cacheKey, TTL_SECONDS, JSON.stringify(user));
"hl-keyword">return user;
}
Key Takeaways
Scalability is not about clever tricks — it's about removing bottlenecks systematically. Start with profiling (which layer is actually slow?), then apply horizontal scaling, smart load balancing, read replicas, and caching in that order. Don't optimise prematurely — instrument first, optimise second.