This document outlines performance optimization best practices for Node.js APIs based on real-world improvements where throughput increased from 100 req/s to 50,000 req/s on the same infrastructure (same machine and database).
No changes were made to:
- Programming language
- System architecture
- Infrastructure scaling (no additional servers)
The improvements focus on eliminating inefficiencies and unnecessary work within the existing codebase.
Most performance issues in Node.js systems are not caused by Node.js itself, but by suboptimal usage patterns.
Performance is primarily about avoiding unnecessary work.
Problem:
- A new PostgreSQL connection was created per request.
- Under load, the database exhausted available connections and started rejecting requests.
Solution:
- Introduced a connection pool using
pg-pool. - Configured a fixed pool (e.g., 20 reusable connections).
Impact:
- ~60% latency reduction
- Eliminated connection exhaustion under load
Best Practice:
- Always use connection pooling for relational databases.
- Tune pool size according to DB capacity and workload.
Problem:
- Independent I/O operations executed sequentially using
await.
const user = await getUser(id);
const orders = await getOrders(id);
const address = await getAddress(id);Solution:
- Execute in parallel using
Promise.all.
const [user, orders, address] = await Promise.all([
getUser(id),
getOrders(id),
getAddress(id),
]);Impact:
- Response time reduced from ~900ms to ~280ms (per route)
Best Practice:
- Identify independent async operations and parallelize them.
- Avoid unnecessary sequential awaits.
Problem:
- Frequently accessed data (configurations, permissions, feature flags) fetched from DB on every request.
Solution:
- Introduced in-memory LRU cache with TTL (e.g., 60 seconds).
Impact:
- Reduced database load by ~99%
- Significant latency improvements for hot paths
Best Practice:
- Cache read-heavy, low-volatility data.
- Use TTL to ensure consistency.
- Consider distributed cache (e.g., Redis) if needed across instances.
Problem:
- Large result sets (e.g., 50,000 records) loaded entirely into memory.
- Caused high memory usage (~2GB/request) and OOM crashes.
Solution:
- Switched to streaming results directly from PostgreSQL.
Impact:
- Memory usage reduced from ~2GB → ~50MB
- Eliminated out-of-memory crashes
Best Practice:
- Use streams for large datasets.
- Avoid buffering entire payloads in memory.
Problem:
- Node.js runs on a single thread by default.
- Only one CPU core utilized.
Solution:
- Enabled cluster mode using a process manager (e.g., PM2).
- Spawned one process per CPU core.
Impact:
- Throughput scaled linearly with number of cores (e.g., 8x on 8 cores)
Best Practice:
- Always utilize all available CPU cores in production.
- Use clustering or container orchestration.
Problem:
JSON.stringifybecame a bottleneck for large responses.
Solution:
- Replaced with schema-based serializer (
fast-json-stringify).
Impact:
- Up to 4x faster serialization
Best Practice:
- Use optimized serializers for large or frequent payloads.
- Prefer schema-driven approaches for predictable structures.
Problem:
- Large response payloads increased transfer time and bandwidth usage.
Solution:
- Enabled gzip compression for responses.
Impact:
- ~70% reduction in payload size
- Faster client response times
- Lower bandwidth consumption
Best Practice:
- Enable compression (gzip or brotli) for API responses.
- Particularly effective for JSON payloads.
| Metric | Before | After |
|---|---|---|
| Throughput | ~100 req/s | ~50,000 req/s |
| Infrastructure | Same | Same |
| Memory Stability | Unstable | Stable |
| DB Load | High | Optimized |
Before scaling infrastructure, ensure the following:
- Efficient database usage (pooling + reduced queries)
- Proper async concurrency (no unnecessary blocking)
- Strategic caching
- Stream-based data handling
- Full CPU utilization
- Optimized serialization
- Compressed network responses
In most cases, performance gains come from removing inefficiencies, not adding complexity.
When writing or reviewing Node.js code:
- Avoid per-request resource allocation (connections, large buffers)
- Prefer parallel execution for independent operations
- Minimize database roundtrips
- Cache aggressively where safe
- Never load large datasets fully into memory unless required
- Utilize all available hardware resources
- Optimize hot paths (serialization, I/O)
- Measure before and after every change