A load balancing algorithm is the exact policy a load balancer uses to pick the next backend. Picking the right policy depends entirely on your workload. A static web server, a stateful WebSocket service, a distributed cache cluster, and an AI inference gateway each need a completely different strategy.
The true goal of a load balancing algorithm is to distribute work evenly across healthy capacity while preserving low latency and predictable failure behavior—not just blindly spreading raw request counts.
1. What an Algorithm Can (and Cannot) Know
A load-balancing algorithm only sees the specific signals available at the logical OSI layer where it runs. It is heavily limited by its structural scope.
| Layer | Signals Available | Signals Usually Missing |
|---|---|---|
| DNS | Resolver location, manual weights, server health status | Active request counts, exact client path, real request computing cost |
| Layer 4 (TCP) | Source/Destination IP, Port, active connection count | HTTP path, user headers, tenant ID, query method |
| Layer 7 (HTTP) | HTTP/gRPC metadata, cookies, paths, upstream latency | Full downstream server backend resource cost (unless exposed explicitly) |
The Architectural Reality
No algorithm can compensate for missing health checks, poorly configured timeouts, retry storms, or a lack of raw hardware capacity.
2. Static Load Balancing Algorithms
Static algorithms distribute traffic using simple, deterministic rules.
They perform no real-time inspection of a server's live resource states (like current CPU or memory saturation).
A. Round Robin
Round robin sends incoming requests to backends in a strict, fixed order, then loops back to the beginning when it reaches the end of the pool.
When to Use It
Excellent choice when your backends have completely identical capacity, your requests have an equal computing cost, and connections are short-lived.
The Flaw
It ignores current load.
If one backend falls behind due to background garbage collection, a cache miss pattern, or a single long-running query, round robin will blindly continue to smash it with new traffic.
B. Weighted Round Robin
This variation allows you to assign a manual capacity weight to each server, giving larger machines more turns in the rotation.
When to Use It
Ideal when managing mixed server profiles (e.g., mixing legacy 4-core servers with modern 16-core nodes) or when sending a tiny fraction of live requests to a new canary deployment server.
The Flaw
Weights are entirely static until manually modified.
A backend with a heavy weight of 5 can still become overloaded if it happens to receive expensive, complex requests sequentially.
C. Source IP Hashing & Sticky Sessions
This algorithm hashes the client's source and destination IP addresses to generate a unique key, mapping that client consistently to one specific backend instance.
When to Use It
Required for stateful applications where user session data is saved locally in a server's local RAM rather than a shared distributed cache (like Redis).
The Flaw
Destroys horizontal elasticity.
If a hotspot user (like an enterprise client sending massive bulk traffic from a single corporate office IP proxy) hits the app, their target server will choke while adjacent servers sit completely idle.
3. Dynamic Load Balancing Algorithms
Dynamic algorithms continuously inspect real-world metrics from backend targets to make adaptive routing decisions.
A. Least Connections
The load balancer tracks active open connection slots and routes the next incoming request to whichever backend currently has the fewest open connections.
When to Use It
Highly useful if request duration varies wildly (e.g., some requests finish in 5ms while others process a heavy 10-second PDF report generation) and connections stay open for a meaningful length of time.
The Flaw
Assumes all connections require equal processing power.
A server might have few connections because those connections are idle, or it could be handling few connections but using 100% of its CPU processing complex ML model evaluations.
B. Least Response Time (Weighted Response Time)
The load balancer continuously measures the Round-Trip Time (RTT) of each backend alongside its open connection count, steering fresh traffic directly toward the fastest-responding instances.
When to Use It
Perfect for performance-critical architectures like e-commerce checkout checkouts where low latency directly protects business revenue.
The Flaw
Can trigger a "herding effect" or stampede.
If Node A becomes extremely fast, the load balancer will flood it with traffic, instantly overwhelming it and degrading its latency spike for the next batch of requests.
C. Resource-Based (Adaptive Tuning)
The load balancer relies on specialized lightweight software agents running on each backend machine.
These agents constantly broadcast real-time metrics (like CPU usage, memory limits, and disk I/O saturation) back to the load balancer.
When to Use It
Ideal for intensive, non-uniform application tiers like AI model inference gateways or heavy media rendering pipelines.
4. Modern Microservices Architecture: Connection Draining
When choosing and managing your routing pool policies during dynamic infrastructure changes (like an autoscaling event or rolling code update), you must utilize Connection Draining.
[Server 01] ──> Mark as Draining ──> Stop Sending New Traffic
│
└──> Allow In-Flight Requests to Finish Safely ──> Safe Shutdown
Instead of dropping an instance immediately and triggering 502 errors for active users, the load balancer stops routing new traffic to the instance but allows existing in-flight requests to complete safely before the node shuts down completely.
Summary
Static algorithms like Round Robin and Weighted Round Robin are highly efficient and predictable, but they are completely blind to sudden traffic complexities or server degradation events.
Dynamic strategies like Least Connections and Least Response Time adjust intelligently to uneven workloads, but they require tracking states and can trigger uneven herding patterns if misconfigured.
For simple, uniform REST APIs, Round Robin or Least Connections are standard defaults.
For long-lived connection types like WebSockets or gRPC, dynamic tracking metrics are mandatory.