When a web platform grows from a few thousand users to millions, a single backend server will quickly run out of CPU, memory, or network bandwidth and crash. To prevent this, architectures scale horizontally by deploying an array of identical application servers.

However, running a massive fleet of servers creates a new challenge: How do you distribute incoming user traffic evenly across them so that no single machine gets overloaded? The core architectural component that solves this problem is the Load Balancer (LB).

Key ideas:

  • Load Balancers act as single entry points to your infrastructure, distributing traffic across backend pools.

  • They operate primarily at either the Transport Layer (Layer 4) or the Application Layer (Layer 7) of the OSI model.

  • Routing decisions are governed by specific mathematical Algorithms (Static or Dynamic) backed by continuous server Health Checks.

1. Where do Load Balancers Sit?

Load balancers are highly flexible components deployed at multiple entry points across an infrastructure stack to eliminate single points of failure:

  • Client-to-Web Server: Sits at the public network edge, distributing incoming internet user requests across your public-facing web servers or API gateways.

  • Web Server-to-Application Server: Sits inside your private network, routing heavy business logic execution tasks from internal web workers to backend microservice clusters.

  • Application Server-to-Database: Routes database read queries across a distributed array of SQL read-replicas or NoSQL data nodes.

2. Layer 4 vs. Layer 7 Load Balancing

The most critical architectural distinction you must make in a system design interview is choosing between Layer 4 and Layer 7 load balancing.

A. Layer 4 Load Balancing (Transport Layer)

Layer 4 load balancers route traffic based strictly on network protocol information found at the Transport Layer (TCP/UDP).

  • The Mechanism: The LB inspects basic packet headers to read the source IP, destination IP, and target Port number. It does not open or read the actual content of the application message inside the packet.

  • Pros: Blazing fast performance. Because it skips parsing complex application payloads, it uses very little CPU and can handle millions of raw packet streams per second.

  • Cons: Completely blind to context. It cannot execute smart routing logic based on URL paths, cookies, or user headers.

B. Layer 7 Load Balancing (Application Layer)

Layer 7 load balancers route traffic based on rich application-level data found at the top tier of the OSI model (HTTP/HTTPS/gRPC).

  • The Mechanism: The LB terminates the incoming network connection, decrypts the TLS/SSL layer, and reads the actual HTTP request content. It inspects URL paths (e.g., /video vs /payments), cookies, authorization tokens, and user-agent strings to make intelligent routing decisions.

  • Pros: Highly intelligent and flexible. It enables smart features like path-based microservice routing, A/B testing splits, and sticky session routing.

  • Cons: High computational overhead. Decrypting data and reading headers consumes significantly more CPU and memory resources than Layer 4 routing.

3. Load Balancing Routing Algorithms

To decide exactly which server receives the next incoming request, load balancers rely on a specific set of rules. These rules are divided into Static (blind to server resource states) and Dynamic (aware of server resource states).

Static Routing Algorithms

  • Round Robin: Passes requests down the list of servers sequentially. Server 1 gets request one, Server 2 gets request two, and so on, looping back to the top when it reaches the end. It assumes all backend servers possess identical hardware power.

  • Weighted Round Robin: Assigns a manual performance weight to each machine based on its capacity. If Server A has a weight of 3 and Server B has a weight of 1, the load balancer will send 3 consecutive requests to Server A for every 1 request sent to Server B.

  • IP Hash: Computes a hash of the client's public IP address to calculate a server target index. This ensures that a specific user is consistently routed to the exact same backend server, which is useful for stateful applications tracking local in-memory sessions.

Dynamic Routing Algorithms

  • Least Connections: Routes the next incoming request to the server currently handling the absolute lowest number of active open connections. This is highly effective when requests vary widely in processing time.

  • Weighted Least Connections: Combines connection counting with hardware capacity weights, ensuring that larger servers take on a larger proportion of concurrent connections.

  • Least Response Time: Measures how quickly each backend server responds to recent requests alongside its active connection count. The LB routes fresh traffic to the fastest, least-loaded node, optimizing for low latency.

4. Resilience Mechanics: Health Checks and Failover

A load balancer is useless if it blindly forwards traffic to a server that has already crashed. To maintain high availability, load balancers utilize continuous Health Checks.

Load Balancer ──(Periodic Ping: HTTP GET /health)──> [Application Server]
  ├── If Status == 200 OK  ──> Keep Server in Active Routing Pool
  └── If Timeout / 500 ERR ──> Evict Server and Reroute Traffic Instantly

The load balancer periodically pings each backend instance over a configured interval (e.g., sending an HTTP GET /health request or a TCP connect ping every 5 seconds). If a server fails a specific number of consecutive checks, the load balancer labels it unhealthy, removes it from the active rotation pool, and safely redistributes incoming user traffic to the remaining healthy servers without user disruption.

Redundancy at the Load Balancer Layer (Clustered LBs)

Because all application traffic passes through the load balancer, the load balancer itself represents a Single Point of Failure (SPOF). If your load balancer crashes, your entire platform drops offline.

To eliminate this SPOF, architects deploy load balancers in a High-Availability Cluster using an Active-Passive topology:

  • The Active LB: Handles 100% of the live incoming traffic stream.

  • The Passive LB: Sits idle, constantly listening to the active node via a private network link called a Heartbeat connection.

  • The Failover: If the Active LB stops broadcasting its heartbeat signal, the Passive LB detects the outage instantly, adopts the cluster's shared Virtual IP address (Floating IP), and seamlessly takes over the incoming traffic stream within milliseconds.

Summary

  • Load Balancers eliminate single points of failure and scale systems out horizontally by distributing incoming request volumes across server arrays.

  • Layer 4 options route traffic blindly at packet-speed using IP and Port markers, while Layer 7 options inspect application-level content to enable intelligent routing pathways.

  • Algorithms balance traffic using static cycles (Round Robin) or resource-aware tracking rules (Least Connections).

  • Automated Health Checks protect availability by evicting malfunctioning nodes, while Active-Passive clusters protect the load balancing tier itself from becoming a single point of failure.