In a distributed DBMS, the same data can be stored in multiple places. This is called replication, and it is used to improve availability, performance, and reliability of the system.
Instead of having exactly one copy of a table or fragment, the system keeps several copies at different sites, so the data can still be accessed even if one site fails.
What Is Replication?
Replication means creating multiple copies of the same data (a table, fragment, or set of records) and storing them on different sites.
Each copy is called a replica.
The DBMS must keep these replicas consistent when updates occur.
For example:
A
CUSTOMERtable has a replica at Mumbai and another at Delhi, so both sites can serve local customers quickly.
Types of Replication
1. Full Replication
Full replication means every fragment or table is copied to all sites.
Each site has a complete copy of the database (or a chosen part of it).
Advantages:
Very high availability: if one site fails, others can still serve all queries.
Fast local access: every site can answer queries without going to another site.
Disadvantages:
High storage cost (same data repeated many times).
Expensive update propagation: changing data must be applied to all replicas, which increases network traffic and complexity.
2. Partial Replication
Partial replication means only some fragments or tables are copied to some sites.
Not every site has every piece of data.
Only frequently accessed or critical data is replicated.
Advantages:
Balanced cost: less storage and fewer copies to update.
Can place strategic copies near heavy‑usage locations.
Disadvantages:
If a site that holds a required replica fails, that data may be unavailable unless there is another backup strategy.
Query routing becomes more complex (the system must know where each replica is).
Why Replication Matters in Distributed DBMS
High availability:
If one site fails, another site can still answer queries using its replica.
Better performance and locality:
Frequently used data can be replicated near the users, so queries execute faster and with less network traffic.
Fault tolerance and reliability:
Replication reduces the risk of total data loss and helps the system keep working even during partial failures.
Load balancing:
Read‑heavy workloads can be spread across replicas, so no single site is overloaded.
However, replication also introduces challenges:
Ensuring consistency when updates happen (all copies must reflect the same state).
Managing update conflicts and propagation delays.
Deciding which data to replicate and where to place replicas.
For beginners, think of replication as making extra copies of important data and putting them in different places. The system then uses these copies to keep the database running smoothly and to answer queries faster, even if something goes wrong at one location.
Summary
Replication in distributed DBMS is the practice of storing multiple copies of the same data at different sites. Full replication keeps copies of everything at every site, while partial replication keeps copies only for selected fragments. Replication improves availability, performance, and reliability, but it requires careful design to keep copies consistent and to manage the extra storage and network overhead.