DBMS Advanced Interview Questions and Answers 2026

Last updated: Jan 16, 2026

Author :

ExamAdda Team

1. What is Two-Phase Locking (2PL)?

Two-Phase Locking (2PL) is a concurrency control protocol that ensures serializability by dividing transaction execution into two phases.

Phases

1. Growing Phase

Transaction acquires locks
Cannot release any lock

2. Shrinking Phase

Transaction releases locks
Cannot acquire new locks

Key Rule

Once a lock is released → no new locks can be acquired

Types of 2PL

Strict 2PL → Release locks only after commit
Rigorous 2PL → Hold all locks until commit

2. How Does 2PL Prevent Concurrency Issues?

Goal

Ensure serializable execution of transactions.

Prevents

Dirty reads
Non-repeatable reads
Lost updates

How It Works

Locks ensure exclusive access to data
Enforces order of operations

Limitation

Can lead to deadlocks

3. What is Multiversion Concurrency Control (MVCC)?

MVCC allows multiple versions of data so that readers do not block writers and writers do not block readers.

Key Idea

Maintain multiple versions of rows
Each transaction sees a consistent snapshot

Benefits

High concurrency
Reduced locking

Example

Transaction reads old version while another writes new version

4. Difference Between MVCC and Locking-Based Concurrency Control

Feature	MVCC	Locking-Based Control
Blocking	Minimal	High
Performance	Better for reads	Slower under contention
Complexity	Higher	Simpler
Data Versions	Multiple versions	Single version

Key Insight

MVCC → Optimized for read-heavy systems
Locking → Simpler but may block transactions

5. What is Write-Ahead Logging (WAL)?

WAL is a logging technique where changes are written to a log before being applied to the database.

Key Principle

Log first → then write to disk

Why Important

Ensures durability and recoverability

Components

Before image (old value)
After image (new value)

Key Rule

No data modification without corresponding log entry

6. What is ARIES Recovery Algorithm?

ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is a recovery algorithm used in DBMS for crash recovery.

Phases

1. Analysis Phase

Identify active transactions

2. Redo Phase

Reapply committed changes

3. Undo Phase

Rollback uncommitted transactions

Key Features

Uses WAL
Supports fine-grained recovery

7. Difference Between B-Tree and B+ Tree

Feature	B-Tree	B+ Tree
Data Storage	Internal + leaf nodes	Only leaf nodes
Search Efficiency	Slightly slower	Faster
Range Queries	Less efficient	Highly efficient
Structure	Complex	Simpler

8. Why is B+ Tree Preferred in Databases?

Reasons

1. Efficient Range Queries

Leaf nodes are linked

2. Better Disk I/O

More keys per node → fewer disk accesses

3. Faster Search

Uniform access path

4. Sequential Access

Ideal for ordered data retrieval

Conclusion

Optimized for database storage and indexing

9. What is Hashing in DBMS?

Hashing is a technique used to map data to a fixed location using a hash function.

Key Idea

Key → Hash function → Bucket

Purpose

Fast data retrieval

Example

ID → hash(ID) → storage location

10. What are Static and Dynamic Hashing?

1. Static Hashing

Definition

Fixed number of buckets

Characteristics

Simple
Performance degrades when data grows

Problem

Overflow handling required

2. Dynamic Hashing

Definition

Number of buckets can grow or shrink dynamically

Types

Extendible hashing
Linear hashing

Advantages

Better scalability
Reduced overflow

Comparison

Feature	Static Hashing	Dynamic Hashing
Bucket Size	Fixed	Dynamic
Scalability	Poor	High
Performance	Degrades over time	Stable

11. What is Extendible Hashing?

Extendible hashing is a dynamic hashing technique that grows and shrinks dynamically using a directory structure.

Key Components

Directory (in memory)
Buckets (on disk)
Hash function

Concept

Uses prefix bits of hash value
Directory size doubles when needed

How It Works

Compute hash(key)
Use prefix bits to locate bucket
If bucket overflows → split bucket
If needed → double directory

Advantages

No overflow chains
Efficient lookup

12. What is Linear Hashing?

Linear hashing is a dynamic hashing technique that splits buckets gradually over time.

Key Idea

No directory required
Buckets split in linear order

How It Works

Start with fixed buckets
When overflow occurs → split next bucket
Hash function adjusts gradually

Advantages

Simple implementation
Smooth growth

Comparison with Extendible Hashing

Feature	Extendible Hashing	Linear Hashing
Directory	Yes	No
Growth	Sudden	Gradual

13. What is the CAP Theorem?

The CAP theorem states that a distributed system can provide only two out of three guarantees:

C – Consistency

All nodes see the same data

A – Availability

System always responds

P – Partition Tolerance

System continues despite network failures

Key Rule

Cannot achieve all three simultaneously

14. How Does CAP Theorem Affect Database Design?

Design Trade-offs

1. CP Systems

Consistency + Partition tolerance
Example: Banking systems

2. AP Systems

Availability + Partition tolerance
Example: Social media

3. CA Systems

Consistency + Availability (no partitions assumed)

Impact

Choose based on use case:
- Financial systems → Consistency
- Web apps → Availability

15. What is BASE vs ACID?

ACID (Traditional DBMS)

Atomicity
Consistency
Isolation
Durability

BASE (Distributed Systems)

Definition

BASE stands for:

Basically Available
Soft State
Eventually Consistent

Comparison

Feature	ACID	BASE
Consistency	Strong	Eventual
Availability	Lower	High
Use Case	Banking	Distributed systems

Key Insight

ACID → Strong correctness
BASE → High scalability

16. What are Distributed Databases?

A distributed database is a database where data is stored across multiple physical locations.

Key Features

Data distributed across nodes
Appears as a single system

Advantages

Scalability
Fault tolerance
Performance

17. Types of Distributed Database Architectures

1. Client-Server Architecture

Clients request data from servers

2. Peer-to-Peer (P2P)

All nodes are equal

3. Federated Database

Multiple independent databases

4. Shared-Nothing Architecture

Each node has its own resources

18. How Do Distributed Databases Handle Consistency?

Techniques

1. Replication

Keep copies synchronized

2. Consensus Algorithms

Paxos, Raft

3. Quorum-Based Systems

Majority agreement

4. Eventual Consistency

Data becomes consistent over time

Key Insight

Trade-off between consistency and availability

19. What is Two-Phase Commit (2PC)?

2PC is a protocol to ensure all nodes either commit or abort a transaction together.

Phases

1. Prepare Phase

Coordinator asks participants: “Can you commit?”

2. Commit Phase

If all agree → commit
Else → rollback

Advantages

Ensures atomicity

Disadvantages

Blocking protocol
Slow in distributed systems

20. What is Three-Phase Commit (3PC)?

Definition

3PC is an improved version of 2PC that reduces blocking.

Phases

1. CanCommit Phase

Ask if participants are ready

2. PreCommit Phase

Prepare to commit

3. Commit Phase

Final commit

Advantages

Non-blocking
Better fault tolerance

Disadvantages

More complex
Higher overhead

21. What is Database Mirroring?

Database mirroring is a technique where a real-time copy of a database is maintained on another server (mirror server).

Key Components

Principal Server → Main database
Mirror Server → Exact copy
(Optional) Witness Server → Enables automatic failover

Working

Changes on the principal are immediately applied to the mirror

Modes

High Safety (Synchronous) → No data loss
High Performance (Asynchronous) → Faster but possible data loss

Use Case

High availability systems requiring quick failover

22. Difference Between Mirroring and Replication

Feature	Mirroring	Replication
Purpose	High availability	Data distribution
Data Sync	Real-time	Can be delayed
Number of Copies	Usually one mirror	Multiple replicas
Read Access	Limited	Available on replicas

Key Insight

Mirroring → Backup system
Replication → Scaling and distribution

23. What is High Availability in DBMS?

Definition

High availability (HA) ensures that the database is accessible with minimal downtime.

Goal

Near 100% uptime

Techniques

Replication
Clustering
Failover systems
Load balancing

Example

If one server fails → another takes over instantly

24. What is Fault Tolerance?

Fault tolerance is the ability of a system to continue operating even when failures occur.

Key Idea

System does not stop working

Methods

Redundant hardware
Data replication
Backup systems

Difference from HA

HA → minimize downtime
Fault tolerance → no interruption

25. What is Disaster Recovery?

Disaster recovery (DR) is the process of restoring data and systems after catastrophic failure.

Examples of Disasters

Data center failure
Cyber attack
Natural disaster

Key Metrics

RPO (Recovery Point Objective) → Data loss tolerance
RTO (Recovery Time Objective) → Recovery time

Techniques

Backups
Replication
Offsite storage

26. Types of Database Backups

1. Full Backup

Entire database

2. Incremental Backup

Changes since last backup

3. Differential Backup

Changes since last full backup

4. Transaction Log Backup

Captures transaction logs

27. What is Full Backup?

A full backup is a complete copy of the entire database.

Advantages

Simple recovery
No dependency on other backups

Disadvantages

Time-consuming
Requires large storage

Use Case

Periodic backup (e.g., weekly)

28. What is Incremental Backup?

An incremental backup stores only the changes made since the last backup (full or incremental).

Advantages

Faster
Less storage

Disadvantages

Recovery is complex (needs all increments)

Example

Day 1 → Full backup
Day 2 → Incremental (changes from Day 1)
Day 3 → Incremental (changes from Day 2)

29. What is Differential Backup?

A differential backup stores all changes made since the last full backup.

Advantages

Faster recovery than incremental
Simpler restoration

Disadvantages

Larger than incremental backups

Example

Day 1 → Full backup
Day 2 → Changes since Day 1
Day 3 → All changes since Day 1

30. What is Online vs Offline Backup?

Online Backup (Hot Backup)

Definition

Backup taken while the database is running and accessible.

Key Features

No downtime
Supports continuous operations

Advantages

High availability
Suitable for production systems

Offline Backup (Cold Backup)

Definition

Backup taken when the database is shut down.

Key Features

Database unavailable during backup

Advantages

Simpler
Consistent backup

Comparison

Feature	Online Backup	Offline Backup
Availability	Available	Not available
Complexity	High	Low
Use Case	Production systems	Maintenance windows

31. What is a Query Optimizer?

A query optimizer is a component of the DBMS that determines the most efficient way to execute a query.

Goal

Minimize execution time
Reduce resource usage

Types

Rule-based optimizer
Cost-based optimizer

32. How Does a Cost-Based Optimizer Work?

Definition

A cost-based optimizer (CBO) evaluates multiple query execution plans and selects the one with the lowest estimated cost.

Steps

1. Generate Plans

Different join orders, access paths

2. Estimate Cost

Based on:
- Disk I/O
- CPU usage
- Memory

3. Choose Best Plan

Minimum cost execution plan selected

Example

Index scan vs full table scan → choose cheaper option

33. What are Join Algorithms?

Join algorithms are methods used to combine rows from multiple tables efficiently.

Types

Nested Loop Join
Hash Join
Merge Join

34. What is Nested Loop Join?

A nested loop join compares each row of one table with every row of another.

Working

Advantages

Simple
Works for all join conditions

Disadvantages

Slow for large datasets

35. What is Hash Join?

A hash join uses a hash table to match rows between two tables.

Working

Build hash table on smaller table
Probe with larger table

Advantages

Efficient for large datasets
Works well for equality joins

Disadvantages

Requires memory
Not suitable for non-equality joins

36. What is Merge Join?

A merge join combines sorted tables by scanning them simultaneously.

Working

Both tables must be sorted
Merge step compares rows sequentially

Advantages

Very efficient for sorted data

Disadvantages

Requires sorting overhead

37. Difference Between Hash Join and Merge Join

Feature	Hash Join	Merge Join
Requirement	No sorting needed	Requires sorted data
Best For	Equality joins	Large sorted datasets
Memory	High	Moderate
Speed	Fast	Very fast if sorted

38. What is Full-Text Indexing?

Full-text indexing is a technique for searching text data efficiently.

Key Features

Supports keyword search
Handles large text fields

Capabilities

Phrase search
Ranking results
Language processing

Example

Search:

39. What is Database Tuning?

Database tuning is the process of optimizing database performance.

Goals

Improve query speed
Reduce resource usage

Techniques

1. Index Optimization

Add/remove indexes

2. Query Optimization

Rewrite inefficient queries

3. Schema Design

Normalize/denormalize

4. Hardware Tuning

Memory, CPU, storage

5. Configuration Tuning

Buffer size, cache

Key Insight

Continuous process based on workload

40. What is Connection Pooling?

Connection pooling is a technique where a pool of reusable database connections is maintained instead of creating a new connection for every request.

How It Works

Application requests a connection
Pool provides an existing connection
After use → connection is returned to pool

Key Idea

Reuse connections instead of creating/destroying repeatedly

41. Why is Connection Pooling Important?

1. Performance Improvement

Avoids overhead of creating connections

2. Resource Optimization

Limits number of active connections

3. Faster Response Time

Immediate connection availability

4. Scalability

Handles large number of users efficiently

Example

Web apps handling thousands of requests per second

42. What is Optimistic Locking?

Optimistic locking assumes conflicts are rare and checks for conflicts only at commit time.

How It Works

Read data
Modify data
Before commit → check if data changed

Implementation

Version number / timestamp

Advantages

No locking overhead
High concurrency

Disadvantages

Retry required on conflict

43. What is Pessimistic Locking?

Pessimistic locking assumes conflicts are likely, so it locks data before access.

How It Works

Lock row before reading/writing
Prevent others from accessing

Advantages

Prevents conflicts completely

Disadvantages

Reduced concurrency
Possible deadlocks

44. Difference Between Optimistic and Pessimistic Locking

Feature	Optimistic Locking	Pessimistic Locking
Assumption	Conflicts rare	Conflicts common
Locking	No upfront lock	Locks before access
Performance	High	Lower
Use Case	Read-heavy systems	Write-heavy systems

45. What is Polyglot Persistence?

Polyglot persistence is the practice of using multiple database technologies within a single application.

Why Use It?

Different databases for different needs

Example

SQL → transactions
NoSQL → caching/logging

Benefits

Flexibility
Better performance

46. What are NoSQL Databases?

NoSQL databases are non-relational databases designed for scalability and flexibility.

Key Features

Schema-less
Distributed
High performance

Use Cases

Big data
Real-time applications

47. Types of NoSQL Databases

1. Key-Value Stores

Simple key-value pairs
Example: Redis

2. Document Databases

JSON-like documents
Example: MongoDB

3. Column-Family Databases

Column-based storage
Example: Cassandra

4. Graph Databases

Focus on relationships
Example: Neo4j

48. Difference Between SQL and NoSQL Databases

Feature	SQL	NoSQL
Structure	Relational (tables)	Non-relational
Schema	Fixed	Flexible
Scalability	Vertical	Horizontal
Consistency	Strong (ACID)	Eventual (BASE)

49. What is NewSQL?

NewSQL databases combine ACID guarantees of SQL with scalability of NoSQL.

Goal

High performance + strong consistency

Key Features

Distributed architecture
SQL support
High throughput

Examples

Google Spanner
CockroachDB

50. What are In-Memory Databases?

In-memory databases (IMDBs) store data primarily in RAM instead of disk, enabling extremely fast access.

Key Characteristics

Data resides in memory
Minimal disk I/O
Ultra-low latency

Examples

Redis, SAP HANA

Advantages

Very high performance
Real-time processing

Disadvantages

Limited by RAM size
Higher cost

51. How Do Databases Handle Very Large Datasets?

Techniques

1. Partitioning

Split data into smaller chunks

2. Sharding

Distribute across multiple servers

3. Indexing

Speed up data retrieval

4. Compression

Reduce storage usage

5. Distributed Systems

Scale horizontally

6. Data Archiving

Move old data to cheaper storage

52. What is Horizontal Scaling?

Horizontal scaling (scale-out) means adding more machines to handle increased load.

Example

Add more database servers

Advantages

High scalability
Fault tolerance

53. What is Vertical Scaling?

Vertical scaling (scale-up) means increasing resources of a single machine.

Example

Add more CPU, RAM

Advantages

Simpler implementation

Limitations

Hardware limits
Expensive

54. What are the Challenges of Big Data with RDBMS?

1. Scalability Issues

Difficult to scale horizontally

2. Rigid Schema

Not suitable for unstructured data

3. Performance Bottlenecks

Joins on large datasets are slow

4. Cost

High infrastructure cost

5. Limited Flexibility

Hard to adapt to changing data

55. How is Database Security Implemented?

Key Techniques

1. Authentication

Verify user identity

2. Authorization

Control access permissions

3. Encryption

Protect data

4. Auditing

Track user actions

5. Firewalls

Prevent unauthorized access

56. What is Data Encryption at Rest and in Transit?

Encryption at Rest

Definition

Data is encrypted when stored on disk

Purpose

Protect against data theft

Encryption in Transit

Definition

Data is encrypted during transfer

Example

HTTPS, SSL/TLS

Key Insight

At rest → storage protection
In transit → communication protection

57. What is Data Masking?

Definition

Data masking hides sensitive data by replacing it with fake or anonymized values.

Types

Static masking
Dynamic masking

Example

Credit card: 1234-XXXX-XXXX-5678

Use Case

Testing environments
Privacy protection

58. How Does DBMS Support Multi-Tenancy?

Multi-tenancy allows multiple users (tenants) to share the same database system.

Models

1. Shared Database, Shared Schema

All tenants share tables

2. Shared Database, Separate Schema

Each tenant has separate schema

3. Separate Database

Each tenant has its own database

Benefits

Cost efficiency
Scalability

59. What is GDPR and Its Impact on Database Design?

GDPR (General Data Protection Regulation) is a data privacy law in the European Union.

Key Principles

Data minimization
Consent
Right to be forgotten
Data portability

Impact on DB Design

1. Data Encryption

Mandatory for sensitive data

2. Data Access Control

Strict permissions

3. Audit Logs

Track data usage

4. Data Deletion

Support user data removal

60. What are Emerging Trends in Database Technologies?

1. Cloud Databases

Managed services (AWS RDS, Azure SQL)

2. Serverless Databases

Auto-scaling, no infrastructure management

3. NewSQL

SQL + scalability

4. AI-Driven Databases

Self-optimizing systems

5. Multi-Model Databases

Support multiple data models

6. Edge Databases

Data processing near source

7. Blockchain Databases

Immutable, decentralized storage