1. What is Two-Phase Locking (2PL)?

Two-Phase Locking (2PL) is a concurrency control protocol that ensures serializability by dividing transaction execution into two phases.

Phases

1. Growing Phase
  • Transaction acquires locks
  • Cannot release any lock

2. Shrinking Phase

  • Transaction releases locks
  • Cannot acquire new locks

Key Rule

  • Once a lock is released → no new locks can be acquired

Types of 2PL

  • Strict 2PL → Release locks only after commit
  • Rigorous 2PL → Hold all locks until commit

2. How Does 2PL Prevent Concurrency Issues?

Goal

Ensure serializable execution of transactions.

Prevents

  • Dirty reads
  • Non-repeatable reads
  • Lost updates

How It Works

  • Locks ensure exclusive access to data
  • Enforces order of operations

Limitation

  • Can lead to deadlocks

3. What is Multiversion Concurrency Control (MVCC)?

MVCC allows multiple versions of data so that readers do not block writers and writers do not block readers.

Key Idea

  • Maintain multiple versions of rows
  • Each transaction sees a consistent snapshot

Benefits

  • High concurrency
  • Reduced locking

Example

  • Transaction reads old version while another writes new version

4. Difference Between MVCC and Locking-Based Concurrency Control

FeatureMVCCLocking-Based Control
BlockingMinimalHigh
PerformanceBetter for readsSlower under contention
ComplexityHigherSimpler
Data VersionsMultiple versionsSingle version

Key Insight

  • MVCC → Optimized for read-heavy systems
  • Locking → Simpler but may block transactions

5. What is Write-Ahead Logging (WAL)?

WAL is a logging technique where changes are written to a log before being applied to the database.

Key Principle

  • Log first → then write to disk

Why Important

  • Ensures durability and recoverability

Components

  • Before image (old value)
  • After image (new value)

Key Rule

  • No data modification without corresponding log entry

6. What is ARIES Recovery Algorithm?

ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is a recovery algorithm used in DBMS for crash recovery.

Phases

1. Analysis Phase

  • Identify active transactions

2. Redo Phase

  • Reapply committed changes

3. Undo Phase

  • Rollback uncommitted transactions

Key Features

  • Uses WAL
  • Supports fine-grained recovery

7. Difference Between B-Tree and B+ Tree

FeatureB-TreeB+ Tree
Data StorageInternal + leaf nodesOnly leaf nodes
Search EfficiencySlightly slowerFaster
Range QueriesLess efficientHighly efficient
StructureComplexSimpler

8. Why is B+ Tree Preferred in Databases?

Reasons

1. Efficient Range Queries

  • Leaf nodes are linked

2. Better Disk I/O

  • More keys per node → fewer disk accesses

3. Faster Search

  • Uniform access path

4. Sequential Access

  • Ideal for ordered data retrieval

Conclusion

  • Optimized for database storage and indexing

9. What is Hashing in DBMS?

Hashing is a technique used to map data to a fixed location using a hash function.

Key Idea

  • Key → Hash function → Bucket

Purpose

  • Fast data retrieval

Example

  • ID → hash(ID) → storage location

10. What are Static and Dynamic Hashing?

1. Static Hashing

Definition

  • Fixed number of buckets

Characteristics

  • Simple
  • Performance degrades when data grows

Problem

  • Overflow handling required

2. Dynamic Hashing

Definition

  • Number of buckets can grow or shrink dynamically

Types

  • Extendible hashing
  • Linear hashing

Advantages

  • Better scalability
  • Reduced overflow

Comparison

FeatureStatic HashingDynamic Hashing
Bucket SizeFixedDynamic
ScalabilityPoorHigh
PerformanceDegrades over timeStable

11. What is Extendible Hashing?

Extendible hashing is a dynamic hashing technique that grows and shrinks dynamically using a directory structure.

Key Components

  • Directory (in memory)
  • Buckets (on disk)
  • Hash function

Concept

  • Uses prefix bits of hash value
  • Directory size doubles when needed

How It Works

  1. Compute hash(key)
  2. Use prefix bits to locate bucket
  3. If bucket overflows → split bucket
  4. If needed → double directory

Advantages

  • No overflow chains
  • Efficient lookup

12. What is Linear Hashing?

Linear hashing is a dynamic hashing technique that splits buckets gradually over time.

Key Idea

  • No directory required
  • Buckets split in linear order

How It Works

  1. Start with fixed buckets
  2. When overflow occurs → split next bucket
  3. Hash function adjusts gradually

Advantages

  • Simple implementation
  • Smooth growth

Comparison with Extendible Hashing

FeatureExtendible HashingLinear Hashing
DirectoryYesNo
GrowthSuddenGradual

13. What is the CAP Theorem?

The CAP theorem states that a distributed system can provide only two out of three guarantees:

C – Consistency

  • All nodes see the same data

A – Availability

  • System always responds

P – Partition Tolerance

  • System continues despite network failures

Key Rule

  • Cannot achieve all three simultaneously

14. How Does CAP Theorem Affect Database Design?

Design Trade-offs

1. CP Systems

  • Consistency + Partition tolerance
  • Example: Banking systems

2. AP Systems

  • Availability + Partition tolerance
  • Example: Social media

3. CA Systems

  • Consistency + Availability (no partitions assumed)

Impact

  • Choose based on use case:
    • Financial systems → Consistency
    • Web apps → Availability

15. What is BASE vs ACID?

ACID (Traditional DBMS)

  • Atomicity
  • Consistency
  • Isolation
  • Durability

BASE (Distributed Systems)

Definition

BASE stands for:

  • Basically Available
  • Soft State
  • Eventually Consistent

Comparison

FeatureACIDBASE
ConsistencyStrongEventual
AvailabilityLowerHigh
Use CaseBankingDistributed systems

Key Insight

  • ACID → Strong correctness
  • BASE → High scalability

16. What are Distributed Databases?

A distributed database is a database where data is stored across multiple physical locations.

Key Features

  • Data distributed across nodes
  • Appears as a single system

Advantages

  • Scalability
  • Fault tolerance
  • Performance

17. Types of Distributed Database Architectures

1. Client-Server Architecture

  • Clients request data from servers

2. Peer-to-Peer (P2P)

  • All nodes are equal

3. Federated Database

  • Multiple independent databases

4. Shared-Nothing Architecture

  • Each node has its own resources

18. How Do Distributed Databases Handle Consistency?

Techniques

1. Replication

  • Keep copies synchronized

2. Consensus Algorithms

  • Paxos, Raft

3. Quorum-Based Systems

  • Majority agreement

4. Eventual Consistency

  • Data becomes consistent over time

Key Insight

  • Trade-off between consistency and availability

19. What is Two-Phase Commit (2PC)?

2PC is a protocol to ensure all nodes either commit or abort a transaction together.

Phases

1. Prepare Phase

  • Coordinator asks participants: “Can you commit?”

2. Commit Phase

  • If all agree → commit
  • Else → rollback

Advantages

  • Ensures atomicity

Disadvantages

  • Blocking protocol
  • Slow in distributed systems

20. What is Three-Phase Commit (3PC)?

Definition

3PC is an improved version of 2PC that reduces blocking.

Phases

1. CanCommit Phase

  • Ask if participants are ready

2. PreCommit Phase

  • Prepare to commit

3. Commit Phase

  • Final commit

Advantages

  • Non-blocking
  • Better fault tolerance

Disadvantages

  • More complex
  • Higher overhead 

21. What is Database Mirroring?

Database mirroring is a technique where a real-time copy of a database is maintained on another server (mirror server).

Key Components

  • Principal Server → Main database
  • Mirror Server → Exact copy
  • (Optional) Witness Server → Enables automatic failover

Working

  • Changes on the principal are immediately applied to the mirror

Modes

  • High Safety (Synchronous) → No data loss
  • High Performance (Asynchronous) → Faster but possible data loss

Use Case

  • High availability systems requiring quick failover

22. Difference Between Mirroring and Replication

FeatureMirroringReplication
PurposeHigh availabilityData distribution
Data SyncReal-timeCan be delayed
Number of CopiesUsually one mirrorMultiple replicas
Read AccessLimitedAvailable on replicas

Key Insight

  • Mirroring → Backup system
  • Replication → Scaling and distribution

23. What is High Availability in DBMS?

Definition

High availability (HA) ensures that the database is accessible with minimal downtime.

Goal

  • Near 100% uptime

Techniques

  • Replication
  • Clustering
  • Failover systems
  • Load balancing

Example

  • If one server fails → another takes over instantly

24. What is Fault Tolerance?

Fault tolerance is the ability of a system to continue operating even when failures occur.

Key Idea

  • System does not stop working

Methods

  • Redundant hardware
  • Data replication
  • Backup systems

Difference from HA

  • HA → minimize downtime
  • Fault tolerance → no interruption

25. What is Disaster Recovery?

Disaster recovery (DR) is the process of restoring data and systems after catastrophic failure.

Examples of Disasters

  • Data center failure
  • Cyber attack
  • Natural disaster

Key Metrics

  • RPO (Recovery Point Objective) → Data loss tolerance
  • RTO (Recovery Time Objective) → Recovery time

Techniques

  • Backups
  • Replication
  • Offsite storage

26. Types of Database Backups

1. Full Backup

  • Entire database

2. Incremental Backup

  • Changes since last backup

3. Differential Backup

  • Changes since last full backup

4. Transaction Log Backup

  • Captures transaction logs

27. What is Full Backup?

A full backup is a complete copy of the entire database.

Advantages

  • Simple recovery
  • No dependency on other backups

Disadvantages

  • Time-consuming
  • Requires large storage

Use Case

  • Periodic backup (e.g., weekly)

28. What is Incremental Backup?

An incremental backup stores only the changes made since the last backup (full or incremental).

Advantages

  • Faster
  • Less storage

Disadvantages

  • Recovery is complex (needs all increments)

Example

  • Day 1 → Full backup
  • Day 2 → Incremental (changes from Day 1)
  • Day 3 → Incremental (changes from Day 2)

29. What is Differential Backup?

A differential backup stores all changes made since the last full backup.

Advantages

  • Faster recovery than incremental
  • Simpler restoration

Disadvantages

  • Larger than incremental backups

Example

  • Day 1 → Full backup
  • Day 2 → Changes since Day 1
  • Day 3 → All changes since Day 1 

30. What is Online vs Offline Backup?

Online Backup (Hot Backup)

Definition

Backup taken while the database is running and accessible.

Key Features

  • No downtime
  • Supports continuous operations

Advantages

  • High availability
  • Suitable for production systems

Offline Backup (Cold Backup)

Definition

Backup taken when the database is shut down.

Key Features

  • Database unavailable during backup

Advantages

  • Simpler
  • Consistent backup

Comparison

FeatureOnline BackupOffline Backup
AvailabilityAvailableNot available
ComplexityHighLow
Use CaseProduction systemsMaintenance windows

31. What is a Query Optimizer?

A query optimizer is a component of the DBMS that determines the most efficient way to execute a query.

Goal

  • Minimize execution time
  • Reduce resource usage

Types

  • Rule-based optimizer
  • Cost-based optimizer

32. How Does a Cost-Based Optimizer Work?

Definition

A cost-based optimizer (CBO) evaluates multiple query execution plans and selects the one with the lowest estimated cost.

Steps

1. Generate Plans

  • Different join orders, access paths

2. Estimate Cost

  • Based on:
    • Disk I/O
    • CPU usage
    • Memory

3. Choose Best Plan

  • Minimum cost execution plan selected

Example

  • Index scan vs full table scan → choose cheaper option

33. What are Join Algorithms?

Join algorithms are methods used to combine rows from multiple tables efficiently.

Types

  • Nested Loop Join
  • Hash Join
  • Merge Join

34. What is Nested Loop Join?

A nested loop join compares each row of one table with every row of another.

Working

Advantages

  • Simple
  • Works for all join conditions

Disadvantages

  • Slow for large datasets

35. What is Hash Join?

A hash join uses a hash table to match rows between two tables.

Working

  1. Build hash table on smaller table
  2. Probe with larger table

Advantages

  • Efficient for large datasets
  • Works well for equality joins

Disadvantages

  • Requires memory
  • Not suitable for non-equality joins

36. What is Merge Join?

A merge join combines sorted tables by scanning them simultaneously.

Working

  • Both tables must be sorted
  • Merge step compares rows sequentially

Advantages

  • Very efficient for sorted data

Disadvantages

  • Requires sorting overhead

37. Difference Between Hash Join and Merge Join

FeatureHash JoinMerge Join
RequirementNo sorting neededRequires sorted data
Best ForEquality joinsLarge sorted datasets
MemoryHighModerate
SpeedFastVery fast if sorted

38. What is Full-Text Indexing?

Full-text indexing is a technique for searching text data efficiently.

Key Features

  • Supports keyword search
  • Handles large text fields

Capabilities

  • Phrase search
  • Ranking results
  • Language processing

Example

Search:

39. What is Database Tuning?

Database tuning is the process of optimizing database performance.

Goals

  • Improve query speed
  • Reduce resource usage

Techniques

1. Index Optimization

  • Add/remove indexes

2. Query Optimization

  • Rewrite inefficient queries

3. Schema Design

  • Normalize/denormalize

4. Hardware Tuning

  • Memory, CPU, storage

5. Configuration Tuning

  • Buffer size, cache

Key Insight

  • Continuous process based on workload

40. What is Connection Pooling?

Connection pooling is a technique where a pool of reusable database connections is maintained instead of creating a new connection for every request.

How It Works

  1. Application requests a connection
  2. Pool provides an existing connection
  3. After use → connection is returned to pool

Key Idea

  • Reuse connections instead of creating/destroying repeatedly

41. Why is Connection Pooling Important?

1. Performance Improvement

  • Avoids overhead of creating connections

2. Resource Optimization

  • Limits number of active connections

3. Faster Response Time

  • Immediate connection availability

4. Scalability

  • Handles large number of users efficiently

Example

Web apps handling thousands of requests per second

42. What is Optimistic Locking?

Optimistic locking assumes conflicts are rare and checks for conflicts only at commit time.

How It Works

  • Read data
  • Modify data
  • Before commit → check if data changed

Implementation

  • Version number / timestamp

Advantages

  • No locking overhead
  • High concurrency

Disadvantages

  • Retry required on conflict

43. What is Pessimistic Locking?

Pessimistic locking assumes conflicts are likely, so it locks data before access.

How It Works

  • Lock row before reading/writing
  • Prevent others from accessing

Advantages

  • Prevents conflicts completely

Disadvantages

  • Reduced concurrency
  • Possible deadlocks

44. Difference Between Optimistic and Pessimistic Locking

FeatureOptimistic LockingPessimistic Locking
AssumptionConflicts rareConflicts common
LockingNo upfront lockLocks before access
PerformanceHighLower
Use CaseRead-heavy systemsWrite-heavy systems

45. What is Polyglot Persistence?

Polyglot persistence is the practice of using multiple database technologies within a single application.

Why Use It?

  • Different databases for different needs

Example

  • SQL → transactions
  • NoSQL → caching/logging

Benefits

  • Flexibility
  • Better performance

46. What are NoSQL Databases?

NoSQL databases are non-relational databases designed for scalability and flexibility.

Key Features

  • Schema-less
  • Distributed
  • High performance

Use Cases

  • Big data
  • Real-time applications

47. Types of NoSQL Databases

1. Key-Value Stores

  • Simple key-value pairs
  • Example: Redis

2. Document Databases

  • JSON-like documents
  • Example: MongoDB

3. Column-Family Databases

  • Column-based storage
  • Example: Cassandra

4. Graph Databases

  • Focus on relationships
  • Example: Neo4j

48. Difference Between SQL and NoSQL Databases

FeatureSQLNoSQL
StructureRelational (tables)Non-relational
SchemaFixedFlexible
ScalabilityVerticalHorizontal
ConsistencyStrong (ACID)Eventual (BASE)

49. What is NewSQL?

NewSQL databases combine ACID guarantees of SQL with scalability of NoSQL.

Goal

  • High performance + strong consistency

Key Features

  • Distributed architecture
  • SQL support
  • High throughput

Examples

  • Google Spanner
  • CockroachDB

50. What are In-Memory Databases?

In-memory databases (IMDBs) store data primarily in RAM instead of disk, enabling extremely fast access.

Key Characteristics

  • Data resides in memory
  • Minimal disk I/O
  • Ultra-low latency

Examples

  • Redis, SAP HANA

Advantages

  • Very high performance
  • Real-time processing

Disadvantages

  • Limited by RAM size
  • Higher cost

51. How Do Databases Handle Very Large Datasets?

Techniques

1. Partitioning

  • Split data into smaller chunks

2. Sharding

  • Distribute across multiple servers

3. Indexing

  • Speed up data retrieval

4. Compression

  • Reduce storage usage

5. Distributed Systems

  • Scale horizontally

6. Data Archiving

  • Move old data to cheaper storage

52. What is Horizontal Scaling?

Horizontal scaling (scale-out) means adding more machines to handle increased load.

Example

  • Add more database servers

Advantages

  • High scalability
  • Fault tolerance

53. What is Vertical Scaling?

Vertical scaling (scale-up) means increasing resources of a single machine.

Example

  • Add more CPU, RAM

Advantages

  • Simpler implementation

Limitations

  • Hardware limits
  • Expensive

54. What are the Challenges of Big Data with RDBMS?

1. Scalability Issues

  • Difficult to scale horizontally

2. Rigid Schema

  • Not suitable for unstructured data

3. Performance Bottlenecks

  • Joins on large datasets are slow

4. Cost

  • High infrastructure cost

5. Limited Flexibility

  • Hard to adapt to changing data

55. How is Database Security Implemented?

Key Techniques

1. Authentication

  • Verify user identity

2. Authorization

  • Control access permissions

3. Encryption

  • Protect data

4. Auditing

  • Track user actions

5. Firewalls

  • Prevent unauthorized access

56. What is Data Encryption at Rest and in Transit?

Encryption at Rest

Definition

  • Data is encrypted when stored on disk

Purpose

  • Protect against data theft

Encryption in Transit

Definition

  • Data is encrypted during transfer

Example

  • HTTPS, SSL/TLS

Key Insight

  • At rest → storage protection
  • In transit → communication protection

57. What is Data Masking?

Definition

Data masking hides sensitive data by replacing it with fake or anonymized values.

Types

  • Static masking
  • Dynamic masking

Example

  • Credit card: 1234-XXXX-XXXX-5678

Use Case

  • Testing environments
  • Privacy protection

58. How Does DBMS Support Multi-Tenancy?

Multi-tenancy allows multiple users (tenants) to share the same database system.

Models

1. Shared Database, Shared Schema

  • All tenants share tables

2. Shared Database, Separate Schema

  • Each tenant has separate schema

3. Separate Database

  • Each tenant has its own database

Benefits

  • Cost efficiency
  • Scalability

59. What is GDPR and Its Impact on Database Design?

GDPR (General Data Protection Regulation) is a data privacy law in the European Union.

Key Principles

  • Data minimization
  • Consent
  • Right to be forgotten
  • Data portability

Impact on DB Design

1. Data Encryption

  • Mandatory for sensitive data

2. Data Access Control

  • Strict permissions

3. Audit Logs

  • Track data usage

4. Data Deletion

  • Support user data removal

60. What are Emerging Trends in Database Technologies?

1. Cloud Databases

  • Managed services (AWS RDS, Azure SQL)

2. Serverless Databases

  • Auto-scaling, no infrastructure management

3. NewSQL

  • SQL + scalability

4. AI-Driven Databases

  • Self-optimizing systems

5. Multi-Model Databases

  • Support multiple data models

6. Edge Databases

  • Data processing near source

7. Blockchain Databases

  • Immutable, decentralized storage