1. What is Two-Phase Locking (2PL)?
Two-Phase Locking (2PL) is a concurrency control protocol that ensures serializability by dividing transaction execution into two phases.
Phases
- Transaction acquires locks
- Cannot release any lock
2. Shrinking Phase
- Transaction releases locks
- Cannot acquire new locks
Key Rule
- Once a lock is released → no new locks can be acquired
Types of 2PL
- Strict 2PL → Release locks only after commit
- Rigorous 2PL → Hold all locks until commit
2. How Does 2PL Prevent Concurrency Issues?
Goal
Ensure serializable execution of transactions.
Prevents
- Dirty reads
- Non-repeatable reads
- Lost updates
How It Works
- Locks ensure exclusive access to data
- Enforces order of operations
Limitation
- Can lead to deadlocks
3. What is Multiversion Concurrency Control (MVCC)?
MVCC allows multiple versions of data so that readers do not block writers and writers do not block readers.
Key Idea
- Maintain multiple versions of rows
- Each transaction sees a consistent snapshot
Benefits
- High concurrency
- Reduced locking
Example
- Transaction reads old version while another writes new version
4. Difference Between MVCC and Locking-Based Concurrency Control
| Feature | MVCC | Locking-Based Control |
|---|---|---|
| Blocking | Minimal | High |
| Performance | Better for reads | Slower under contention |
| Complexity | Higher | Simpler |
| Data Versions | Multiple versions | Single version |
Key Insight
- MVCC → Optimized for read-heavy systems
- Locking → Simpler but may block transactions
5. What is Write-Ahead Logging (WAL)?
WAL is a logging technique where changes are written to a log before being applied to the database.
Key Principle
- Log first → then write to disk
Why Important
- Ensures durability and recoverability
Components
- Before image (old value)
- After image (new value)
Key Rule
- No data modification without corresponding log entry
6. What is ARIES Recovery Algorithm?
ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is a recovery algorithm used in DBMS for crash recovery.
Phases
1. Analysis Phase
- Identify active transactions
2. Redo Phase
- Reapply committed changes
3. Undo Phase
- Rollback uncommitted transactions
Key Features
- Uses WAL
- Supports fine-grained recovery
7. Difference Between B-Tree and B+ Tree
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data Storage | Internal + leaf nodes | Only leaf nodes |
| Search Efficiency | Slightly slower | Faster |
| Range Queries | Less efficient | Highly efficient |
| Structure | Complex | Simpler |
8. Why is B+ Tree Preferred in Databases?
Reasons
1. Efficient Range Queries
- Leaf nodes are linked
2. Better Disk I/O
- More keys per node → fewer disk accesses
3. Faster Search
- Uniform access path
4. Sequential Access
- Ideal for ordered data retrieval
Conclusion
- Optimized for database storage and indexing
9. What is Hashing in DBMS?
Hashing is a technique used to map data to a fixed location using a hash function.
Key Idea
- Key → Hash function → Bucket
Purpose
- Fast data retrieval
Example
- ID → hash(ID) → storage location
10. What are Static and Dynamic Hashing?
1. Static Hashing
Definition
- Fixed number of buckets
Characteristics
- Simple
- Performance degrades when data grows
Problem
- Overflow handling required
2. Dynamic Hashing
Definition
- Number of buckets can grow or shrink dynamically
Types
- Extendible hashing
- Linear hashing
Advantages
- Better scalability
- Reduced overflow
Comparison
| Feature | Static Hashing | Dynamic Hashing |
|---|---|---|
| Bucket Size | Fixed | Dynamic |
| Scalability | Poor | High |
| Performance | Degrades over time | Stable |
11. What is Extendible Hashing?
Extendible hashing is a dynamic hashing technique that grows and shrinks dynamically using a directory structure.
Key Components
- Directory (in memory)
- Buckets (on disk)
- Hash function
Concept
- Uses prefix bits of hash value
- Directory size doubles when needed
How It Works
- Compute hash(key)
- Use prefix bits to locate bucket
- If bucket overflows → split bucket
- If needed → double directory
Advantages
- No overflow chains
- Efficient lookup
12. What is Linear Hashing?
Linear hashing is a dynamic hashing technique that splits buckets gradually over time.
Key Idea
- No directory required
- Buckets split in linear order
How It Works
- Start with fixed buckets
- When overflow occurs → split next bucket
- Hash function adjusts gradually
Advantages
- Simple implementation
- Smooth growth
Comparison with Extendible Hashing
| Feature | Extendible Hashing | Linear Hashing |
|---|---|---|
| Directory | Yes | No |
| Growth | Sudden | Gradual |
13. What is the CAP Theorem?
The CAP theorem states that a distributed system can provide only two out of three guarantees:
C – Consistency
- All nodes see the same data
A – Availability
- System always responds
P – Partition Tolerance
- System continues despite network failures
Key Rule
- Cannot achieve all three simultaneously
14. How Does CAP Theorem Affect Database Design?
Design Trade-offs
1. CP Systems
- Consistency + Partition tolerance
- Example: Banking systems
2. AP Systems
- Availability + Partition tolerance
- Example: Social media
3. CA Systems
- Consistency + Availability (no partitions assumed)
Impact
- Choose based on use case:
- Financial systems → Consistency
- Web apps → Availability
15. What is BASE vs ACID?
ACID (Traditional DBMS)
- Atomicity
- Consistency
- Isolation
- Durability
BASE (Distributed Systems)
Definition
BASE stands for:
- Basically Available
- Soft State
- Eventually Consistent
Comparison
| Feature | ACID | BASE |
|---|---|---|
| Consistency | Strong | Eventual |
| Availability | Lower | High |
| Use Case | Banking | Distributed systems |
Key Insight
- ACID → Strong correctness
- BASE → High scalability
16. What are Distributed Databases?
A distributed database is a database where data is stored across multiple physical locations.
Key Features
- Data distributed across nodes
- Appears as a single system
Advantages
- Scalability
- Fault tolerance
- Performance
17. Types of Distributed Database Architectures
1. Client-Server Architecture
- Clients request data from servers
2. Peer-to-Peer (P2P)
- All nodes are equal
3. Federated Database
- Multiple independent databases
4. Shared-Nothing Architecture
- Each node has its own resources
18. How Do Distributed Databases Handle Consistency?
Techniques
1. Replication
- Keep copies synchronized
2. Consensus Algorithms
- Paxos, Raft
3. Quorum-Based Systems
- Majority agreement
4. Eventual Consistency
- Data becomes consistent over time
Key Insight
- Trade-off between consistency and availability
19. What is Two-Phase Commit (2PC)?
2PC is a protocol to ensure all nodes either commit or abort a transaction together.
Phases
1. Prepare Phase
- Coordinator asks participants: “Can you commit?”
2. Commit Phase
- If all agree → commit
- Else → rollback
Advantages
- Ensures atomicity
Disadvantages
- Blocking protocol
- Slow in distributed systems
20. What is Three-Phase Commit (3PC)?
Definition
3PC is an improved version of 2PC that reduces blocking.
Phases
1. CanCommit Phase
- Ask if participants are ready
2. PreCommit Phase
- Prepare to commit
3. Commit Phase
- Final commit
Advantages
- Non-blocking
- Better fault tolerance
Disadvantages
- More complex
- Higher overhead
21. What is Database Mirroring?
Database mirroring is a technique where a real-time copy of a database is maintained on another server (mirror server).
Key Components
- Principal Server → Main database
- Mirror Server → Exact copy
- (Optional) Witness Server → Enables automatic failover
Working
- Changes on the principal are immediately applied to the mirror
Modes
- High Safety (Synchronous) → No data loss
- High Performance (Asynchronous) → Faster but possible data loss
Use Case
- High availability systems requiring quick failover
22. Difference Between Mirroring and Replication
| Feature | Mirroring | Replication |
|---|---|---|
| Purpose | High availability | Data distribution |
| Data Sync | Real-time | Can be delayed |
| Number of Copies | Usually one mirror | Multiple replicas |
| Read Access | Limited | Available on replicas |
Key Insight
- Mirroring → Backup system
- Replication → Scaling and distribution
23. What is High Availability in DBMS?
Definition
High availability (HA) ensures that the database is accessible with minimal downtime.
Goal
- Near 100% uptime
Techniques
- Replication
- Clustering
- Failover systems
- Load balancing
Example
- If one server fails → another takes over instantly
24. What is Fault Tolerance?
Fault tolerance is the ability of a system to continue operating even when failures occur.
Key Idea
- System does not stop working
Methods
- Redundant hardware
- Data replication
- Backup systems
Difference from HA
- HA → minimize downtime
- Fault tolerance → no interruption
25. What is Disaster Recovery?
Disaster recovery (DR) is the process of restoring data and systems after catastrophic failure.
Examples of Disasters
- Data center failure
- Cyber attack
- Natural disaster
Key Metrics
- RPO (Recovery Point Objective) → Data loss tolerance
- RTO (Recovery Time Objective) → Recovery time
Techniques
- Backups
- Replication
- Offsite storage
26. Types of Database Backups
1. Full Backup
- Entire database
2. Incremental Backup
- Changes since last backup
3. Differential Backup
- Changes since last full backup
4. Transaction Log Backup
- Captures transaction logs
27. What is Full Backup?
A full backup is a complete copy of the entire database.
Advantages
- Simple recovery
- No dependency on other backups
Disadvantages
- Time-consuming
- Requires large storage
Use Case
- Periodic backup (e.g., weekly)
28. What is Incremental Backup?
An incremental backup stores only the changes made since the last backup (full or incremental).
Advantages
- Faster
- Less storage
Disadvantages
- Recovery is complex (needs all increments)
Example
- Day 1 → Full backup
- Day 2 → Incremental (changes from Day 1)
- Day 3 → Incremental (changes from Day 2)
29. What is Differential Backup?
A differential backup stores all changes made since the last full backup.
Advantages
- Faster recovery than incremental
- Simpler restoration
Disadvantages
- Larger than incremental backups
Example
- Day 1 → Full backup
- Day 2 → Changes since Day 1
- Day 3 → All changes since Day 1
30. What is Online vs Offline Backup?
Online Backup (Hot Backup)
Definition
Backup taken while the database is running and accessible.
Key Features
- No downtime
- Supports continuous operations
Advantages
- High availability
- Suitable for production systems
Offline Backup (Cold Backup)
Definition
Backup taken when the database is shut down.
Key Features
- Database unavailable during backup
Advantages
- Simpler
- Consistent backup
Comparison
| Feature | Online Backup | Offline Backup |
|---|---|---|
| Availability | Available | Not available |
| Complexity | High | Low |
| Use Case | Production systems | Maintenance windows |
31. What is a Query Optimizer?
A query optimizer is a component of the DBMS that determines the most efficient way to execute a query.
Goal
- Minimize execution time
- Reduce resource usage
Types
- Rule-based optimizer
- Cost-based optimizer
32. How Does a Cost-Based Optimizer Work?
Definition
A cost-based optimizer (CBO) evaluates multiple query execution plans and selects the one with the lowest estimated cost.
Steps
1. Generate Plans
- Different join orders, access paths
2. Estimate Cost
- Based on:
- Disk I/O
- CPU usage
- Memory
3. Choose Best Plan
- Minimum cost execution plan selected
Example
- Index scan vs full table scan → choose cheaper option
33. What are Join Algorithms?
Join algorithms are methods used to combine rows from multiple tables efficiently.
Types
- Nested Loop Join
- Hash Join
- Merge Join
34. What is Nested Loop Join?
A nested loop join compares each row of one table with every row of another.
Working
Advantages
- Simple
- Works for all join conditions
Disadvantages
- Slow for large datasets
35. What is Hash Join?
A hash join uses a hash table to match rows between two tables.
Working
- Build hash table on smaller table
- Probe with larger table
Advantages
- Efficient for large datasets
- Works well for equality joins
Disadvantages
- Requires memory
- Not suitable for non-equality joins
36. What is Merge Join?
A merge join combines sorted tables by scanning them simultaneously.
Working
- Both tables must be sorted
- Merge step compares rows sequentially
Advantages
- Very efficient for sorted data
Disadvantages
- Requires sorting overhead
37. Difference Between Hash Join and Merge Join
| Feature | Hash Join | Merge Join |
|---|---|---|
| Requirement | No sorting needed | Requires sorted data |
| Best For | Equality joins | Large sorted datasets |
| Memory | High | Moderate |
| Speed | Fast | Very fast if sorted |
38. What is Full-Text Indexing?
Full-text indexing is a technique for searching text data efficiently.
Key Features
- Supports keyword search
- Handles large text fields
Capabilities
- Phrase search
- Ranking results
- Language processing
Example
Search:
39. What is Database Tuning?
Database tuning is the process of optimizing database performance.
Goals
- Improve query speed
- Reduce resource usage
Techniques
1. Index Optimization
- Add/remove indexes
2. Query Optimization
- Rewrite inefficient queries
3. Schema Design
- Normalize/denormalize
4. Hardware Tuning
- Memory, CPU, storage
5. Configuration Tuning
- Buffer size, cache
Key Insight
- Continuous process based on workload
40. What is Connection Pooling?
Connection pooling is a technique where a pool of reusable database connections is maintained instead of creating a new connection for every request.
How It Works
- Application requests a connection
- Pool provides an existing connection
- After use → connection is returned to pool
Key Idea
- Reuse connections instead of creating/destroying repeatedly
41. Why is Connection Pooling Important?
1. Performance Improvement
- Avoids overhead of creating connections
2. Resource Optimization
- Limits number of active connections
3. Faster Response Time
- Immediate connection availability
4. Scalability
- Handles large number of users efficiently
Example
Web apps handling thousands of requests per second
42. What is Optimistic Locking?
Optimistic locking assumes conflicts are rare and checks for conflicts only at commit time.
How It Works
- Read data
- Modify data
- Before commit → check if data changed
Implementation
- Version number / timestamp
Advantages
- No locking overhead
- High concurrency
Disadvantages
- Retry required on conflict
43. What is Pessimistic Locking?
Pessimistic locking assumes conflicts are likely, so it locks data before access.
How It Works
- Lock row before reading/writing
- Prevent others from accessing
Advantages
- Prevents conflicts completely
Disadvantages
- Reduced concurrency
- Possible deadlocks
44. Difference Between Optimistic and Pessimistic Locking
| Feature | Optimistic Locking | Pessimistic Locking |
|---|---|---|
| Assumption | Conflicts rare | Conflicts common |
| Locking | No upfront lock | Locks before access |
| Performance | High | Lower |
| Use Case | Read-heavy systems | Write-heavy systems |
45. What is Polyglot Persistence?
Polyglot persistence is the practice of using multiple database technologies within a single application.
Why Use It?
- Different databases for different needs
Example
- SQL → transactions
- NoSQL → caching/logging
Benefits
- Flexibility
- Better performance
46. What are NoSQL Databases?
NoSQL databases are non-relational databases designed for scalability and flexibility.
Key Features
- Schema-less
- Distributed
- High performance
Use Cases
- Big data
- Real-time applications
47. Types of NoSQL Databases
1. Key-Value Stores
- Simple key-value pairs
- Example: Redis
2. Document Databases
- JSON-like documents
- Example: MongoDB
3. Column-Family Databases
- Column-based storage
- Example: Cassandra
4. Graph Databases
- Focus on relationships
- Example: Neo4j
48. Difference Between SQL and NoSQL Databases
| Feature | SQL | NoSQL |
|---|---|---|
| Structure | Relational (tables) | Non-relational |
| Schema | Fixed | Flexible |
| Scalability | Vertical | Horizontal |
| Consistency | Strong (ACID) | Eventual (BASE) |
49. What is NewSQL?
NewSQL databases combine ACID guarantees of SQL with scalability of NoSQL.
Goal
- High performance + strong consistency
Key Features
- Distributed architecture
- SQL support
- High throughput
Examples
- Google Spanner
- CockroachDB
50. What are In-Memory Databases?
In-memory databases (IMDBs) store data primarily in RAM instead of disk, enabling extremely fast access.
Key Characteristics
- Data resides in memory
- Minimal disk I/O
- Ultra-low latency
Examples
- Redis, SAP HANA
Advantages
- Very high performance
- Real-time processing
Disadvantages
- Limited by RAM size
- Higher cost
51. How Do Databases Handle Very Large Datasets?
Techniques
1. Partitioning
- Split data into smaller chunks
2. Sharding
- Distribute across multiple servers
3. Indexing
- Speed up data retrieval
4. Compression
- Reduce storage usage
5. Distributed Systems
- Scale horizontally
6. Data Archiving
- Move old data to cheaper storage
52. What is Horizontal Scaling?
Horizontal scaling (scale-out) means adding more machines to handle increased load.
Example
- Add more database servers
Advantages
- High scalability
- Fault tolerance
53. What is Vertical Scaling?
Vertical scaling (scale-up) means increasing resources of a single machine.
Example
- Add more CPU, RAM
Advantages
- Simpler implementation
Limitations
- Hardware limits
- Expensive
54. What are the Challenges of Big Data with RDBMS?
1. Scalability Issues
- Difficult to scale horizontally
2. Rigid Schema
- Not suitable for unstructured data
3. Performance Bottlenecks
- Joins on large datasets are slow
4. Cost
- High infrastructure cost
5. Limited Flexibility
- Hard to adapt to changing data
55. How is Database Security Implemented?
Key Techniques
1. Authentication
- Verify user identity
2. Authorization
- Control access permissions
3. Encryption
- Protect data
4. Auditing
- Track user actions
5. Firewalls
- Prevent unauthorized access
56. What is Data Encryption at Rest and in Transit?
Encryption at Rest
Definition
- Data is encrypted when stored on disk
Purpose
- Protect against data theft
Encryption in Transit
Definition
- Data is encrypted during transfer
Example
- HTTPS, SSL/TLS
Key Insight
- At rest → storage protection
- In transit → communication protection
57. What is Data Masking?
Definition
Data masking hides sensitive data by replacing it with fake or anonymized values.
Types
- Static masking
- Dynamic masking
Example
- Credit card: 1234-XXXX-XXXX-5678
Use Case
- Testing environments
- Privacy protection
58. How Does DBMS Support Multi-Tenancy?
Multi-tenancy allows multiple users (tenants) to share the same database system.
Models
1. Shared Database, Shared Schema
- All tenants share tables
2. Shared Database, Separate Schema
- Each tenant has separate schema
3. Separate Database
- Each tenant has its own database
Benefits
- Cost efficiency
- Scalability
59. What is GDPR and Its Impact on Database Design?
GDPR (General Data Protection Regulation) is a data privacy law in the European Union.
Key Principles
- Data minimization
- Consent
- Right to be forgotten
- Data portability
Impact on DB Design
1. Data Encryption
- Mandatory for sensitive data
2. Data Access Control
- Strict permissions
3. Audit Logs
- Track data usage
4. Data Deletion
- Support user data removal
60. What are Emerging Trends in Database Technologies?
1. Cloud Databases
- Managed services (AWS RDS, Azure SQL)
2. Serverless Databases
- Auto-scaling, no infrastructure management
3. NewSQL
- SQL + scalability
4. AI-Driven Databases
- Self-optimizing systems
5. Multi-Model Databases
- Support multiple data models
6. Edge Databases
- Data processing near source
7. Blockchain Databases
- Immutable, decentralized storage