1. What is a Transaction in DBMS?
A transaction is a sequence of one or more database operations executed as a single logical unit of work.
Key Idea
- Either all operations succeed or none are applied
Characteristics
- Begins with
BEGIN/START TRANSACTION - Ends with
COMMITorROLLBACK
Example (Bank Transfer)
Key Insight
- Ensures data consistency and reliability
2. What are the ACID Properties of a Transaction?
ACID ensures reliable transaction processing.
A – Atomicity
- All operations execute completely or not at all
- Failure → rollback
C – Consistency
- Database moves from one valid state to another
- Constraints must be maintained
I – Isolation
- Transactions run independently
- No interference from other transactions
D – Durability
- Once committed → changes are permanent
- Survive system crash
Example
Bank transfer must either:
- Deduct + add → success
- OR → rollback fully
3. What is Transaction Management?
Transaction management is the process of handling transactions to ensure ACID properties.
Key Responsibilities
- Start and end transactions
- Maintain consistency
- Handle failures and recovery
Components
- Transaction Manager
- Log Manager
- Recovery Manager
Goal
- Ensure safe execution of concurrent transactions
4. What is Concurrency Control?
Concurrency control ensures that multiple transactions execute simultaneously without conflict.
Purpose
- Maintain data consistency
- Avoid anomalies
Example
Two users updating the same bank balance simultaneously
5. Why is Concurrency Control Required?
Problems Without It
1. Lost Update
- One transaction overwrites another
2. Dirty Read
- Reading uncommitted data
3. Non-Repeatable Read
- Same query returns different results
4. Phantom Read
- New rows appear during execution
Conclusion
- Ensures correct and predictable results
6. What are the Different Types of Locks in DBMS?
Locks are used to control access to data during transactions.
Main Types
1. Shared Lock (S Lock)
- Read-only access
2. Exclusive Lock (X Lock)
- Read + write access
Other Lock Types
- Binary Lock (Locked/Unlocked)
- Intent Locks (for hierarchy)
- Update Lock (intermediate lock)
7. What is a Shared Lock?
Definition
A shared lock (S lock) allows multiple transactions to read a resource simultaneously.
Key Properties
- No modification allowed
- Multiple readers allowed
Example
- Many users viewing the same data
8. What is an Exclusive Lock?
An exclusive lock (X lock) allows a transaction to read and modify data.
Key Properties
- Only one transaction allowed
- Blocks both reads and writes from others
Example
- Updating account balance
9. What is a Deadlock in DBMS?
A deadlock occurs when two or more transactions are waiting indefinitely for each other’s resources.
Condition
- Circular waiting
Example
- T1 locks A, waits for B
- T2 locks B, waits for A
→ Deadlock occurs
10. How Can Deadlocks be Prevented?
1. Lock Ordering
- Acquire locks in a fixed order
2. Timeout
- Abort transaction if it waits too long
3. Deadlock Prevention Algorithms
Wait-Die
- Older transaction waits, younger aborts
Wound-Wait
- Older transaction aborts younger
4. Resource Allocation Strategy
- Avoid circular wait
5. Detection & Recovery
- Detect deadlock → rollback one transaction
11. What is Deadlock Detection?
Deadlock detection is a technique used by DBMS to identify deadlocks after they occur.
How It Works
- The system maintains a Wait-For Graph (WFG)
- Nodes → Transactions
- Edges → Waiting relationships
👉 If a cycle exists → deadlock detected
Steps
- Build wait-for graph
- Check for cycles
- If found → deadlock exists
Resolution
- Abort one or more transactions
- Release locks
Key Insight
- Detection = Allow deadlock, then fix it
12. What is Deadlock Prevention?
Deadlock prevention ensures that deadlocks never occur by restricting system behavior.
Approach
Break at least one of the four necessary conditions of deadlock:
- Mutual exclusion
- Hold and wait
- No preemption
- Circular wait
Techniques
1. Resource Ordering
- Request resources in fixed order
2. Preemption
- Force release of resources
3. No Hold and Wait
- Request all resources at once
Key Insight
- Prevention = Avoid possibility of deadlock
14. What is Deadlock Avoidance?
Deadlock avoidance ensures the system never enters an unsafe state.
Concept
- Before granting a request → check if it's safe
Safe State
- System can complete all transactions without deadlock
Algorithm
- Banker’s Algorithm
Key Idea
- Grant request only if:
→ System remains in safe state
Difference from Prevention
| Prevention | Avoidance |
|---|---|
| Restrictive | Dynamic decision |
| No deadlocks possible | Avoid unsafe states |
15. What is a Transaction Log?
A transaction log is a file that records all changes made to the database.
Contents
- Transaction start
- Data changes (before/after values)
- Commit / rollback status
Purpose
- Recovery after crash
- Maintain durability
16. What is Recovery Management?
Definition
Recovery management ensures the database is restored to a consistent state after failure.
Types of Failures
- System crash
- Power failure
- Disk failure
Techniques
1. Log-Based Recovery
- Uses transaction logs
2. Shadow Paging
- Maintain old + new copies
Goal
- Ensure Atomicity + Durability
17. What is a Checkpoint in DBMS?
Definition
A checkpoint is a mechanism to save the current state of the database to reduce recovery time.
How It Works
- Flush all changes to disk
- Mark a checkpoint in log
Benefits
- Faster recovery
- Reduce log scanning
Example
Instead of replaying entire log → start from checkpoint
18. What is Durability in DBMS?
Definition
Durability ensures that once a transaction is committed, its changes are permanently stored.
Key Points
- Survives system crash
- Stored in disk/log
Example
- Money transferred → must not disappear
Ensured By
- Logs
- Checkpoints
- Stable storage
19. What is Isolation in DBMS?
Isolation ensures that transactions execute independently without interference.
Goal
- Prevent data inconsistency
Example
Two users updating same account:
- Should not see intermediate results
Key Insight
- Makes concurrent execution appear serial
20. What are Isolation Levels?
Isolation levels define the degree of visibility between transactions.
1. Read Uncommitted
- Can read uncommitted data
- Allows dirty reads
2. Read Committed
- Only committed data visible
- Prevents dirty reads
3. Repeatable Read
- Same query gives same result
- Prevents non-repeatable reads
4. Serializable (Highest Level)
- Full isolation
- Equivalent to serial execution
Comparison Table
| Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted | ❌ Yes | ❌ Yes | ❌ Yes |
| Read Committed | ✅ No | ❌ Yes | ❌ Yes |
| Repeatable Read | ✅ No | ✅ No | ❌ Yes |
| Serializable | ✅ No | ✅ No | ✅ No |
21. What are Dirty Reads?
A dirty read occurs when a transaction reads data that has been modified but not yet committed by another transaction.
Problem
- If the first transaction rolls back → the read data becomes invalid
Example
- T1 updates balance → not committed
- T2 reads updated balance
- T1 rolls back → T2 used incorrect data
Prevention
- Use Read Committed isolation level or higher
22. What are Non-Repeatable Reads?
A non-repeatable read occurs when the same query returns different results within the same transaction.
Cause
- Another transaction modifies data between reads
Example
- T1 reads salary = 5000
- T2 updates salary to 7000 and commits
- T1 reads again → 7000
Prevention
- Use Repeatable Read isolation level
23. What are Phantom Reads?
A phantom read occurs when new rows appear or disappear during a transaction.
Cause
- Another transaction inserts or deletes rows
Example
- T1: SELECT all employees with salary > 5000
- T2: Inserts new employee with salary 6000
- T1 runs again → extra row appears
Prevention
- Use Serializable isolation level
24. What is Normalization?
Normalization is the process of organizing data to reduce redundancy and improve data integrity.
Goals
- Eliminate duplicate data
- Ensure logical data storage
- Prevent anomalies
Key Idea
- Divide large tables into smaller, related tables
25. What are Normal Forms?
Normal forms are a set of rules used to evaluate and improve database design.
Main Types
- 1NF (First Normal Form)
- 2NF (Second Normal Form)
- 3NF (Third Normal Form)
- BCNF (Boyce-Codd Normal Form)
26. Explain 1NF (First Normal Form)
Rule
- All attributes must contain atomic (indivisible) values
Requirements
- No repeating groups
- Each field contains single value
Violation Example
| ID | Phones |
|---|---|
| 1 | 123, 456 |
1NF Solution
| ID | Phone |
|---|---|
| 1 | 123 |
| 1 | 456 |
27. Explain 2NF (Second Normal Form)
Rule
- Must be in 1NF
- No partial dependency
Partial Dependency
- When non-key attribute depends on part of a composite key
Example
| Student_ID | Course_ID | Student_Name |
|---|
- Student_Name depends only on Student_ID
Solution
Split into:
- Student(Student_ID, Student_Name)
- Enrollment(Student_ID, Course_ID)
28. Explain 3NF (Third Normal Form)
Rule
- Must be in 2NF
- No transitive dependency
Transitive Dependency
- Non-key attribute depends on another non-key attribute
Example
| Student_ID | Dept_ID | Dept_Name |
- Dept_Name depends on Dept_ID
Solution
- Student(Student_ID, Dept_ID)
- Department(Dept_ID, Dept_Name)
29. Explain BCNF (Boyce-Codd Normal Form)
BCNF is a stronger version of 3NF.
Rule
- Every determinant must be a candidate key
Why Needed?
- Handles anomalies not covered by 3NF
Example
If:
- A → B
- B → A
Both must be candidate keys
30. What is Data Redundancy and How is it Reduced?
Data redundancy is the unnecessary duplication of data in a database.
Problems Caused
- Increased storage
- Data inconsistency
- Update anomalies
How to Reduce Redundancy
1. Normalization
- Split tables logically
2. Proper Key Design
- Use primary and foreign keys
3. Avoid Duplicate Storage
- Store data in one place
4. Use Relationships
- Link tables instead of repeating data
Example
Instead of:
| Student | Course | Course_Name |
Use:
- Student table
- Course table
- Relationship table
31. What is a Functional Dependency (FD)?
A functional dependency describes a relationship where one attribute uniquely determines another.
Notation
-
→ Attribute A determines B
Key Idea
- If two rows have the same A → they must have the same B
Example
- Student_ID → Name
→ Each ID corresponds to one Name
Types
- Trivial FD: A → A
- Non-Trivial FD: A → B (B not part of A)
32. What is a Multivalued Dependency (MVD)?
A multivalued dependency occurs when one attribute determines multiple independent values of another attribute.
Notation
Key Idea
- B values are independent of other attributes
Example
Student:
- Courses and Hobbies independent
| Student | Course | Hobby |
|---|---|---|
| A | DBMS | Music |
| A | OS | Music |
Problem
- Redundant combinations
Solution
- Decompose into separate tables
33. What is Join Dependency (JD)?
A join dependency exists when a table can be reconstructed by joining multiple smaller tables.
Key Idea
- Table = Join of projections
Example
Table R(A, B, C):
- Can be split into (A, B) and (B, C)
- Joined back without loss
Use
- Leads to 5NF (Fifth Normal Form)
34. What is Indexing in DBMS?
Definition
Indexing is a technique to improve the speed of data retrieval.
Concept
- Similar to index in a book
- Avoids full table scan
Benefits
- Faster queries
- Efficient searching
Trade-offs
- Extra storage
- Slower inserts/updates
35. What are the Different Types of Indexes?
1. Primary Index
- Based on primary key
2. Secondary Index
- On non-key columns
3. Clustered Index
- Data stored physically in sorted order
4. Non-Clustered Index
- Separate structure
5. Composite Index
- Based on multiple columns
6. Unique Index
- Enforces uniqueness
36. What is a Clustered Index?
A clustered index determines the physical order of data in a table.
Key Characteristics
- Data is stored sorted
- Only one clustered index per table
Example
- Table sorted by Student_ID
Advantage
- Faster range queries
37. What is a Non-Clustered Index?
A non-clustered index stores index data separately from actual table data.
Structure
- Contains pointers to actual rows
Key Characteristics
- Multiple allowed per table
- Does not affect physical storage
Example
- Index on Name column
38. Difference: Clustered vs Non-Clustered Index
| Feature | Clustered Index | Non-Clustered Index |
|---|---|---|
| Storage | Physical order | Separate structure |
| Count per table | One | Multiple |
| Speed | Faster for range | Faster for lookup |
| Data location | Same as table | Pointer-based |
Key Insight
- Clustered → actual data sorted
- Non-clustered → reference to data
39. What is Query Optimization?
Query optimization is the process of selecting the most efficient execution plan for a query.
Goal
- Minimize execution time
- Reduce resource usage
Techniques
- Index usage
- Join reordering
- Predicate pushdown
Example
- Choosing index scan vs full table scan
40. What is Query Processing?
Query processing is the complete process of executing a SQL query.
Steps Involved
1. Parsing
- Check syntax and validity
2. Translation
- Convert into internal representation
3. Optimization
- Choose best execution plan
4. Execution
- Run the query
Flow
SQL Query → Parser → Optimizer → Execution Engine → Result
41. What is a Stored Procedure?
A stored procedure is a precompiled collection of SQL statements stored in the database and executed as a single unit.
Key Characteristics
- Stored inside the DBMS
- Can accept parameters (IN, OUT, INOUT)
- Supports control logic (loops, conditions)
Example
Key Insight
- Reusable, efficient, and centralized business logic
42. What are the Advantages of Stored Procedures?
1. Performance Improvement
- Precompiled → faster execution
2. Code Reusability
- Write once, use multiple times
3. Security
- Restrict direct table access
4. Reduced Network Traffic
- Execute logic on server side
5. Maintainability
- Centralized logic → easy updates
44. What is a Trigger?
Definition
A trigger is a special type of stored procedure that automatically executes in response to specific events.
Trigger Events
-
INSERT -
UPDATE -
DELETE
Types
- BEFORE Trigger
- AFTER Trigger
Example
Key Insight
- Event-driven automation
45. Difference Between Trigger and Stored Procedure
| Feature | Stored Procedure | Trigger |
|---|---|---|
| Execution | Manual call | Automatic |
| Invocation | Explicit | Event-based |
| Control | User-controlled | System-controlled |
| Use Case | Business logic | Auditing, validation |
46. What is a Materialized View?
A materialized view is a view that stores data physically instead of just the query.
Key Characteristics
- Stores actual data
- Requires periodic refresh
- Improves performance
Example
- Precomputed sales summary
Use Case
- Reporting systems
- Data warehousing
47. Difference Between View and Materialized View
| Feature | View | Materialized View |
|---|---|---|
| Storage | Virtual | Physical |
| Data freshness | Always updated | Needs refresh |
| Performance | Slower (computed each time) | Faster (precomputed) |
Key Insight
- View = dynamic
- Materialized view = cached
48. What is Database Partitioning?
Partitioning divides a large table into smaller, manageable pieces.
Purpose
- Improve performance
- Easier maintenance
- Better scalability
Types
- Horizontal
- Vertical
49. Difference Between Horizontal and Vertical Partitioning
Horizontal Partitioning
Definition
- Rows are divided into partitions
Example
- Students split by region
Vertical Partitioning
Definition
- Columns are divided into partitions
Example
- Separate personal info and academic info
Comparison
| Feature | Horizontal Partitioning | Vertical Partitioning |
|---|---|---|
| Division | Rows | Columns |
| Use Case | Large datasets | Column optimization |
| Example | Region-wise data | Sensitive columns |
50. What is Database Replication?
Database replication is the process of copying and maintaining data across multiple databases.
Purpose
- Improve availability
- Enhance performance
- Provide fault tolerance
Types
1. Master-Slave Replication
- One master, multiple read replicas
2. Master-Master Replication
- Multiple writable nodes
Benefits
- Load balancing
- Backup & disaster recovery
- High availability
Challenges
- Data consistency
- Replication lag
51. Types of Database Replication
Database replication involves copying data across multiple servers to ensure availability, performance, and fault tolerance.
1. Master-Slave (Primary–Replica) Replication
- One master handles writes
- Replicas handle read operations
Advantages
- Simple setup
- Good for read-heavy systems
Limitation
- Single point of write failure
2. Master-Master Replication
- Multiple nodes can read and write
Advantages
- High availability
- No single point of failure
Challenges
- Conflict resolution
3. Synchronous Replication
- Data written to all replicas at the same time
Pros
- Strong consistency
Cons
- Slower performance
4. Asynchronous Replication
- Data replicated after commit
Pros
- Faster writes
Cons
- Possible data lag
5. Snapshot Replication
- Copies entire dataset at intervals
6. Transactional Replication
- Replicates changes in real-time
52. What is Database Sharding?
Sharding is a technique of splitting a large database into smaller, distributed pieces (shards) across multiple servers.
Key Idea
- Each shard contains part of the data
Example
Users split by:
- Region
- User_ID range
Benefits
- Horizontal scalability
- Improved performance
Challenge
- Complex queries across shards
53. Difference Between Sharding and Partitioning
| Feature | Sharding | Partitioning |
|---|---|---|
| Scope | Across multiple servers | Within a single database |
| Type | Horizontal scaling | Logical division |
| Complexity | High | Lower |
| Use Case | Large distributed systems | Performance optimization |
Key Insight
- Partitioning = inside DB
- Sharding = across servers
54. What is OLTP (Online Transaction Processing)?
OLTP systems are designed to handle real-time transactional operations.
Characteristics
- Large number of short transactions
- High concurrency
- Fast response time
Examples
- Banking systems
- E-commerce orders
55. What is OLAP (Online Analytical Processing)?
OLAP systems are designed for complex queries and data analysis.
Characteristics
- Large datasets
- Complex aggregations
- Read-heavy workload
Examples
- Business intelligence
- Data analytics dashboards
56. Difference Between OLTP and OLAP
| Feature | OLTP | OLAP |
|---|---|---|
| Purpose | Transactions | Analysis |
| Queries | Simple, fast | Complex, long-running |
| Data | Current | Historical |
| Users | End-users | Analysts |
58. What is Data Warehousing?
A data warehouse is a centralized repository for storing historical and analytical data.
Key Characteristics
- Subject-oriented
- Integrated
- Time-variant
- Non-volatile
Purpose
- Support decision-making
- Enable OLAP operations
Architecture Components
- Data sources
- ETL (Extract, Transform, Load)
- Warehouse storage
- BI tools
59. Difference Between ER Diagram and Relational Schema
| Feature | ER Diagram | Relational Schema |
|---|---|---|
| Representation | Graphical | Tabular |
| Level | Conceptual design | Logical design |
| Components | Entities, attributes, relations | Tables, columns, keys |
| Purpose | Design understanding | Implementation |
Key Insight
- ER Diagram → Design phase
- Schema → Implementation phase
60. What is UNION?
UNION is an SQL operator used to combine results of two or more SELECT queries.
Key Rules
- Same number of columns
- Compatible data types
- Removes duplicates
Types
1. UNION
- Removes duplicates
2. UNION ALL
- Includes duplicates
Example
Key Insight
- UNION → unique results
- UNION ALL → faster, includes duplicates
61. What is UNION ALL?
Definition
UNION ALL is an SQL operator used to combine results of two or more SELECT queries including duplicates.
Key Characteristics
- Does not remove duplicates
- Faster than
UNION(no duplicate check) - Requires same number of columns and compatible data types
Example
Key Insight
- Use when duplicate values are acceptable and performance matters
62. Difference Between UNION and UNION ALL
| Feature | UNION | UNION ALL |
|---|---|---|
| Duplicate Rows | Removed | Allowed |
| Performance | Slower (deduplication) | Faster |
| Use Case | Unique results needed | All data needed |
Key Insight
-
UNION= DISTINCT -
UNION ALL= RAW merge
63. What is CHAR vs VARCHAR?
Both are used to store string data, but differ in storage behavior.
CHAR
- Fixed-length
- Pads extra spaces
VARCHAR
- Variable-length
- Stores only actual data
Comparison
| Feature | CHAR | VARCHAR |
|---|---|---|
| Length | Fixed | Variable |
| Storage | Always full length | Actual data only |
| Performance | Faster (fixed size) | Slightly slower |
| Use Case | Fixed values (e.g., Gender) | Dynamic text (e.g., Name) |
Example
64. What is a Cursor and When is it Used?
A cursor is a database object used to process query results row by row.
Why Needed?
- SQL is set-based
- Cursor allows procedural logic
Types
- Implicit Cursor
- Explicit Cursor
When to Use
- Complex row-by-row operations
- Conditional logic per record
- Data transformations
Drawback
- Slower than set-based operations
65. What is Database Schema Evolution?
Schema evolution is the process of modifying database structure over time without breaking existing systems.
Examples
- Adding new columns
- Changing data types
- Renaming tables
Challenges
- Backward compatibility
- Data migration
- Application impact
Best Practices
- Use versioning
- Apply migrations carefully
- Maintain compatibility
66. What is ORM (Object Relational Mapping)?
ORM is a technique that maps database tables to programming language objects.
Purpose
- Simplify database interaction
- Avoid writing raw SQL
How It Works
- Table → Class
- Row → Object
- Column → Attribute
Examples
- Hibernate (Java)
- Sequelize (Node.js)
- Django ORM (Python)
Advantages
- Faster development
- Cleaner code
Disadvantages
- Performance overhead
- Less control over queries
67. What is Referential Action (CASCADE, SET NULL)?
Referential actions define what happens to child table records when parent data changes.
1. CASCADE
- Automatically update/delete child rows
Example
- Delete parent → delete all related child rows
2. SET NULL
- Set foreign key to NULL
Example
- Parent deleted → child reference becomes NULL
Other Actions
- RESTRICT → Prevent action
- NO ACTION → Similar to restrict
68. What is a Many-to-Many Relationship and How is it Implemented?
A many-to-many (M:N) relationship occurs when:
- One entity relates to multiple records of another, and vice versa
Example
- Students ↔ Courses
- One student → many courses
- One course → many students
Implementation
Step 1: Create Two Tables
- Student
- Course
Step 2: Create Junction Table
- Enrollment
Structure
Key Features
- Junction table contains foreign keys
- Often uses composite primary key
Key Insight
- M:N relationships are always broken into two 1:N relationships