1Q. What are multi-document ACID transactions in MongoDB?
Multi-document ACID transactions in MongoDB allow multiple operations across documents and collections to be executed atomically, ensuring ACID properties.
ACID Properties:
- Atomicity → All operations succeed or fail together
- Consistency → Database remains valid
- Isolation → Transactions are isolated from others
- Durability → Data persists after commit
Key Points:
- Introduced in MongoDB 4.0 (replica sets), 4.2 (sharded clusters)
- Supports multiple collections and documents
- Uses sessions to manage transactions
Example:
const session = db.getMongo().startSession();
session.startTransaction();
db.users.insertOne({ name: "Jitendra" }, { session });
db.orders.insertOne({ item: "Book" }, { session });
session.commitTransaction();
session.endSession();
Use Case:
- Banking systems
- Order processing systems
2Q. How do transactions work in replica sets?
In a replica set, transactions ensure consistent and atomic operations across multiple nodes.
Key Points:
- All writes go to the primary node
- Transaction operations are recorded in oplog
- Secondaries replicate committed transactions
- Uses two-phase commit internally
Working Flow:
- Client starts a session
- Transaction begins on primary
- Operations executed
- Commit request sent
- Data written and replicated
- Transaction marked committed
Important:
- Uses majority write concern for durability
- Ensures consistency across nodes
3Q. How do transactions work in sharded clusters?
Transactions in sharded clusters allow atomic operations across multiple shards (distributed data).
Key Points:
- Introduced in MongoDB 4.2
- Supports cross-shard transactions
- Uses two-phase commit protocol
Working Flow:
- Transaction starts via
mongos - Operations sent to relevant shards
- Each shard executes locally
- Coordinator shard manages commit
- All shards commit or abort
Challenges:
- Higher latency than replica sets
- More complex coordination
4Q. What are limitations of MongoDB transactions?
MongoDB transactions have certain constraints that affect performance and scalability.
Key Limitations:
- Higher overhead compared to single operations
- Timeout limit (~60 seconds default)
- Increased memory usage
- Not ideal for large batch operations
- Slower in sharded clusters
Other Constraints:
- Limited number of operations per transaction
- Locks resources during execution
- Performance impact under heavy load
Best Practice:
- Keep transactions short and small
5Q. Explain MongoDB replication architecture in detail
MongoDB replication architecture is based on replica sets, where multiple nodes maintain copies of the same data.
Components:
- Primary Node → handles all writes
- Secondary Nodes → replicate data
- Arbiter (optional) → participates in elections
Architecture Flow:
Primary → Oplog → Secondary Nodes
Key Points:
- Asynchronous replication
- Ensures redundancy and fault tolerance
- Supports automatic failover
Data Flow:
- Write operation goes to primary
- Logged in oplog
- Secondary nodes read oplog
- Apply changes
6Q. What happens during replica set elections?
Replica set election is the process of selecting a new primary when the current primary fails.
Key Points:
- Triggered when primary becomes unavailable
- Secondary nodes vote to elect new primary
- Based on priority and freshness of data
Election Process:
- Primary fails
- Secondaries detect failure
- Election initiated
- Nodes vote
- Node with majority votes becomes primary
Important Factors:
- Node priority
- Replication lag
- Network latency
Result:
- New primary takes over
- System continues operation
7Q. What is majority write concern?
Majority write concern ensures that a write operation is acknowledged only after it is committed to a majority of replica set members.
Key Points:
- Ensures strong data durability
- Prevents data loss during failover
- Required for transactions
Example:
db.users.insertOne(
{ name: "Jitendra" },
{ writeConcern: { w: "majority" } }
)
How it Works:
- Write sent to primary
- Replicated to secondary nodes
- Acknowledged after majority confirms
Trade-off:
- High reliability → slower performance
- Low reliability → faster performance
8Q. How does MongoDB ensure durability in MongoDB?
Durability ensures that once a write operation is acknowledged, the data will not be lost even in case of crashes or failures.
Key Mechanisms:
- Journaling
- Write Concern (majority)
- Replication
How it Works:
- Write operation occurs
- Data written to journal file (WAL)
- Then written to data files
- Replicated to secondary nodes
- Acknowledged after durability guarantees
Key Points:
- Journaling enables crash recovery
- Majority write concern ensures replication
- Data persists even after system failure
9Q. WiredTiger vs MMAPv1 – deep comparison
WiredTiger and MMAPv1 are MongoDB storage engines that manage how data is stored and accessed.
Comparison Table:
| Feature | WiredTiger | MMAPv1 |
|---|---|---|
| Default Engine | Yes | No (deprecated) |
| Compression | Supported | Not supported |
| Locking | Document-level | Collection-level |
| Performance | High | Moderate |
| Concurrency | High | Limited |
| Memory Usage | Efficient | Less efficient |
| Journaling | Yes | Yes |
WiredTiger Key Points:
- Uses document-level locking
- Supports compression (Snappy, Zlib)
- Better concurrency and performance
- Uses cache for memory optimization
MMAPv1 Key Points:
- Uses collection-level locking
- No compression
- Poor concurrency
- Deprecated in modern MongoDB
Conclusion:
WiredTiger is faster, more efficient, and preferred for production systems.
10Q. How does MongoDB handle concurrency?
Concurrency in MongoDB refers to how multiple operations are handled simultaneously without conflicts.
Key Mechanisms:
- Locking system
- WiredTiger storage engine
- Multi-version concurrency control (MVCC)
Key Points:
- Supports multiple reads and writes concurrently
- Uses fine-grained locking
- Avoids blocking operations
How it Works:
- Each operation gets a snapshot of data
- Writes do not block reads
- Conflicts are minimized
11Q. What is document-level locking?
Document-level locking allows MongoDB to lock only the specific document being modified, instead of locking the entire collection.
Key Points:
- Enabled by WiredTiger
- Improves concurrency
- Multiple operations can run in parallel
Example:
- Two users update different documents → both succeed simultaneously
Benefit:
- Faster performance
- Reduced contention
12Q. How does WiredTiger cache work?
WiredTiger cache is an in-memory cache used to store frequently accessed data for faster read/write operations.
Key Points:
- Default cache size ≈ 50% of system RAM
- Stores frequently accessed documents and indexes
- Uses eviction policy to remove old data
How it Works:
- Data loaded into cache
- Reads served from memory (fast)
- Writes buffered in cache
- Periodically flushed to disk
Important Concept:
- Dirty Data → modified data in cache not yet written to disk
13Q. How does MongoDB manage memory?
MongoDB manages memory using the WiredTiger cache and OS-level memory management.
Key Points:
- Uses WiredTiger cache for active data
- Relies on OS for file system caching
- Automatically adjusts memory usage
Memory Usage Components:
- WiredTiger cache
- Indexes
- Connections
- Aggregation operations
Best Practice:
- Keep working set in RAM for optimal performance
14Q. What is MapReduce and when should it be avoided?
MapReduce is a data processing model used to process large datasets using map and reduce functions.
Key Points:
- Uses JavaScript functions
- Processes data in two steps:
- Map → transform data
- Reduce → aggregate data
Example:
db.collection.mapReduce(
function() { emit(this.category, this.amount); },
function(key, values) { return Array.sum(values); }
)
When to Avoid:
- Slower than aggregation framework
- Not optimized for performance
- Deprecated for most use cases
Use Case:
- Complex custom computations (rare cases)
15Q. MapReduce vs Aggregation Framework
Both are used for data processing, but aggregation framework is the modern and efficient approach.
Comparison Table:
| Feature | MapReduce | Aggregation Framework |
|---|---|---|
| Performance | Slow | Fast |
| Language | JavaScript | Native operators |
| Complexity | High | Low |
| Optimization | Limited | Highly optimized |
| Use Case | Complex logic | Most data processing |
Key Points:
- Aggregation is preferred in modern MongoDB
- MapReduce is rarely used today
- Aggregation is faster and easier
Conclusion:
Use Aggregation Framework instead of MapReduce in most cases.
16Q. How does full-text search work in MongoDB?
Full-text search in MongoDB allows searching text content within documents using text indexes and text search queries.
Key Points:
- Uses text indexes on string fields
- Supports keyword-based search
- Performs tokenization and stemming
- Returns results based on relevance score
How it Works:
- Text index created on fields
- MongoDB tokenizes words (breaks into terms)
- Removes stop words (e.g., “the”, “is”)
- Applies stemming (running → run)
- Matches search query against indexed terms
Example:
db.articles.find({ $text: { $search: "mongodb database" } })
Important:
- Supports ranking using
score - Case-insensitive search
17Q. What is a text index?
A text index is a special index type used to support text search queries in MongoDB.
Key Points:
- Created on string fields
- Enables
$textqueries - Stores tokenized words instead of raw text
- Only one text index per collection (can include multiple fields)
Example:
db.articles.createIndex({ title: "text", content: "text" })
Features:
- Supports language-specific rules
- Provides relevance scoring
18Q. What is Atlas Search?
MongoDB Atlas Search is an advanced full-text search feature in MongoDB Atlas powered by Apache Lucene.
Key Points:
- More powerful than basic text search
- Supports fuzzy search, autocomplete, synonyms
- Built into MongoDB Atlas
- No separate search engine required
Features:
- Relevance ranking
- Highlighting
- Complex queries (phrase, wildcard)
Use Case:
- E-commerce search
- Advanced filtering systems
- Search-as-you-type
19Q. How does MongoDB handle large file storage?
MongoDB handles large files using GridFS, which splits files into smaller chunks and stores them across collections.
Key Points:
- Used for files larger than 16MB
- Stores files in chunks
- Efficient retrieval and storage
- Avoids BSON size limitation
Storage Structure:
-
fs.files→ metadata -
fs.chunks→ actual file data
Use Cases:
- Video storage
- Image storage
- File systems
20Q. GridFS internal working
GridFS internally stores large files by dividing them into smaller chunks and managing them across collections.
Key Points:
- Default chunk size: 255KB
- Each chunk stored as separate document
- Files reconstructed during retrieval
Internal Flow:
- File uploaded
- Split into chunks
- Stored in
fs.chunks - Metadata stored in
fs.files - File retrieved by combining chunks
Example Structure:
fs.files → file metadata
fs.chunks → binary data chunks
Benefit:
- Efficient handling of large files
- Supports streaming
21Q. What is mongos?
mongos is a query router in MongoDB that directs client requests to the appropriate shard in a sharded cluster.
Key Points:
- Acts as entry point for clients
- Does not store data
- Routes queries based on shard key
- Works with config servers
Role:
- Receives query
- Determines target shard
- Sends request
- Merges results
22Q. Role of config servers
Config servers store metadata and configuration information about the sharded cluster.
Key Points:
- Store shard mapping information
- Maintain chunk distribution
- Essential for cluster operation
- Usually deployed as replica set
Data Stored:
- Shard details
- Chunk locations
- Database configuration
Importance:
-
mongosdepends on config servers - Without config servers, cluster cannot function
23Q. How does query routing work in sharded clusters?
Query routing is the process by which MongoDB directs queries to the correct shard(s) using mongos.
Key Points:
- Uses shard key to identify target shard
- Minimizes unnecessary data access
- Improves performance
Working Flow:
- Client sends query to
mongos -
mongoschecks config server metadata - Determines which shard(s) contain data
- Sends query to relevant shard(s)
- Collects and merges results
- Returns final response
Types of Queries:
🔹 Targeted Query:
- Query includes shard key
- Sent to specific shard
🔹 Scatter-Gather Query:
- Query without shard key
- Sent to all shards
Optimization Tip:
- Always include shard key in queries for better performance
24Q. How do you reshard a collection in MongoDB?
Resharding is the process of changing the shard key of an existing collection to improve data distribution and performance.
Key Points:
- Introduced in MongoDB 5.0
- Allows changing shard key without downtime
- Data is redistributed automatically
- Uses background process
How it Works:
- New shard key is defined
- MongoDB creates temporary collection
- Data copied and redistributed
- Writes synchronized between old & new
- Switch happens seamlessly
Command:
sh.reshardCollection("db.collection", { newShardKey: 1 })
Use Case:
- Fix poor shard key
- Improve performance and scalability
25Q. What are common shard key design mistakes?
Shard key mistakes lead to uneven data distribution, hotspots, and poor performance.
Common Mistakes:
🔹 Low Cardinality
- Few unique values → uneven distribution
🔹 Monotonically Increasing Keys
- Example: timestamps, auto-increment IDs
- Causes hot shard problem
🔹 Not Including in Queries
- Queries without shard key → scatter-gather
🔹 Large Chunk Sizes
- Leads to inefficient balancing
Best Practices:
- Choose high cardinality field
- Ensure even distribution
- Frequently used in queries
26Q. How do you migrate data from SQL to MongoDB?
Data migration is the process of converting relational data into document-based structure.
Key Steps:
- Analyze Schema
- Identify tables and relationships
- Design MongoDB Schema
- Use embedding or referencing
- Transform Data
- Convert rows → documents
- Migrate Data
- Use tools or scripts
Tools:
-
mongoimport - ETL tools (like Talend, Apache NiFi)
- Custom scripts
Example Transformation:
SQL:
Users + Orders (JOIN)
MongoDB:
{
name: "Jitendra",
orders: [...]
}
27Q. How do you perform rolling upgrades?
Rolling upgrade is a process of upgrading MongoDB nodes one at a time without downtime.
Key Points:
- Upgrade secondary nodes first
- Primary upgraded last
- Ensures continuous availability
Steps:
- Upgrade secondary nodes
- Restart each secondary
- Step down primary
- Upgrade former primary
- Verify cluster health
Benefit:
- Zero service interruption
- Maintains availability
28Q. Zero-downtime MongoDB upgrade strategy
A strategy to upgrade MongoDB without affecting application availability.
Key Approach:
- Use replica sets or sharded clusters
- Upgrade nodes sequentially
Steps:
- Ensure replication is healthy
- Upgrade secondaries
- Perform primary step-down
- Upgrade primary
- Monitor system
Key Points:
- No downtime for users
- Requires proper monitoring
- Backup before upgrade
29Q. How do you secure MongoDB in production?
Securing MongoDB involves protecting data from unauthorized access and ensuring safe communication.
Key Security Measures:
- Enable authentication
- Use role-based access control (RBAC)
- Enable TLS/SSL encryption
- Restrict network access (IP whitelist)
- Disable unnecessary ports
- Enable auditing
Best Practices:
- Never expose database publicly
- Use strong passwords
- Regularly update MongoDB
30Q. Authentication mechanisms in MongoDB
Authentication verifies the identity of users accessing MongoDB.
Types:
SCRAM (Default)
- Username/password-based
X.509
- Certificate-based authentication
LDAP
- Enterprise authentication
Kerberos
- Network authentication protocol
Key Points:
- Ensures only authorized users access data
- Integrated with RBAC
31Q. Role-based access control (RBAC)
RBAC restricts access based on user roles and permissions.
Key Points:
- Users assigned roles
- Roles define permissions
- Fine-grained access control
Example Roles:
-
read -
readWrite -
dbAdmin -
clusterAdmin
Example:
db.createUser({
user: "admin",
pwd: "password",
roles: ["readWrite"]
})
32Q. How does TLS/SSL work in MongoDB?
TLS/SSL encrypts data transmitted between MongoDB clients and servers to ensure secure communication.
Key Points:
- Prevents data interception
- Uses certificates for encryption
- Supports mutual authentication
How it Works:
- Client connects to server
- SSL handshake occurs
- Certificates verified
- Secure encrypted connection established
Benefit:
- Data security in transit
- Protection against attacks
33Q. How does MongoDB handle disaster recovery?
Disaster recovery ensures data can be restored after failures like crashes, data loss, or outages.
Key Strategies:
- Replication (replica sets)
- Regular backups (
mongodump) - Point-in-time recovery (oplog)
- Multi-region deployment
Recovery Process:
- Detect failure
- Failover to secondary
- Restore from backup if needed
- Sync data
Best Practices:
- Maintain backups
- Test recovery process
- Use geographically distributed clusters
Troubleshooting slow queries involves identifying and resolving performance bottlenecks in query execution.
Key Steps:
Use explain()
- Analyze query execution plan
- Check for
COLLSCANvsIXSCAN
db.users.find({ age: 25 }).explain("executionStats")
Check Index Usage
- Ensure proper indexes exist
- Use compound indexes if needed
Analyze Query Patterns
- Avoid unnecessary fields
- Use projection
Monitor Metrics
- Query execution time
- CPU, memory, disk usage
Enable Profiling
db.setProfilingLevel(2)
Common Fixes:
- Add indexes
- Optimize query structure
- Reduce data scanned
35Q. How does MongoDB handle high write throughput?
MongoDB handles high write throughput using scaling, batching, and efficient storage mechanisms.
Key Techniques:
- Horizontal scaling (sharding)
- WiredTiger storage engine
- Bulk writes
- Asynchronous replication
Key Points:
- Writes go to primary node
- Buffered in memory (cache)
- Flushed to disk efficiently
Optimization Tips:
- Use unordered bulk writes
- Choose good shard key
- Reduce write concern if acceptable
36Q. MongoDB in microservices architecture
MongoDB is widely used in microservices as a flexible, scalable database per service.
Key Points:
- Each service can have its own database
- Supports independent scaling
- Flexible schema fits evolving services
Benefits:
- Loose coupling
- Faster development
- Independent deployments
Example:
- User Service → MongoDB (users)
- Order Service → MongoDB (orders)
37Q. One database per service – pros & cons
Each microservice owns its own database, ensuring data isolation.
Pros:
- Independent scaling
- Better fault isolation
- No cross-service dependencies
Cons:
- Data duplication
- Complex joins across services
- Distributed transactions required
Key Point:
- Encourages event-driven architecture
38Q. MongoDB with containers & Kubernetes
MongoDB can be deployed using containers (Docker) and orchestrated with Kubernetes for scalability and automation.
Key Points:
- Use StatefulSets in Kubernetes
- Persistent volumes for storage
- Replica sets for high availability
Benefits:
- Easy deployment
- Auto-scaling
- Self-healing systems
Tools:
- Kubernetes Operator for MongoDB
- Helm charts
39Q. How does MongoDB handle schema evolution?
Schema evolution is the ability to modify data structure over time without downtime.
Key Points:
- Schema-less design allows flexible changes
- Documents can have different structures
- No migration required for small changes
Strategies:
- Versioning fields
- Backward compatibility
- Gradual migration
Example:
// Old document
{ name: "Jitendra" }
// New document
{ name: "Jitendra", age: 22 }
Benefit:
- Faster development cycles
- Easy feature updates
40Q. Common production deployment best practices
Best practices ensure MongoDB runs efficiently, securely, and reliably in production.
Key Practices:
Performance
- Use proper indexes
- Optimize queries
- Monitor regularly
Scalability
- Use sharding for large datasets
- Choose proper shard key
High Availability
- Use replica sets
- Enable automatic failover
Security
- Enable authentication
- Use TLS/SSL
- Apply RBAC
Backup & Recovery
- Regular backups
- Test restore process
Monitoring
- Use tools like
mongostat,mongotop - Track performance metrics
Golden Rule:
Design based on query patterns, not just data structure