1Q. How does querying work in MongoDB?

Querying in MongoDB is the process of retrieving documents from a collection based on specified conditions using MongoDB Query Language (MQL).

Key Points:

  • Uses find() method to fetch documents
  • Queries are written in JSON-like format
  • Supports filtering, sorting, and projection
  • Uses indexes to improve performance
  • Returns a cursor (iterator over results)

Example:

db.users.find({ age: { $gt: 25 } })

Internal Working:

  • MongoDB parses the query
  • Query planner selects optimal execution plan
  • Indexes are used if available
  • Documents are fetched and returned

2Q. What query operators are available in MongoDB?

Query operators in MongoDB are special keywords used to perform filtering, comparison, and logical operations on data.

Key Types of Operators:

Comparison Operators:

  • $eq, $ne
  • $gt, $gte
  • $lt, $lte
  • $in, $nin

Logical Operators:

  • $and → all conditions true
  • $or → any condition true
  • $not → negates condition
  • $nor → none true

Element Operators:

  • $exists → checks field existence
  • $type → checks data type

Array Operators:

  • $all, $elemMatch, $size

Evaluation Operators:

  • $regex → pattern matching
  • $expr → use expressions

Example:

db.users.find({ age: { $gte: 18, $lte: 30 } })

3Q. What is projection in MongoDB?

Projection in MongoDB is used to select specific fields to include or exclude from query results.

Key Points:

  • Controls output fields
  • Improves performance by reducing data transfer
  • Uses 1 to include and 0 to exclude
  • _id is included by default

Example:

db.users.find({}, { name: 1, age: 1, _id: 0 })

4Q. What is aggregation in MongoDB?

Aggregation in MongoDB is the process of transforming and analyzing data to produce summarized or computed results.

Key Points:

  • Similar to SQL operations like GROUP BY
  • Used for analytics and reporting
  • Performs calculations like sum, average, count
  • Works using aggregation pipeline

Example:

db.orders.aggregate([
{
$group: {
_id: "$status",
total: { $sum: "$amount" }
}
}
])

5Q. What is the aggregation framework in MongoDB?

The aggregation framework is a powerful feature in MongoDB used to process and transform data through a series of stages.

Key Points:

  • Pipeline-based processing
  • Each stage transforms data
  • Efficient for large datasets
  • Supports complex operations

Common Stages:

  • $match → filter data
  • $group → group data
  • $project → reshape fields
  • $sort → sort results
  • $limit → limit output
  • $lookup → join collections
  • $unwind → flatten arrays

Example:

db.orders.aggregate([
{ $match: { status: "completed" } },
{
$group: {
_id: "$userId",
total: { $sum: "$amount" }
}
}
])

6Q. What is an aggregation pipeline in MongoDB?

An aggregation pipeline is a sequence of stages where each stage processes data and passes the result to the next stage.

Key Points:

  • Data flows through multiple stages
  • Each stage performs a specific operation
  • Output of one stage becomes input of next
  • Enables complex data transformations

Flow:

Input → Stage → Stage → Stage → Output

Example:

db.sales.aggregate([
{ $match: { region: "India" } },
{
$group: {
_id: "$product",
totalSales: { $sum: "$amount" }
}
},
{ $sort: { totalSales: -1 } }
])

7Q. Explain $match, $group, $sort, $project in MongoDB

These are aggregation pipeline stages in MongoDB used to filter, transform, group, and sort data.

Key Stages:

$match

  • Filters documents (like WHERE in SQL)
  • Should be placed early for performance
{ $match: { status: "active" } }

$group

  • Groups documents based on a field
  • Performs aggregation (sum, avg, count)
{
$group: {
_id: "$category",
total: { $sum: "$amount" }
}
}

$sort

  • Sorts documents (ascending = 1, descending = -1)
{ $sort: { total: -1 } }

$project

  • Reshapes output documents
  • Selects or modifies fields
{ $project: { name: 1, total: 1, _id: 0 } }

Key Points:

  • Used inside aggregation pipeline
  • Each stage processes data step-by-step
  • Improves data analysis and transformation

8Q. What is $lookup and how does it work?

$lookup is an aggregation stage used to join documents from another collection, similar to a JOIN in SQL.

Key Points:

  • Performs left outer join
  • Combines data from multiple collections
  • Adds a new array field with matched documents

Syntax:

{
$lookup: {
from: "orders",
localField: "userId",
foreignField: "userId",
as: "userOrders"
}
}

Example:

db.users.aggregate([
{
$lookup: {
from: "orders",
localField: "_id",
foreignField: "userId",
as: "orders"
}
}
])

How it Works:

  • Matches localField with foreignField
  • Fetches matching documents
  • Adds them as an array

9Q. What is a compound index in MongoDB?

A compound index is an index on multiple fields in a document.

Key Points:

  • Improves performance for multi-field queries
  • Order of fields matters
  • Supports sorting and filtering

Example:

db.users.createIndex({ name: 1, age: -1 })

Important Concept:

  • Prefix Rule → Index works for:
    • { name }
    • { name, age }
    • ❌ Not for { age } alone

10Q. What are TTL indexes?

TTL (Time-To-Live) indexes automatically delete documents after a specified time.

Key Points:

  • Used for expiring data (logs, sessions)
  • Works on date fields
  • Background process removes expired documents

Example:

db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })

Use Cases:

  • Session management
  • Cache expiration
  • Temporary data cleanup

11Q. What are capped collections?

Capped collections are fixed-size collections that automatically overwrite old data when the size limit is reached.

Key Points:

  • Maintain insertion order
  • Very fast for writes
  • Do not allow document deletion (manual)
db.createCollection("logs", { capped: true, size: 100000 })

Use Cases:

  • Logging systems
  • Real-time data streams

12Q. What are geospatial indexes?

Geospatial indexes are used to store and query location-based data.

Key Points:

  • Supports queries like distance, location, area
  • Uses coordinates (latitude, longitude)
  • Enables location-based searches

Types:

  • 2d index → flat geometry
  • 2dsphere → spherical (earth-like)

Example:

db.places.createIndex({ location: "2dsphere" })

Use Cases:

  • Maps & navigation apps
  • Nearby location search
  • Delivery services

13Q. How does MongoDB ensure high availability?

MongoDB ensures high availability through replication, allowing data to be available even if a server fails.

Key Points:

  • Uses replica sets
  • Automatic failover
  • Data redundancy across multiple nodes
  • No downtime during failures

How it Works:

  • One primary node handles writes
  • Secondary nodes replicate data
  • If primary fails → secondary becomes primary

14Q. What is replication in MongoDB?

Replication is the process of copying data across multiple servers to ensure availability and reliability.

Key Points:

  • Maintains multiple copies of data
  • Protects against data loss
  • Improves read performance (reads from secondaries)

Working:

  • Primary node records operations
  • Secondary nodes replicate using oplog (operation log)

15Q. What is a replica set in MongoDB?

A replica set is a group of MongoDB servers that maintain the same data, providing redundancy and high availability.

Key Components:

  • Primary Node → handles writes
  • Secondary Nodes → replicate data
  • Arbiter (optional) → participates in elections

Key Points:

  • Automatic failover
  • Ensures data consistency
  • Supports scaling reads

Example Structure:

Primary → Secondary → Secondary
↑ replication ↑

16Q. What is automatic failover in MongoDB?

Automatic failover is the process where MongoDB automatically switches to a secondary node as the primary when the current primary node fails.

Key Points:

  • Ensures high availability
  • Happens in a replica set
  • No manual intervention required
  • New primary is elected automatically

How it Works:

  • Primary node fails or becomes unreachable
  • Secondary nodes hold an election
  • One secondary becomes the new primary
  • Application continues without downtime

Benefit:

  • Minimizes downtime
  • Ensures continuous read/write operations

17Q. What is sharding in MongoDB?

Sharding is a method of distributing data across multiple servers to handle large datasets and high traffic.

Key Points:

  • Enables horizontal scaling
  • Distributes data across shards
  • Each shard holds a portion of data
  • Improves performance and scalability

Components:

  • Shard → stores data
  • Config Server → stores metadata
  • Query Router (mongos) → routes queries

Example:

Large user data is split across multiple servers instead of storing in one.

18Q. How does sharding work internally?

Sharding internally distributes data based on a shard key and routes queries to the correct shard.

Key Points:

  • Data is split into chunks
  • Each chunk belongs to a shard
  • Balancer distributes chunks evenly
  • Queries are routed using mongos

Internal Flow:

  1. Data inserted → shard key evaluated
  2. Chunk created and assigned to shard
  3. Query comes → mongos checks metadata
  4. Request routed to correct shard(s)
  5. Results merged and returned

Important Concept:

  • Balancer automatically redistributes data
  • Prevents uneven data distribution

19Q. What is a shard key?

A shard key is a field (or set of fields) used to distribute data across shards.

Key Points:

  • Determines how data is partitioned
  • Must be present in every document
  • Cannot be changed after sharding
  • Impacts performance significantly

Example:

{ userId: 1 }

Good Shard Key Characteristics:

  • High cardinality (many unique values)
  • Even data distribution
  • Frequently used in queries

20Q. What are hashed shard keys?

Hashed shard keys use a hash of the shard key value to distribute data evenly across shards.

Key Points:

  • Prevents data skew
  • Ensures uniform distribution
  • Good for write-heavy workloads

Example:

db.users.createIndex({ userId: "hashed" })

When to Use:

  • When data distribution is uneven
  • When avoiding hotspots is important

Limitation:

  • Not ideal for range queries

21Q. Difference between horizontal and vertical scaling?

Scaling refers to increasing system capacity to handle more load.

Key Differences:

FeatureHorizontal ScalingVertical Scaling
MeaningAdd more serversUpgrade single server
ExampleShardingIncrease RAM/CPU
ScalabilityHighLimited
CostDistributed costExpensive hardware
Fault ToleranceHighLow

Key Points:

  • MongoDB mainly supports horizontal scaling via sharding
  • Vertical scaling has hardware limits

22Q. What is write concern in MongoDB?

Write concern defines the level of acknowledgment requested from MongoDB for write operations.

Key Points:

  • Controls data durability
  • Specifies how many nodes must confirm write
  • Impacts performance vs safety

Levels:

  • w: 1 → acknowledged by primary
  • w: "majority" → acknowledged by majority of nodes
  • w: 0 → no acknowledgment

Example:

db.users.insertOne(
{ name: "Jitendra" },
{ writeConcern: { w: "majority" } }
)

Trade-off:

  • Higher write concern → safer but slower
  • Lower write concern → faster but less reliable

23Q. What is read concern in MongoDB?

Read concern defines the consistency and isolation level of data returned in read operations.

Key Points:

  • Controls how up-to-date the data is
  • Ensures consistency across replica sets
  • Works with replication

Levels:

  • local → returns most recent data (default)
  • majority → returns only committed data
  • linearizable → strongest consistency

Example:

db.users.find().readConcern("majority")

Trade-off:

  • Higher read concern → strong consistency, slower
  • Lower read concern → faster, may return stale data

24Q. How does MongoDB handle data consistency in MongoDB?

MongoDB ensures data consistency using replication, write concern, read concern, and atomic operations.

Key Points:

  • Single document operations are atomic
  • Uses replica sets for consistency
  • Supports eventual consistency by default
  • Strong consistency can be achieved using read/write concern

How it Works:

  • Writes go to primary node
  • Data replicated to secondary nodes
  • Read concern ensures correct data visibility
  • Write concern ensures durability

Important Concept:

  • Eventual Consistency → data becomes consistent over time
  • Strong Consistency → using majority read/write

25Q. What is journaling?

Journaling is a mechanism used by MongoDB to log write operations to disk before applying them, ensuring data durability.

Key Points:

  • Prevents data loss during crashes
  • Writes are first recorded in a journal file
  • Ensures recovery after system failure

How it Works:

  1. Write operation occurs
  2. Data written to journal
  3. Then written to main data files
  4. On crash → recovery using journal

Benefit:

  • Guarantees durability
  • Improves reliability

26Q. What is the oplog?

The oplog (operation log) is a special capped collection that stores all write operations in a replica set.

Key Points:

  • Located in local.oplog.rs
  • Used for replication
  • Stores operations in chronological order
  • Enables secondaries to sync with primary

How it Works:

  • Primary logs operations in oplog
  • Secondary nodes read oplog
  • Apply operations to stay updated

Important:

  • Acts like a replication history log

27Q. What is GridFS and when is it used?

GridFS is a specification used to store and retrieve large files (larger than 16MB) in MongoDB.

Key Points:

  • Splits large files into smaller chunks
  • Stores chunks in separate collections
  • Efficient for large media storage

How it Works:

  • File divided into chunks (default 255KB)
  • Stored in:
    • fs.files → metadata
    • fs.chunks → actual data

Use Cases:

  • Storing images/videos
  • File storage systems
  • Large binary data

28Q. How does MongoDB handle schema design?

MongoDB uses a schema-less (flexible schema) approach, allowing documents to have different structures.

Key Points:

  • No fixed schema required
  • Documents in same collection can vary
  • Schema can evolve over time
  • Validation rules can be applied if needed

Design Approach:

  • Design schema based on application needs
  • Focus on query patterns
  • Optimize for read performance

29Q. Embedding vs Referencing – when to use which?

Embedding and referencing are two approaches to model relationships between data in MongoDB.

Embedding (Denormalization)

Key Points:

  • Stores related data in same document
  • Faster reads
  • No joins required

Example:

{
name: "Jitendra",
orders: [
{ item: "Book", price: 100 }
]
}

Referencing (Normalization)

Key Points:

  • Stores related data in separate collections
  • Uses references (IDs)
  • Requires $lookup for joins

Example:

{
name: "Jitendra",
orderIds: [101, 102]
}

When to Use:

EmbeddingReferencing
Small, related dataLarge or frequently changing data
Read-heavy workloadsWrite-heavy or complex relationships
One-to-few relationshipsOne-to-many or many-to-many

30Q. How do you optimize MongoDB queries?

Query optimization involves improving query performance by reducing execution time and resource usage.

Key Techniques:

  • Use indexes
  • Avoid full collection scans
  • Use projection to limit fields
  • Use $match early in aggregation
  • Limit results using $limit
  • Use proper schema design

Advanced Tips:

  • Use compound indexes
  • Avoid large documents
  • Optimize queries based on access patterns

31Q. What is explain()?

Definition:

explain() is a method used to analyze how MongoDB executes a query.

Key Points:

  • Shows query execution plan
  • Helps identify performance issues
  • Displays index usage

Example:

db.users.find({ age: 25 }).explain("executionStats")

Output Includes:

  • COLLSCAN or IXSCAN
  • Execution time
  • Number of documents examined

32Q. What is a covered query?

A covered query is a query where all required fields are retrieved from the index only, without accessing the actual documents.

Key Points:

  • No disk I/O required
  • Very fast performance
  • Requires proper indexing

Example:

db.users.createIndex({ name: 1, age: 1 })

db.users.find(
{ name: "Jitendra" },
{ name: 1, age: 1, _id: 0 }
)

Condition:

  • All queried fields must be in index

33Q. How do you import and export data in MongoDB?

MongoDB provides tools to import and export data between MongoDB and external formats.

Tools:

Import:

  • mongoimport → import JSON/CSV
mongoimport --db test --collection users --file users.json

Export:

  • mongoexport → export data
mongoexport --db test --collection users --out users.json

Key Points:

  • Supports JSON and CSV
  • Useful for data migration

34Q. How do you backup and restore MongoDB?

Backup and restore are processes used to protect and recover data in MongoDB.

Tools:

Backup:

  • mongodump → creates backup
mongodump --db test --out /backup

Restore:

  • mongorestore → restores backup
mongorestore /backup

Key Points:

  • Supports full and partial backups
  • Essential for disaster recovery
  • Can be automated 

35Q. What is MongoDB Atlas?

MongoDB Atlas is a fully managed cloud database service provided by MongoDB that handles deployment, scaling, and maintenance automatically.

Key Points:

  • Fully managed database (DBaaS)
  • Runs on cloud providers (AWS, GCP, Azure)
  • Automatic backups, scaling, and updates
  • Built-in security (encryption, access control)
  • Global cluster support

Features:

  • Auto-scaling
  • Monitoring dashboards
  • Backup & restore
  • High availability

Use Case:

  • Best for production apps without managing infrastructure

36Q. Atlas vs Self-Hosted MongoDB

Comparison between managed cloud MongoDB (Atlas) and manually managed MongoDB servers.

Key Differences:

FeatureMongoDB AtlasSelf-Hosted MongoDB
SetupFully managedManual setup
MaintenanceAutomaticManual
ScalingAuto scalingManual scaling
BackupBuilt-inManual
CostPay-as-you-goInfrastructure cost
ControlLimitedFull control

Key Points:

  • Atlas → Easy, fast, scalable
  • Self-hosted → More control, complex setup

37Q. What is change stream?

Change streams allow applications to listen to real-time changes (insert, update, delete) in MongoDB collections.

Key Points:

  • Works on replica sets and sharded clusters
  • Provides real-time data updates
  • Uses oplog internally
  • Useful for event-driven systems

Example:

db.collection.watch()

Use Cases:

  • Real-time notifications
  • Live dashboards
  • Event-driven architectures

38Q. How do you paginate results?

Pagination is the process of retrieving data in small chunks (pages) instead of loading all data at once.

Methods:

Using skip() and limit()

db.users.find().skip(10).limit(5)

Using Range-Based Pagination (Better)

db.users.find({ _id: { $gt: lastId } }).limit(5)

Key Points:

  • skip() is slow for large data
  • Range-based pagination is more efficient
  • Improves performance and UX

39Q. How do you monitor MongoDB performance?

Monitoring MongoDB performance involves tracking database metrics to ensure efficient operation.

Key Tools:

  • MongoDB Atlas monitoring dashboard
  • mongostat → real-time stats
  • mongotop → read/write activity
  • Logs and profiling

Key Metrics:

  • Query execution time
  • Index usage
  • CPU and memory usage
  • Disk I/O
  • Connections

Key Points:

  • Helps detect bottlenecks
  • Improves performance tuning
  • Essential for production systems

40Q. What are common MongoDB performance issues?

Performance issues in MongoDB occur when queries or operations are inefficient, leading to slow response times.

Common Issues:

Missing Indexes

  • Causes full collection scan (COLLSCAN)

Poor Query Design

  • Inefficient filters
  • Not using projection

Large Documents

  • Slower read/write operations

Unoptimized Aggregation

  • Heavy pipelines without $match early

High Memory Usage

  • Large working set not fitting in RAM

Improper Shard Key

  • Uneven data distribution (hotspots)

Solutions:

  • Create proper indexes
  • Use explain() to analyze queries
  • Optimize schema design
  • Use projection and limit
  • Monitor performance regularly