Partitioning is the technique of splitting a large table or index into smaller, manageable pieces called partitions, each stored separately but still treated as a single logical object by the DBMS. Partitioning is often applied to big fact tables in data‑warehouse‑like environments where a single table can grow to millions or billions of rows.

Partitioning can be done by different criteria:

  • Range partitioning:

    • Partitions are based on ranges of a key, such as date, month, or sales_amount.

  • List partitioning:

    • Each partition holds rows for a predefined list of values (for example, different regions or categories).

  • Hash partitioning:

    • A hash function spreads rows across partitions to balance size and access.

How Partitioning Helps Performance

  • Faster queries (partition pruning):

    • When a query includes the partitioning key in the WHERE clause, the DBMS can skip entire partitions and read only those that may contain matching data.

  • Easier maintenance:

    • Old partitions (such as data older than a certain date) can be dropped or archived quickly without scanning the whole table.

  • Improved parallelism:

    • Different partitions can be scanned or updated in parallel, speeding up bulk operations like backups, exports, or index builds.

When to Use Partitioning

  • Large tables with millions of rows.

  • Workloads that naturally group data by time, region, or category.

  • Systems where some data is frequently accessed and other data is rarely used but must be retained.

For beginners, partitioning is like cutting a huge book into smaller volumes by chapter or year: you still have one logical book, but queries can open only the relevant volume instead of scanning the entire book every time.

Summary

Partitioning in DBMS divides large tables or indexes into smaller, physically separate partitions while keeping them logically unified. It improves query performance through partition pruning, simplifies maintenance of large datasets, and supports efficient archiving and parallel operations, making it a powerful tool for scaling performance in databases with big tables.