Indexing in DBMS

Hash Indexing and Dynamic Hashing

In the previous article, we studied B-Tree and B+ Tree indexing.

Tree-based indexing is very efficient for range queries and ordered data retrieval.

Now we study another important indexing technique:

Hash Indexing

Hash indexing is mainly used for fast equality-based searches.


1. What is Hash Indexing?

Hash indexing uses a hash function to compute the location (bucket) where a record should be stored.

Instead of searching through a tree, we directly compute the address using:

Address = h(Key)

Where:

  • h is the hash function
  • The key is the search key.

This allows direct access to data.


2. How Hash Indexing Works

Step 1: Applythe  hash function to the key
Step 2: Compute bucket number
Step 3: Access the bucket directly

Example:

If h(Student_ID) = Student_ID mod 10

Student_ID = 123
h(123) = 3

The record is stored in bucket 3.


3. Static Hashing

In static hashing:

  • The number of buckets is fixed.
  • Hash function remains constant.

3.1 Advantages

  • Very fast equality search
  • Simple to implement

3.2 Disadvantages

  • Hash collisions may occur
  • If the file grows, the buckets may overflow
  • Performance degrades due to overflow chains

3.3 Hash Collision

A collision occurs when two keys map to the same bucket.

Example:

h(101) = 1
h(111) = 1

Both are stored in bucket 1.

Collision resolution methods:

  • Chaining
  • Open addressing

4. Dynamic Hashing

Static hashing is not suitable when the database size changes frequently.

Dynamic hashing adjusts the number of buckets automatically.

There are two main types:

  • Extendible Hashing
  • Linear Hashing

We will focus on Extendible Hashing.


5. Extendible Hashing

Extendible hashing uses:

  • A directory
  • Buckets
  • Global depth
  • Local depth

5.1 Key Concepts

Global Depth:

  • Number of bits used to index the directory.

Local Depth:

  • Number of bits used by a bucket.

5.2 How It Works

  1. Apply the hash function.
  2. Use the first few bits to access the directory.
  3. The directory points to the bucket.
  4. If the bucket overflows:
    • Split the bucket.
    • Increase local depth.
    • If needed, double the directory size.

This allows flexible growth without full reorganization.


5.3 Advantages

  • Efficient space utilization
  • No long overflow chains
  • Better performance than static hashing

6. Hash Indexing vs B+ Tree

Feature

Hash Indexing

B+ Tree

Equality Search

Very Fast

Fast

Range Query

Not Supported

Very Efficient

Ordered Data

No

Yes

Dynamic Growth

With Dynamic Hashing

Yes

Disk Access

Very Low

Low


7. When to Use Hash Indexing

Use hash indexing when:

  • Queries are mostly equality-based
    Example: WHERE ID = 101
  • Range queries are not required

Do not use hash indexing when:

  • Range queries are frequent
  • Ordered traversal is required

Summary

In this article, we studied:

  • Hash Indexing
  • Static Hashing
  • Collision handling
  • Dynamic Hashing
  • Extendible Hashing
  • Comparison with B+ Tree