Indexing in DBMS
Hash Indexing and Dynamic Hashing
In the previous article, we studied B-Tree and B+ Tree indexing.
Tree-based indexing is very efficient for range queries and ordered data retrieval.
Now we study another important indexing technique:
Hash Indexing
Hash indexing is mainly used for fast equality-based searches.
1. What is Hash Indexing?
Hash indexing uses a hash function to compute the location (bucket) where a record should be stored.
Instead of searching through a tree, we directly compute the address using:
Address = h(Key)
Where:
- h is the hash function
- The key is the search key.
This allows direct access to data.
2. How Hash Indexing Works
Step 1: Applythe hash function to the key
Step 2: Compute bucket number
Step 3: Access the bucket directly
Example:
If h(Student_ID) = Student_ID mod 10
Student_ID = 123
h(123) = 3
The record is stored in bucket 3.
3. Static Hashing
In static hashing:
- The number of buckets is fixed.
- Hash function remains constant.
3.1 Advantages
- Very fast equality search
- Simple to implement
3.2 Disadvantages
- Hash collisions may occur
- If the file grows, the buckets may overflow
- Performance degrades due to overflow chains
3.3 Hash Collision
A collision occurs when two keys map to the same bucket.
Example:
h(101) = 1
h(111) = 1
Both are stored in bucket 1.
Collision resolution methods:
- Chaining
- Open addressing
4. Dynamic Hashing
Static hashing is not suitable when the database size changes frequently.
Dynamic hashing adjusts the number of buckets automatically.
There are two main types:
- Extendible Hashing
- Linear Hashing
We will focus on Extendible Hashing.
5. Extendible Hashing
Extendible hashing uses:
- A directory
- Buckets
- Global depth
- Local depth
5.1 Key Concepts
Global Depth:
- Number of bits used to index the directory.
Local Depth:
- Number of bits used by a bucket.
5.2 How It Works
- Apply the hash function.
- Use the first few bits to access the directory.
- The directory points to the bucket.
- If the bucket overflows:
- Split the bucket.
- Increase local depth.
- If needed, double the directory size.
This allows flexible growth without full reorganization.
5.3 Advantages
- Efficient space utilization
- No long overflow chains
- Better performance than static hashing
6. Hash Indexing vs B+ Tree
| Feature | Hash Indexing | B+ Tree |
| Equality Search | Very Fast | Fast |
| Range Query | Not Supported | Very Efficient |
| Ordered Data | No | Yes |
| Dynamic Growth | With Dynamic Hashing | Yes |
| Disk Access | Very Low | Low |
7. When to Use Hash Indexing
Use hash indexing when:
- Queries are mostly equality-based
Example: WHERE ID = 101 - Range queries are not required
Do not use hash indexing when:
- Range queries are frequent
- Ordered traversal is required
Summary
In this article, we studied:
- Hash Indexing
- Static Hashing
- Collision handling
- Dynamic Hashing
- Extendible Hashing
- Comparison with B+ Tree