Hash Index

Last updated Nov 8, 2022 Edit Source

# Hash Index

Idea: use a hash table

Take a search key and hash it into an integer in the range of 0 to B-1 where B is the number of buckets
A bucket array holds the headers of B linked lists, one for each bucket
If a record has search key K, store the record by linking it to bucket list number h(K) Implementations
We can directly hash a key which points to the record

Add a level of indirection: use an array of pointers to blocks to represent the buckets rather than an array holding data itself

# Static Hash

# Insertion and Deletion

Bucket overflow can be handled naively by adding additional pointer to a separate block.

# Extensible Hash Index

If the number of buckets is fixed from the start, we will end up with many additional pointers to additional overflow buckets. This incurs more I/O as more records are added.

Idea: use only the first i bits output from the hash function to point to a directory.

# Insertion and Deletion

Increase i when overflow: Update the directory structure: Delete implementations:

Merge blocks when removing record
Do not merge blocks at all

# Duplicate Keys

Unable to allocate different directory for duplicate keys:

Pasted image 20220910143851 400x400 i. 2 I/O ii. All blocks needs to be searched: 11 I/O 400x400 a. Hash index on ID b. B+ tree index on ID to support better range query c. Multi key index on ID and name

Brendan Ang

Hash Index

# Hash Index

# Static Hash

# Insertion and Deletion

# Extensible Hash Index

# Insertion and Deletion

# Duplicate Keys

# Practice Problems

Backlinks

Interactive Graph