Hash Index
# Hash Index
Idea: use a hash table
- Take a search key and hash it into an integer in the range of 0 to B-1 where B is the number of buckets
- A bucket array holds the headers of B linked lists, one for each bucket
- If a record has search key K, store the record by linking it to bucket list number h(K) Implementations
- We can directly hash a key which points to the record
- Add a level of indirection: use an array of pointers to blocks to represent the buckets rather than an array holding data itself
# Static Hash
# Insertion and Deletion
Bucket overflow can be handled naively by adding additional pointer to a separate block.
# Extensible Hash Index
If the number of buckets is fixed from the start, we will end up with many additional pointers to additional overflow buckets. This incurs more I/O as more records are added.
Idea: use only the first i bits output from the hash function to point to a directory.
# Insertion and Deletion
Increase i when overflow: Update the directory structure: Delete implementations:
- Merge blocks when removing record
- Do not merge blocks at all
# Duplicate Keys
Unable to allocate different directory for duplicate keys:
# Practice Problems
i. 2 I/O ii. All blocks needs to be searched: 11 I/O a. Hash index on ID b. B+ tree index on ID to support better range query c. Multi key index on ID and name