B+ Tree Index
# B+ Tree Index
Idea: build a multi-layer index in the structure of a B-tree
- Each tree node is stored within a block
- Each node stores at most n+1 pointers and n keys
- Each level is an index - sorted within each node - sorted across nodes at the same level
[!Leaf Node Properties] Consider a leaf node storing k+1 pointers (k <= n) and k keys
- First k pointers are to records and last one is to the next leaf node
- Each key is equal to the key that its corresponding pointer is pointing to in the record
[!Internal Node Properties] Consider an internal node storing k+1 pointers (k <= n) and k keys
- The ith key is the lower bound of the range of the i+1 pointer. The 2nd pointer points to a subtree that has the first key as the first element:
# Validity
# Searching
- If search key is greater than ith key, follow i+1 pointer, else follow ith pointer
# Insertion
- Find which node to insert record
- If node is not full, insert the record (maintain sorted order)
- Else
- Split the node and distribute the keys
- If no parent, create root node
- Else, insert into parent recursively until no splits needed
# Deletion
Case 1: key can be deleted while maintaining constraints Case 2: key can be borrowed from sibling nodes, e.g. take 16 from left: Case 3: cannot borrow from siblings
- Merge 2 nodes and delete 1 of them
- Delete key from parent (one child is removed, may need to remove key from parent)
- Recursively apply delete on this parent if it is not full enough
# Construction
One way to do so is through a series of insert operations No sequential storage of leaf nodes but is more suitable for dynamic data.
# Bulk Loading
Leaves will be stored sequentially, will work for static data (all data is known before hand)
- Sort all data entries based on search key
- Start by creating all leaf nodes by packing the keys
- Insert internal nodes bottom up
Height of the B+tree will be $log_{150}1000000=2.75$ Need at least 3 levels in the btree At most will take 3 I/O + 1 I/O to access data block
- Each internal node can index 151 children
- Last level indexes 150 records a. i. Data level: 1,000,000 records -> 100,000 blocks Leaf level will have 1,000,000 pointers = $1000000/70\approx14286$ blocks 2nd level will have $14286/70\approx205$ blocks 3rd level will have $205/70\approx3$ blocks Root will have 1 block Total blocks: $100000+14286+205+3=114494$ Height of the B-tree will at least be: $log_{70}1000000=3.25\approx4$ Number of I/O = 4+1 = 5 b. Same as a. Since dense index, order of the data record does not matter c. What if a) but sparse index? Leaf level will have 100,000 pointers to each data record block: $100000/70=1429$ blocks 2nd level: $1429/70\approx21$ Root: 1 Total blocks: $100,000+1429+21+1=101451$