Indexes & Optimization
Overview
Section titled “Overview”HaruDB uses B-tree indexing for efficient data retrieval and query optimization. B-trees provide excellent performance for range queries, equality searches, and ordered data access while maintaining balanced tree structure.
What is a B-tree?
Section titled “What is a B-tree?”A B-tree (Balanced Tree) is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time.
Key Characteristics of B-trees:
Section titled “Key Characteristics of B-trees:”- Balanced: All leaf nodes are at the same level
- Ordered: Data is stored in sorted order
- Multi-way: Each node can have multiple children (not just 2)
- Efficient: O(log n) time complexity for all operations
- Range queries: Excellent for range-based searches
B-tree Structure
Section titled “B-tree Structure”Node Structure
Section titled “Node Structure”Each B-tree node contains:
- Keys: Sorted array of values
- Pointers: References to child nodes or data records
- Leaf flag: Indicates if node is a leaf or internal node
Internal Node:[Key1] [Key2] [Key3] | | | |Child1 Child2 Child3 Child4
Leaf Node:[Key1] [Key2] [Key3] -> [Data1] [Data2] [Data3]
B-tree Properties
Section titled “B-tree Properties”For a B-tree of order m
:
- Root node: Can have 1 to
m
children - Internal nodes: Can have
⌈m/2⌉
tom
children - Leaf nodes: Can contain
⌈m/2⌉-1
tom-1
keys - All leaves: At the same depth
Step-by-Step B-tree Operations
Section titled “Step-by-Step B-tree Operations”1. B-tree Search Algorithm
Section titled “1. B-tree Search Algorithm”Goal: Find a key in the B-tree
Steps:
- Start at the root node
- Compare the search key with keys in current node
- If key is found, return the data
- If key is not found, determine which child to traverse
- Recursively search in the appropriate child
- Continue until key is found or reach a leaf
Example:
Search for key "M" in B-tree:
Root: [G, P] | | [A,C,E] [M,N,R]
1. Compare "M" with root keys [G, P]2. "M" > "G" and "M" < "P", so go to middle child3. Compare "M" with [M, N, R]4. "M" == "M", found!
2. B-tree Insertion Algorithm
Section titled “2. B-tree Insertion Algorithm”Goal: Insert a new key while maintaining B-tree properties
Steps:
- Search for the appropriate leaf node
- Insert the key in sorted order
- If leaf is not full, insertion complete
- If leaf is full, split the node:
- Create new node
- Move half keys to new node
- Promote middle key to parent
- Update parent pointers
- If parent becomes full, repeat splitting process
- If root splits, create new root
Example:
Insert "Q" into B-tree:
Before: Root [G, P] | | [A,C,E] [M,N,R]
1. Find leaf: [M,N,R]2. Insert "Q": [M,N,Q,R] (now full)3. Split leaf: [M,N] and [Q,R]4. Promote "Q" to parent5. Update parent: [G,P,Q]6. Update pointers
After: Root [G, P, Q] | | | [A,C,E] [M,N] [Q,R]
3. B-tree Deletion Algorithm
Section titled “3. B-tree Deletion Algorithm”Goal: Remove a key while maintaining B-tree properties
Steps:
- Search for the key to delete
- If key is in leaf:
- Simply remove it
- If leaf becomes under-full, borrow from sibling or merge
- If key is in internal node:
- Replace with predecessor/successor from leaf
- Delete the predecessor/successor
- Handle under-full nodes by borrowing or merging
- Propagate changes up the tree
Example:
Delete "M" from B-tree:
Before: Root [G, P, Q] | | | [A,C,E] [M,N] [Q,R]
1. "M" is in leaf [M,N]2. Remove "M": [N] (now under-full)3. Borrow from sibling [Q,R]: [N,Q] and [R]4. Update parent: [G,P] (no longer needs Q)5. Merge with left sibling: [G] and [P]
After: Root [G, P] | | [A,C,E] [N,Q,R]
HaruDB B-tree Implementation
Section titled “HaruDB B-tree Implementation”Creating B-tree Indexes
Section titled “Creating B-tree Indexes”-- Create B-tree index on single columnCREATE INDEX ON users (email);
-- Create B-tree index on multiple columns (composite)CREATE INDEX ON orders (customer_id, order_date);
-- Create B-tree index on numeric columnCREATE INDEX ON products (price);
B-tree Index Properties in HaruDB
Section titled “B-tree Index Properties in HaruDB”- Order: Default order of 3 (can be configured)
- Persistence: Index metadata saved to disk
- Rebuilding: Indexes rebuilt on server startup
- Memory: Indexes stored in memory for fast access
- Balancing: Automatic rebalancing on insertions/deletions
Query Optimization with B-trees
Section titled “Query Optimization with B-trees”Equality Queries
Section titled “Equality Queries”B-trees excel at equality searches:
-- Uses B-tree index efficientlySELECT * FROM users WHERE email = 'alice@example.com';
-- Composite index usageSELECT * FROM orders WHERE customer_id = '123' AND order_date = '2024-01-15';
Performance: O(log n) time complexity
Range Queries
Section titled “Range Queries”B-trees are excellent for range queries:
-- Range queries use B-tree efficientlySELECT * FROM products WHERE price BETWEEN 100 AND 500;
-- Greater than queriesSELECT * FROM users WHERE created_at > '2024-01-01';
-- Less than queriesSELECT * FROM orders WHERE total < 1000;
Performance: O(log n + k) where k is result set size
Ordered Queries
Section titled “Ordered Queries”B-trees maintain sorted order:
-- Ordered results (no additional sorting needed)SELECT * FROM products ORDER BY price;
-- Reverse orderSELECT * FROM users ORDER BY created_at DESC;
Performance: O(log n + k) for ordered traversal
B-tree vs Other Index Types
Section titled “B-tree vs Other Index Types”B-tree vs Hash Index
Section titled “B-tree vs Hash Index”Feature | B-tree | Hash Index |
---|---|---|
Equality queries | O(log n) | O(1) |
Range queries | O(log n + k) | Not supported |
Ordered queries | O(log n + k) | Not supported |
Memory usage | Moderate | Low |
Maintenance | Automatic | Simple |
B-tree vs Binary Search Tree
Section titled “B-tree vs Binary Search Tree”Feature | B-tree | Binary Tree |
---|---|---|
Height | O(log n) | O(log n) worst case |
Balancing | Automatic | Manual (AVL/Red-Black) |
Node size | Multiple keys | Single key |
Cache efficiency | Better | Poor |
Range queries | Excellent | Good |
Advanced B-tree Concepts
Section titled “Advanced B-tree Concepts”B-tree Order and Performance
Section titled “B-tree Order and Performance”Order 3 B-tree (2-3 tree):
- Each node has 1-2 keys
- Each node has 2-3 children
- Good for small datasets
Order 5 B-tree:
- Each node has 2-4 keys
- Each node has 3-5 children
- Better for larger datasets
Order 7 B-tree:
- Each node has 3-6 keys
- Each node has 4-7 children
- Optimal for very large datasets
B-tree Splitting Strategy
Section titled “B-tree Splitting Strategy”-- Example of B-tree splitting during insertion-- Order 3 B-tree (2-3 tree)
Before insertion:Root: [50] | [20, 30] [70, 80]
Insert 25:1. Find leaf: [20, 30]2. Insert 25: [20, 25, 30] (now full)3. Split: [20] and [25, 30]4. Promote 25 to parent5. New root: [25, 50]6. Update pointers
After insertion:Root: [25, 50] | | [20] [30] [70, 80]
B-tree Merging Strategy
Section titled “B-tree Merging Strategy”-- Example of B-tree merging during deletion-- Order 3 B-tree (2-3 tree)
Before deletion:Root: [25, 50] | | [20] [30] [70, 80]
Delete 20:1. Remove 20 from leaf [20]2. Leaf becomes empty3. Merge with right sibling [30]4. New leaf: [30]5. Remove 25 from parent6. Parent becomes [50]
After deletion:Root: [50] | [30] [70, 80]
Performance Optimization
Section titled “Performance Optimization”Index Selection Strategy
Section titled “Index Selection Strategy”- Primary keys: Always indexed automatically
- Foreign keys: Index for join performance
- Frequently queried columns: Create indexes
- Range query columns: B-tree indexes are ideal
- Composite indexes: For multi-column queries
Index Maintenance
Section titled “Index Maintenance”-- Check index usage (if supported)EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
-- Rebuild indexes (if needed)-- HaruDB automatically rebuilds indexes on startup
-- Monitor index performance-- Use query timing to measure improvement
Best Practices
Section titled “Best Practices”- Create indexes on frequently queried columns
- Use composite indexes for multi-column queries
- Avoid over-indexing (too many indexes slow down writes)
- Monitor query performance after index creation
- Consider index order based on query patterns
Complete B-tree Example
Section titled “Complete B-tree Example”Database Schema
Section titled “Database Schema”-- Create tablesCREATE TABLE customers (id, name, email, created_at);CREATE TABLE orders (id, customer_id, product_id, quantity, total, order_date);CREATE TABLE products (id, name, price, category);
-- Insert sample dataINSERT INTO customers VALUES (1, 'Alice', 'alice@example.com', '2024-01-01');INSERT INTO customers VALUES (2, 'Bob', 'bob@example.com', '2024-01-02');INSERT INTO customers VALUES (3, 'Charlie', 'charlie@example.com', '2024-01-03');
INSERT INTO products VALUES (1, 'Laptop', '999.99', 'Electronics');INSERT INTO products VALUES (2, 'Mouse', '29.99', 'Electronics');INSERT INTO products VALUES (3, 'Keyboard', '79.99', 'Electronics');
INSERT INTO orders VALUES (1, 1, 1, 1, '999.99', '2024-01-15');INSERT INTO orders VALUES (2, 1, 2, 2, '59.98', '2024-01-15');INSERT INTO orders VALUES (3, 2, 3, 1, '79.99', '2024-01-16');
B-tree Index Creation
Section titled “B-tree Index Creation”-- Create B-tree indexesCREATE INDEX ON customers (email);CREATE INDEX ON customers (created_at);CREATE INDEX ON orders (customer_id);CREATE INDEX ON orders (order_date);CREATE INDEX ON products (price);CREATE INDEX ON products (category);
-- Composite B-tree indexCREATE INDEX ON orders (customer_id, order_date);
Query Examples with B-tree Optimization
Section titled “Query Examples with B-tree Optimization”-- Equality query (uses email index)SELECT * FROM customers WHERE email = 'alice@example.com';
-- Range query (uses created_at index)SELECT * FROM customers WHERE created_at BETWEEN '2024-01-01' AND '2024-01-02';
-- Range query (uses price index)SELECT * FROM products WHERE price > 50 AND price < 200;
-- Composite query (uses composite index)SELECT * FROM orders WHERE customer_id = '1' AND order_date = '2024-01-15';
-- Ordered query (uses price index for sorting)SELECT * FROM products ORDER BY price;
-- Complex query with multiple indexesSELECT c.name, o.total, p.nameFROM customers c, orders o, products pWHERE c.id = o.customer_id AND o.product_id = p.id AND c.email = 'alice@example.com' AND o.order_date > '2024-01-01';
B-tree Visualization
Section titled “B-tree Visualization”Visual Representation
Section titled “Visual Representation”B-tree of order 3 (2-3 tree):
[50] / \ [20, 30] [70, 80] / | \ / | \ [10] [25] [35] [60] [75] [90]
Key properties:- All leaves at same level- Each internal node has 1-2 keys- Each internal node has 2-3 children- Keys are sorted within each node
Insertion Visualization
Section titled “Insertion Visualization”Step 1: Insert 15 [50] / \ [20, 30] [70, 80] / | \ / | \ [10,15] [25] [35] [60] [75] [90]
Step 2: Insert 5 (causes split) [50] / \ [15, 30] [70, 80] / | \ / | \ [5,10] [20,25] [35] [60] [75] [90]
Troubleshooting B-tree Indexes
Section titled “Troubleshooting B-tree Indexes”Common Issues
Section titled “Common Issues”-
Index not being used:
- Check query syntax
- Verify index exists
- Ensure proper WHERE clause
-
Slow index performance:
- Check B-tree order
- Monitor memory usage
- Consider index rebuilding
-
Index maintenance overhead:
- Balance read vs write performance
- Monitor index size
- Consider index consolidation
Performance Monitoring
Section titled “Performance Monitoring”-- Monitor query performance-- Before index creationSELECT * FROM users WHERE email = 'alice@example.com';-- Note: execution time
-- After index creationCREATE INDEX ON users (email);SELECT * FROM users WHERE email = 'alice@example.com';-- Note: improved execution time
For more information about HaruDB’s storage engine and WAL implementation, see the Storage Engine documentation and WAL documentation.