DSA Blog Series - Part 2: Intermediate Data Structures
DSA Blog Series - Part 2: Intermediate Data Structures
🌳 Intermediate Data Structures
11. Hash Tables: Hash Functions & Collision Handling
Hash tables are one of the most powerful data structures, offering O(1) average-case lookup, insertion, and deletion. They work by transforming keys into array indices using a hash function.
How it works: A hash function takes your key (like "apple") and converts it to a number (say, 42). That number becomes the index where the value is stored. Need the value later? Hash the key again, get 42, and retrieve it instantly.
The collision problem: What if "apple" and "grape" both hash to 42? That's a collision. Two main solutions exist:
Chaining: Each array slot holds a linked list. Multiple keys hashing to the same index go into the same list. Lookup becomes O(1 + k) where k is the list length—still fast with a good hash function.
Open addressing: Find another empty slot using probing (linear, quadratic, or double hashing). If slot 42 is taken, try 43, then 44, until you find an empty spot.
Real-world applications: Database indexing, caching systems, symbol tables in compilers, detecting duplicates, and every dictionary/map structure in modern programming languages.
The magic of hash tables is turning complex search problems into simple array lookups. Master them, and you'll solve problems that would otherwise take O(n) time in O(1).
12. Trees: Terminology, Types, and Traversals
Trees are hierarchical data structures with a root node and child nodes forming parent-child relationships. Unlike linked lists (linear), trees branch out, enabling efficient organisation and search.
Key terminology:
- Root: The topmost node
- Leaf: Node with no children
- Height: Longest path from root to leaf
- Depth: Distance from root to a specific node
- Subtree: A node and all its descendants
Types of trees:
- Binary tree: Each node has at most 2 children
- Binary Search Tree (BST): Left child < parent < right child
- Complete tree: All levels filled except possibly the last, which fills left to right
- Full tree: Every node has 0 or 2 children
- Perfect tree: All levels completely filled
Traversals:
- Inorder (Left-Root-Right): Gives sorted order in BST
- Preorder (Root-Left-Right): Useful for copying trees
- Postorder (Left-Right-Root): Useful for deleting trees
- Level-order (BFS): Process level by level
Trees model hierarchical relationships naturally—file systems, organisational charts, HTML DOM, decision trees in AI. Understanding tree traversals is crucial for solving countless algorithmic problems.
A Binary Search Tree is a binary tree with a special property: for every node, all values in the left subtree are smaller, and all values in the right subtree are larger. This property enables efficient searching.
Core operations:
- Search: Compare target with current node. Go left if smaller, right if larger. O(h) time where h is height.
- Insert: Find the correct position using search logic, then add the node. O(h) time.
- Delete: Three cases—node with no children (just remove), one child (bypass the node), two children (replace with inorder successor/predecessor). O(h) time.
The critical pitfall: BST operations are O(h), but height varies based on insertion order. A balanced tree has h = log n, giving O(log n) operations. But insert sorted data (1, 2, 3, 4, 5...), and your BST degenerates into a linked list with h = n, making operations O(n).
Example: Insert [5, 3, 7, 1, 9] creates a nice balanced tree. Insert [1, 2, 3, 4, 5] creates a right-skewed chain—terrible performance.
Solution: Self-balancing trees like AVL or Red-Black trees maintain O(log n) height regardless of insertion order.
BSTs are fundamental, but in production code, always use self-balancing variants. Understanding BSTs prepares you for more advanced tree structures.
14. Heaps & Priority Queues
A heap is a complete binary tree satisfying the heap property: in a max-heap, every parent is greater than its children; in a min-heap, every parent is smaller. This structure enables efficient priority-based operations.
Implementation: Heaps are typically stored in arrays. For node at index i:
- Left child: 2i + 1
- Right child: 2i + 2
- Parent: (i - 1) / 2
This array representation is space-efficient and cache-friendly.
Core operations:
- Insert: Add to end, then "bubble up" to restore heap property—O(log n)
- Extract max/min: Remove root, move last element to root, then "bubble down"—O(log n)
- Peek: View root without removing—O(1)
- Heapify: Build heap from array—O(n) surprisingly, not O(n log n)
Priority queues: Abstract data type implemented using heaps. Elements have priorities; highest priority element is served first. Perfect for task scheduling, Dijkstra's algorithm, Huffman coding.
Real-world applications:
- Operating system task scheduling
- Event-driven simulation
- Finding k largest/smallest elements
- Median maintenance in streaming data
Heaps are the go-to structure when you need repeated access to the maximum or minimum element. They're efficient, elegant, and surprisingly versatile.
15. Tries (Prefix Trees) and Their Real-World Uses
A trie (pronounced "try") is a tree structure specialised for storing strings, where each path from root to leaf represents a word, and nodes share common prefixes.
Structure: Each node contains:
- An array/map of child nodes (one per possible character)
- A boolean marking if it's the end of a word
The word "cat" shares the path C-A with "car", saving space and enabling efficient prefix operations.
Core operations:
- Insert: Follow/create path for each character—O(m) where m is word length
- Search: Follow path; check if word-ending flag is set—O(m)
- Prefix search: Check if any word starts with given prefix—O(m)
- Delete: Mark word-ending flag as false; optionally prune unused nodes
Space tradeoff: Tries use more memory than hash tables but excel at prefix operations. A hash table can't efficiently find all words starting with "app".
Real-world applications:
- Autocomplete: Type "goog" → suggests "google", "good", "goodbye"
- Spell checkers: Find words with similar prefixes
- IP routing: Match longest prefix in routing tables
- Text prediction: T9 keyboards, search suggestions
- Dictionary implementations: Fast word lookup and prefix queries
Tries are perfect when you need to work with prefixes, suffixes, or pattern matching in strings. They transform certain O(n) problems into O(m) solutions, where m is typically much smaller.
🌲 Advanced Trees
16. AVL Trees and Self-Balancing TreesAVL trees are self-balancing Binary Search Trees that maintain O(log n) height regardless of insertion order. They're named after inventors Adelson-Velsky and Landis.
The balance factor: For every node, the height difference between left and right subtrees must be at most 1. Balance factor = height(left) - height(right) ∈ {-1, 0, 1}.
Rotations: When insertion/deletion violates balance, AVL trees restore it using rotations:
- Right rotation: Used when left subtree is too heavy
- Left rotation: Used when right subtree is too heavy
- Left-Right rotation: First left, then right
- Right-Left rotation: First right, then left
Operations:
- Search: Same as BST—O(log n) guaranteed
- Insert: Insert like BST, then traverse back up fixing imbalances—O(log n)
- Delete: Delete like BST, then fix imbalances—O(log n)
Cost: AVL trees are more rigidly balanced than Red-Black trees, meaning faster lookups but slower insertions/deletions due to more frequent rotations.
When to use: Choose AVL trees for lookup-heavy applications (databases, file systems) where search speed matters more than insertion speed. For write-heavy workloads, Red-Black trees are better.
AVL trees guarantee the worst-case O(log n) that basic BSTs promise but can't deliver.
17. Red-Black Trees Explained Simply
Red-Black trees are self-balancing BSTs with relaxed balance constraints compared to AVL trees. Each node is colored red or black, and these colors enforce balance rules.
Five rules:
- Every node is red or black
- Root is always black
- All leaves (NULL nodes) are black
- Red nodes cannot have red children (no two reds in a row)
- Every path from root to leaf contains the same number of black nodes
These rules ensure the tree's height is at most 2 × log(n + 1), guaranteeing O(log n) operations.
Why better than AVL for some cases? Red-Black trees require fewer rotations during insertion/deletion. They're slightly less balanced (height can be up to 2x optimal vs AVL's 1.44x), but modification operations are faster.
Real-world usage:
- Java's TreeMap and TreeSet
- C++ STL map and set
- Linux kernel's Completely Fair Scheduler
- Many database implementations
The tradeoff: AVL trees: faster searches, slower modifications. Red-Black trees: slightly slower searches, faster modifications. For most applications, the difference is negligible, but Red-Black trees' lower maintenance cost makes them the industry favourite.
You don't need to memorize every recoloring rule, but understanding why these trees stay balanced helps you choose the right structure.
18. Segment Trees for Range Queries
Segment trees solve range query problems efficiently. Given an array, answer queries like "What's the sum/min/max of elements from index L to R?" and handle updates.
The naive approach: For each query, loop from L to R—O(n) per query. With many queries, this becomes too slow.
Segment tree approach: Build a binary tree where:
- Leaves represent individual array elements
- Internal nodes represent merged data from a range
- Root represents the entire array
Building: O(n) time to construct. Each node stores aggregate info (sum, min, max) for its range.
Operations:
- Query: Find relevant nodes covering [L, R] and combine their results—O(log n)
- Update: Update a single element and propagate changes up—O(log n)
- Range update: With lazy propagation, update ranges efficiently—O(log n)
Lazy propagation: Defer updates to children until necessary. Mark nodes as "pending update" and propagate only when querying that subtree. This optimises range updates from O(n) to O(log n).
Real-world applications:
- Database range queries
- Graphics rendering (finding objects in a viewport)
- Computational geometry
- Competitive programming staple
Segment trees turn O(n) per query into O(log n), making them essential for range query problems with frequent updates.
19. Fenwick Tree (Binary Indexed Tree)
Fenwick Trees, or Binary Indexed Trees (BIT), solve prefix sum queries and updates efficiently—similar to segment trees but simpler and more space-efficient.
The problem: Given an array, support two operations:
- Update an element
- Find sum of elements from index 0 to k (prefix sum)
Naive approach: Update is O(1), but prefix sum is O(n). Or maintain prefix sums with O(n) updates and O(1) queries.
Fenwick tree approach: Both operations in O(log n) using a clever binary indexing trick.
How it works: Each index in the BIT stores the sum of a range of elements. The range length is determined by the position of the rightmost set bit in the binary representation of the index.
Operations:
- Update: Add value to the element and propagate to relevant ranges—O(log n)
- Query: Sum prefix ranges by traversing parent indices—O(log n)
Why use over segment trees?
- Simpler implementation (20 lines vs 100+)
- Less memory (same size as input array)
- Faster constant factors
Limitation: Fenwick trees excel at prefix operations but can't handle arbitrary range queries as elegantly as segment trees. For range [L, R], compute prefix(R) - prefix(L-1).
Use cases: Competitive programming, cumulative frequency tables, inversion counting, order statistics.
Fenwick trees are the lightweight alternative to segment trees—perfect when you only need prefix operations.
20. B-Trees and Database Indexing
B-Trees are self-balancing search trees designed for disk-based storage systems. Unlike binary trees, each node can have many children (hundreds or thousands), minimising disk reads.
Why B-Trees for databases? Disk I/O is 100,000x slower than RAM access. A binary search tree on disk would require O(log n) disk reads. B-Trees reduce this dramatically by increasing branching factor.
Properties:
- Each node contains multiple keys (sorted) and children pointers
- All leaves are at the same level
- Minimum degree t: each node has between t-1 and 2t-1 keys (except root)
- A node with k keys has k+1 children
Example: A B-Tree with t=100 stores 99-199 keys per node. A tree with 1 billion entries has height ≤ 4, meaning at most 4 disk reads for any search.
Operations:
- Search: Navigate down the tree comparing keys—O(log n) but with very small log base
- Insert: Add to leaf; if full, split and promote median key up—O(log n)
- Delete: Remove key; rebalance by borrowing or merging nodes—O(log n)
B+ Trees (variant): Leaf nodes form a linked list, enabling efficient range scans. Used in most relational databases (MySQL, PostgreSQL, SQLite).
Real-world usage:
- Database indexing (primary keys, secondary indexes)
- File systems (NTFS, HFS+, ext4)
- Key-value stores
B-Trees optimise for the realities of hardware, making them indispensable for systems dealing with large datasets on disk.
What's Next?
You've now covered intermediate and advanced tree structures, plus hash tables—the workhorses of modern computing. These structures power databases, file systems, and countless applications where efficiency matters.
Practice Tips:
- Implement a hash table with chaining from scratch
- Code all three tree traversals (inorder, preorder, postorder)
- Build a min-heap and implement heap sort
- Create a simple trie for autocomplete functionality
- Solve segment tree problems on platforms like Codeforces
Comments
Post a Comment