Today I learnt about the Unordered Maps under the hood, it's a beautiful blend of Mathematics and coding and i think i dove as deep as i could get into it.
In C++, std::unordered_map is our ultimate, trusty tool for O(1) lookups, insertions, and deletions. But when we treat it as a black box, we miss out on a fascinating combination of memory layout engineering and modular arithmetic. Allow me to peel back the layers and see how it actually runs.
What you've probably learnt in school
We learnt that the time complexity of insertion, deletion and access is O(1). But this is only true when there are no collisions. The worst case scenario is O(N) when all the keys are hashed to the same bucket, but that's extremely unlikely due to how hash maps are built (more on this later).
We also learn about amortisation, the idea of paying for the cost earlier on, but "dividing" the upfront cost over a large table size so as to bring the average cost down. But how exactly does it make it O(1)? Well this is where we introduce separate chaining, load factors and the beautiful use of prime numbers in the hash function.
The Foundation: Hash Tables and Separate Chaining
At its core, C++ Hash Maps use a technique called separate chaining to resolve collisions.
When you insert a key-value pair, the map takes the key, runs it through a hash function (usually a prime number for modulo), and maps it to a bucket index within a bucket array of size N:
index = hash_function(key) % N
Conceptually, we are taught that a hash map is made up of a hash table (an array) where each index entry (bucket) within the hash table holds its own independent linked list. If multiple keys hash to the same bucket, they are simply appended to that bucket's linked list.
However, when I took a look at the actual C++ Standard Template Library (STL) implementation (specifically in GNU's libstdc++), I realised there was a difference.
Conceptual vs. STL Implementation: Why the Difference?
Implementation-wise under the hood, the STL doesn't actually store a bunch of separate, independent linked lists. Instead, it uses:
- A primitive, dynamically allocated array of pointers (the buckets).
- Exactly ONE single, global singly linked list that holds all the elements in the map.
Why is there a difference?
It all boils down to iteration performance.
In C++, you can iterate through the entire map using iterators from map.begin() to map.end().
If we used the conceptual model (where each bucket has its own independent linked list), moving an iterator to the next element would require:
- Checking if the current linked list has another node.
- If it doesn't, scanning the bucket array to find the next non-empty bucket.
If your map has 1,000,000 buckets but contains only 5 elements, iterating through all elements would force the iterator to scan 1,000,000 pointers! Iteration complexity would be O(N + M) (where N is bucket size and M is element count), which is incredibly slow and cache-unfriendly.
By storing all nodes in one single singly linked list, iterating through the map is a direct, cache-friendly traversal of that single list, guaranteeing a strict O(M) time complexity!
How do the buckets work then? Instead of pointing to the first element of their respective buckets, the pointers in the bucket array point to the node immediately before the first element of that bucket in the global singly linked list.
Why? Because in a singly linked list, if you want to insert a node at the head of a bucket, or erase a node from a bucket, you need a reference to the previous node to update its next pointer in O(1)! This layout is a masterclass in pointer manipulation.
Why Prime Numbers? The Power of Coprimes
When the map sizes its bucket array N, it highly prefers N to be a prime number (e.g. 11, 97, 199, ...).
Why Prime numbers? Let's look at what happens when N is a composite number, for example N = 100.
If our keys are not perfectly random—which they rarely are in real applications—they might have regular mathematical strides. For example, if we are hashing memory addresses (which are always multiples of 8 or 16) or IDs that increment by 10 (10, 20, 30, 40, ...), and N = 100:
- Our key strides (10) and our bucket size (100) share a greatest common divisor:
gcd(10, 100) = 10 > 1. - Because they are not coprime, the keys will only hash to buckets that are multiples of 10 (buckets 0, 10, 20, ..., 90).
- This means 90% of our buckets remain completely empty, while 10% of our buckets take all the collisions! Separate chaining collapses, and lookups slow down.
If we instead choose a prime number like N = 97, then for almost any key stride S < 97, the stride and N are guaranteed to be coprime (gcd(S, 97) = 1).
Because they are coprime, the keys are mathematically guaranteed to disperse uniformly across all 97 buckets before wrapping around to hit the same bucket. Even if your hash function is slightly imperfect or your keys have patterns, a prime bucket size acts as a mathematical failsafe to distribute keys evenly and prevent clustering.
How is it O(1) Amortized? Resizing and the Load Factor
Since we use separate chaining, as we insert more elements, the linked lists inside the buckets will grow. If they grow too long, lookups degrade from O(1) to O(M) linear searches.
To maintain constant time complexity, we use the Load Factor (often represented as λ), which is defined as:
Load Factor = number_of_elements / number_of_buckets
The Load Factor represents the average length of the linked lists in each bucket. In C++, the default maximum allowed load factor is 1.0.
Once the load factor exceeds this threshold (i.e. we have more elements than buckets), the map triggers a Resize and Rehash:
- Allocate a new, larger bucket array: The STL picks the next prime number that is roughly double the current bucket count.
- Rehash: It iterates through the global singly linked list, recalculates the bucket index for each element using the new prime size (
hash % new_N), and reorganizes the bucket pointers.
Why is it O(1) amortised?
Rehashing is an O(M) operation because we have to touch and re-link every single element.
However, because we roughly double the bucket size during a rehash, this expensive O(M) operation happens very infrequently—specifically, only after we have performed another O(M) insertions.
When you spread (amortise) the O(M) cost of that occasional rehash across the O(M) fast insertions that triggered it, the average cost per insertion remains a constant O(1) amortised time complexity!
Final Thoughts
Unordered maps in C++ are a beautiful example of computer science and mathematics conjoining. Taking simple concepts (linked lists and primitive dynamically allocated arrays containing pointers) and wrapping them in elegant math (coprime distribution and amortised analysis) to solve real-world performance problems. Next time you write std::unordered_map, I hope you can appreciate the underlying beauty :)