- Frugal Cafe
- Posts
- FC59: Thousands of small/efficient dictionaries needed
FC59: Thousands of small/efficient dictionaries needed
Sadness of software engineers: limited data structure choices
For non-trivial memory dump analysis, you need to map object addresses to sequential integer indices. PerfView’s implementation uses a Dictionary with ulong as key type and integer as value type. The cost of each entry is 24 + 4 = 28 bytes (4 bytes unused), plus overhead of resizing and having limited number of prime numbers. An alternative solution is using segmented dictionary which is having the similar issue. To handle huge dumps efficiently, we can’t afford to spend 28 bytes on this. We need much more space efficient dictionary.
One simple approach is dividing the whole memory span into smaller segments, making sure segment offsets can be represented using 32-bit integers. Then we can reduce key type to 32-bit integer, reducing per entry cost to 16 + 4 = 20, 28.57% reduction in memory usage. That is quite good, but still not enough.
If you want more reduction, let’s reduce segment size to be addressable with 16-bit integer. Given each segment is so small, the maximum number of elements in such segment would also be quite small, representable using 16-bit integers too.
Here is another thing to remove: stored hash code. This works fine as long as GetHashCode and Equals implementation are fast.
Now we can have our own small dictionary implementation:
SmallDictionary is a generic class. For the dump analysis scenario we’re interested in, key type would be short, value type would be integer, memory cost per node would be only 10 bytes (2 for key, 4 for value, 2 for next, 2 for bucket), another 50% reduction. Combined reduction from using 64-bit integers as key type is 64.29% (from 28 bytes to 10 bytes).
To reduce over capacity, we can add all prime numbers in a small range:
Normally, searching the for next prime number is a O(N) linear search in prime number table. When so many of them are added, we can’t afford such searching, so we’re providing O(1) lookup:
Now we can test it out:
Test result:
Comparing with .Net implementation, SmallDictionary reduces allocation by 55.6%.
Part of the test here is using copy constructor to trim the dictionary to minimal size. For 1,000 elements, .Net dictionary implementation uses prime number 1,103, SmallDictionary uses prime number 1,009, 8.5% smaller.