FC48: Designer Prime Numbers

More selected prime numbers for better memory usage

In .Net prime numbers are used as hash container sizes to reduce hash collisions. They’re used by Hashtable, HashSet, Dictionary, and now ConcurrentDictionary. But strangely, mscorlib only uses 72 prime numbers:

Here are the issues with this limited set of prime numbers:

  1. They’re 20% apart. On average, dictionary is wasting 10% space when you specify the exact known capacity. For example, if you convert a collection to a dictionary, 10% extra space is not used.

  2. Hash containers go into large object heap earlier than needed. For example, if you grow a string dictionary from empty state, the largest it can get is 1,931 elements before going into LOH. For 1,931 elements, the entry array is only 46,368 bytes, 54% of the 85,000 LOH threshold.

How can we fix these issues? We can add more carefully selected prime numbers. For example, on 64-bit platform, dictionary entry struct is normally 24 bytes (8 byte for key, 8-byte for value, 4-byte for hash, 4-byte for next). Let’s find the largest prime number for that to be below the LOH threshold, and its resizing sequence:

We get these prime numbers:

For 24-byte entry struct, the largest prime number is 3,539, 83% larger than 1,931. This will reduce the change of dictionary getting into LOH significantly.

Now let’s merge with mscorlib’s prime numbers:

We’re adding two prime number arrays: one for dictionary which normally has 24-byte entry struct, the other for hash set which normally has 16-byte entry struct.

We can even add more prime numbers to reduce the 10% gap between prime numbers.

Such custom primer number arrays can be useful when you have your own hash container classes. You can even use it to replace mscorlib’s prime number table using reflection magic.