Frugal Cafe
Posts
FC10 Silver Bullet: ConcurrentDictionary

FC10 Silver Bullet: ConcurrentDictionary

Effective but Expensive

Feng Yuan
August 08, 2023

In last episode, we discussed the horror of modifying HashSet/Dictionary from multiple threads causing all sorts of issues, including endless looping taking all cores on a machine.

A silver bullet to fix such issue would be ConcurrentDictionary whose API is similar to that of Dictionary. The concurrent version of HashSet could be implemented by ConcurrentDictionary.

But as a silver bullet, ConcurrentDictionary is much more expensive than Dictionary or HashSet. There are several aspects of the increased cost:

ConcurrentDictionary has much higher fixed memory cost. Its default constructor will allocate for 31 elements with N locks, where N is the CPU core count. Compare with Dictionary, its default constructor only allocates a single object, 80 bytes. Before first element is added, initial capacity only needs to be for 3 elements.
ConcurrentDictionary has higher memory cost per entry. For each entry in it, there is a node object, which is normally around 48 bytes, plus there is a pointer pointing to it, so 56 bytes per entry. The memory cost per entry in Dictionary is normally at 28 bytes.
ConcurrentDictionary is protected by an array of locks, each for a section of the entries. The initial size is core count. When more elements are added, the lock array can scale up to 1,024, but no more than 2,048. Normal operation only needs to take a single lock. But there are quite a few APIs which needs to take all locks, so they can be very expensive. Here is the list of slow APIs:
1. IsEmpty property getter
2. Count property getter
3. Keys property getter
4. Values property getter
5. Clear method
6. CopyTo method
7. ToArray method
ConcurrentDictionary should be more expensive for garbage collection because it uses more memory and there are many more pointer references in its data structure, 2 more per node to be exact.

Here is ConcurrentDictionary default constructor:

Here is internal Node class:

Normally, it’s 48 bytes in 64-bit process (for ConcurrentDictionary, 40 bytes for ConcurrentDictionary). Notice normally 4 byte space is not used.

Here is Count property implementation:

AcquireAllLocks goes through the lock array and takes all of them one by one. So it needs to wait for all other modifications, and block incoming modification operations.

Overall, it’s perfectly fine to use ConcurrentDictionary for singleton instances, of types with limited number of instances. But if your data structure would have thousands of instances or more, you need to be careful with memory usage. A common trick I recommend is use custom constructor. For example, allocate with new ConcurrentDictionary(4, 5), where 4 is the lock count, and 5 is the initial capacity.

If you use ConcurrentDictionary, please be very careful with its APIs which take all the locks. A common problem is accessing Count property too often. One possible solution is adding a wrapper class to keep a rough count using InterlockedIncrement/InterlockedDecrement.