FC53: Who would need 6 million locks?

Locks can be shared among friends

After getting 23 gb memory dump, I ran PerfView to analyze it, and then analyzed PerfView performance. PerfView uses a helper process heapdump.exe to convert raw dump into .gcDump file. I captured a dump for heapdump.exe process and found this:

There are 9.9 million objects in memory, 6.1 million of them are ReaderWriterLockSlim at 96 bytes each. Total space taken by these locks is 559 mb, but there are also large lock arrays holding them.

6 million locks, that is too much for me. Let’s find the offending source code:

CacheEntryBase constructor is allocating a lock array, and N locks for every segment in the dump. Each lock is meant to protect a page in the dump, at 4 kb each. 6.1 million locks for the entire 23 gb dump (25,045,573,289 bytes). Lock count is proportional to dump size. 100 gb dump would need 25 million locks.

There is no need to have so many locks, because it’s possible to share locks. ConcurrentDictionary uses such trick. Its lock array is limited to no more than 2,048 locks (normally 1,024) even when it has millions of hash buckets.

So let’s have a static lock array of just 256 locks and share them:

The GetLock method is added here to get a shared lock based on page indices. Now we can remove 6.1 million locks and lock arrays. This removes most objects in the heap, so Gen2 GC will be more efficient.