• Frugal Cafe
  • Posts
  • FC49: Let's clone Linq.Enumerable.Set<T>

FC49: Let's clone Linq.Enumerable.Set<T>

Upcycle the orphan set class

For .Net Framework, Linq.Enumerable extension methods are implemented in a single huge cs file Enumerable.cs. I read this file often enough to know that there is a simple Set class inside, for methods like Distinct. In .Net Core implementation, its usage has been replaced by HashSet. This is a class no one wants anymore. Let’s take it and upcycle it for Frugal Cafe usage.

Here is the data structure, quite simple:

Here is the resizing implementation we’re changing to use our own prime number sequence:

The original implementation uses odd number sequence as hash container sizes, which could lead to more hash collisions.

Now let’s check if our designer prime numbers can really reduce large object heap usage:

Output:

For strings, SimpleHashSet can have 5,309 elements without using LOH. It’s 4049 in .Net implementation, 23.73% less.

Having your own hash containers could have lots of benefits besides improving memory usage. For example, you could use it to build segmented hash containers, allow multiple key types, or operate on data held inside faster.