• Frugal Cafe
  • Posts
  • FC24 Why 64-bit string.GetHashcode is slower than 32-bit, and how to fix it

FC24 Why 64-bit string.GetHashcode is slower than 32-bit, and how to fix it

When a typo becomes a feature, you can't change it

It’s quite common that people overuse strings as hash container key types, which makes the performance of string.GetHashCode is quite important. The performance of string.Equals(string, string) is equally important, but it already has quite good implementation.

The sad thing is that in 64-bit .Net Framework, string.GetHashCode is quite slow. Let’s measure it:

Here is what’s found in ETW trace:

There is a static variable access inside GetHashCode. Let’s check per-line cost:

We found something rather strange. There are two versions of the implementation, one is used when macro Win32 is defined; the other when it’s not defined.

When Win32 is defined, there is loop unrolling and data is accessed as 32-bit integers. When it’s not, there is no loop unrolling, and data is accessed as 16-bit characters. So the Win32 version should be much faster.

But what about when compiling 64 bit? The Win32 macro is not designed, so the slower 16-bit code path is taken.

So somehow the .Net team forgot to check this piece of code when moving to 64-bit builds. Once such build is released, you can’t even change it back because it could break people’s application. For example, someone could put string.GetHashCode return values into database.

So let’s write a string comparer using the 32-bit logic:

Notice there is something strange here, the data is always accessed as 32-bit integers, never as 16-bit char. When if the string has odd length? Is the code reading one extra character? This is not an issue because strings are always zero terminated in .Net, so reading extra zero is not an issue.

Now let’s compare the two:

Test result:

StringCompare32 is 45% faster than string.GetHashCode for the string we tested, on .Net 4.8, 64-bit.