- Frugal Cafe
- Posts
- FC70: A thread-safe lock-free string interner
FC70: A thread-safe lock-free string interner
Much better alternative to string.Intern
We talked about string.Intern can’t be used by application side due to its locking and inability to cleanup. What would be recommended solution for string interning? Introducing a thread-safe lock-free string interner: StringTable.
Here is the simple data structure:
There is a static array for single character Latin1 strings and a shared instance with around 62 K strings.
Here is the main interning implementation:
The code just calculates the hash string, map to a slot, find cached string, validate if the string is really matching, then either return the original string, or allocate a new string and cache it for reuse. There is only a single string for each hash bucket, so when mismatching is found, new string replaces original string. This makes implementation lock-free, rather simple, and maintenance free, while paying the price for losing a few strings.
Such simple implementation is proven in production to be able to remove 70% to 80% string duplications, unless string cardinality in your scenario is very high. In such case, you may want to have a much larger StringTable instance or just avoid using it.
Here is the simple hash code algorithm:
This hash code algorithm is designed to be very simple, so that we would have other matching implementation for other data sources. For example, we would have one for StringBuilder, another one for byte encoded strings.
Here is the string equality check:
Notice in this implementation, we can’t assume the string is zero terminated, because we want to intern substrings.
Now SubString.ToString implementation can be changed to:
This code is inspired by class with the same name in Roslyn code base (StringTable.cs (sourceroslyn.io)), used by Visual Studio. Source code: FrugalCafe/Common/StringTable.cs at master · ProfFrugal/FrugalCafe (github.com)