- Frugal Cafe
- Posts
- FC22 Substring vs StringSegment
FC22 Substring vs StringSegment
A few issues with Microsoft.Extensions.Primitives.StringSegment
In the last episode (FC21: Replacement one for string.Split (beehiiv.com)), we discussed one replacement for string.Split(char) using OpenList class and Substring struct. For those following .Net core development, the Substring struct looks very much the same as Microsoft.Extensions.Primitive.StringSegment, which has much richer API. It even has a Split method returning enumeration of StringSegment. So why not just use StringSegment?
Let’s compare the performance of the two:
Here we’re putting the results of StringSegment.Split and StringSplitter into hash sets, thus finding unique words in a sentence. We’re measuring splitting and IEquatable implementation performance.
Here is the result:
The StringSplitter/OpenList/Substring based solution reduces allocations per call from 336 bytes to zero, and reduces CPU usage by 53 (tested with .Net Framework 4.8).
Here are the differences:
StringSegment’s GetHashCode implementation on .Net Framework is based on StringSegment.ToString, which causes heap allocation every time GetHashCode is called. This is completely unnecessary, as there is no need to return the string hash code as string.
StringSegment Equals implementation is not optimal either:
It’s adding an extra argument StringComparison.Ordinal.
Here is StringSegment.Split implementation:
The implementation only allows char array, even when you just need a single separator. IndexOfAny(char[]) is not cheap.
Single character search on the other hand can be easily optimized using unsafe code. So we have much more room in improving the performance of our Substring struct.
Full source code can be found here: FrugalCafe/Posts/TestSubstring.cs at master · ProfFrugal/FrugalCafe (github.com)