- Frugal Cafe
- Posts
- FC63: ServiceStack.Text CSV generation, so many downloads and yet so slow
FC63: ServiceStack.Text CSV generation, so many downloads and yet so slow
Anyone can claim to be fast, until you measure it
Isn’t that sad the popular OSS nuget packages are still having so many basic performance issues? We just took a look at FluentValidation which needs 1.2 microsecond for validating a single record with no error. Here is another one: ServiceStack.Text’s CSV generation.
Here is a simple test case:
We’re asking ServiceStack.Text to write out a single string in csv file. There is a single comma in the text, so escaping is needed. Test result:
157 nano-second, 176 byte allocation. That is inefficient implementation. Here are the allocations:
There are three allocations:
A delegate for using Linq expression
Enumerator allocation
String.Concat.
CPU samples:
There are quite a few performance hot spots. Even reading configuration is expensive.
Source code:
CsvConfig.ItemDelimterString is a property, stored in thread-static variable. Accessing it is not cheap, and it’s accessed three times here. string.Replace is expensive, string.Concat is also expensive.
But the most expensive thing is the escape check here:
EscapeStrings here is string array with 5 strings. So there are 5 string.Contains calls here.
The main issue in the implementation is configuration which allows settings to be changed in each call and all are using strings. This is completely unnecessary. A much more efficient implementation would be for code at higher level to determine that there is no configuration change, so all the special characters are just constant characters:
For escape check, double quote is special:
When double quote character is not found, no replacement is needed. Also, no temp allocation is needed.
Here is the comparison:
No allocation, 76% reduction in CPU usage.
To me, even this is not faster enough. Here is the caller code:
Data is converted from object form to string first, then written out as string. For numerical values which is very common in CSV generation, string conversion could be avoided, and no escape check is even needed.
This is not a mere case study, this is real big performance issue found in production.