• Frugal Cafe
  • Posts
  • FC41: Optimal replacement for string.Format for large strings

FC41: Optimal replacement for string.Format for large strings

0 or 1 string allocation/single pass data copying.

In FC39: The best ways to generate strings (beehiiv.com), I wrote about the best way to generate string is with 0 or 1 allocations: 0 for reusing a string, 1 for single allocation/single pass data copying. This is the bar we’re using to check various string generation methods.

We’ve discussed string.Format implementations in the past. In .Net Framework, it’s implemented using StringBuilder; in .Net Core, it’s implemented using ValueStringBuilder. Both implementations can’t have 0 allocation, single allocation is possible. But there are multiple rounds of data copying. For StringBuilder implementation, data first need to be copied into the buffers inside StringBuilder, then copied out of it, so two copies. ValueStringBuilder could have two extra copies when the default 256-character stack allocation is not big enough.

So can we achieve 0 or 1 allocation optimal string generation with string.Format implementation? Let’s try with this StringFormatter class (FrugalCafe/Common/StringFormatter.cs at master · ProfFrugal/FrugalCafe (github.com)):

StringFormatter is an implementation of the ISimpleStringBuilder interface we introduced in FC38: Liberate StringBuilder.AppendFormatHelper (beehiiv.com). Its internal data structure is quite simple, basically a list of Substring structs.

Now we can parse formatting string and send data to StringFormatter class. The key is in the final ToString implementation:

If there is only a single sub string, we just convert it to a string and return it. In the best case, it’s a full string, so it’s reused. We’re achieving 0 allocation goal whenever reasonably possible.

Otherwise, we calculate exact length, single string allocation, and then a single pass data coping loop. We achieved the targets we set ourselves.

Extension methods using it to replace string.Format calls:

StringFormatter objects are reused through thread static variables.

Here are validations for two cases of 0 allocation:

If formatting string is “{0}”, first string is returned without allocating new string. Such cases are actually real in production. One possible source is localization.

The second case is stranger: there is no argument in the formatting string, so itself should be returned.

Performance testing:

For 395 characters string, OptimalFormat is 19.57% faster than string.Format (.Net Framework 4.8.1). Allocation reduction is at 68%.

In .Net Core, string.Format has a new implementation using ValueStringBuilder. Stack allocation is for 256 characters. Beyond that, array pool will be used with more memory usage and data copying. .Net Core also introduces a new interface: ISpanFormattable which are implemented by scalar types to avoid temp string allocations. We shall add its support in .Net Core implementation.