• Frugal Cafe
  • Posts
  • FC35: .Net Core ValueStringBuilder, not designed for large data

FC35: .Net Core ValueStringBuilder, not designed for large data

LOH memory issue and multiple rounds of data copying

We discussed StringBuilder internal implementation, related performance issues, and how to avoid them in recent posts. In .Net Core, the usage of StringBuilder is reduced due to the introduction of a new string builder called ValueStringBuilder. For example, string.Format and string.Join are implemented using it.

Here is for string.Format implementation:

ValueStringBuilder is has a simple single buffer design, very much the same as List. It allows taking an initial buffer allocated on the stack. In this caller, the initial buffer is for 256 characters, which should be enough for forming small strings. So the new string.Format implementation in .Net Core is quite optimized for small string formatting.

But what if the 256 character buffer is not big enough? ValueStringBuilder needs to resize its buffer, similar to List. But there is a difference, new buffers are provided by the shared array pool:

Here is its implementation, TlsOverPerCoreLockedStackArrayPool:

So it’s implemented using thread static variables.

Let’s test it out. Here is the test data:

Here we’re populating a dictionary with 60,000 pairs of strings, then use string.Join for dictionary serialization. People actually write such code in production. The length of the final string is over 1024 × 1024 characters.

In .Net Core string.Join is implemented using ValueStringBuilder too:

Here is the performance test:

We’re comparing the original implementation of string.Join using StringBuilder and the new implementation using ValueStringBuilder, by running 100 tasks in parallel:

The string.Join implementation in .Net Framework is similar to this, except it’s renting/returning StringBuilder through StringBuilder cache. But the cache is useless when the data is over 360 characters long.

Here is the result:

We’re just looking at memory usage (managed heap size). The .Net Framework implementation just needs 17 mb of memory, the .Net Core implementation is using 217 mb of memory, 200 mb more, for 25 threads (I’m using my new machine with 16 physical cores, 24 logical cores).

Let’s check LOH:

There are 150 large char buffers, 6 per thread. Their array lengths are 64 k, 128K, 256 k, 512 k, 1024 k, and 2048 k. The largest are 4 mb each. If you check Gen2, there are other char buffers: 512, 1024, 2048, 4096, 8192, 16384, 32768.

So each thread needs full set of buffers from 512 characters to 2 × 1024 × 1024 characters. Total memory usage for the array pool for each thread is 8 mb, or 200 mb for 25 threads.

Notice this is just for string.Join generating a string over 1024 × 1024 characters long, using 25 threads. What if the strings are much longer, what if there are many more threads?

So the new implementation of ValueStringBuilder could use too much memory for forming large strings, through common APIs like string.Format and string.Join.

Notice the data are copied between all those buffers, unlike in the case of StringBuilder. In worst case, every character is copied twice between buffers. StringBuilder based implementation only needs two data copies: first into the buffer, then from buffer to string. ValueStringBuilder needs 4 copy operations in worst case, doubling that of StringBuilder.

The array pool implementation has some trimming logic, but that could be mean more garbage collection and then more LOH allocations later.

In general, I would recommend avoid using ValueStringBuilder for forming large strings. Instead, reuse large StringBuilder and replace string.Join/string.Format using it. Another approach is using StringList class (FC25: ICSharpCode.ILSpy.CSharpLanguage (beehiiv.com)).