• Frugal Cafe
  • Posts
  • FC66: If you're generating text data in high volume, replace StreamWriter

FC66: If you're generating text data in high volume, replace StreamWriter

General purpose class has its limitations

If you’re generating text data, most likely you’re using StreamWriter. But if you use it to generate data in high volume, you need to replace it with your own implementation. General purpose class like that always has its implementations.

Here are a few issues with StreamWriter implementation:

  1. First, you need to determine buffer size. If buffer size is too small, there will be lots of interactions with OS layer. But if buffer size is too large, you will run into allocation and garbage collection issues.

  2. TextWriter.WriteLine(string) (StreamWriter’s base class) has strange implementation (.Net Framework). It allocates a temp char array to combine input string with cr\lf, causing extra allocation and data copying.

  3. Write(string, obj arg0, obj arg1, …) is implemented with string.Format.

  4. Even Write(char) is problematic if you call it a lot, for two reasons. First it’s a method on the abstract class, so the calls are not inlined. Secondly, there is an async task check inside.

For efficient text data generation, you need to use large buffers, but reuse them; you need to control your own formatting, and you need to make sure Write(char) is as efficient as possible.

Here is one such implementation added to Frugal Cafe library:

We’re using 32 k character buffers rented from array pool. Here is single character handling:

It’s marked for aggressive inlining, and it’s super simple. With this as a basic building block, you can add useful methods:

These two methods are for csv file generation:

The PerfView code base has lots of text file generation: for xml, csv, and json. Here is one of them:

Now we can replace it with much more efficient custom csv generator:

Notice because we’re using large buffers in SimpleStreamWriter, we can just use a small buffer in FileStream. This buffer may not even be needed.