FC14 Reuse regular expressions

Especially compiled version

In last episode we discussed the important of reuse JSON contract resolver object because it contains expensive reusable data. Regular expressions are similar to that, especially compiled regular expressions.

Regular expressions are created with a text pattern and a few options. Even parsing the pattern and generate internal representations could be expensive. If the regular expression is compiled, then code generation and IL jitting would be much more expensive. Within the implementation, there is also a small cache of 15 regular expressions. Even generating the key for cache lookup and locking for cache access could be costly.

Here is an anti-pattern found in Lenovo code:

This extension method is calling Rege.Replace(string, string) which will allocate a new regex expression, and do cache lookup, every time using the second string.

Here is the right code to write:

Here a compiled regular expression is allocated in static constructor, and then reused over and over again.

Perf test results:

Allocation goes from 336 bytes per call to zero. Here is allocation stack before the change:

There are at least 5 allocations each time:

  1. The regular expression itself

  2. Converting option flag to string

  3. string array for calling string.Concat

  4. Another string array allocation inside string.Concat implementation

  5. string for cache lookup.

Here is the key generation and cache lookup inside Regex constructor:

CPU reduction is at 25%, lower than normal. The reason is that ‘\w’ handling in is the most expensive thing in both cases. Here is CPU stack:

CharInClassRecursive is using 91% CPU, after the change.