Sep 0.8.0 - SepWriter Replace StringBuilder with ArrayPool Array
Sep 0.8.0 was released January 19th, 2025 - earlier this year - with two notable changes:
- 🎯 Remove net7.0 target
- ✨ SepWriter.Col: Replace StringBuilder with ArrayPool array and DefaultInterpolatedStringHandler
See v0.8.0 release for all changes and Sep README on GitHub for full details. Below is a quick (belated) blog post to explain the changes a bit.
SepWriter vs TextWriter
SepWriter
hasn’t gotten as much attention as SepReader
here, which is partly
intentional, as SepWriter
is not so much about performance and speed but more
about convenience, ease of use and change. And not much has changed about that
since Sep was introduced. If you want the best speed for writing you would be
better off simply using TextWriter
directly (if done correctly).
Let’s do a quick code comparison of SepWriter
and TextWriter
. Given:
1
2
3
4
const string ColNameA = "A";
const string ColNameB = "B";
ReadOnlySpan<int> values = [1, 2, 3];
we want to write some multiple of the values to csv for each column. With Sep this can be done like below. The main take away here is that with Sep you do not have to separate the writing of the header (column name) and the values.
1
2
3
4
5
6
7
8
using var sepWriter = Sep.Default.Writer().ToText();
foreach (var v in values)
{
using var row = sepWriter.NewRow();
row[ColNameA].Format(v * 10);
row[ColNameB].Format(v * 100);
}
Console.WriteLine(sepWriter.ToString());
One way to do this with StringWriter
(aka TextWriter
) is shown below. While
this clearly is longer, the other issue is how you have two separate parts for
first writing the header and then writing the rows. Not a big issue here but
when you have many columns keeping things in sync can be a challenge and a known
source of dev churn.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
const char Separator = ';';
using var textWriter = new StringWriter();
// Header
textWriter.Write(ColNameA);
textWriter.Write(Separator);
textWriter.Write(ColNameB);
textWriter.WriteLine();
// Rows
foreach (var v in values)
{
textWriter.Write(v * 10);
textWriter.Write(Separator);
textWriter.Write(v * 100);
textWriter.WriteLine();
}
Console.WriteLine(textWriter.ToString());
SepWriter.Col: StringBuilder Issue
The above TextWriter
code is basically what Sep does under the hood. However,
for each column (e.g. var col = row[ColNameA];
) Sep would store each column
value as a StringBuilder
until the completion of a row and calling Dispose()
on it at which point the contents of StringBuilder
is written to the
underlying TextWriter
that SepWriter
works over. In this way Sep could
utilize all the StringBuilder
functionality to support Format
(e.g.
ISpanFormattable
)
and similar. Additionally, Sep would use a pool of StringBuilder
s to reduce
repeated allocations.
StringBuilder
does have an underlying issue, though, in that it is basically
implemented as a linked list of StringBuilder
s, which means in order to write
all the contents of it to TextWriter
, without creating a string, one would
have to enumerate the chunks of it like:
1
2
3
4
foreach (var chunk in sb.GetChunks())
{
_writer.Write(chunk.Span);
}
Since, long columns are rare it is similarly rare for there being multiple chunks. Hence, the enumeration causes a bit of a performance hit.
SepWriter.Col: Replace StringBuilder with ArrayPool array and DefaultInterpolatedStringHandler
Performance is a feature, and while not the top priority for SepWriter
, 0.8.0
addresses this issue by swapping out the internal StringBuilder
with a
char[]
from ArrayPool
. To implement formatting Sep then relies on
DefaultInterpolatedStringHandler
.
However, this doesn’t have public APIs allowing for using and managing arrays
from the ArrayPool
. I was then faced with a choice of either copying the
entire implementation of DefaultInterpolatedStringHandler
or finding another
way. That other way was to use the UnsafeAccessor
attribute to access the
internal state of DefaultInterpolatedStringHandler
, as shown below, and reuse
the array from ArrayPool
. This is a bit of a hack, but it works and is fast.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Avoid recreating DefaultInterpolatedStringHandler while being
// able to reuse array from ArrayPool by using UnsafeAccessor to
// access internal state of this. This works fine for net8.0 and
// net9.0 but there are no guarantees if this could change in the
// future, if so consider using #if NET10_0_OR_GREATER or similar to
// address any changes or consider then copying the entire
// DefaultInterpolatedStringHandler source code and adopt for needs.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[UnsafeAccessor(UnsafeAccessorKind.Field, Name = "_arrayToReturnToPool")]
static extern ref char[]? ArrayToReturnToPool(ref DefaultInterpolatedStringHandler handler);
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[UnsafeAccessor(UnsafeAccessorKind.Field, Name = "_pos")]
static extern ref int Position(ref DefaultInterpolatedStringHandler handler);
The downside is that UnsafeAccessor
is only supported on net8.0
and above.
Given net7.0
is no longer supported I decided it was time to drop support for
it.
I don’t have detailed benchmarks here, but the end result for SepWriter
is
that for a given simple case of writing multiple short columns SepWriter
is
10-15% faster while still having zero allocations after warmup/first rows.
Additionally, code is simpler, even with the UnsafeAccessor
code.
For more details take a look at the pull request SepWriter.Col: Replace StringBuilder with ArrayPool array and DefaultInterpolatedStringHandler.
That’s all!