<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://nietras.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://nietras.com/" rel="alternate" type="text/html" /><updated>2025-11-11T12:21:16+00:00</updated><id>https://nietras.com/feed.xml</id><title type="html">nietras</title><subtitle>Programming, mechanical sympathy, machine learning and .NET ❤.</subtitle><entry><title type="html">.NET and C# Versions - .NET Framework 1.0 to .NET 10 &amp;amp; C# 1.0 to 14 (10/14 Update)</title><link href="https://nietras.com/2025/11/11/dotnet-and-csharp-versions/" rel="alternate" type="text/html" title=".NET and C# Versions - .NET Framework 1.0 to .NET 10 &amp;amp; C# 1.0 to 14 (10/14 Update)" /><published>2025-11-11T00:00:00+00:00</published><updated>2025-11-11T00:00:00+00:00</updated><id>https://nietras.com/2025/11/11/dotnet-and-csharp-versions</id><content type="html" xml:base="https://nietras.com/2025/11/11/dotnet-and-csharp-versions/"><![CDATA[<p><img src="/images/2025-11-dotnet-versions/dotnet-versions-export.png" alt=".NET and C# versions - .NET Framework 1.0 to .NET 10 &amp; C# 1.0 to C# 14" /></p>

<p>Above <a href="/images/2025-11-dotnet-versions/dotnet-versions-export.png">png</a> is very high
resolution, but you can also download a <a href="/images/2025-11-dotnet-versions/dotnet-versions.pdf">pdf</a>.</p>

<h3 id="sources">Sources</h3>
<ul>
  <li><a href="/2022/02/13/dotnet-and-csharp-versions">.NET and C# Versions - 20th Anniversary ♥</a></li>
  <li><a href="/2022/11/26/dotnet-and-csharp-versions">.NET and C# Versions - 7/11 Update</a></li>
  <li><a href="/2023/11/14/dotnet-and-csharp-versions">.NET and C# Versions - 8/12 Update</a></li>
  <li><a href="/2024/11/12/dotnet-and-csharp-versions">.NET and C# Versions - 9/13 Update</a></li>
  <li><a href="https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-10/overview">What’s new in .NET 10</a></li>
  <li><a href="https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-10/runtime">What’s new in the .NET 10 runtime</a></li>
  <li><a href="https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-10/libraries">What’s new in .NET libraries for .NET 10</a></li>
  <li><a href="https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-10/sdk">What’s new in the SDK and tooling for .NET 10</a></li>
  <li><a href="https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-14">What’s new in C# 14</a></li>
  <li><a href="https://learn.microsoft.com/en-us/dotnet/core/compatibility/10.0?toc=%2Fdotnet%2Ffundamentals%2Ftoc.json&amp;bc=%2Fdotnet%2Fbreadcrumb%2Ftoc.json">Breaking Changes in .NET 10</a></li>
</ul>

<h3 id="changes">Changes</h3>
<ul>
  <li>Add .NET 10.0 and C# 14.0</li>
</ul>]]></content><author><name></name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Sep 0.11.0 - 9.5 GB/s CSV Parsing Using ARM NEON SIMD on Apple M1 🚀</title><link href="https://nietras.com/2025/06/17/sep-0-11-0/" rel="alternate" type="text/html" title="Sep 0.11.0 - 9.5 GB/s CSV Parsing Using ARM NEON SIMD on Apple M1 🚀" /><published>2025-06-17T00:00:00+00:00</published><updated>2025-06-17T00:00:00+00:00</updated><id>https://nietras.com/2025/06/17/sep-0.11.0</id><content type="html" xml:base="https://nietras.com/2025/06/17/sep-0-11-0/"><![CDATA[<p><a href="https://github.com/nietras/Sep/releases/tag/v0.11.0">Sep 0.11.0 was released June 12th,
2025</a> with a new parser
optimized specifically for <a href="https://developer.arm.com/Architectures/Neon">ARM NEON
SIMD</a> (called <code class="language-plaintext highlighter-rouge">AdvSimd</code> in .NET)
capable ARM64 CPUs like the <a href="https://en.wikipedia.org/wiki/Apple_M1">Apple M1</a>)
or the new Microsoft cloud <a href="https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/cobalt-overview">Cobalt
100</a>,
which is based on the <a href="https://www.arm.com/products/silicon-ip-cpu/neoverse/neoverse-n2">ARM Neoverse
N2</a>.</p>

<p>Both of these are available as <a href="https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories">free runners on
GitHub</a>
and it is those virtual instances that are used for benchmarking Sep on ARM64,
as I have no ARM64 hardware myself. It was the new availability of Cobalt 100
based GitHub runners that triggered me adding a new ARM NEON SIMD parser to Sep,
as early benchmarks of this looked a bit “slow” compared to other CPUs.</p>

<p>Previously, Sep on ARM would use a cross-platform SIMD parser based on
<code class="language-plaintext highlighter-rouge">Vector128</code>, which has been the case since early Sep e.g. <a href="/2023/08/07/sep-0-2-0/">Sep 0.2.0</a>. Now Sep hits 9.5 GB/s on Apple M1 up
from ~7 GB/s with the cross-platform SIMD parser, and ~6 GB/s on Cobalt 100 up
from ~4 GB/s.</p>

<p>As seen last in <a href="/2025/05/09/sep-0-10-0/">Sep 0.10.0 - 21 GB/s CSV Parsing Using SIMD on AMD 9950X 🚀</a> this is not too bad compared to the
insane performance of a large Zen 5 desktop core. I wonder what the performance
would be on an Apple M4, but I do not have access to one. And it is pretty nice
to be able to run such benchmarks as GitHub actions.</p>

<p>See <a href="https://github.com/nietras/Sep/releases/tag/v0.11.0">v0.11.0 release</a> for
all changes for the release, and <a href="https://github.com/nietras/Sep">Sep README on
GitHub</a> for full details including detailed
benchmarks.</p>

<p>Below I will dive into the new NEON SIMD parser, showing interesting code and
assembly along the way. First I’ll show how low-level parsing performance of Sep
has improved on Apple M1 due to this and how it compares to
<a href="https://github.com/JoshClose/CsvHelper">CsvHelper</a> and
<a href="https://github.com/MarkPflug/Sylvan">Sylvan.Data.Csv</a>.</p>

<h2 id="sep-performance-on-apple-m1">Sep Performance on Apple M1</h2>

<p><img src="/images/2025-06-sep-0.11.0/sep-perf-0.11.0-apple-m1.png" alt="Sep Performance on Apple M1" /></p>

<p>The above graph shows how Sep 0.11.0 improves the already stellar performance on
Apple M1 compared to Sylvan and CsvHelper. Sep is now ~14x faster than CsvHelper
at 9.5 GB/s vs 0.7 GB/s and 6.4x faster than Sylvan.</p>

<p>These numbers are, as seen before, for the package assets CSV data and the low
level parse <code class="language-plaintext highlighter-rouge">Rows</code> only scope, see <a href="https://github.com/nietras/Sep">Sep README on
GitHub</a> or code on GitHub for details on this,
but to recap for new readers:</p>

<ul>
  <li>1 char = 2 bytes</li>
  <li>Based on <code class="language-plaintext highlighter-rouge">StringReader</code> and <code class="language-plaintext highlighter-rouge">string</code> in memory to rule out any IO variation</li>
  <li>Data intentionally small and typically fits in L3 cache to be able to measure 
impact of even small code changes</li>
  <li>Single-threaded</li>
  <li>No quotes, no trimming, no unescaping, but still real-world CSV</li>
  <li>Just CSV parsing only, rows only</li>
  <li>Warm runs only as per usual with <code class="language-plaintext highlighter-rouge">BenchmarkDotNet</code></li>
</ul>

<p>In addition, since these benchmarks run on a virtual machine in the form of
actions on a GitHub runner, there is higher variance. Usually locally I’d say
there is about 5% variance but for GitHub runners this is about 10%, sometimes
even much larger.</p>

<p>Hence, before Sep 0.11.0 using the cross-platform SIMD based parser on Apple M1
I’ve observed performance ranging from ~7000 - 7600 MB/s. With 0.11.0 and the
NEON SIMD based parser this increases to ~8500-9700 MB/s, so the new parser is
about 1.3-1.4x faster on Apple M1.</p>

<h2 id="sep-performance-on-cobalt-100">Sep Performance on Cobalt 100</h2>

<p><img src="/images/2025-06-sep-0.11.0/sep-perf-0.11.0-cobalt-100.png" alt="Sep Performance on Cobalt 100" /></p>

<p>The performance on Cobalt 100 before Sep 0.11.0 was only ~4 GB/s, which is what
triggered me looking into improving ARM performance, since this seemed rather
slow. After Sep 0.11.0 this has improved to ~6 GB/s, which is a nice 1.5x
improvement. Still fairly slow compared to “performance” cores, but these ARM
cores are not really designed for maximum single-threaded performance, but
rather for high efficiency and density.</p>

<p>As can be also seen compared to Sylvan and CsvHelper, Sep is now ~11x faster
than CsvHelper at 6.1 GB/s vs 0.6 GB/s and 4.2x faster than Sylvan on ARM
Neoverse N2.</p>

<h2 id="disassembling-via-dotnet-publish-and-dumpbin-on-windows">Disassembling via <code class="language-plaintext highlighter-rouge">dotnet publish</code> and <code class="language-plaintext highlighter-rouge">dumpbin</code> on Windows</h2>

<p>Not having any ARM64 PC myself also meant adding a specific parser for this
presented new challenges, as I could neither test/debug the code directly nor
inspect the assembly for it directly using
<a href="https://github.com/EgorBo/Disasmo">Disasmo</a> as usual (maybe that is possible
but I do not know how). Debugging wasn’t that much of an issue, since I only had
to do minor changes to SIMD code in 0.11.0 and few issues arose, but inspecting
the assembly was more important to be sure the generated machine code was in
line with expected instructions.</p>

<p>Now I could have tried using a GitHub action on e.g. Apple M1 runner to get to
the assembly, but that would have been a terrible dev loop and I am too
impatient for that.</p>

<p>Instead, I figured out I could use NativeAOT via <code class="language-plaintext highlighter-rouge">dotnet publish</code> of a small
test executable and have this generate an executable for <code class="language-plaintext highlighter-rouge">win-arm64</code>, which is
possible <a href="https://learn.microsoft.com/en-us/dotnet/core/deploying/native-aot/?tabs=windows%2Cnet8">on Windows if you have the right development tools
installed</a>
e.g. “Visual Studio 2022, including the Desktop development with C++ workload
with all default components.” And then I could use <code class="language-plaintext highlighter-rouge">dumpbin</code> to disassemble the
resulting executable as shown in the script below.</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="rouge-code"><pre><span class="kr">param</span><span class="p">(</span><span class="w">
    </span><span class="p">[</span><span class="n">string</span><span class="p">]</span><span class="nv">$runtime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win-arm64"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">dotnet</span><span class="w"> </span><span class="nx">publish</span><span class="w"> </span><span class="nx">src/Sep.Tester/Sep.Tester.csproj</span><span class="w"> </span><span class="se">`
</span><span class="w">  </span><span class="nt">-c</span><span class="w"> </span><span class="nx">Release</span><span class="w"> </span><span class="nt">-r</span><span class="w"> </span><span class="s2">"</span><span class="nv">$runtime</span><span class="s2">"</span><span class="w"> </span><span class="nt">-f</span><span class="w"> </span><span class="nx">net9.0</span><span class="w"> </span><span class="nt">--self-contained</span><span class="w"> </span><span class="nx">true</span><span class="w"> </span><span class="se">`
</span><span class="w">  </span><span class="nx">/p:PublishAot</span><span class="o">=</span><span class="n">true</span><span class="w"> </span><span class="nx">/p:DebugSymbols</span><span class="o">=</span><span class="n">true</span><span class="w">
</span><span class="nx">dumpbin</span><span class="w"> </span><span class="nx">/DISASM</span><span class="w"> </span><span class="nx">/SYMBOLS</span><span class="w"> </span><span class="se">`
</span><span class="w">  </span><span class="s2">"artifacts\publish\Sep.Tester\release_net9.0_</span><span class="nv">$runtime</span><span class="s2">\Sep.Tester.exe"</span><span class="w"> </span><span class="err">&gt;</span><span class="se">` </span><span class="w">
  </span><span class="s2">"artifacts\publish\Sep.Tester\release_net9.0_</span><span class="nv">$runtime</span><span class="s2">\disassembly.asm"</span><span class="w">
</span></pre></td></tr></tbody></table></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">disassembly.asm</code> file is then a text file that I can keep open in VS or
WinMerge and in which I manually search for relevant class names and methods,
which are available since there are debug symbols, to find the assembly I am
interested in. This also allows me to compare x64 vs ARM64 if need be.</p>

<p>Note that there may be differences in the assembly generated with NativeAOT vs
JIT in general, but for the specific places where I am interested in this for
Sep there hasn’t been any.</p>

<h2 id="arm-simd---cross-platform-vs-advsimd">ARM SIMD - Cross-platform vs AdvSimd</h2>

<p>Let’s take a look at the cross-platform SIMD C# code previously used on ARM (and
any other non-x86 platform with <code class="language-plaintext highlighter-rouge">Vector128</code> hardware acceleration). Basic
approach has been discussed before also in <a href="/2025/05/09/sep-0-10-0/">Sep 0.10.0 blog post</a> and I am showing only partial inner loop
code. For full code go to <a href="https://github.com/nietras/Sep">nietras/Sep on
GitHub</a>.</p>

<p>Since cross-platform <code class="language-plaintext highlighter-rouge">Vector128</code> SIMD does not (currently) have support for
saturated conversion/narrowing of 16-bit unsigned integers (e.g. <code class="language-plaintext highlighter-rouge">char</code>) to
8-bit unsigned integers (this is <a href="https://github.com/dotnet/runtime/issues/75724">coming in .NET 10 I believe with
<code class="language-plaintext highlighter-rouge">NarrowWithSaturation</code>
methods</a>), this is done by using
<code class="language-plaintext highlighter-rouge">Min</code> before <code class="language-plaintext highlighter-rouge">Narrow</code> as can be seen in the C# code below. Additionally, this
uses <code class="language-plaintext highlighter-rouge">ExtractMostSignificantBits</code> for <code class="language-plaintext highlighter-rouge">MoveMask</code>.</p>

<p><code class="language-plaintext highlighter-rouge">SepParserVector128NrwCmpExtMsbTzcnt</code></p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="rouge-code"><pre><span class="k">ref</span> <span class="kt">var</span> <span class="n">charsRef</span> <span class="p">=</span> <span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">charsOriginRef</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint</span><span class="p">)</span><span class="n">charsIndex</span><span class="p">);</span>
<span class="k">ref</span> <span class="kt">var</span> <span class="n">byteRef</span> <span class="p">=</span> <span class="k">ref</span> <span class="n">As</span><span class="p">&lt;</span><span class="kt">char</span><span class="p">,</span> <span class="kt">byte</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="n">charsRef</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">v0</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecUI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">v1</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecUI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">,</span> <span class="n">VecUI8</span><span class="p">.</span><span class="n">Count</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">limit0</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Min</span><span class="p">(</span><span class="n">v0</span><span class="p">,</span> <span class="n">max</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">limit1</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Min</span><span class="p">(</span><span class="n">v1</span><span class="p">,</span> <span class="n">max</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">bytes</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Narrow</span><span class="p">(</span><span class="n">limit0</span><span class="p">,</span> <span class="n">limit1</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">nlsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">nls</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">crsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">crs</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">qtsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">qts</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">spsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">sps</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">lineEndings</span> <span class="p">=</span> <span class="n">nlsEq</span> <span class="p">|</span> <span class="n">crsEq</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators</span> <span class="p">=</span> <span class="n">spsEq</span> <span class="p">|</span> <span class="n">lineEndings</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">specialChars</span> <span class="p">=</span> <span class="n">lineEndingsSeparators</span> <span class="p">|</span> <span class="n">qtsEq</span><span class="p">;</span>

<span class="c1">// Optimize for the case of no special character</span>
<span class="kt">var</span> <span class="n">specialCharMask</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">specialChars</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">specialCharMask</span> <span class="p">!=</span> <span class="m">0u</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The assembly for this, based on using NativeAOT and dumpbin as discussed above
for ARM64 with NEON SIMD support, is shown below. What is interesting here is
both the instructions needed for narrowing but also the many instructions needed
for the <code class="language-plaintext highlighter-rouge">MoveMask</code> operation, as ARM NEON does not have an equivalent to this
built-in. A clear oversight by ARM…</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="rouge-code"><pre><span class="nf">AD405935</span>  <span class="nv">ldp</span>         <span class="nv">q21</span><span class="p">,</span><span class="nv">q22</span><span class="p">,[</span><span class="nv">x9</span><span class="p">]</span>
<span class="err">6</span><span class="nf">E706EB5</span>  <span class="nv">umin</span>        <span class="nv">v21.8h</span><span class="p">,</span><span class="nv">v21.8h</span><span class="p">,</span><span class="nv">v16.8h</span>
<span class="err">0</span><span class="nf">E212AB5</span>  <span class="nv">xtn</span>         <span class="nv">v21.8b</span><span class="p">,</span><span class="nv">v21.8h</span>
<span class="err">6</span><span class="nf">E706ED6</span>  <span class="nv">umin</span>        <span class="nv">v22.8h</span><span class="p">,</span><span class="nv">v22.8h</span><span class="p">,</span><span class="nv">v16.8h</span>
<span class="err">4</span><span class="nf">E212AD5</span>  <span class="nv">xtn2</span>        <span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v22.8h</span>
<span class="err">6</span><span class="nf">E318EB6</span>  <span class="nv">cmeq</span>        <span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v17.16b</span>
<span class="err">6</span><span class="nf">E328EB7</span>  <span class="nv">cmeq</span>        <span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v18.16b</span>
<span class="err">6</span><span class="nf">E338EB8</span>  <span class="nv">cmeq</span>        <span class="nv">v24.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v19.16b</span>
<span class="err">6</span><span class="nf">E348EB5</span>  <span class="nv">cmeq</span>        <span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v20.16b</span>
<span class="err">4</span><span class="nf">EB71ED6</span>  <span class="nv">orr</span>         <span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v23.16b</span>
<span class="err">4</span><span class="nf">EB61EB6</span>  <span class="nv">orr</span>         <span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v22.16b</span>
<span class="err">4</span><span class="nf">EB81ED7</span>  <span class="nv">orr</span>         <span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v24.16b</span>
<span class="err">4</span><span class="nf">F04E418</span>  <span class="nv">movi</span>        <span class="nv">v24.16b</span><span class="p">,</span><span class="err">#</span><span class="mh">0x80</span>
<span class="err">4</span><span class="nf">E381EF7</span>  <span class="nv">and</span>         <span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v24.16b</span>
<span class="err">9</span><span class="nf">C0019F9</span>  <span class="nv">ldr</span>         <span class="nv">q25</span><span class="p">,</span><span class="mi">000000014007</span><span class="nv">B140</span>
<span class="err">6</span><span class="nf">E3946F7</span>  <span class="nv">ushl</span>        <span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v25.16b</span>
<span class="err">4</span><span class="nf">EB71EFA</span>  <span class="nv">mov</span>         <span class="nv">v26.16b</span><span class="p">,</span><span class="nv">v23.16b</span>
<span class="err">0</span><span class="nf">E31BB5A</span>  <span class="nv">addv</span>        <span class="nv">b26</span><span class="p">,</span><span class="nv">v26.8b</span>
<span class="err">0</span><span class="nf">E013F4D</span>  <span class="nv">umov</span>        <span class="nv">w13</span><span class="p">,</span><span class="nv">v26.b</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="err">6</span><span class="nf">E1742F7</span>  <span class="nv">ext8</span>        <span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="err">#</span><span class="mi">8</span>
<span class="err">0</span><span class="nf">E31BAF7</span>  <span class="nv">addv</span>        <span class="nv">b23</span><span class="p">,</span><span class="nv">v23.8b</span>
<span class="err">0</span><span class="nf">E013EEE</span>  <span class="nv">umov</span>        <span class="nv">w14</span><span class="p">,</span><span class="nv">v23.b</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="err">2</span><span class="nf">A0E21AD</span>  <span class="nv">orr</span>         <span class="nv">w13</span><span class="p">,</span><span class="nv">w13</span><span class="p">,</span><span class="nv">w14</span><span class="p">,</span><span class="nv">lsl</span> <span class="err">#</span><span class="mi">8</span>
<span class="err">2</span><span class="nf">A0D03ED</span>  <span class="nv">mov</span>         <span class="nv">w13</span><span class="p">,</span><span class="nv">w13</span>
<span class="nf">B4FFFC4D</span>  <span class="nv">cbz</span>         <span class="nv">x13</span><span class="p">,</span><span class="mi">000000014007</span><span class="nv">ADB4</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Many people have noticed this and in <a href="#links">Links</a> below I have listed a few
resources that discuss this. One of which is <a href="https://branchfree.org/2019/04/01/fitting-my-head-through-the-arm-holes-or-two-sequences-to-substitute-for-the-missing-pmovmskb-instruction-on-arm-neon/">Fitting My Head Through The ARM
Holes or: Two Sequences to Substitute for the Missing PMOVMSKB Instruction on
ARM
NEON</a>
by Geoff Langdale, a notable figure in the SIMD community and one of the authors
on the sadly incomplete <a href="https://github.com/geofflangdale/simdcsv">simdcsv</a>. The
blog post discusses a way to do “move mask” efficiently in bulk for four 128-bit
registers in one go. Please read that blog post for more, since I won’t cover
the trickery here 😊.</p>

<p>This is the approach I have added in
<code class="language-plaintext highlighter-rouge">SepParserAdvSimdNrwCmpOrBulkMoveMaskTzcnt</code>, with key code shown below, in
addition to using efficient saturated conversion from 16-bit to 8-bit, which is
supported in ARM NEON. However, to do that here we have to load a total of 8 x
<code class="language-plaintext highlighter-rouge">Vector128</code>s in each loop first, and then narrow/convert to 4 x <code class="language-plaintext highlighter-rouge">Vector128</code>,
before doing comparisons, or’ing and finally extract bits to get a mask. This
means this parser is handling 1024 bits or 64 x <code class="language-plaintext highlighter-rouge">char</code>s at a time, which is the
same as the AVX-512 based parser. And the logic is based on comparisons
resulting in all bits being <code class="language-plaintext highlighter-rouge">1</code> for each byte where the comparison is equal.</p>

<p><code class="language-plaintext highlighter-rouge">SepParserAdvSimdNrwCmpOrBulkMoveMaskTzcnt</code></p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
</pre></td><td class="rouge-code"><pre><span class="k">ref</span> <span class="kt">var</span> <span class="n">charsRef</span> <span class="p">=</span> <span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">charsOriginRef</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint</span><span class="p">)</span><span class="n">charsIndex</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">bytes0</span> <span class="p">=</span> <span class="nf">ReadNarrow</span><span class="p">(</span><span class="k">ref</span> <span class="n">charsRef</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">bytes1</span> <span class="p">=</span> <span class="nf">ReadNarrow</span><span class="p">(</span><span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">charsRef</span><span class="p">,</span> <span class="n">VecUI8</span><span class="p">.</span><span class="n">Count</span> <span class="p">*</span> <span class="m">1</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">bytes2</span> <span class="p">=</span> <span class="nf">ReadNarrow</span><span class="p">(</span><span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">charsRef</span><span class="p">,</span> <span class="n">VecUI8</span><span class="p">.</span><span class="n">Count</span> <span class="p">*</span> <span class="m">2</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">bytes3</span> <span class="p">=</span> <span class="nf">ReadNarrow</span><span class="p">(</span><span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">charsRef</span><span class="p">,</span> <span class="n">VecUI8</span><span class="p">.</span><span class="n">Count</span> <span class="p">*</span> <span class="m">3</span><span class="p">));</span>

<span class="kt">var</span> <span class="n">nlsEq0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes0</span><span class="p">,</span> <span class="n">nls</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">crsEq0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes0</span><span class="p">,</span> <span class="n">crs</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">qtsEq0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes0</span><span class="p">,</span> <span class="n">qts</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">spsEq0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes0</span><span class="p">,</span> <span class="n">sps</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">lineEndings0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">nlsEq0</span><span class="p">,</span> <span class="n">crsEq0</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">spsEq0</span><span class="p">,</span> <span class="n">lineEndings0</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">specialChars0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">lineEndingsSeparators0</span><span class="p">,</span> <span class="n">qtsEq0</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">nlsEq1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes1</span><span class="p">,</span> <span class="n">nls</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">crsEq1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes1</span><span class="p">,</span> <span class="n">crs</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">qtsEq1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes1</span><span class="p">,</span> <span class="n">qts</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">spsEq1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes1</span><span class="p">,</span> <span class="n">sps</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">lineEndings1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">nlsEq1</span><span class="p">,</span> <span class="n">crsEq1</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">spsEq1</span><span class="p">,</span> <span class="n">lineEndings1</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">specialChars1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">lineEndingsSeparators1</span><span class="p">,</span> <span class="n">qtsEq1</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">nlsEq2</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes2</span><span class="p">,</span> <span class="n">nls</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">crsEq2</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes2</span><span class="p">,</span> <span class="n">crs</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">qtsEq2</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes2</span><span class="p">,</span> <span class="n">qts</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">spsEq2</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes2</span><span class="p">,</span> <span class="n">sps</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">lineEndings2</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">nlsEq2</span><span class="p">,</span> <span class="n">crsEq2</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators2</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">spsEq2</span><span class="p">,</span> <span class="n">lineEndings2</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">specialChars2</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">lineEndingsSeparators2</span><span class="p">,</span> <span class="n">qtsEq2</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">nlsEq3</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes3</span><span class="p">,</span> <span class="n">nls</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">crsEq3</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes3</span><span class="p">,</span> <span class="n">crs</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">qtsEq3</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes3</span><span class="p">,</span> <span class="n">qts</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">spsEq3</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">CompareEqual</span><span class="p">(</span><span class="n">bytes3</span><span class="p">,</span> <span class="n">sps</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">lineEndings3</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">nlsEq3</span><span class="p">,</span> <span class="n">crsEq3</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators3</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">spsEq3</span><span class="p">,</span> <span class="n">lineEndings3</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">specialChars3</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">Or</span><span class="p">(</span><span class="n">lineEndingsSeparators3</span><span class="p">,</span> <span class="n">qtsEq3</span><span class="p">);</span>

<span class="c1">// Optimize for the case of no special character</span>
<span class="kt">var</span> <span class="n">specialCharMask</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">specialChars0</span><span class="p">,</span> <span class="n">specialChars1</span><span class="p">,</span>
                               <span class="n">specialChars2</span><span class="p">,</span> <span class="n">specialChars3</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">specialCharMask</span> <span class="p">!=</span> <span class="m">0u</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">AggressiveInlining</span><span class="p">)]</span>
<span class="k">static</span> <span class="n">VecUI8</span> <span class="nf">ReadNarrow</span><span class="p">(</span><span class="k">ref</span> <span class="kt">char</span> <span class="n">charsRef</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">ref</span> <span class="kt">var</span> <span class="n">byteRef</span> <span class="p">=</span> <span class="k">ref</span> <span class="n">As</span><span class="p">&lt;</span><span class="kt">char</span><span class="p">,</span> <span class="kt">byte</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="n">charsRef</span><span class="p">);</span>
    <span class="kt">var</span> <span class="n">v0</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecUI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">);</span>
    <span class="kt">var</span> <span class="n">v1</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecUI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">,</span> <span class="n">VecUI8</span><span class="p">.</span><span class="n">Count</span><span class="p">));</span>

    <span class="kt">var</span> <span class="n">r0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">ExtractNarrowingSaturateLower</span><span class="p">(</span><span class="n">v0</span><span class="p">);</span>
    <span class="kt">var</span> <span class="n">r1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">ExtractNarrowingSaturateUpper</span><span class="p">(</span><span class="n">r0</span><span class="p">,</span> <span class="n">v1</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">r1</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="rouge-code"><pre><span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">AggressiveInlining</span><span class="p">)]</span>
<span class="k">internal</span> <span class="k">static</span> <span class="n">nuint</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">VecUI8</span> <span class="n">p0</span><span class="p">,</span> <span class="n">VecUI8</span> <span class="n">p1</span><span class="p">,</span> <span class="n">VecUI8</span> <span class="n">p2</span><span class="p">,</span> <span class="n">VecUI8</span> <span class="n">p3</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">var</span> <span class="n">bitmask</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Create</span><span class="p">(</span>
        <span class="m">0x01</span><span class="p">,</span> <span class="m">0x02</span><span class="p">,</span> <span class="m">0x04</span><span class="p">,</span> <span class="m">0x08</span><span class="p">,</span> <span class="m">0x10</span><span class="p">,</span> <span class="m">0x20</span><span class="p">,</span> <span class="m">0x40</span><span class="p">,</span> <span class="m">0x80</span><span class="p">,</span>
        <span class="m">0x01</span><span class="p">,</span> <span class="m">0x02</span><span class="p">,</span> <span class="m">0x04</span><span class="p">,</span> <span class="m">0x08</span><span class="p">,</span> <span class="m">0x10</span><span class="p">,</span> <span class="m">0x20</span><span class="p">,</span> <span class="m">0x40</span><span class="p">,</span> <span class="m">0x80</span>
    <span class="p">);</span>

    <span class="kt">var</span> <span class="n">t0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">And</span><span class="p">(</span><span class="n">p0</span><span class="p">,</span> <span class="n">bitmask</span><span class="p">);</span>
    <span class="kt">var</span> <span class="n">t1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">And</span><span class="p">(</span><span class="n">p1</span><span class="p">,</span> <span class="n">bitmask</span><span class="p">);</span>
    <span class="kt">var</span> <span class="n">t2</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">And</span><span class="p">(</span><span class="n">p2</span><span class="p">,</span> <span class="n">bitmask</span><span class="p">);</span>
    <span class="kt">var</span> <span class="n">t3</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="nf">And</span><span class="p">(</span><span class="n">p3</span><span class="p">,</span> <span class="n">bitmask</span><span class="p">);</span>

    <span class="kt">var</span> <span class="n">sum0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="n">Arm64</span><span class="p">.</span><span class="nf">AddPairwise</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="n">t1</span><span class="p">);</span>
    <span class="kt">var</span> <span class="n">sum1</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="n">Arm64</span><span class="p">.</span><span class="nf">AddPairwise</span><span class="p">(</span><span class="n">t2</span><span class="p">,</span> <span class="n">t3</span><span class="p">);</span>
    <span class="n">sum0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="n">Arm64</span><span class="p">.</span><span class="nf">AddPairwise</span><span class="p">(</span><span class="n">sum0</span><span class="p">,</span> <span class="n">sum1</span><span class="p">);</span>
    <span class="n">sum0</span> <span class="p">=</span> <span class="n">AdvSimd</span><span class="p">.</span><span class="n">Arm64</span><span class="p">.</span><span class="nf">AddPairwise</span><span class="p">(</span><span class="n">sum0</span><span class="p">,</span> <span class="n">sum0</span><span class="p">);</span>

    <span class="k">return</span> <span class="p">(</span><span class="n">nuint</span><span class="p">)</span><span class="n">sum0</span><span class="p">.</span><span class="nf">AsUInt64</span><span class="p">().</span><span class="nf">GetElement</span><span class="p">(</span><span class="m">0</span><span class="p">);</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The assembly for this can be seen below. Given this now processes a lot more in
one go there are a lot more repeated instructions, but then we also reduce the
relative number of instructions per bit in the mask generated since we handle
more at a time. The performance benefits are clear given the numbers shown
above. All thanks to Geoff Langdale 🙏</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
</pre></td><td class="rouge-code"><pre><span class="nf">AD405534</span>  <span class="nv">ldp</span>         <span class="nv">q20</span><span class="p">,</span><span class="nv">q21</span><span class="p">,[</span><span class="nv">x9</span><span class="p">]</span>
<span class="err">2</span><span class="nf">E214A94</span>  <span class="nv">uqxtn</span>       <span class="nv">v20.8b</span><span class="p">,</span><span class="nv">v20.8h</span>
<span class="err">6</span><span class="nf">E214AB4</span>  <span class="nv">uqxtn2</span>      <span class="nv">v20.16b</span><span class="p">,</span><span class="nv">v21.8h</span>
<span class="err">9100812</span><span class="nf">D</span>  <span class="nv">add</span>         <span class="nv">x13</span><span class="p">,</span><span class="nv">x9</span><span class="p">,</span><span class="err">#</span><span class="mh">0x20</span>
<span class="nf">AD4059B5</span>  <span class="nv">ldp</span>         <span class="nv">q21</span><span class="p">,</span><span class="nv">q22</span><span class="p">,[</span><span class="nv">x13</span><span class="p">]</span>
<span class="err">2</span><span class="nf">E214AB5</span>  <span class="nv">uqxtn</span>       <span class="nv">v21.8b</span><span class="p">,</span><span class="nv">v21.8h</span>
<span class="err">6</span><span class="nf">E214AD5</span>  <span class="nv">uqxtn2</span>      <span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v22.8h</span>
<span class="err">9101012</span><span class="nf">D</span>  <span class="nv">add</span>         <span class="nv">x13</span><span class="p">,</span><span class="nv">x9</span><span class="p">,</span><span class="err">#</span><span class="mh">0x40</span>
<span class="nf">AD405DB6</span>  <span class="nv">ldp</span>         <span class="nv">q22</span><span class="p">,</span><span class="nv">q23</span><span class="p">,[</span><span class="nv">x13</span><span class="p">]</span>
<span class="err">2</span><span class="nf">E214AD6</span>  <span class="nv">uqxtn</span>       <span class="nv">v22.8b</span><span class="p">,</span><span class="nv">v22.8h</span>
<span class="err">6</span><span class="nf">E214AF6</span>  <span class="nv">uqxtn2</span>      <span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v23.8h</span>
<span class="err">9101812</span><span class="nf">D</span>  <span class="nv">add</span>         <span class="nv">x13</span><span class="p">,</span><span class="nv">x9</span><span class="p">,</span><span class="err">#</span><span class="mh">0x60</span>
<span class="nf">AD4061B7</span>  <span class="nv">ldp</span>         <span class="nv">q23</span><span class="p">,</span><span class="nv">q24</span><span class="p">,[</span><span class="nv">x13</span><span class="p">]</span>
<span class="err">2</span><span class="nf">E214AF7</span>  <span class="nv">uqxtn</span>       <span class="nv">v23.8b</span><span class="p">,</span><span class="nv">v23.8h</span>
<span class="err">6</span><span class="nf">E214B17</span>  <span class="nv">uqxtn2</span>      <span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v24.8h</span>
<span class="err">6</span><span class="nf">E308E98</span>  <span class="nv">cmeq</span>        <span class="nv">v24.16b</span><span class="p">,</span><span class="nv">v20.16b</span><span class="p">,</span><span class="nv">v16.16b</span>
<span class="err">6</span><span class="nf">E318E99</span>  <span class="nv">cmeq</span>        <span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v20.16b</span><span class="p">,</span><span class="nv">v17.16b</span>
<span class="err">6</span><span class="nf">E328E9A</span>  <span class="nv">cmeq</span>        <span class="nv">v26.16b</span><span class="p">,</span><span class="nv">v20.16b</span><span class="p">,</span><span class="nv">v18.16b</span>
<span class="err">6</span><span class="nf">E338E94</span>  <span class="nv">cmeq</span>        <span class="nv">v20.16b</span><span class="p">,</span><span class="nv">v20.16b</span><span class="p">,</span><span class="nv">v19.16b</span>
<span class="err">4</span><span class="nf">EB91F18</span>  <span class="nv">orr</span>         <span class="nv">v24.16b</span><span class="p">,</span><span class="nv">v24.16b</span><span class="p">,</span><span class="nv">v25.16b</span>
<span class="err">4</span><span class="nf">EB81E98</span>  <span class="nv">orr</span>         <span class="nv">v24.16b</span><span class="p">,</span><span class="nv">v20.16b</span><span class="p">,</span><span class="nv">v24.16b</span>
<span class="err">4</span><span class="nf">EBA1F19</span>  <span class="nv">orr</span>         <span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v24.16b</span><span class="p">,</span><span class="nv">v26.16b</span>
<span class="err">6</span><span class="nf">E308EBA</span>  <span class="nv">cmeq</span>        <span class="nv">v26.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v16.16b</span>
<span class="err">6</span><span class="nf">E318EBB</span>  <span class="nv">cmeq</span>        <span class="nv">v27.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v17.16b</span>
<span class="err">6</span><span class="nf">E328EBC</span>  <span class="nv">cmeq</span>        <span class="nv">v28.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v18.16b</span>
<span class="err">6</span><span class="nf">E338EB5</span>  <span class="nv">cmeq</span>        <span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v19.16b</span>
<span class="err">4</span><span class="nf">EBB1F5A</span>  <span class="nv">orr</span>         <span class="nv">v26.16b</span><span class="p">,</span><span class="nv">v26.16b</span><span class="p">,</span><span class="nv">v27.16b</span>
<span class="err">4</span><span class="nf">EBA1EBA</span>  <span class="nv">orr</span>         <span class="nv">v26.16b</span><span class="p">,</span><span class="nv">v21.16b</span><span class="p">,</span><span class="nv">v26.16b</span>
<span class="err">4</span><span class="nf">EBC1F5B</span>  <span class="nv">orr</span>         <span class="nv">v27.16b</span><span class="p">,</span><span class="nv">v26.16b</span><span class="p">,</span><span class="nv">v28.16b</span>
<span class="err">6</span><span class="nf">E308EDC</span>  <span class="nv">cmeq</span>        <span class="nv">v28.16b</span><span class="p">,</span><span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v16.16b</span>
<span class="err">6</span><span class="nf">E318EDD</span>  <span class="nv">cmeq</span>        <span class="nv">v29.16b</span><span class="p">,</span><span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v17.16b</span>
<span class="err">6</span><span class="nf">E328EDE</span>  <span class="nv">cmeq</span>        <span class="nv">v30.16b</span><span class="p">,</span><span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v18.16b</span>
<span class="err">6</span><span class="nf">E338ED6</span>  <span class="nv">cmeq</span>        <span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v19.16b</span>
<span class="err">4</span><span class="nf">EBD1F9C</span>  <span class="nv">orr</span>         <span class="nv">v28.16b</span><span class="p">,</span><span class="nv">v28.16b</span><span class="p">,</span><span class="nv">v29.16b</span>
<span class="err">4</span><span class="nf">EBC1EDC</span>  <span class="nv">orr</span>         <span class="nv">v28.16b</span><span class="p">,</span><span class="nv">v22.16b</span><span class="p">,</span><span class="nv">v28.16b</span>
<span class="err">4</span><span class="nf">EBE1F9D</span>  <span class="nv">orr</span>         <span class="nv">v29.16b</span><span class="p">,</span><span class="nv">v28.16b</span><span class="p">,</span><span class="nv">v30.16b</span>
<span class="err">6</span><span class="nf">E308EFE</span>  <span class="nv">cmeq</span>        <span class="nv">v30.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v16.16b</span>
<span class="err">6</span><span class="nf">E318EFF</span>  <span class="nv">cmeq</span>        <span class="nv">v31.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v17.16b</span>
<span class="err">6</span><span class="nf">E328EE7</span>  <span class="nv">cmeq</span>        <span class="nv">v7.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v18.16b</span>
<span class="err">6</span><span class="nf">E338EF7</span>  <span class="nv">cmeq</span>        <span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v19.16b</span>
<span class="err">4</span><span class="nf">EBF1FDE</span>  <span class="nv">orr</span>         <span class="nv">v30.16b</span><span class="p">,</span><span class="nv">v30.16b</span><span class="p">,</span><span class="nv">v31.16b</span>
<span class="err">4</span><span class="nf">EBE1EFE</span>  <span class="nv">orr</span>         <span class="nv">v30.16b</span><span class="p">,</span><span class="nv">v23.16b</span><span class="p">,</span><span class="nv">v30.16b</span>
<span class="err">4</span><span class="nf">EA71FDF</span>  <span class="nv">orr</span>         <span class="nv">v31.16b</span><span class="p">,</span><span class="nv">v30.16b</span><span class="p">,</span><span class="nv">v7.16b</span>
<span class="err">9</span><span class="nf">C0019E7</span>  <span class="nv">ldr</span>         <span class="nv">q7</span><span class="p">,</span><span class="mi">000000014007</span><span class="nv">AD30</span>
<span class="err">4</span><span class="nf">E271F39</span>  <span class="nv">and</span>         <span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v7.16b</span>
<span class="err">4</span><span class="nf">E271F7B</span>  <span class="nv">and</span>         <span class="nv">v27.16b</span><span class="p">,</span><span class="nv">v27.16b</span><span class="p">,</span><span class="nv">v7.16b</span>
<span class="err">4</span><span class="nf">E3BBF39</span>  <span class="nv">addp</span>        <span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v27.16b</span>
<span class="err">4</span><span class="nf">E271FBB</span>  <span class="nv">and</span>         <span class="nv">v27.16b</span><span class="p">,</span><span class="nv">v29.16b</span><span class="p">,</span><span class="nv">v7.16b</span>
<span class="err">4</span><span class="nf">E271FFD</span>  <span class="nv">and</span>         <span class="nv">v29.16b</span><span class="p">,</span><span class="nv">v31.16b</span><span class="p">,</span><span class="nv">v7.16b</span>
<span class="err">4</span><span class="nf">E3DBF7B</span>  <span class="nv">addp</span>        <span class="nv">v27.16b</span><span class="p">,</span><span class="nv">v27.16b</span><span class="p">,</span><span class="nv">v29.16b</span>
<span class="err">4</span><span class="nf">E3BBF39</span>  <span class="nv">addp</span>        <span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v27.16b</span>
<span class="err">4</span><span class="nf">E39BF39</span>  <span class="nv">addp</span>        <span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v25.16b</span><span class="p">,</span><span class="nv">v25.16b</span>
<span class="err">4</span><span class="nf">E083F2D</span>  <span class="nv">mov</span>         <span class="nv">x13</span><span class="p">,</span><span class="nv">v25.d</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="nf">B4FFF8AD</span>  <span class="nv">cbz</span>         <span class="nv">x13</span><span class="p">,</span><span class="mi">000000014007</span><span class="nv">A930</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="links">Links</h2>

<p>Interesting links related to ARM SIMD. The first one explains the approach
adopted by Sep for efficient bulk move mask in ARM NEON. Others, can be used for
inspiration or perhaps future improvements to Sep.</p>

<ul>
  <li><a href="https://branchfree.org/2019/04/01/fitting-my-head-through-the-arm-holes-or-two-sequences-to-substitute-for-the-missing-pmovmskb-instruction-on-arm-neon/">Fitting My Head Through The ARM Holes or: Two Sequences to Substitute for the
 Missing PMOVMSKB Instruction on ARM
 NEON</a>
 by Geoff Langdale.</li>
  <li><a href="https://lemire.me/blog/2017/07/10/pruning-spaces-faster-on-arm-processors-with-vector-table-lookups/">Pruning Spaces Faster on ARM Processors with Vector Table
 Lookups</a>
 by Daniel Lemire.</li>
  <li><a href="https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon">Bit twiddling with Arm Neon: beating SSE movemasks, counting bits and more</a>
 by Danila Kutenin.</li>
  <li><a href="https://developer.arm.com/architectures/instruction-sets/intrinsics">ARM Intrinsics</a></li>
</ul>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Sep 0.11.0 was released June 12th, 2025 with a new parser optimized specifically for ARM NEON SIMD (called AdvSimd in .NET) capable ARM64 CPUs like the Apple M1) or the new Microsoft cloud Cobalt 100, which is based on the ARM Neoverse N2.]]></summary></entry><entry><title type="html">How to Get Windows 8.3 Short File Names Using FindFirstFileW (UNC) and GetShortPathName (local) in C#</title><link href="https://nietras.com/2025/05/26/windows-short-path-names/" rel="alternate" type="text/html" title="How to Get Windows 8.3 Short File Names Using FindFirstFileW (UNC) and GetShortPathName (local) in C#" /><published>2025-05-26T00:00:00+00:00</published><updated>2025-05-26T00:00:00+00:00</updated><id>https://nietras.com/2025/05/26/windows-short-path-names</id><content type="html" xml:base="https://nietras.com/2025/05/26/windows-short-path-names/"><![CDATA[<p>Working with files and directories on Windows, especially on network shares,
often leads to issues with long paths. This post explains what Windows 8.3 short
paths are, how they work in NTFS, the difference between local and UNC paths,
why browsers and some tools fail with long network paths, and how to
programmatically obtain short file names for both local and UNC files.</p>

<p>First, a bit of color on why we need this for our machine learning workflow.</p>

<h2 id="use-case-and-problem">Use Case and Problem</h2>

<p>At work we train all our machine learning image models in house on custom build
servers featuring GPUs like NVIDIA RTX 4090. We have a lot of data - collected
from production sites all over the world - and this data is usually split in
two.</p>

<ol>
  <li><strong>Ground truth annotations</strong> are typically defined in <code class="language-plaintext highlighter-rouge">csv</code>-files that are stored
in git repositories and published as versioned NuGet packages. These packages
are then consumed by different pipelines that are run as Azure Pipelines on
the servers. We then use <a href="/2024/07/09/renovate-azure-devops/">Renovate to automatically bump versions</a> of these.</li>
  <li><strong>Images</strong> are stored on a file server on a network share, and typically also
cached at specific resolutions locally on the servers to speed up training.</li>
</ol>

<p>At the same time the path to and file name of the images usually follows a
schema so the path alone contains information relevant to a given image. This
means paths can get quite long. Certainly above 260 characters on the file
server.</p>

<p>Example path schema for directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>\\fileserver\Pipelines\&lt;PROJECTNAME&gt;\&lt;SETDATETIME&gt;_&lt;SETNAME&gt;_&lt;SITE&gt;\&lt;STATION&gt;\
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Example file name schema:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>&lt;DATETIME&gt;_Camera=&lt;CAMERANAME&gt;_Id=&lt;ID&gt;_&lt;DETAILS&gt;.png
</pre></td></tr></tbody></table></code></pre></div></div>
<p>For example:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This is a simple pragmatic setup that works very well and a setup we have
iterated on over many years and runs very smoothly. Each pipeline is a separate
git repo that is easy to get started with by simply cloning and hitting F5 in
Visual Studio or similar for local running.</p>

<p>As part of the output of the pipelines we generate a lot of data either as
simple <code class="language-plaintext highlighter-rouge">csv</code>-files, plots <code class="language-plaintext highlighter-rouge">png</code>-files or interactive reports in the form of
<code class="language-plaintext highlighter-rouge">html</code>-files with inline JavaScript and data.</p>

<p>Below is an example showing box plots including all samples as dots for a
regression model with different “levels”. The box plots are interactive and when
pointing on a singular dot (e.g. a “minimum” outlier) the image for that sample
will be shown on the right. Usually, but not here since if the path is too long
(&gt; 260 chars) the browser won’t show it.</p>

<p><img src="/images/2025-05-windows-short-path-names/example-boxplot-outlier-image.png" alt="Box plot and outlier selected but image not shown" /></p>

<p>This is where there can be issues since browsers do not support showing long
file paths, e.g. see <a href="https://issues.chromium.org/issues/40134281">chromium: Long file path handling not working on
Windows</a>. And browsers do not
support extended length prefix <code class="language-plaintext highlighter-rouge">\\?\</code> that one can otherwise use to escape the
<code class="language-plaintext highlighter-rouge">MAX_PATH</code> 260 char limit in Win32 APIs.</p>

<p>Now if you SHIFT right click on a file in Windows Explorer and click <strong>Copy as
path</strong> you will get the 8.3 short file name path if the file path is longer than
<code class="language-plaintext highlighter-rouge">MAX_PATH</code>, as shown below both for a UNC file path and a local path name.</p>

<p>Copy as path for a local file resulting in
<code class="language-plaintext highlighter-rouge">C:\Temp\LONGFI~1\VERYLO~1\202501~1.PNG</code>:</p>

<p><img src="/images/2025-05-windows-short-path-names/verylongpath-local-copy-as-path.png" alt="" /></p>

<p>Copy as path for a network share file resulting in
<code class="language-plaintext highlighter-rouge">\\files\Pipelines\LXF59T~G\V1WO8N~7\290O13~C.PNG</code>: 
<img src="/images/2025-05-windows-short-path-names/verylongpath-unc-copy-as-path.png" alt="" /></p>

<p>Similarly, if you right click and select <strong>Open with…</strong> and select a browser
like Chrome then it will use the short path name for long file paths.</p>

<p>Hence, to fix the above interactive box plots we need to get the short path name
in C#, as this is what we use to generate the <code class="language-plaintext highlighter-rouge">html</code>-files. This has the added
benefit of reducing the amount of data we have to store in the <code class="language-plaintext highlighter-rouge">html</code>-file since
the path name is shorter (we already do some custom path compression on this
using a simple common prefix algorithm). Our data sets can have hundreds of
thousands of images so every byte counts.</p>

<p>Incidentally, it would have been nice if the browsers would support loading
gzip’ed html e.g. files directly, but as far as I know they do not, and spinning
up a web server just for this just seemed overkill.</p>

<p>So let’s take a quick look at Windows 8.3 short paths and some example code on
how get them in C# for both local and UNC paths.</p>

<h2 id="what-are-windows-83-short-paths">What Are Windows 8.3 Short Paths?</h2>

<p>Windows 8.3 short paths (also called “short file names” or SFN) are a legacy
feature from MS-DOS and early Windows versions. They provide a way to represent
long file and directory names (introduced with Windows 95 and NTFS) in a format
compatible with older software that only supports 8-character filenames and
3-character extensions (e.g., <code class="language-plaintext highlighter-rouge">MYDOCU~1.TXT</code>). See <a href="https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#short-vs-long-names">Naming Files, Paths, and
Namespaces - Short vs. Long
Names</a>
or <a href="https://en.wikipedia.org/wiki/8.3_filename">wikipedia 8.3 filename</a> for
more. <strong>Example:</strong></p>

<ul>
  <li>Long name: <code class="language-plaintext highlighter-rouge">C:\Program Files\My Application\readme.txt</code></li>
  <li>Short name: <code class="language-plaintext highlighter-rouge">C:\PROGRA~1\MYAPPL~1\README.TXT</code></li>
</ul>

<h2 id="how-are-83-short-paths-stored-in-ntfs">How Are 8.3 Short Paths Stored in NTFS?</h2>

<p><a href="https://en.wikipedia.org/wiki/NTFS#Metadata">NTFS</a> stores both the long and
short (8.3) names for each file and directory, if 8.3 name generation is
enabled. The short name is generated when a file is created, following specific
rules to ensure uniqueness.</p>

<ul>
  <li><strong>Short names are stored as metadata</strong> in the NTFS file system.</li>
  <li>You can disable 8.3 name creation for performance reasons, but this may break
compatibility with legacy applications.</li>
</ul>

<h2 id="local-paths-vs-unc-paths">Local Paths vs. UNC Paths</h2>

<ul>
  <li><strong>Local paths</strong> refer to files on a local drive, e.g., <code class="language-plaintext highlighter-rouge">C:\folder\file.txt</code>.</li>
  <li><strong>UNC (Universal Naming Convention) paths</strong> refer to files on a network share, e.g., <code class="language-plaintext highlighter-rouge">\\server\share\folder\file.txt</code>.</li>
</ul>

<p><strong>Key differences:</strong></p>
<ul>
  <li>Local paths are handled directly by the local file system.</li>
  <li><a href="https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats#unc-paths">UNC Paths</a> 
are resolved over the network, and some Windows APIs behave differently or 
have limitations with UNC paths.</li>
</ul>

<h2 id="why-browsers-and-some-tools-fail-with-long-network-paths">Why Browsers and Some Tools Fail with Long Network Paths</h2>

<p>Windows has a <a href="https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation">traditional <code class="language-plaintext highlighter-rouge">MAX_PATH</code> limit of 260
characters</a>
for file paths. While modern Windows versions and .NET can support longer paths
(with configuration), many tools - including browsers - do not support long UNC
paths due to:</p>

<ul>
  <li>Lack of support for the <code class="language-plaintext highlighter-rouge">\\?\</code> extended-length path prefix.</li>
  <li>Network shares (UNC) are not always handled with the same APIs as local paths.</li>
  <li>Browsers use standard Windows APIs that may not support long paths or UNC paths.</li>
</ul>

<h2 id="getting-the-83-short-path-programmatically">Getting the 8.3 Short Path Programmatically</h2>

<h3 id="for-local-paths">For Local Paths</h3>

<p>Use the Windows API function <a href="https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getshortpathnamew">GetShortPathName</a>:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="p">[</span><span class="nf">DllImport</span><span class="p">(</span><span class="s">"kernel32.dll"</span><span class="p">,</span> <span class="n">CharSet</span> <span class="p">=</span> <span class="n">CharSet</span><span class="p">.</span><span class="n">Unicode</span><span class="p">,</span> <span class="n">SetLastError</span> <span class="p">=</span> <span class="k">true</span><span class="p">)]</span>
<span class="k">static</span> <span class="k">extern</span> <span class="kt">uint</span> <span class="nf">GetShortPathName</span><span class="p">(</span><span class="kt">string</span> <span class="n">lpszLongPath</span><span class="p">,</span> <span class="n">StringBuilder</span> <span class="n">lpszShortPath</span><span class="p">,</span> <span class="kt">uint</span> <span class="n">cchBuffer</span><span class="p">);</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<ul>
  <li>Works for local file system paths (e.g., <code class="language-plaintext highlighter-rouge">C:\...</code>).</li>
  <li>Does <strong>not</strong> work for UNC paths.</li>
</ul>

<h3 id="for-unc-paths-network-shares">For UNC Paths (Network Shares)</h3>

<p>Use
<a href="https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstfilew">FindFirstFileW</a>
to retrieve the 8.3 name for each segment of the path:</p>

<ul>
  <li>Iterate over each directory/file segment in the UNC path.</li>
  <li>For each, call <code class="language-plaintext highlighter-rouge">FindFirstFileW</code> and use the <code class="language-plaintext highlighter-rouge">cAlternateFileName</code> 
field from the returned <code class="language-plaintext highlighter-rouge">WIN32_FIND_DATA</code> structure.</li>
  <li>Rebuild the path using the 8.3 names.</li>
</ul>

<p>Code for this including <code class="language-plaintext highlighter-rouge">struct</code> definitions is shown below.</p>

<h3 id="summary-table">Summary Table</h3>

<table>
  <thead>
    <tr>
      <th>Type</th>
      <th>API to Use</th>
      <th>Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Local</td>
      <td><code class="language-plaintext highlighter-rouge">GetShortPathName</code></td>
      <td>Direct, simple</td>
    </tr>
    <tr>
      <td>Network</td>
      <td><code class="language-plaintext highlighter-rouge">FindFirstFileW</code></td>
      <td>Iterate segments</td>
    </tr>
  </tbody>
</table>

<h2 id="win32shortpath-class">Win32ShortPath Class</h2>

<p>The <code class="language-plaintext highlighter-rouge">Win32ShortPath</code> class shown below is factored into the following main
methods.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">GetShortPath</code> - Entry point. Normalizes the path, checks if it exists, 
and dispatches to the correct method:
    <ul>
      <li>For local paths, calls <code class="language-plaintext highlighter-rouge">GetByGetShortPathName</code>.</li>
      <li>For UNC paths, calls <code class="language-plaintext highlighter-rouge">GetByFindFirstFile</code>.</li>
    </ul>
  </li>
  <li><code class="language-plaintext highlighter-rouge">GetByGetShortPathName</code> - Uses the <code class="language-plaintext highlighter-rouge">GetShortPathName</code> Win32 API to get 
the short path for local files and directories. <code class="language-plaintext highlighter-rouge">GetShortPathName</code> requires
long paths to be “escaped” with the extended length prefix <code class="language-plaintext highlighter-rouge">\\?\</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">GetByFindFirstFile</code> - For UNC paths, splits the path into segments and uses 
<code class="language-plaintext highlighter-rouge">FindFirstFileW</code> to retrieve the 8.3 short name for each segment, rebuilding 
the full short path.
    <ul>
      <li><code class="language-plaintext highlighter-rouge">AppendPartsByFindFirstFile</code> - Iterates through each path segment, using 
<code class="language-plaintext highlighter-rouge">FindFirstFileW</code> to get the short name (cAlternateFileName) if available.</li>
    </ul>
  </li>
</ul>

<p>Note that it does not handle all edge cases including using <code class="language-plaintext highlighter-rouge">/</code> as directory
separator or similar. It is also not the version we use internally as here we
already have the path defined as parts (typically) matching CSV-columns.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
</pre></td><td class="rouge-code"><pre><span class="k">using</span> <span class="nn">System.ComponentModel</span><span class="p">;</span>
<span class="k">using</span> <span class="nn">System.Runtime.InteropServices</span><span class="p">;</span>
<span class="k">using</span> <span class="nn">System.Text</span><span class="p">;</span>

<span class="k">public</span> <span class="k">static</span> <span class="k">partial</span> <span class="k">class</span> <span class="nc">Win32ShortPath</span>
<span class="p">{</span>
    <span class="k">const</span> <span class="kt">int</span> <span class="n">MaxPathLength</span> <span class="p">=</span> <span class="m">260</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">int</span> <span class="n">UncPrefixPartCount</span> <span class="p">=</span> <span class="m">2</span><span class="p">;</span>
    <span class="k">static</span> <span class="k">readonly</span> <span class="kt">char</span> <span class="n">DirectorySeparator</span> <span class="p">=</span> <span class="n">Path</span><span class="p">.</span><span class="n">DirectorySeparatorChar</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">string</span> <span class="n">UncPrefix</span> <span class="p">=</span> <span class="s">@"\\"</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">string</span> <span class="n">ExtendedLengthPrefix</span> <span class="p">=</span> <span class="s">@"\\?\"</span><span class="p">;</span>

    <span class="c1">/// &lt;summary&gt;</span>
    <span class="c1">/// Returns the short (8.3) path for any file or directory if </span>
    <span class="c1">/// available, covering both local and UNC paths.</span>
    <span class="c1">/// &lt;/summary&gt;</span>
    <span class="k">public</span> <span class="k">static</span> <span class="kt">string</span> <span class="nf">GetShortPath</span><span class="p">(</span><span class="kt">string</span> <span class="n">longPath</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="kt">string</span><span class="p">.</span><span class="nf">IsNullOrWhiteSpace</span><span class="p">(</span><span class="n">longPath</span><span class="p">))</span>
        <span class="p">{</span> <span class="k">throw</span> <span class="k">new</span> <span class="nf">ArgumentNullException</span><span class="p">(</span><span class="k">nameof</span><span class="p">(</span><span class="n">longPath</span><span class="p">));</span> <span class="p">}</span>

        <span class="c1">// Normalize</span>
        <span class="n">longPath</span> <span class="p">=</span> <span class="n">Path</span><span class="p">.</span><span class="nf">GetFullPath</span><span class="p">(</span><span class="n">longPath</span><span class="p">);</span>

        <span class="k">if</span> <span class="p">(!</span><span class="n">File</span><span class="p">.</span><span class="nf">Exists</span><span class="p">(</span><span class="n">longPath</span><span class="p">)</span> <span class="p">&amp;&amp;</span> <span class="p">!</span><span class="n">Directory</span><span class="p">.</span><span class="nf">Exists</span><span class="p">(</span><span class="n">longPath</span><span class="p">))</span>
        <span class="p">{</span> <span class="k">throw</span> <span class="k">new</span> <span class="nf">FileNotFoundException</span><span class="p">(</span><span class="s">"Path not found"</span><span class="p">,</span> <span class="n">longPath</span><span class="p">);</span> <span class="p">}</span>

        <span class="kt">var</span> <span class="n">isUnc</span> <span class="p">=</span> <span class="n">longPath</span><span class="p">.</span><span class="nf">StartsWith</span><span class="p">(</span><span class="n">UncPrefix</span><span class="p">,</span> <span class="n">StringComparison</span><span class="p">.</span><span class="n">Ordinal</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">isUnc</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">GetByFindFirstFile</span><span class="p">(</span><span class="n">longPath</span><span class="p">);</span> <span class="p">}</span>

        <span class="kt">var</span> <span class="n">isLong</span> <span class="p">=</span> <span class="n">longPath</span><span class="p">.</span><span class="n">Length</span> <span class="p">&gt;</span> <span class="n">MaxPathLength</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">isLong</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="c1">// Ensure starts with extended length prefix</span>
            <span class="kt">var</span> <span class="n">extendedPrefix</span> <span class="p">=</span> <span class="n">longPath</span><span class="p">.</span><span class="nf">StartsWith</span><span class="p">(</span><span class="n">ExtendedLengthPrefix</span><span class="p">,</span>
                <span class="n">StringComparison</span><span class="p">.</span><span class="n">Ordinal</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(!</span><span class="n">extendedPrefix</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="n">longPath</span> <span class="p">=</span> <span class="n">ExtendedLengthPrefix</span> <span class="p">+</span> <span class="n">longPath</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="kt">var</span> <span class="n">shortPathName</span> <span class="p">=</span> <span class="nf">GetByGetShortPathName</span><span class="p">(</span><span class="n">longPath</span><span class="p">);</span>
            <span class="c1">// Remove the extended length prefix if added</span>
            <span class="k">if</span> <span class="p">(!</span><span class="n">extendedPrefix</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="n">shortPathName</span> <span class="p">=</span> <span class="n">shortPathName</span><span class="p">[</span><span class="n">ExtendedLengthPrefix</span><span class="p">.</span><span class="n">Length</span><span class="p">..];</span>
            <span class="p">}</span>
            <span class="k">return</span> <span class="n">shortPathName</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">else</span>
        <span class="p">{</span>
            <span class="k">return</span> <span class="nf">GetByGetShortPathName</span><span class="p">(</span><span class="n">longPath</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="c1">// Local drive (C:\…), use GetShortPathName directly, for long prefix with \\?\</span>
    <span class="k">static</span> <span class="kt">string</span> <span class="nf">GetByGetShortPathName</span><span class="p">(</span><span class="kt">string</span> <span class="n">longPath</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="kt">var</span> <span class="n">sb</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">StringBuilder</span><span class="p">(</span><span class="n">MaxPathLength</span><span class="p">);</span>
        <span class="kt">uint</span> <span class="n">size</span> <span class="p">=</span> <span class="nf">GetShortPathName</span><span class="p">(</span><span class="n">longPath</span><span class="p">,</span> <span class="n">sb</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint</span><span class="p">)</span><span class="n">sb</span><span class="p">.</span><span class="n">Capacity</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">size</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span> <span class="p">{</span> <span class="k">throw</span> <span class="k">new</span> <span class="nf">Win32Exception</span><span class="p">(</span><span class="n">Marshal</span><span class="p">.</span><span class="nf">GetLastWin32Error</span><span class="p">());</span> <span class="p">}</span>
        <span class="k">return</span> <span class="n">sb</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
    <span class="p">}</span>

    <span class="c1">// UNC path: \\server\share\…\file.ext, use FindFirstFile iteratively</span>
    <span class="k">static</span> <span class="kt">string</span> <span class="nf">GetByFindFirstFile</span><span class="p">(</span><span class="kt">string</span> <span class="n">longPath</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="c1">// Rebuild segment by segment, use FindFirstFile for the </span>
        <span class="c1">// 8.3 name of each.</span>
        <span class="kt">var</span> <span class="n">parts</span> <span class="p">=</span> <span class="n">longPath</span><span class="p">.</span><span class="nf">TrimStart</span><span class="p">(</span><span class="n">DirectorySeparator</span><span class="p">)</span>
                            <span class="p">.</span><span class="nf">Split</span><span class="p">(</span><span class="n">DirectorySeparator</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">parts</span><span class="p">.</span><span class="n">Length</span> <span class="p">&lt;</span> <span class="n">UncPrefixPartCount</span><span class="p">)</span>
        <span class="p">{</span> <span class="k">throw</span> <span class="k">new</span> <span class="nf">ArgumentException</span><span class="p">(</span><span class="s">$"Invalid UNC path '</span><span class="p">{</span><span class="n">longPath</span><span class="p">}</span><span class="s">'"</span><span class="p">,</span> <span class="k">nameof</span><span class="p">(</span><span class="n">longPath</span><span class="p">));</span> <span class="p">}</span>

        <span class="kt">var</span> <span class="n">sb</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">StringBuilder</span><span class="p">(</span><span class="n">MaxPathLength</span><span class="p">);</span>

        <span class="c1">// Re‑prefix with \\server\share</span>
        <span class="n">sb</span><span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="n">UncPrefix</span><span class="p">)</span>
          <span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="n">parts</span><span class="p">[</span><span class="m">0</span><span class="p">])</span>
          <span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="n">DirectorySeparator</span><span class="p">)</span>
          <span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="n">parts</span><span class="p">[</span><span class="m">1</span><span class="p">]);</span>

        <span class="nf">AppendPartsByFindFirstFile</span><span class="p">(</span><span class="n">parts</span><span class="p">,</span> <span class="n">UncPrefixPartCount</span><span class="p">,</span> <span class="n">sb</span><span class="p">);</span>

        <span class="k">return</span> <span class="n">sb</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
    <span class="p">}</span>

    <span class="k">static</span> <span class="k">void</span> <span class="nf">AppendPartsByFindFirstFile</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">parts</span><span class="p">,</span> <span class="kt">int</span> <span class="n">partStart</span><span class="p">,</span>
                                           <span class="n">StringBuilder</span> <span class="n">sb</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="n">partStart</span><span class="p">;</span> <span class="n">i</span> <span class="p">&lt;</span> <span class="n">parts</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
        <span class="p">{</span>
            <span class="kt">var</span> <span class="n">currentPart</span> <span class="p">=</span> <span class="n">parts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
            <span class="kt">var</span> <span class="n">pathToQuery</span> <span class="p">=</span> <span class="s">$"</span><span class="p">{</span><span class="n">sb</span><span class="p">}{</span><span class="n">DirectorySeparator</span><span class="p">}{</span><span class="n">currentPart</span><span class="p">}</span><span class="s">"</span><span class="p">;</span>
            <span class="kt">var</span> <span class="n">findHandle</span> <span class="p">=</span> <span class="nf">FindFirstFileW</span><span class="p">(</span><span class="n">pathToQuery</span><span class="p">,</span> <span class="k">out</span> <span class="kt">var</span> <span class="n">findData</span><span class="p">);</span>
            <span class="n">sb</span><span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="n">DirectorySeparator</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">findHandle</span> <span class="p">!=</span> <span class="n">IntPtr</span><span class="p">.</span><span class="n">Zero</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="nf">FindClose</span><span class="p">(</span><span class="n">findHandle</span><span class="p">);</span>
                <span class="c1">// If there's an alternate (8.3) name, use it</span>
                <span class="kt">var</span> <span class="n">alternatePart</span> <span class="p">=</span> <span class="n">findData</span><span class="p">.</span><span class="n">cAlternateFileName</span><span class="p">;</span>
                <span class="n">currentPart</span> <span class="p">=</span> <span class="kt">string</span><span class="p">.</span><span class="nf">IsNullOrEmpty</span><span class="p">(</span><span class="n">alternatePart</span><span class="p">)</span>
                    <span class="p">?</span> <span class="n">currentPart</span> <span class="p">:</span> <span class="n">alternatePart</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="n">sb</span><span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="n">currentPart</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="p">[</span><span class="nf">DllImport</span><span class="p">(</span><span class="s">"kernel32.dll"</span><span class="p">,</span> <span class="n">CharSet</span> <span class="p">=</span> <span class="n">CharSet</span><span class="p">.</span><span class="n">Unicode</span><span class="p">,</span> <span class="n">SetLastError</span> <span class="p">=</span> <span class="k">true</span><span class="p">)]</span>
    <span class="k">static</span> <span class="k">extern</span> <span class="kt">uint</span> <span class="nf">GetShortPathName</span><span class="p">(</span><span class="kt">string</span> <span class="n">lpszLongPath</span><span class="p">,</span>
        <span class="n">StringBuilder</span> <span class="n">lpszShortPath</span><span class="p">,</span> <span class="kt">uint</span> <span class="n">cchBuffer</span><span class="p">);</span>

    <span class="p">[</span><span class="nf">StructLayout</span><span class="p">(</span><span class="n">LayoutKind</span><span class="p">.</span><span class="n">Sequential</span><span class="p">,</span> <span class="n">CharSet</span> <span class="p">=</span> <span class="n">CharSet</span><span class="p">.</span><span class="n">Unicode</span><span class="p">,</span> <span class="n">Pack</span> <span class="p">=</span> <span class="m">1</span><span class="p">)]</span>
    <span class="k">struct</span> <span class="nc">WIN32_FIND_DATA</span>
    <span class="p">{</span>
        <span class="k">const</span> <span class="kt">int</span> <span class="n">AlternateLength</span> <span class="p">=</span> <span class="m">14</span><span class="p">;</span>

        <span class="k">public</span> <span class="n">FileAttributes</span> <span class="n">dwFileAttributes</span><span class="p">;</span>
        <span class="k">public</span> <span class="kt">long</span> <span class="n">ftCreationTime</span><span class="p">;</span>
        <span class="k">public</span> <span class="kt">long</span> <span class="n">ftLastAccessTime</span><span class="p">;</span>
        <span class="k">public</span> <span class="kt">long</span> <span class="n">ftLastWriteTime</span><span class="p">;</span>
        <span class="k">public</span> <span class="kt">uint</span> <span class="n">nFileSizeHigh</span><span class="p">;</span>
        <span class="k">public</span> <span class="kt">uint</span> <span class="n">nFileSizeLow</span><span class="p">;</span>
        <span class="k">public</span> <span class="kt">uint</span> <span class="n">dwReserved0</span><span class="p">;</span>
        <span class="k">public</span> <span class="kt">uint</span> <span class="n">dwReserved1</span><span class="p">;</span>
        <span class="p">[</span><span class="nf">MarshalAs</span><span class="p">(</span><span class="n">UnmanagedType</span><span class="p">.</span><span class="n">ByValTStr</span><span class="p">,</span> <span class="n">SizeConst</span> <span class="p">=</span> <span class="n">MaxPathLength</span><span class="p">)]</span>
        <span class="k">public</span> <span class="kt">string</span> <span class="n">cFileName</span><span class="p">;</span>
        <span class="p">[</span><span class="nf">MarshalAs</span><span class="p">(</span><span class="n">UnmanagedType</span><span class="p">.</span><span class="n">ByValTStr</span><span class="p">,</span> <span class="n">SizeConst</span> <span class="p">=</span> <span class="n">AlternateLength</span><span class="p">)]</span>
        <span class="k">public</span> <span class="kt">string</span> <span class="n">cAlternateFileName</span><span class="p">;</span> <span class="c1">// The 8.3 name</span>
    <span class="p">}</span>

    <span class="p">[</span><span class="nf">DllImport</span><span class="p">(</span><span class="s">"kernel32.dll"</span><span class="p">,</span> <span class="n">CharSet</span> <span class="p">=</span> <span class="n">CharSet</span><span class="p">.</span><span class="n">Unicode</span><span class="p">,</span> <span class="n">SetLastError</span> <span class="p">=</span> <span class="k">true</span><span class="p">)]</span>
    <span class="k">static</span> <span class="k">extern</span> <span class="n">IntPtr</span> <span class="nf">FindFirstFileW</span><span class="p">(</span><span class="kt">string</span> <span class="n">lpFileName</span><span class="p">,</span>
        <span class="k">out</span> <span class="n">WIN32_FIND_DATA</span> <span class="n">lpFindFileData</span><span class="p">);</span>

    <span class="p">[</span><span class="nf">DllImport</span><span class="p">(</span><span class="s">"kernel32.dll"</span><span class="p">,</span> <span class="n">CharSet</span> <span class="p">=</span> <span class="n">CharSet</span><span class="p">.</span><span class="n">Unicode</span><span class="p">,</span> <span class="n">SetLastError</span> <span class="p">=</span> <span class="k">true</span><span class="p">)]</span>
    <span class="k">static</span> <span class="k">extern</span> <span class="n">IntPtr</span> <span class="nf">FindFirstFileW</span><span class="p">(</span><span class="n">StringBuilder</span> <span class="n">lpFileName</span><span class="p">,</span>
        <span class="k">out</span> <span class="n">WIN32_FIND_DATA</span> <span class="n">lpFindFileData</span><span class="p">);</span>

    <span class="p">[</span><span class="nf">DllImport</span><span class="p">(</span><span class="s">"kernel32.dll"</span><span class="p">,</span> <span class="n">CharSet</span> <span class="p">=</span> <span class="n">CharSet</span><span class="p">.</span><span class="n">Unicode</span><span class="p">,</span> <span class="n">SetLastError</span> <span class="p">=</span> <span class="k">true</span><span class="p">)]</span>
    <span class="k">static</span> <span class="k">extern</span> <span class="kt">bool</span> <span class="nf">FindClose</span><span class="p">(</span><span class="n">IntPtr</span> <span class="n">hFindFile</span><span class="p">);</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="example-program">Example Program</h2>

<p>Below is a simple example program using this with some (truncated) example paths
that I created for testing:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td class="rouge-code"><pre><span class="k">using</span> <span class="nn">System.Diagnostics</span><span class="p">;</span>
<span class="n">Action</span><span class="p">&lt;</span><span class="kt">string</span><span class="p">&gt;</span> <span class="n">log</span> <span class="p">=</span> <span class="n">t</span> <span class="p">=&gt;</span> <span class="p">{</span> <span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">t</span><span class="p">);</span> <span class="n">Trace</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">t</span><span class="p">);</span> <span class="p">};</span>

<span class="n">ReadOnlySpan</span><span class="p">&lt;</span><span class="n">Test</span><span class="p">&gt;</span> <span class="n">tests</span> <span class="p">=</span>
<span class="p">[</span>
    <span class="c1">// Unc path long</span>
    <span class="k">new</span><span class="p">(</span><span class="s">@"\\files\Pipelines\LongFilePathTest\VeryLong...\20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png"</span><span class="p">,</span>
        <span class="s">@"\\files\Pipelines\LXF59T~G\V1WO8N~7\290O13~C.PNG"</span><span class="p">),</span>
    <span class="c1">// Local path long</span>
    <span class="k">new</span><span class="p">(</span><span class="s">@"C:\Temp\LongFilePathTest\VeryLong...\20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png"</span><span class="p">,</span>
        <span class="s">@"C:\Temp\LONGFI~1\VERYLO~1\202501~1.PNG"</span><span class="p">),</span>
    <span class="c1">// Local path long with extended prefix</span>
    <span class="k">new</span><span class="p">(</span><span class="s">@"\\?\C:\Temp\LongFilePathTest\VeryLong...\20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png"</span><span class="p">,</span>
        <span class="s">@"\\?\C:\Temp\LONGFI~1\VERYLO~1\202501~1.PNG"</span><span class="p">),</span>
    <span class="c1">// Local path not long</span>
    <span class="k">new</span><span class="p">(</span><span class="s">@"C:\Temp\LongFilePathTest\NotLong\20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png"</span><span class="p">,</span>
        <span class="s">@"C:\Temp\LONGFI~1\NotLong\202501~1.PNG"</span><span class="p">),</span>
<span class="p">];</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">t</span> <span class="k">in</span> <span class="n">tests</span><span class="p">)</span>
<span class="p">{</span>
    <span class="nf">log</span><span class="p">(</span><span class="s">$"Long Path: '</span><span class="p">{</span><span class="n">t</span><span class="p">.</span><span class="n">Path</span><span class="p">}</span><span class="s">'"</span><span class="p">);</span>
    <span class="nf">log</span><span class="p">(</span><span class="s">$"File Exists: </span><span class="p">{</span><span class="n">File</span><span class="p">.</span><span class="nf">Exists</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">Path</span><span class="p">)}</span><span class="s">"</span><span class="p">);</span>
    <span class="kt">var</span> <span class="n">shortPath</span> <span class="p">=</span> <span class="n">Win32ShortPath</span><span class="p">.</span><span class="nf">GetShortPath</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">Path</span><span class="p">);</span>
    <span class="nf">log</span><span class="p">(</span><span class="s">$"Short Path: '</span><span class="p">{</span><span class="n">shortPath</span><span class="p">}</span><span class="s">'"</span><span class="p">);</span>
    <span class="nf">log</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">ShortPathExpected</span> <span class="p">==</span> <span class="n">shortPath</span> <span class="p">?</span> <span class="s">"PASSED"</span> <span class="p">:</span> <span class="s">"FAILED"</span><span class="p">);</span>
    <span class="nf">log</span><span class="p">(</span><span class="s">""</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">public</span> <span class="n">record</span> <span class="nf">Test</span><span class="p">(</span><span class="kt">string</span> <span class="n">Path</span><span class="p">,</span> <span class="kt">string</span> <span class="n">ShortPathExpected</span><span class="p">);</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>When run, this will output the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="rouge-code"><pre>Long Path: '\\files\Pipelines\LongFilePathTest\VeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVery\20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png'
File Exists: True
Short Path: '\\files\Pipelines\LXF59T~G\V1WO8N~7\290O13~C.PNG'
PASSED

Long Path: 'C:\Temp\LongFilePathTest\VeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVery\20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png'
File Exists: True
Short Path: 'C:\Temp\LONGFI~1\VERYLO~1\202501~1.PNG'
PASSED

Long Path: '\\?\C:\Temp\LongFilePathTest\VeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVeryLongVery\20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png'
File Exists: True
Short Path: '\\?\C:\Temp\LONGFI~1\VERYLO~1\202501~1.PNG'
PASSED

Long Path: 'C:\Temp\LongFilePathTest\NotLong\20250102.123456.789_Camera=Primary_Id=9999999_x=0123_y=0234_w=1234_h=2345.png'
File Exists: True
Short Path: 'C:\Temp\LONGFI~1\NotLong\202501~1.PNG'
PASSED
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="considerations">Considerations</h2>

<p>8.3 short path names on Windows are not guaranteed to be stable across repeated
file deletions and recreations — even if the long file name remains the same.</p>

<ul>
  <li>8.3 short names are dynamically assigned by the Windows file system (usually
NTFS) and are not deterministic. If you delete a file and create a new one
with the same long name, the 8.3 name might:
    <ul>
      <li>Remain the same (if no name conflict occurs),</li>
      <li>Change (if the name was reused or reclaimed for another file),</li>
      <li>Or not be created at all (if 8.3 name generation is disabled or restricted).</li>
    </ul>
  </li>
  <li>Name conflicts are resolved by appending a numeric suffix (e.g., <code class="language-plaintext highlighter-rouge">LONGFI~1.TXT</code>,
<code class="language-plaintext highlighter-rouge">LONGFI~2.TXT</code>), and the exact suffix depends on what already exists in the
directory at the time of file creation.</li>
</ul>

<h2 id="links-and-further-reading">Links and Further Reading</h2>

<ul>
  <li><a href="https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file">Naming Files, Paths, and Namespaces (Microsoft Docs)</a></li>
  <li><a href="https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation">Maximum Path Length Limitation (Microsoft Docs)</a></li>
  <li><a href="https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getshortpathnamew">GetShortPathName function (Microsoft Docs)</a></li>
  <li><a href="https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstfilew">FindFirstFileW function (Microsoft Docs)</a></li>
  <li><a href="https://web.archive.org/web/20181025124257/home.teleport.com/~brainy/lfn.htm">Long Filename Specification</a></li>
  <li><a href="https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html">The Definitive Guide on Win32 to NT Path Conversion</a></li>
  <li><a href="https://helgeklein.com/blog/why-disabling-the-creation-of-83-dos-file-names-will-not-improve-performance-or-will-it/">Why Disabling the Creation of 8.3 DOS File Names Will Not Improve Performance. Or Will It?</a></li>
</ul>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Working with files and directories on Windows, especially on network shares, often leads to issues with long paths. This post explains what Windows 8.3 short paths are, how they work in NTFS, the difference between local and UNC paths, why browsers and some tools fail with long network paths, and how to programmatically obtain short file names for both local and UNC files.]]></summary></entry><entry><title type="html">Backup (NuGet) Packages for All Azure DevOps Organization Feeds with PowerShell</title><link href="https://nietras.com/2025/05/15/azure-devops-backup-all-packages/" rel="alternate" type="text/html" title="Backup (NuGet) Packages for All Azure DevOps Organization Feeds with PowerShell" /><published>2025-05-15T00:00:00+00:00</published><updated>2025-05-15T00:00:00+00:00</updated><id>https://nietras.com/2025/05/15/azure-devops-backup-all-packages</id><content type="html" xml:base="https://nietras.com/2025/05/15/azure-devops-backup-all-packages/"><![CDATA[<p>This quick blog post demonstrates how to automate the backup of all packages
from all organization feeds on Azure DevOps using a scheduled pipeline and a
PowerShell script. The script leverages the <a href="https://learn.microsoft.com/en-us/rest/api/azure/devops/artifacts/artifact-details?view=azure-devops-rest-7.1">Azure DevOps REST
API</a>
to enumerate feeds and download every package version to a network share.</p>

<p>It handles skipping already downloaded packages and for our case can handle
checking several feeds and ~10,000 packages and versions in less than one minute
when no new packages found. No further details are provided here, but the script
is available below, I thought perhaps someone might find it useful as a
reference and the code is pretty straighforward.</p>

<p>Note that this only works for <em>organization</em> feeds not project feeds.</p>

<div class="language-yml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
</pre></td><td class="rouge-code"><pre><span class="na">trigger</span><span class="pi">:</span> <span class="s">none</span>

<span class="na">schedules</span><span class="pi">:</span>
  <span class="c1"># 01:00 UTC every Sunday</span>
  <span class="pi">-</span> <span class="na">cron</span><span class="pi">:</span> <span class="s2">"</span><span class="s">0</span><span class="nv"> </span><span class="s">1</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">0"</span>
    <span class="na">displayName</span><span class="pi">:</span> <span class="s">Weekly NuGet Backup</span>
    <span class="na">branches</span><span class="pi">:</span>
      <span class="na">include</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="s">main</span>
    <span class="na">always</span><span class="pi">:</span> <span class="no">true</span>

<span class="na">pool</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s1">'</span><span class="s">Default'</span>

<span class="na">variables</span><span class="pi">:</span>
  <span class="na">ORGANIZATION_NAME</span><span class="pi">:</span> <span class="s1">'</span><span class="s">YOURORGNAME'</span>
  <span class="na">NETWORK_PATH</span><span class="pi">:</span> <span class="s1">'</span><span class="s">\\YOURLOCALPATH'</span> <span class="c1"># UNC path to your network share</span>

<span class="na">steps</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">task</span><span class="pi">:</span> <span class="s">PowerShell@2</span>
    <span class="na">displayName</span><span class="pi">:</span> <span class="s">Download NuGet packages from all feeds</span>
    <span class="na">env</span><span class="pi">:</span>
      <span class="na">SYSTEM_ACCESSTOKEN</span><span class="pi">:</span> <span class="s">$(System.AccessToken)</span>
    <span class="na">inputs</span><span class="pi">:</span>
      <span class="na">targetType</span><span class="pi">:</span> <span class="s">inline</span>
      <span class="na">script</span><span class="pi">:</span> <span class="pi">|</span>
        <span class="s">$ErrorActionPreference = 'Stop'</span>

        <span class="s">$org = "$(ORGANIZATION_NAME)"</span>
        <span class="s">$networkBase = "$(NETWORK_PATH)"</span>
        <span class="s">$token = $env:SYSTEM_ACCESSTOKEN</span>
        <span class="s">$headers = @{ Authorization = "Bearer $token" }</span>

        <span class="s"># Get all feeds</span>
        <span class="s">$feedsUrl = "https://feeds.dev.azure.com/$org/_apis/packaging/feeds?api-version=7.1-preview.1"</span>
        <span class="s">Write-Host "Fetching all feeds from $feedsUrl"</span>
        <span class="s">$feedsResp = Invoke-RestMethod -Uri $feedsUrl -Headers $headers</span>
        <span class="s">$feeds = $feedsResp.value</span>

        <span class="s">Write-Host "Feeds count $($feeds.Count)"</span>
        <span class="s">if ($feeds.Count -eq 0) {</span>
          <span class="s">Write-Host "No feeds found."</span>
          <span class="s">exit 0</span>
        <span class="s">}</span>

        <span class="s">foreach ($feed in $feeds) {</span>
          <span class="s">$feedName = $feed.name</span>
          <span class="s">$feedId = $feed.id</span>
          <span class="s">$feedPath = Join-Path $networkBase $feedName</span>

          <span class="s">Write-Host "=================================================================="</span>
          <span class="s">Write-Host "====== '$($feed.name)'"</span>
          <span class="s">Write-Host "=================================================================="</span>

          <span class="s">$fetchBaseUrl = "https://feeds.dev.azure.com/$org/_apis/packaging/feeds/$feedId/packages?protocolType=NuGet&amp;api-version=7.1"</span>
          <span class="s">$downloadBaseUrl = "https://pkgs.dev.azure.com/$org/_apis/packaging/feeds/$feedId/nuget/packages/"</span>
          <span class="s">$pageSize = 50000</span>
          <span class="s">$skip = 0</span>
          <span class="s">$allPkgs = @()</span>
          <span class="s">$page = 1</span>

          <span class="s"># Fetch package names and versions from the feed</span>
          <span class="s">Write-Host "Fetching packages from feed '$feedName'..."</span>

          <span class="s">while ($true) {</span>
            <span class="s">$url = "$fetchBaseUrl&amp;includeAllVersions=true&amp;`$top=$pageSize&amp;`$skip=$skip"</span>
            <span class="s">Write-Host "Requesting page $($page) from $($url)"</span>
            <span class="s">$resp = Invoke-RestMethod -Uri $url -Headers $headers</span>

            <span class="s">if ($resp.value.Count -eq 0) {</span>
              <span class="s">break</span>
            <span class="s">}</span>

            <span class="s">$allPkgs += $resp.value</span>
            <span class="s">Write-Host "Fetched $($resp.value.Count) packages in page $($page)."</span>
            <span class="s">$skip += $resp.value.Count</span>
            <span class="s">$page++</span>
          <span class="s">}</span>

          <span class="s">Write-Host "Package names found $($allPkgs.Count) in feed '$feedName'"</span>

          <span class="s">if ($allPkgs.Count -eq 0) {</span>
            <span class="s">Write-Host "WARNING No packages found in feed '$feedName'."</span>
            <span class="s">continue</span>
          <span class="s">}</span>

          <span class="s"># Ensure the target directory exists</span>
          <span class="s">if (-not (Test-Path $feedPath)) {</span>
            <span class="s">Write-Host "Creating directory '$feedPath'..."</span>
            <span class="s">New-Item -ItemType Directory -Path $feedPath -Force | Out-Null</span>
          <span class="s">} else {</span>
            <span class="s">Write-Host "Directory '$feedPath' already exists."</span>
          <span class="s">}</span>

          <span class="s"># Download packages (skip already exists)</span>
          <span class="s">$totalPackagesCount = 0</span>
          <span class="s">foreach ($pkg in $allPkgs) {</span>
            <span class="s">Write-Host "====== '$($pkg.name)' versions found $($pkg.versions.Count) ======"</span>
            <span class="s">foreach ($ver in $pkg.versions) {</span>
              <span class="s">$nupkgFileName = "$($pkg.name).$($ver.version).nupkg"</span>
              <span class="s">$nupkgPath = Join-Path $feedPath $nupkgFileName</span>

              <span class="s">if (Test-Path $nupkgPath) {</span>
                <span class="s">Write-Host "Skipping '$($nupkgPath)' (already exists)"</span>
                <span class="s">continue</span>
              <span class="s">}</span>

              <span class="s">$downloadUrl = "$($downloadBaseUrl)$($pkg.name)/versions/$($ver.version)/content?api-version=7.1-preview"</span>
              <span class="s">Write-Host "Downloading '$($nupkgFileName)' to '$feedPath' from '$($downloadUrl)'"</span>
              <span class="s">Invoke-RestMethod -Uri $downloadUrl -Headers $headers -OutFile $nupkgPath</span>
            <span class="s">}</span>
            <span class="s">$totalPackagesCount += $pkg.versions.Count</span>
          <span class="s">}</span>
          <span class="s">Write-Host "Total packages found (already downloaded or downloaded) $($totalPackagesCount) in feed '$feedName'"</span>
        <span class="s">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[This quick blog post demonstrates how to automate the backup of all packages from all organization feeds on Azure DevOps using a scheduled pipeline and a PowerShell script. The script leverages the Azure DevOps REST API to enumerate feeds and download every package version to a network share.]]></summary></entry><entry><title type="html">Sep 0.10.0 - 21 GB/s CSV Parsing Using SIMD on AMD 9950X 🚀</title><link href="https://nietras.com/2025/05/09/sep-0-10-0/" rel="alternate" type="text/html" title="Sep 0.10.0 - 21 GB/s CSV Parsing Using SIMD on AMD 9950X 🚀" /><published>2025-05-09T00:00:00+00:00</published><updated>2025-05-09T00:00:00+00:00</updated><id>https://nietras.com/2025/05/09/sep-0.10.0</id><content type="html" xml:base="https://nietras.com/2025/05/09/sep-0-10-0/"><![CDATA[<p><a href="https://github.com/nietras/Sep/releases/tag/v0.10.0">Sep 0.10.0 was released April 22nd,
2025</a> with optimizations
for <a href="https://en.wikipedia.org/wiki/AVX-512">AVX-512</a> capable CPUs like the AMD
9950X (<a href="https://en.wikipedia.org/wiki/Zen_5">Zen 5</a>) and updated benchmarks
including the 9950X. Sep now achieves a staggering <strong>21 GB/s on the 9950X</strong> for
the low-level CSV parsing. 🚀 Before 0.10.0, Sep achieved ~18 GB/s on 9950X.</p>

<p>See <a href="https://github.com/nietras/Sep/releases/tag/v0.10.0">v0.10.0 release</a> for
all changes for the release, and <a href="https://github.com/nietras/Sep">Sep README on
GitHub</a> for full details.</p>

<p>In this blog post, I will dive into how .NET 9.0 machine code for AVX-512 is
sub-optimal and what changes were made to speed up Sep for AVX-512 by
circumventing this, showing interesting code and assembly along the way, so get
ready for SIMD C# code, x64 SIMD assembly and tons of benchmark numbers.</p>

<p>However, first let’s take a look at the progression of Sep’s performance from
early 0.1.0 to 0.10.0, from .NET 7.0 to .NET 9.0 and from AMD Ryzen 9 5950X (Zen
3) to 9950X (Zen 5), as I have also recently upgraded my work PC.</p>

<h2 id="sep-performance-progression">Sep Performance Progression</h2>

<p><img src="/images/2025-05-sep-0.10.0/sep-perf-progression-0.1.0-to-0.10.0.png" alt="Sep Perf Progression 0.1.0 To 0.10.0" /></p>

<p>The benchmark numbers above are for the package assets CSV data and the low
level parse <code class="language-plaintext highlighter-rouge">Rows</code> only scope, see <a href="https://github.com/nietras/Sep">Sep README on
GitHub</a> or code on GitHub for details on this.
Note that all numbers here are single-threaded and are also shown in the table
below. Note that there can be a few percentage points variation in the numbers,
so for a given release Sep might see minor regressions.</p>

<p>The main take away is that Sep has seen incremental improvements to performance
driven by both major (e.g. almost complete rewrite of <a href="/2023/08/07/sep-0-2-0/">internals in 0.2.0</a>) and minor code changes. While also
seeing improved performance on new .NET versions. And finally here showing
improvement for going from the AMD 5950X (<a href="https://en.wikipedia.org/wiki/Zen_3">Zen
3</a>) to AMD 9950X (<a href="https://en.wikipedia.org/wiki/Zen_5">Zen
5</a>). Hence, this showcases how software
together with hardware improvements can boost performance to the next level.</p>

<p>We can see Sep progressing:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">~ 7 GB/s</code> (0.1.0, 5950X and .NET 7.0)</li>
  <li><code class="language-plaintext highlighter-rouge">~12 GB/s</code> (0.3.0, 5950X and .NET 8.0)</li>
  <li><code class="language-plaintext highlighter-rouge">~13 GB/s</code> (0.6.0, 5950X and .NET 9.0)</li>
  <li><code class="language-plaintext highlighter-rouge">~18 GB/s</code> (0.9.0, 9950X and .NET 9.0)</li>
  <li><code class="language-plaintext highlighter-rouge">~21 GB/s</code> (0.10.0, 9950X and .NET 9.0)</li>
</ul>

<p>This is a staggering <strong>~3x</strong> improvement in just under 2 years since <a href="/2023/06/05/introducing-sep">Sep was
introduced June, 2023</a>.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: right">Sep</th>
      <th style="text-align: right">.NET</th>
      <th style="text-align: right">CPU</th>
      <th style="text-align: right">Rows</th>
      <th style="text-align: right">Mean [ms]</th>
      <th style="text-align: right">MB</th>
      <th style="text-align: right">MB/s</th>
      <th style="text-align: right">Ratio</th>
      <th style="text-align: right">ns/row</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: right">0.1.0</td>
      <td style="text-align: right">7.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">1000000</td>
      <td style="text-align: right">79.590</td>
      <td style="text-align: right">583</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">7335.3</code></td>
      <td style="text-align: right">1.00</td>
      <td style="text-align: right">79.6</td>
    </tr>
    <tr>
      <td style="text-align: right">0.2.0</td>
      <td style="text-align: right">7.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">1000000</td>
      <td style="text-align: right">57.280</td>
      <td style="text-align: right">583</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">10191.6</code></td>
      <td style="text-align: right">1.39</td>
      <td style="text-align: right">57.3</td>
    </tr>
    <tr>
      <td style="text-align: right">0.2.1</td>
      <td style="text-align: right">7.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">2.624</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">11120.2</code></td>
      <td style="text-align: right">1.52</td>
      <td style="text-align: right">52.5</td>
    </tr>
    <tr>
      <td style="text-align: right">0.3.0</td>
      <td style="text-align: right">7.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">2.537</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">11503.7</code></td>
      <td style="text-align: right">1.57</td>
      <td style="text-align: right">50.7</td>
    </tr>
    <tr>
      <td style="text-align: right">0.3.0</td>
      <td style="text-align: right">8.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">2.409</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">12111.6</code></td>
      <td style="text-align: right">1.65</td>
      <td style="text-align: right">48.2</td>
    </tr>
    <tr>
      <td style="text-align: right">0.4.0</td>
      <td style="text-align: right">8.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">2.319</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">12581.3</code></td>
      <td style="text-align: right">1.72</td>
      <td style="text-align: right">46.4</td>
    </tr>
    <tr>
      <td style="text-align: right">0.4.1</td>
      <td style="text-align: right">8.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">2.278</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">12811.2</code></td>
      <td style="text-align: right">1.75</td>
      <td style="text-align: right">45.6</td>
    </tr>
    <tr>
      <td style="text-align: right">0.5.0</td>
      <td style="text-align: right">8.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">2.326</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">12544.5</code></td>
      <td style="text-align: right">1.71</td>
      <td style="text-align: right">46.5</td>
    </tr>
    <tr>
      <td style="text-align: right">0.6.0</td>
      <td style="text-align: right">9.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">2.188</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">13339.9</code></td>
      <td style="text-align: right">1.82</td>
      <td style="text-align: right">43.8</td>
    </tr>
    <tr>
      <td style="text-align: right">0.9.0</td>
      <td style="text-align: right">9.0</td>
      <td style="text-align: right">5950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">2.230</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">13088.4</code></td>
      <td style="text-align: right">1.78</td>
      <td style="text-align: right">44.6</td>
    </tr>
    <tr>
      <td style="text-align: right">0.9.0</td>
      <td style="text-align: right">9.0</td>
      <td style="text-align: right">9950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">1.603</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">18202.7</code></td>
      <td style="text-align: right">2.48</td>
      <td style="text-align: right">32.1</td>
    </tr>
    <tr>
      <td style="text-align: right">0.10.0</td>
      <td style="text-align: right">9.0</td>
      <td style="text-align: right">9950X</td>
      <td style="text-align: right">50000</td>
      <td style="text-align: right">1.365</td>
      <td style="text-align: right">29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">21384.9</code></td>
      <td style="text-align: right">2.92</td>
      <td style="text-align: right">27.3</td>
    </tr>
  </tbody>
</table>

<p>The improvement from <code class="language-plaintext highlighter-rouge">5950X w. Sep 0.9.0</code> to <code class="language-plaintext highlighter-rouge">9950X w. Sep 0.10.0</code> is <strong>~1.6x</strong>
which is a pretty good improvement from Zen 3 to Zen 5. Note the 9950X has a 5.7
GHz boost frequency vs 4.9 GHz for 5950X, so this alone probably explains 1.2x.</p>

<h2 id="avx-512-code-generation-and-mask-register-issues">AVX-512 Code Generation and Mask Register Issues</h2>

<p>Sep has had support for AVX-512 since <a href="/2023/09/05/sep-0-2-3/">0.2.3</a> and back then I noted that:</p>

<blockquote>
  <p>different here is the use of the <a href="https://en.wikipedia.org/wiki/AVX-512">mask registers (<code class="language-plaintext highlighter-rouge">k1-k8</code>) introduced with
AVX-512</a>. However, .NET 8 does not have
explicit support for these and the code generation is a bit suboptimal, given
mask register are moved to normal registers each time. And then back.</p>
</blockquote>

<p>I did not have direct access to an AVX-512 capable CPU then, so I could not test
the performance of the AVX-512 in detail, but did verify it on the Xeon Silver
4316 which based on some quick tests showed the AVX-512 parser to be the fastest
on that CPU despite the issues with the mask registers.</p>

<h2 id="9950x-upgrade-and-avx-512-vs-avx2-performance">9950X Upgrade and AVX-512 vs AVX2 Performance</h2>

<p>Recently, I then upgraded from an AMD 5950X (Zen 3) CPU to an AMD 9950X (Zen 5)
CPU. Zen 3 does not support AVX-512, but Zen 5 does. One of the first things I
did on the new CPU was of course to run the Sep benchmarks, and this showed,
that Sep hit ~18 GB/s on the 9950X for the low-level parsing of CSV files. This
was great and ~1.4x faster than on the 5950X. A pretty good improvement from Zen
3 to Zen 5.</p>

<p>However, I still wanted to compare AVX-512 to
<a href="https://en.wikipedia.org/wiki/Advanced_Vector_Extensions">AVX2</a>. Sep has,
unofficial, support for overriding the default selected parser via an
environment variable. This is also used for testing all possible parsers fully
no matter which parser is selected as best. A bit surprisingly the AVX2 parser
on 9950X hit ~20GB/s! That is, it was better than the AVX-512 based parser by
~10%, which is pretty significant for Sep. Hence, it would seem the mask
register issue was still an issue.</p>

<h2 id="parser-codeassembly-comparison-and-new-avx-512-to-256-parser">Parser Code/Assembly Comparison and New AVX-512-to-256 Parser</h2>

<p>Let’s examine the code and assembly (via
<a href="https://github.com/EgorBo/Disasmo">Disasmo</a>) for the AVX-512-based parser
(0.9.0), a tweaked version (0.10.0), compare it to the AVX2-based parser, and
finally review a new AVX-512-to-256-based parser that circumvents the mask
register issue and is even faster than the AVX2-based parser, achieving ~21 GB/s
as shown above.</p>

<h3 id="parse-methods">Parse Methods</h3>

<p>All parsers in Sep follow the same basic layout as shown below and have a single
generic <code class="language-plaintext highlighter-rouge">Parse</code> method to support both parsing for when handling quotes
(<code class="language-plaintext highlighter-rouge">ParseColInfos</code>) and when not (<code class="language-plaintext highlighter-rouge">ParseColEnds</code>). The former requires keeping
track of more state, and is slightly slower.</p>

<p>In Sep the <code class="language-plaintext highlighter-rouge">Parse</code> method is marked with <code class="language-plaintext highlighter-rouge">AggressiveInlining</code> to ensure it is
inlined, which means one can in principle go to <code class="language-plaintext highlighter-rouge">ParseColEnds</code> in Visual Studio
with Disasmo installed and hit <code class="language-plaintext highlighter-rouge">ALT + SHIFT + D</code>. Unfortunately, for some reason
this does not work currently unless you change the parser from a <code class="language-plaintext highlighter-rouge">class</code> to
<code class="language-plaintext highlighter-rouge">struct</code>. So readers are aware if they want to follow along. See GitHub issue
<a href="https://github.com/EgorBo/Disasmo/issues/68">Empty disassembly for method with inlined generic method (used to
work)</a> for more.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="rouge-code"><pre><span class="k">public</span> <span class="k">void</span> <span class="nf">ParseColEnds</span><span class="p">(</span><span class="n">SepReaderState</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Parse</span><span class="p">&lt;</span><span class="kt">int</span><span class="p">,</span> <span class="n">SepColEndMethods</span><span class="p">&gt;(</span><span class="n">s</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">public</span> <span class="k">void</span> <span class="nf">ParseColInfos</span><span class="p">(</span><span class="n">SepReaderState</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Parse</span><span class="p">&lt;</span><span class="n">SepColInfo</span><span class="p">,</span> <span class="n">SepColInfoMethods</span><span class="p">&gt;(</span><span class="n">s</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">void</span> <span class="n">Parse</span><span class="p">&lt;</span><span class="n">TColInfo</span><span class="p">,</span> <span class="n">TColInfoMethods</span><span class="p">&gt;(</span><span class="n">SepReaderState</span> <span class="n">s</span><span class="p">)</span>
    <span class="k">where</span> <span class="n">TColInfo</span> <span class="p">:</span> <span class="n">unmanaged</span>
    <span class="k">where</span> <span class="n">TColInfoMethods</span> <span class="p">:</span> <span class="n">ISepColInfoMethods</span><span class="p">&lt;</span><span class="n">TColInfo</span><span class="p">&gt;</span>
<span class="p">{</span>
    <span class="c1">// Implementation</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>To recap, parsing in Sep is done on a span of <code class="language-plaintext highlighter-rouge">char</code>s (from an array) e.g. 16K
and outputs a set of column end indices and row column counts for that span.
This ensures parsing is done on a significant yet small enough chunk of data to
fit in the CPU cache, and facilitates efficient multi-threading after too.</p>

<p>Parsing of the span is basically then just a loop where one or two SIMD
registers (e.g. <code class="language-plaintext highlighter-rouge">Vector256</code>) are loaded (as unsigned 16-bit integers e.g.
<code class="language-plaintext highlighter-rouge">ushort</code>) and converted to <code class="language-plaintext highlighter-rouge">byte</code> SIMD register and then compared to the special
characters (e.g. <code class="language-plaintext highlighter-rouge">\n</code>, <code class="language-plaintext highlighter-rouge">\r</code>, <code class="language-plaintext highlighter-rouge">"</code>, <code class="language-plaintext highlighter-rouge">;</code>) using SIMD compare instructions. The
compare results are then converted to bit masks and each set bit in that mask is
sequentially parsed after.</p>

<p>The interesting part here is the SIMD code and how it’s JIT’ed to machine code
on .NET and how efficient that is. Below this specific code and assembly is
shown for the parsers mentioned before.</p>

<h3 id="sepparseravx512packcmpormovemasktzcntcs-090"><code class="language-plaintext highlighter-rouge">SepParserAvx512PackCmpOrMoveMaskTzcnt.cs</code> (0.9.0)</h3>

<p>A breakdown of the below code snippet:</p>

<ol>
  <li><strong>Data Loading and Packing</strong>:
    <ul>
      <li>Two 16-bit integer vectors (<code class="language-plaintext highlighter-rouge">v0</code> and <code class="language-plaintext highlighter-rouge">v1</code>) are read from memory using
unaligned reads.</li>
      <li>These vectors are packed into a single byte vector using
<code class="language-plaintext highlighter-rouge">PackUnsignedSaturate</code>, ensuring values fit within the byte range.</li>
      <li>For AVX-512 this means loading two 512-bit SIMD registers each with
32 <code class="language-plaintext highlighter-rouge">char</code>s and then packing it to single 512-bit SIMD register with 64
bytes. This means 64 <code class="language-plaintext highlighter-rouge">char</code>s are handled in each loop.</li>
    </ul>
  </li>
  <li><strong>Reordering Packed Data</strong>:
    <ul>
      <li>The packed data is interleaved, so a permutation operation
(<code class="language-plaintext highlighter-rouge">PermuteVar8x64</code>) is applied to reorder the bytes into the correct
sequence.</li>
    </ul>
  </li>
  <li><strong>Character Comparisons</strong>:
    <ul>
      <li>The byte vector is compared against specific characters (e.g. <code class="language-plaintext highlighter-rouge">\n</code>, <code class="language-plaintext highlighter-rouge">\r</code>,
<code class="language-plaintext highlighter-rouge">"</code>, <code class="language-plaintext highlighter-rouge">;</code>) using SIMD equality operations. These comparisons identify
special characters relevant to CSV parsing.</li>
    </ul>
  </li>
  <li><strong>Combine Comparison Results</strong>:
    <ul>
      <li>The results of the comparisons are combined using logical operations.</li>
    </ul>
  </li>
  <li><strong>Bitmask Generation and Check</strong>:
    <ul>
      <li>A <code class="language-plaintext highlighter-rouge">MoveMask</code> operation extracts a bitmask from the SIMD register, allowing
for a quick check to skip further processing if no special characters are
found.</li>
    </ul>
  </li>
</ol>

<p>All parsers follow the same basic approach, so this description will be omitted
going forward. Note how <code class="language-plaintext highlighter-rouge">ISA</code> and <code class="language-plaintext highlighter-rouge">Vec</code> are aliases used to make the different
parsers more similar which makes it easier to compare and maintain the different
parsers.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="rouge-code"><pre><span class="kt">var</span> <span class="n">v0</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">v1</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">,</span> <span class="n">VecUI8</span><span class="p">.</span><span class="n">Count</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">packed</span> <span class="p">=</span> <span class="n">ISA</span><span class="p">.</span><span class="nf">PackUnsignedSaturate</span><span class="p">(</span><span class="n">v0</span><span class="p">,</span> <span class="n">v1</span><span class="p">);</span>
<span class="c1">// Pack interleaves the two vectors need to permute them back</span>
<span class="kt">var</span> <span class="n">permuteIndices</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Create</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">2L</span><span class="p">,</span> <span class="m">4L</span><span class="p">,</span> <span class="m">6L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">3L</span><span class="p">,</span> <span class="m">5L</span><span class="p">,</span> <span class="m">7L</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">bytes</span> <span class="p">=</span> <span class="n">ISA</span><span class="p">.</span><span class="nf">PermuteVar8x64</span><span class="p">(</span><span class="n">packed</span><span class="p">.</span><span class="nf">AsInt64</span><span class="p">(),</span> <span class="n">permuteIndices</span><span class="p">).</span><span class="nf">AsByte</span><span class="p">();</span>

<span class="kt">var</span> <span class="n">nlsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">nls</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">crsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">crs</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">qtsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">qts</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">spsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">sps</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">lineEndings</span> <span class="p">=</span> <span class="n">nlsEq</span> <span class="p">|</span> <span class="n">crsEq</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators</span> <span class="p">=</span> <span class="n">spsEq</span> <span class="p">|</span> <span class="n">lineEndings</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">specialChars</span> <span class="p">=</span> <span class="n">lineEndingsSeparators</span> <span class="p">|</span> <span class="n">qtsEq</span><span class="p">;</span>

<span class="c1">// Optimize for the case of no special character</span>
<span class="kt">var</span> <span class="n">specialCharMask</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">specialChars</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">specialCharMask</span> <span class="p">!=</span> <span class="m">0u</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Assembly is shown below for a 64-bit CPU with AVX-512 support e.g. the 9950X.
What is most interesting here is that each compare <code class="language-plaintext highlighter-rouge">Vec.Equals</code> ends up being
two instructions <code class="language-plaintext highlighter-rouge">vpcmpeqb</code> (<code class="language-plaintext highlighter-rouge">c</code>o<code class="language-plaintext highlighter-rouge">mp</code>are <code class="language-plaintext highlighter-rouge">eq</code>ual <code class="language-plaintext highlighter-rouge">b</code>ytes) and <code class="language-plaintext highlighter-rouge">vpmovm2b</code>
(<code class="language-plaintext highlighter-rouge">mov</code>e <code class="language-plaintext highlighter-rouge">m</code>ask to <code class="language-plaintext highlighter-rouge">b</code>yte). That is, there is a lot of going from the mask
register, e.g. <code class="language-plaintext highlighter-rouge">k1</code>, to a normal 512-bit register, e.g. <code class="language-plaintext highlighter-rouge">zmm5</code>, and back again.</p>

<p>Note that the C# code does not deal with vector mask registers directly. This is
not supported in .NET and hence it is the JIT that is responsible for code
generation around this. Unfortunately, here it does not do a good job and the
AVX-512 is not as fast as it could be.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="rouge-code"><pre><span class="nf">mov</span>      <span class="nb">edi</span><span class="p">,</span> <span class="nb">r9d</span>
<span class="nf">lea</span>      <span class="nb">rdi</span><span class="p">,</span> <span class="nv">bword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nv">r10</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="nb">rdi</span><span class="p">]</span>
<span class="nf">vmovups</span>  <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span><span class="p">]</span>
<span class="nf">vpackuswb</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x40</span><span class="p">]</span>
<span class="nf">vmovups</span>  <span class="nv">zmm5</span><span class="p">,</span> <span class="nv">zmmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nv">reloc</span> <span class="err">@</span><span class="nv">RWD00</span><span class="p">]</span>
<span class="nf">vpermq</span>   <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm5</span><span class="p">,</span> <span class="nv">zmm4</span>
<span class="nf">vpcmpeqb</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm0</span>
<span class="nf">vpmovm2b</span> <span class="nv">zmm5</span><span class="p">,</span> <span class="nv">k1</span>
<span class="nf">vpcmpeqb</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm1</span>
<span class="nf">vpmovm2b</span> <span class="nv">zmm16</span><span class="p">,</span> <span class="nv">k1</span>
<span class="nf">vpcmpeqb</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm2</span>
<span class="nf">vpmovm2b</span> <span class="nv">zmm17</span><span class="p">,</span> <span class="nv">k1</span>
<span class="nf">vpcmpeqb</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm3</span>
<span class="nf">vpmovm2b</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">k1</span>
<span class="nf">vpternlogd</span> <span class="nv">zmm5</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm16</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span>
<span class="nf">vpord</span>    <span class="nv">zmm16</span><span class="p">,</span> <span class="nv">zmm5</span><span class="p">,</span> <span class="nv">zmm17</span>
<span class="nf">vpmovb2m</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm16</span>
<span class="nf">kmovq</span>    <span class="nv">r15</span><span class="p">,</span> <span class="nv">k1</span>
<span class="nf">test</span>     <span class="nv">r15</span><span class="p">,</span> <span class="nv">r15</span>
<span class="nf">je</span>       <span class="nv">G_M000_IG03</span>
<span class="nf">vpmovb2m</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span>
<span class="nf">kmovq</span>    <span class="nv">r13</span><span class="p">,</span> <span class="nv">k1</span>
<span class="nf">lea</span>      <span class="nv">r12</span><span class="p">,</span> <span class="p">[</span><span class="nv">r15</span><span class="o">+</span><span class="nv">r8</span><span class="p">]</span>
<span class="nf">cmp</span>      <span class="nv">r13</span><span class="p">,</span> <span class="nv">r12</span>
<span class="nf">je</span>       <span class="nv">G_M000_IG43</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h3 id="sepparseravx512packcmpormovemasktzcntcs-0100"><code class="language-plaintext highlighter-rouge">SepParserAvx512PackCmpOrMoveMaskTzcnt.cs</code> (0.10.0)</h3>

<p>To address the above code generation issues, in Sep 0.10.0 I changed the AVX-512
based parser by moving the <code class="language-plaintext highlighter-rouge">MoveMask</code> calls earlier to avoid the whole mask
register back and forth as shown below. For other parsers, <code class="language-plaintext highlighter-rouge">MoveMask</code> is only
called when necessary to reduce instructions in the “happy”/skip path.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="rouge-code"><pre><span class="kt">var</span> <span class="n">v0</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">v1</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">,</span> <span class="n">VecUI8</span><span class="p">.</span><span class="n">Count</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">packed</span> <span class="p">=</span> <span class="n">ISA</span><span class="p">.</span><span class="nf">PackUnsignedSaturate</span><span class="p">(</span><span class="n">v0</span><span class="p">,</span> <span class="n">v1</span><span class="p">);</span>
<span class="c1">// Pack interleaves the two vectors need to permute them back</span>
<span class="kt">var</span> <span class="n">permuteIndices</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Create</span><span class="p">(</span><span class="m">0L</span><span class="p">,</span> <span class="m">2L</span><span class="p">,</span> <span class="m">4L</span><span class="p">,</span> <span class="m">6L</span><span class="p">,</span> <span class="m">1L</span><span class="p">,</span> <span class="m">3L</span><span class="p">,</span> <span class="m">5L</span><span class="p">,</span> <span class="m">7L</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">bytes</span> <span class="p">=</span> <span class="n">ISA</span><span class="p">.</span><span class="nf">PermuteVar8x64</span><span class="p">(</span><span class="n">packed</span><span class="p">.</span><span class="nf">AsInt64</span><span class="p">(),</span> <span class="n">permuteIndices</span><span class="p">).</span><span class="nf">AsByte</span><span class="p">();</span>

<span class="kt">var</span> <span class="n">nlsEq</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">nls</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">crsEq</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">crs</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">qtsEq</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">qts</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">spsEq</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">sps</span><span class="p">));</span>

<span class="kt">var</span> <span class="n">lineEndings</span> <span class="p">=</span> <span class="n">nlsEq</span> <span class="p">|</span> <span class="n">crsEq</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators</span> <span class="p">=</span> <span class="n">spsEq</span> <span class="p">|</span> <span class="n">lineEndings</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">specialChars</span> <span class="p">=</span> <span class="n">lineEndingsSeparators</span> <span class="p">|</span> <span class="n">qtsEq</span><span class="p">;</span>

<span class="c1">// Optimize for the case of no special character</span>
<span class="kt">var</span> <span class="n">specialCharMask</span> <span class="p">=</span> <span class="n">specialChars</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">specialCharMask</span> <span class="p">!=</span> <span class="m">0u</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This improves the assembly for the parser quite a bit as can be seen below.
Basically, less instructions. We are still going to mask register to normal
register but at least only once.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="rouge-code"><pre> <span class="nf">mov</span>      <span class="nb">edi</span><span class="p">,</span> <span class="nb">r9d</span>
 <span class="nf">lea</span>      <span class="nb">rdi</span><span class="p">,</span> <span class="nv">bword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nv">r10</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="nb">rdi</span><span class="p">]</span>
 <span class="nf">vmovups</span>  <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span><span class="p">]</span>
 <span class="nf">vpackuswb</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x40</span><span class="p">]</span>
 <span class="nf">vmovups</span>  <span class="nv">zmm5</span><span class="p">,</span> <span class="nv">zmmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nv">reloc</span> <span class="err">@</span><span class="nv">RWD00</span><span class="p">]</span>
 <span class="nf">vpermq</span>   <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm5</span><span class="p">,</span> <span class="nv">zmm4</span>
 <span class="nf">vpcmpeqb</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm0</span>
 <span class="nf">kmovq</span>    <span class="nv">r15</span><span class="p">,</span> <span class="nv">k1</span>
 <span class="nf">vpcmpeqb</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm1</span>
 <span class="nf">kmovq</span>    <span class="nv">r13</span><span class="p">,</span> <span class="nv">k1</span>
 <span class="nf">vpcmpeqb</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm2</span>
 <span class="nf">kmovq</span>    <span class="nv">r12</span><span class="p">,</span> <span class="nv">k1</span>
 <span class="nf">vpcmpeqb</span> <span class="nv">k1</span><span class="p">,</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm3</span>
 <span class="nf">kmovq</span>    <span class="nb">rcx</span><span class="p">,</span> <span class="nv">k1</span>
 <span class="nf">or</span>       <span class="nv">r15</span><span class="p">,</span> <span class="nb">rcx</span>
 <span class="nf">or</span>       <span class="nv">r15</span><span class="p">,</span> <span class="nv">r13</span>
 <span class="nf">or</span>       <span class="nv">r12</span><span class="p">,</span> <span class="nv">r15</span>
 <span class="nf">je</span>       <span class="nv">SHORT</span> <span class="nv">G_M000_IG03</span>
 <span class="nf">mov</span>      <span class="nv">r13</span><span class="p">,</span> <span class="nb">rcx</span>
 <span class="nf">lea</span>      <span class="nb">rcx</span><span class="p">,</span> <span class="p">[</span><span class="nv">r12</span><span class="o">+</span><span class="nv">r8</span><span class="p">]</span>
 <span class="nf">cmp</span>      <span class="nv">r13</span><span class="p">,</span> <span class="nb">rcx</span>
 <span class="nf">je</span>       <span class="nv">G_M000_IG43</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h3 id="sepparseravx2packcmpormovemasktzcntcs-0100"><code class="language-plaintext highlighter-rouge">SepParserAvx2PackCmpOrMoveMaskTzcnt.cs</code> (0.10.0)</h3>

<p>Let’s compare the AVX-512 to the AVX2 based parser. C# code is shown below.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="rouge-code"><pre><span class="kt">var</span> <span class="n">v0</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">v1</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="nf">Add</span><span class="p">(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">,</span> <span class="n">VecUI8</span><span class="p">.</span><span class="n">Count</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">packed</span> <span class="p">=</span> <span class="n">ISA</span><span class="p">.</span><span class="nf">PackUnsignedSaturate</span><span class="p">(</span><span class="n">v0</span><span class="p">,</span> <span class="n">v1</span><span class="p">);</span>
<span class="c1">// Pack interleaves the two vectors need to permute them back</span>
<span class="kt">var</span> <span class="n">bytes</span> <span class="p">=</span> <span class="n">ISA</span><span class="p">.</span><span class="nf">Permute4x64</span><span class="p">(</span><span class="n">packed</span><span class="p">.</span><span class="nf">AsInt64</span><span class="p">(),</span> <span class="m">0</span><span class="n">b_11_01_10_00</span><span class="p">).</span><span class="nf">AsByte</span><span class="p">();</span>

<span class="kt">var</span> <span class="n">nlsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">nls</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">crsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">crs</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">qtsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">qts</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">spsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">sps</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">lineEndings</span> <span class="p">=</span> <span class="n">nlsEq</span> <span class="p">|</span> <span class="n">crsEq</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators</span> <span class="p">=</span> <span class="n">spsEq</span> <span class="p">|</span> <span class="n">lineEndings</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">specialChars</span> <span class="p">=</span> <span class="n">lineEndingsSeparators</span> <span class="p">|</span> <span class="n">qtsEq</span><span class="p">;</span>

<span class="c1">// Optimize for the case of no special character</span>
<span class="kt">var</span> <span class="n">specialCharMask</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">specialChars</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">specialCharMask</span> <span class="p">!=</span> <span class="m">0u</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The assembly below is, however, clearly more straightforward as there are no mask
registers involved. This explains why the AVX2 based parser is faster than the
old (0.9.0) AVX-512 based parser.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="rouge-code"><pre><span class="nf">mov</span>      <span class="nb">edi</span><span class="p">,</span> <span class="nb">r9d</span>
<span class="nf">lea</span>      <span class="nb">rdi</span><span class="p">,</span> <span class="nv">bword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nv">r10</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="nb">rdi</span><span class="p">]</span>
<span class="nf">vmovups</span>  <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span><span class="p">]</span>
<span class="nf">vpackuswb</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x20</span><span class="p">]</span>
<span class="nf">vpermq</span>   <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="o">-</span><span class="mi">40</span>
<span class="nf">vpcmpeqb</span> <span class="nv">ymm5</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm0</span>
<span class="nf">vpcmpeqb</span> <span class="nv">ymm6</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm1</span>
<span class="nf">vpcmpeqb</span> <span class="nv">ymm7</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm2</span>
<span class="nf">vpcmpeqb</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm3</span>
<span class="nf">vpternlogd</span> <span class="nv">ymm5</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm6</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span>
<span class="nf">vpor</span>     <span class="nv">ymm6</span><span class="p">,</span> <span class="nv">ymm5</span><span class="p">,</span> <span class="nv">ymm7</span>
<span class="nf">vpmovmskb</span> <span class="nb">r15d</span><span class="p">,</span> <span class="nv">ymm6</span>
<span class="nf">mov</span>      <span class="nb">r15d</span><span class="p">,</span> <span class="nb">r15d</span>
<span class="nf">test</span>     <span class="nv">r15</span><span class="p">,</span> <span class="nv">r15</span>
<span class="nf">je</span>       <span class="nv">SHORT</span> <span class="nv">G_M000_IG03</span>
<span class="nf">vpmovmskb</span> <span class="nb">r13d</span><span class="p">,</span> <span class="nv">ymm4</span>
<span class="nf">mov</span>      <span class="nb">r13d</span><span class="p">,</span> <span class="nb">r13d</span>
<span class="nf">lea</span>      <span class="nv">r12</span><span class="p">,</span> <span class="p">[</span><span class="nv">r15</span><span class="o">+</span><span class="nv">r8</span><span class="p">]</span>
<span class="nf">cmp</span>      <span class="nv">r13</span><span class="p">,</span> <span class="nv">r12</span>
<span class="nf">je</span>       <span class="nv">G_M000_IG43</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h3 id="sepparseravx512to256cmpormovemasktzcntcs-0100"><code class="language-plaintext highlighter-rouge">SepParserAvx512To256CmpOrMoveMaskTzcnt.cs</code> (0.10.0)</h3>

<p>Given that even the tweaked AVX-512 (0.10.0) based parser had issues with mask
registers, I kept thinking that perhaps there was a more straightforward way to
do this, and then after some searching and unfruitful discussions with LLMs I
figured out that one could just use AVX-512 instructions for loading the <code class="language-plaintext highlighter-rouge">char</code>s
and then convert the 16-bit to 8-bit bytes saturated as a 256-bit register,
avoiding the 512-bit mask registers, by using
<code class="language-plaintext highlighter-rouge">ConvertToVector256ByteWithSaturation</code> (<code class="language-plaintext highlighter-rouge">vpmovuswb</code>) as shown below. This “only”
parses 32 <code class="language-plaintext highlighter-rouge">char</code>s at a time, but it is much simpler and avoids the mask register
issue.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="rouge-code"><pre><span class="kt">var</span> <span class="n">v</span> <span class="p">=</span> <span class="n">ReadUnaligned</span><span class="p">&lt;</span><span class="n">VecUI16</span><span class="p">&gt;(</span><span class="k">ref</span> <span class="n">byteRef</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">bytes</span> <span class="p">=</span> <span class="n">ISA</span><span class="p">.</span><span class="nf">ConvertToVector256ByteWithSaturation</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">nlsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">nls</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">crsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">crs</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">qtsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">qts</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">spsEq</span> <span class="p">=</span> <span class="n">Vec</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">sps</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">lineEndings</span> <span class="p">=</span> <span class="n">nlsEq</span> <span class="p">|</span> <span class="n">crsEq</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">lineEndingsSeparators</span> <span class="p">=</span> <span class="n">spsEq</span> <span class="p">|</span> <span class="n">lineEndings</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">specialChars</span> <span class="p">=</span> <span class="n">lineEndingsSeparators</span> <span class="p">|</span> <span class="n">qtsEq</span><span class="p">;</span>

<span class="c1">// Optimize for the case of no special character</span>
<span class="kt">var</span> <span class="n">specialCharMask</span> <span class="p">=</span> <span class="nf">MoveMask</span><span class="p">(</span><span class="n">specialChars</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">specialCharMask</span> <span class="p">!=</span> <span class="m">0u</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The assembly then is much simpler and direct (closer to AVX2) and not only
avoids the mask register issues but also has more straightforward saturated
conversion since no permutation is needed as the packed data is already in order
just in the <code class="language-plaintext highlighter-rouge">ymm4</code> register (that is the 256-bit part of <code class="language-plaintext highlighter-rouge">zmm4</code>).</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="rouge-code"><pre><span class="nf">mov</span>      <span class="nb">edi</span><span class="p">,</span> <span class="nb">r9d</span>
<span class="nf">lea</span>      <span class="nb">rdi</span><span class="p">,</span> <span class="nv">bword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nv">r10</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="nb">rdi</span><span class="p">]</span>
<span class="nf">vmovups</span>  <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmmword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span><span class="p">]</span>
<span class="nf">vpmovuswb</span> <span class="nv">zmm4</span><span class="p">,</span> <span class="nv">zmm4</span>
<span class="nf">vpcmpeqb</span> <span class="nv">ymm5</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm0</span>
<span class="nf">vpcmpeqb</span> <span class="nv">ymm6</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm1</span>
<span class="nf">vpcmpeqb</span> <span class="nv">ymm7</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm2</span>
<span class="nf">vpcmpeqb</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm3</span>
<span class="nf">vpternlogd</span> <span class="nv">ymm5</span><span class="p">,</span> <span class="nv">ymm4</span><span class="p">,</span> <span class="nv">ymm6</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span>
<span class="nf">vpor</span>     <span class="nv">ymm6</span><span class="p">,</span> <span class="nv">ymm5</span><span class="p">,</span> <span class="nv">ymm7</span>
<span class="nf">vpmovmskb</span> <span class="nb">r15d</span><span class="p">,</span> <span class="nv">ymm6</span>
<span class="nf">mov</span>      <span class="nb">r15d</span><span class="p">,</span> <span class="nb">r15d</span>
<span class="nf">test</span>     <span class="nv">r15</span><span class="p">,</span> <span class="nv">r15</span>
<span class="nf">je</span>       <span class="nv">SHORT</span> <span class="nv">G_M000_IG03</span>
<span class="nf">vpmovmskb</span> <span class="nb">r13d</span><span class="p">,</span> <span class="nv">ymm4</span>
<span class="nf">mov</span>      <span class="nb">r13d</span><span class="p">,</span> <span class="nb">r13d</span>
<span class="nf">lea</span>      <span class="nv">r12</span><span class="p">,</span> <span class="p">[</span><span class="nv">r15</span><span class="o">+</span><span class="nv">r8</span><span class="p">]</span>
<span class="nf">cmp</span>      <span class="nv">r13</span><span class="p">,</span> <span class="nv">r12</span>
<span class="nf">je</span>       <span class="nv">G_M000_IG43</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>And this is what brings Sep parsing up to a staggering 21 GB/s on the 9950X! 🚀</p>

<h3 id="all-parsers-benchmarks">All Parsers Benchmarks</h3>

<p>Finally, given all the parsers available in Sep I have added a benchmark, that
uses the aforementioned environment variable, to run all parsers and compare
their performance on the low level row parsing to better gauge their individual
performance on the same CPU. Here the AMD 9950X.</p>

<p>The new AVX-512-to-256 parser is the fastest parser of all hitting ~21.5 GB/s,
but the Vector256/AVX2 based parsers are not far behind (about 5%).
<code class="language-plaintext highlighter-rouge">SepParserVector256NrwCmpExtMsbTzcnt</code> is the cross-platform <code class="language-plaintext highlighter-rouge">Vector256</code> based
parser and it is notably now on par with the AVX2, but note how the other
cross-platform <code class="language-plaintext highlighter-rouge">Vector128</code> and <code class="language-plaintext highlighter-rouge">Vector512</code> based parsers are not (still fast but
5-10% slower), and even worse that the <code class="language-plaintext highlighter-rouge">Vector512</code> one is slower than the
<code class="language-plaintext highlighter-rouge">Vector128</code>.</p>

<p><code class="language-plaintext highlighter-rouge">SepParserIndexOfAny</code> is far behind, and should make it clear that any ideas
that this could be used to compete with Sep are not realistic. 😉 <code class="language-plaintext highlighter-rouge">Vector64</code> is
not accelerated on the 9950X and therefore very slow. It’s just there for
completeness.</p>

<table>
  <thead>
    <tr>
      <th>Parser</th>
      <th style="text-align: right">MB/s</th>
      <th style="text-align: right">ns/row</th>
      <th style="text-align: right">Mean</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>SepParserAvx512To256CmpOrMoveMaskTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">21597.7</code></td>
      <td style="text-align: right">27.0</td>
      <td style="text-align: right">1.351 ms</td>
    </tr>
    <tr>
      <td>SepParserVector256NrwCmpExtMsbTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">20608.5</code></td>
      <td style="text-align: right">28.3</td>
      <td style="text-align: right">1.416 ms</td>
    </tr>
    <tr>
      <td>SepParserAvx2PackCmpOrMoveMaskTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">20599.3</code></td>
      <td style="text-align: right">28.3</td>
      <td style="text-align: right">1.417 ms</td>
    </tr>
    <tr>
      <td>SepParserAvx512PackCmpOrMoveMaskTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">19944.3</code></td>
      <td style="text-align: right">29.3</td>
      <td style="text-align: right">1.463 ms</td>
    </tr>
    <tr>
      <td>SepParserAvx256To128CmpOrMoveMaskTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">19465.5</code></td>
      <td style="text-align: right">30.0</td>
      <td style="text-align: right">1.499 ms</td>
    </tr>
    <tr>
      <td>SepParserSse2PackCmpOrMoveMaskTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">19312.5</code></td>
      <td style="text-align: right">30.2</td>
      <td style="text-align: right">1.511 ms</td>
    </tr>
    <tr>
      <td>SepParserVector128NrwCmpExtMsbTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">18252.1</code></td>
      <td style="text-align: right">32.0</td>
      <td style="text-align: right">1.599 ms</td>
    </tr>
    <tr>
      <td>SepParserVector512NrwCmpExtMsbTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">18067.4</code></td>
      <td style="text-align: right">32.3</td>
      <td style="text-align: right">1.615 ms</td>
    </tr>
    <tr>
      <td>SepParserIndexOfAny</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">2787.0</code></td>
      <td style="text-align: right">209.4</td>
      <td style="text-align: right">10.471 ms</td>
    </tr>
    <tr>
      <td>SepParserVector64NrwCmpExtMsbTzcnt</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">459.9</code></td>
      <td style="text-align: right">1268.9</td>
      <td style="text-align: right">63.446 ms</td>
    </tr>
  </tbody>
</table>

<h3 id="top-level-5950x-vs-9950x-benchmarks">Top Level 5950X vs 9950X Benchmarks</h3>

<p>Finally, the table below shows the top level benchmarks for the 5950X and 9950X
CPUs for the package assets and floats data.</p>

<p>Note how on the 9950X the one million package assets rows are parsed in just 72
ms for <code class="language-plaintext highlighter-rouge">Sep_MT</code> (multi-threaded Sep) compared to 119 ms on the 5950X. Or 8 GB/s
on 9950X vs 4.9 GB/s on 5950X. <strong>~8 GB/s!</strong> 🚀</p>

<p>Similarly, for floats Sep can parse 8 GB/s of floating point CSV data
multi-threaded. <strong>~8 GB/s</strong>! 🌪</p>

<p>That’s, about 1.5x-1.6x improvement, similarly to the low level benchmarks,
going from 5950X to 9950X. That’s significant generational improvements to CPU
performance. Kudos to AMD and TSMC.</p>

<h5 id="package-assets-5950x">Package Assets 5950X</h5>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Rows</th>
      <th style="text-align: right">Mean</th>
      <th style="text-align: right">Ratio</th>
      <th style="text-align: right">MB/s</th>
      <th style="text-align: right">ns/row</th>
      <th style="text-align: right">Allocated</th>
      <th style="text-align: right">Alloc Ratio</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Sep</td>
      <td>1000000</td>
      <td style="text-align: right">432.887 ms</td>
      <td style="text-align: right">1.00</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">1348.6</code></td>
      <td style="text-align: right">432.9</td>
      <td style="text-align: right">260.41 MB</td>
      <td style="text-align: right">1.00</td>
    </tr>
    <tr>
      <td>Sep_MT</td>
      <td>1000000</td>
      <td style="text-align: right">119.430 ms</td>
      <td style="text-align: right">0.28</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">4888.1</code></td>
      <td style="text-align: right">119.4</td>
      <td style="text-align: right">261.39 MB</td>
      <td style="text-align: right">1.00</td>
    </tr>
    <tr>
      <td>Sylvan</td>
      <td>1000000</td>
      <td style="text-align: right">559.550 ms</td>
      <td style="text-align: right">1.29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">1043.3</code></td>
      <td style="text-align: right">559.6</td>
      <td style="text-align: right">260.57 MB</td>
      <td style="text-align: right">1.00</td>
    </tr>
    <tr>
      <td>ReadLine_</td>
      <td>1000000</td>
      <td style="text-align: right">573.637 ms</td>
      <td style="text-align: right">1.33</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">1017.7</code></td>
      <td style="text-align: right">573.6</td>
      <td style="text-align: right">1991.05 MB</td>
      <td style="text-align: right">7.65</td>
    </tr>
    <tr>
      <td>CsvHelper</td>
      <td>1000000</td>
      <td style="text-align: right">1,537.602 ms</td>
      <td style="text-align: right">3.55</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">379.7</code></td>
      <td style="text-align: right">1537.6</td>
      <td style="text-align: right">260.58 MB</td>
      <td style="text-align: right">1.00</td>
    </tr>
  </tbody>
</table>

<h5 id="package-assets-9950x">Package Assets 9950X</h5>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Rows</th>
      <th style="text-align: right">Mean</th>
      <th style="text-align: right">Ratio</th>
      <th style="text-align: right">MB/s</th>
      <th style="text-align: right">ns/row</th>
      <th style="text-align: right">Allocated</th>
      <th style="text-align: right">Alloc Ratio</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Sep</td>
      <td>1000000</td>
      <td style="text-align: right">291.979 ms</td>
      <td style="text-align: right">1.00</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">1999.4</code></td>
      <td style="text-align: right">292.0</td>
      <td style="text-align: right">260.41 MB</td>
      <td style="text-align: right">1.00</td>
    </tr>
    <tr>
      <td>Sep_MT</td>
      <td>1000000</td>
      <td style="text-align: right">72.213 ms</td>
      <td style="text-align: right">0.25</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">8084.1</code></td>
      <td style="text-align: right">72.2</td>
      <td style="text-align: right">261.63 MB</td>
      <td style="text-align: right">1.00</td>
    </tr>
    <tr>
      <td>Sylvan</td>
      <td>1000000</td>
      <td style="text-align: right">413.265 ms</td>
      <td style="text-align: right">1.42</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">1412.6</code></td>
      <td style="text-align: right">413.3</td>
      <td style="text-align: right">260.57 MB</td>
      <td style="text-align: right">1.00</td>
    </tr>
    <tr>
      <td>ReadLine_</td>
      <td>1000000</td>
      <td style="text-align: right">377.033 ms</td>
      <td style="text-align: right">1.29</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">1548.4</code></td>
      <td style="text-align: right">377.0</td>
      <td style="text-align: right">1991.04 MB</td>
      <td style="text-align: right">7.65</td>
    </tr>
    <tr>
      <td>CsvHelper</td>
      <td>1000000</td>
      <td style="text-align: right">1,005.323 ms</td>
      <td style="text-align: right">3.44</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">580.7</code></td>
      <td style="text-align: right">1005.3</td>
      <td style="text-align: right">260.58 MB</td>
      <td style="text-align: right">1.00</td>
    </tr>
  </tbody>
</table>

<h5 id="floats-5950x">Floats 5950X</h5>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Rows</th>
      <th style="text-align: right">Mean</th>
      <th style="text-align: right">Ratio</th>
      <th style="text-align: right">MB/s</th>
      <th style="text-align: right">ns/row</th>
      <th style="text-align: right">Allocated</th>
      <th style="text-align: right">Alloc Ratio</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Sep</td>
      <td>25000</td>
      <td style="text-align: right">20.297 ms</td>
      <td style="text-align: right">1.00</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">1001.1</code></td>
      <td style="text-align: right">811.9</td>
      <td style="text-align: right">7.97 KB</td>
      <td style="text-align: right">1.00</td>
    </tr>
    <tr>
      <td>Sep_MT</td>
      <td>25000</td>
      <td style="text-align: right">3.780 ms</td>
      <td style="text-align: right">0.19</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">5375.6</code></td>
      <td style="text-align: right">151.2</td>
      <td style="text-align: right">179.49 KB</td>
      <td style="text-align: right">22.51</td>
    </tr>
    <tr>
      <td>Sylvan</td>
      <td>25000</td>
      <td style="text-align: right">52.343 ms</td>
      <td style="text-align: right">2.58</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">388.2</code></td>
      <td style="text-align: right">2093.7</td>
      <td style="text-align: right">18.88 KB</td>
      <td style="text-align: right">2.37</td>
    </tr>
    <tr>
      <td>ReadLine_</td>
      <td>25000</td>
      <td style="text-align: right">68.698 ms</td>
      <td style="text-align: right">3.38</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">295.8</code></td>
      <td style="text-align: right">2747.9</td>
      <td style="text-align: right">73493.12 KB</td>
      <td style="text-align: right">9,215.89</td>
    </tr>
    <tr>
      <td>CsvHelper</td>
      <td>25000</td>
      <td style="text-align: right">100.913 ms</td>
      <td style="text-align: right">4.97</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">201.4</code></td>
      <td style="text-align: right">4036.5</td>
      <td style="text-align: right">22061.69 KB</td>
      <td style="text-align: right">2,766.49</td>
    </tr>
  </tbody>
</table>

<h5 id="floats-9950x">Floats 9950X</h5>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Rows</th>
      <th style="text-align: right">Mean</th>
      <th style="text-align: right">Ratio</th>
      <th style="text-align: right">MB/s</th>
      <th style="text-align: right">ns/row</th>
      <th style="text-align: right">Allocated</th>
      <th style="text-align: right">Alloc Ratio</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Sep</td>
      <td>25000</td>
      <td style="text-align: right">16.182 ms</td>
      <td style="text-align: right">1.00</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">1255.7</code></td>
      <td style="text-align: right">647.3</td>
      <td style="text-align: right">7.94 KB</td>
      <td style="text-align: right">1.00</td>
    </tr>
    <tr>
      <td>Sep_MT</td>
      <td>25000</td>
      <td style="text-align: right">2.497 ms</td>
      <td style="text-align: right">0.15</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">8136.8</code></td>
      <td style="text-align: right">99.9</td>
      <td style="text-align: right">179.81 KB</td>
      <td style="text-align: right">22.64</td>
    </tr>
    <tr>
      <td>Sylvan</td>
      <td>25000</td>
      <td style="text-align: right">38.800 ms</td>
      <td style="text-align: right">2.40</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">523.7</code></td>
      <td style="text-align: right">1552.0</td>
      <td style="text-align: right">18.72 KB</td>
      <td style="text-align: right">2.36</td>
    </tr>
    <tr>
      <td>ReadLine_</td>
      <td>25000</td>
      <td style="text-align: right">54.117 ms</td>
      <td style="text-align: right">3.34</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">375.5</code></td>
      <td style="text-align: right">2164.7</td>
      <td style="text-align: right">73493.05 KB</td>
      <td style="text-align: right">9,253.27</td>
    </tr>
    <tr>
      <td>CsvHelper</td>
      <td>25000</td>
      <td style="text-align: right">71.601 ms</td>
      <td style="text-align: right">4.42</td>
      <td style="text-align: right"><code class="language-plaintext highlighter-rouge">283.8</code></td>
      <td style="text-align: right">2864.1</td>
      <td style="text-align: right">22061.55 KB</td>
      <td style="text-align: right">2,777.70</td>
    </tr>
  </tbody>
</table>

<h2 id="-summary-highlights">🌟 Summary Highlights</h2>

<p>AI generated summary highlights 😁</p>

<ul>
  <li>🚀 <strong>Blazing Fast Parsing</strong>: Sep 0.10.0 achieves an incredible <strong>21 GB/s</strong> CSV
parsing speed on AMD 9950X, a <strong>~3x improvement</strong> since its first release in
2023!</li>
  <li>🖥 <strong>Hardware Boost</strong>: Upgrading from AMD 5950X (Zen 3) to AMD 9950X (Zen 5)
delivers a <strong>~1.6x performance gain</strong>, thanks to AVX-512 support and higher
clock speeds.</li>
  <li>🧠 <strong>Smarter Parsers</strong>: The new <strong>AVX-512-to-256 parser</strong> circumvents mask
register inefficiencies, outperforming AVX2 and older AVX-512 parsers,
achieving <strong>~21 GB/s</strong>!</li>
  <li>📊 <strong>Cross-Platform Excellence</strong>: The <code class="language-plaintext highlighter-rouge">Vector256</code> based cross-platform parser
is now on par with AVX2, ensuring top-tier performance across platforms.</li>
  <li>🔬 <strong>Deep Dive</strong>: Explored .NET 9.0 JIT optimizations, SIMD assembly, and
parser design for CSV parsing.</li>
  <li>🏆 <strong>Multi-Threaded Power</strong>: Sep parses <strong>1 million rows in just 72 ms</strong> on
the 9950X, achieving <strong>8 GB/s</strong> for real-world CSV workloads.</li>
  <li>🔧 <strong>Continuous Improvement</strong>: Incremental optimizations and hardware
advancements have propelled Sep to new heights in just under 2 years.</li>
</ul>

<p>🎉 Sep is a testament to the power of software and hardware working together
to push the boundaries of performance!</p>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Sep 0.10.0 was released April 22nd, 2025 with optimizations for AVX-512 capable CPUs like the AMD 9950X (Zen 5) and updated benchmarks including the 9950X. Sep now achieves a staggering 21 GB/s on the 9950X for the low-level CSV parsing. 🚀 Before 0.10.0, Sep achieved ~18 GB/s on 9950X.]]></summary></entry><entry><title type="html">Sep 0.9.0 - Async Support</title><link href="https://nietras.com/2025/05/08/sep-0-9-0/" rel="alternate" type="text/html" title="Sep 0.9.0 - Async Support" /><published>2025-05-08T00:00:00+00:00</published><updated>2025-05-08T00:00:00+00:00</updated><id>https://nietras.com/2025/05/08/sep-0.9.0</id><content type="html" xml:base="https://nietras.com/2025/05/08/sep-0-9-0/"><![CDATA[<p>Sep 0.9.0 was released February 1st, 2025 - earlier this year - with a major new
feature: Async support for both <code class="language-plaintext highlighter-rouge">SepReader</code> and <code class="language-plaintext highlighter-rouge">SepWriter</code>.</p>

<p>See <a href="https://github.com/nietras/Sep/releases/tag/v0.9.0">v0.9.0 release</a> for all
changes, the release includes a few other niceties, and <a href="https://github.com/nietras/Sep">Sep README on
GitHub</a> for full details. Below is a (belated)
blog post focusing on the pragmatic approach used to add async support. First,
however, a copy of the section on async support in Sep README to introduce this,
then details on how this support was added.</p>

<h2 id="async-support">Async Support</h2>

<p>Sep supports efficient <code class="language-plaintext highlighter-rouge">ValueTask</code> based asynchronous reading and writing.</p>

<p>However, given both <code class="language-plaintext highlighter-rouge">SepReader.Row</code> and <code class="language-plaintext highlighter-rouge">SepWriter.Row</code> are <code class="language-plaintext highlighter-rouge">ref struct</code>s, as
they point to internal state and should only be used one at a time,
<code class="language-plaintext highlighter-rouge">async/await</code> usage is only supported on C# 13.0+ as this has support for <strong>“ref
and unsafe in iterators and async methods”</strong> as covered in <a href="https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-13">What’s new in C#
13</a>. Please
consult details in that for limitations and constraints due to this.</p>

<p>Similarly, <code class="language-plaintext highlighter-rouge">SepReader</code> only implements <code class="language-plaintext highlighter-rouge">IAsyncEnumerable&lt;SepReader.Row&gt;</code> (and
<code class="language-plaintext highlighter-rouge">IEnumerable&lt;SepReader.Row&gt;</code>) for .NET 9.0+/C# 13.0+ since then the interfaces
have been annotated with <code class="language-plaintext highlighter-rouge">allows ref struct</code> for <code class="language-plaintext highlighter-rouge">T</code>.</p>

<p>Async support is provided on the existing <code class="language-plaintext highlighter-rouge">SepReader</code> and <code class="language-plaintext highlighter-rouge">SepWriter</code> types
similar to how <code class="language-plaintext highlighter-rouge">TextReader</code> and <code class="language-plaintext highlighter-rouge">TextWriter</code> support both sync and async usage.
This means you as a developer are responsible for calling async methods and
using <code class="language-plaintext highlighter-rouge">await</code> when necessary. See below for a simple example and consult tests
on GitHub for more examples.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="rouge-code"><pre><span class="kt">var</span> <span class="n">text</span> <span class="p">=</span> <span class="s">"""
</span>           <span class="n">A</span><span class="p">;</span><span class="n">B</span><span class="p">;</span><span class="n">C</span><span class="p">;</span><span class="n">D</span><span class="p">;</span><span class="n">E</span><span class="p">;</span><span class="n">F</span>
           <span class="n">Sep</span><span class="p">;</span><span class="err">🚀</span><span class="p">;</span><span class="m">1</span><span class="p">;</span><span class="m">1.2</span><span class="p">;</span><span class="m">0.1</span><span class="p">;</span><span class="m">0.5</span>
           <span class="n">CSV</span><span class="p">;</span><span class="err">✅</span><span class="p">;</span><span class="m">2</span><span class="p">;</span><span class="m">2.2</span><span class="p">;</span><span class="m">0.2</span><span class="p">;</span><span class="m">1.5</span>
           
           <span class="s">"""; // Empty line at end is for line ending
</span>
<span class="k">using</span> <span class="nn">var</span> <span class="n">reader</span> <span class="p">=</span> <span class="k">await</span> <span class="n">Sep</span><span class="p">.</span><span class="nf">Reader</span><span class="p">().</span><span class="nf">FromTextAsync</span><span class="p">(</span><span class="n">text</span><span class="p">);</span>
<span class="k">await</span> <span class="k">using</span> <span class="nn">var</span> <span class="n">writer</span> <span class="p">=</span> <span class="n">reader</span><span class="p">.</span><span class="n">Spec</span><span class="p">.</span><span class="nf">Writer</span><span class="p">().</span><span class="nf">ToText</span><span class="p">();</span>
<span class="k">await</span> <span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">readRow</span> <span class="k">in</span> <span class="n">reader</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">await</span> <span class="k">using</span> <span class="nn">var</span> <span class="n">writeRow</span> <span class="p">=</span> <span class="n">writer</span><span class="p">.</span><span class="nf">NewRow</span><span class="p">(</span><span class="n">readRow</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">Assert</span><span class="p">.</span><span class="nf">AreEqual</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">writer</span><span class="p">.</span><span class="nf">ToString</span><span class="p">());</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Note how for <code class="language-plaintext highlighter-rouge">SepReader</code> the <code class="language-plaintext highlighter-rouge">FromTextAsync</code> is suffixed with <code class="language-plaintext highlighter-rouge">Async</code> to
indicate async creation, this is due to the reader having to read the first row
of the source at creation to determine both separator and, if file has a header,
column names of the header. The <code class="language-plaintext highlighter-rouge">From*Async</code> call then has to be <code class="language-plaintext highlighter-rouge">await</code>ed.
After that rows can be enumerated asynchronously simply by putting <code class="language-plaintext highlighter-rouge">await</code>
before <code class="language-plaintext highlighter-rouge">foreach</code>. If one forgets to do that the rows will be enumerated
synchronously.</p>

<p>For <code class="language-plaintext highlighter-rouge">SepWriter</code> the usage is kind of reversed. <code class="language-plaintext highlighter-rouge">To*</code> methods have no <code class="language-plaintext highlighter-rouge">Async</code>
variants, since creation is synchronous. That is, <code class="language-plaintext highlighter-rouge">StreamWriter</code> is created by a
simple constructor call. Nothing is written until a header or row is defined and
<code class="language-plaintext highlighter-rouge">Dispose</code>/<code class="language-plaintext highlighter-rouge">DisposeAsync</code> is called on the row.</p>

<p>For reader nothing needs to be asynchronously disposed, so <code class="language-plaintext highlighter-rouge">using</code> does not
require <code class="language-plaintext highlighter-rouge">await</code>. However, for <code class="language-plaintext highlighter-rouge">SepWriter</code> dispose may have to write/flush data
to underlying <code class="language-plaintext highlighter-rouge">TextWriter</code> and hence it should be using <code class="language-plaintext highlighter-rouge">DisposeAsync</code>, so you
must use <code class="language-plaintext highlighter-rouge">await using</code>.</p>

<p>To support cancellation many methods have overloads that accept a
<code class="language-plaintext highlighter-rouge">CancellationToken</code> like the <code class="language-plaintext highlighter-rouge">From*Async</code> methods for creating a <code class="language-plaintext highlighter-rouge">SepReader</code> or
for example <code class="language-plaintext highlighter-rouge">NewRow</code> for <code class="language-plaintext highlighter-rouge">SepWriter</code>. Consult <a href="#public-api-reference">Public API
Reference</a> for full set of available methods.</p>

<p>Additionally, both <a href="#sepreaderoptions">SepReaderOptions</a> and
<a href="#sepwriteroptions">SepWriterOptions</a> feature the <code class="language-plaintext highlighter-rouge">bool
AsyncContinueOnCapturedContext</code> option that is forwarded to internal
<code class="language-plaintext highlighter-rouge">ConfigureAwait</code> calls, see the <a href="https://devblogs.microsoft.com/dotnet/configureawait-faq/">ConfigureAwait
FAQ</a> for details on
that.</p>

<h2 id="pragmatic-async-support-implementation">Pragmatic Async Support Implementation</h2>

<p><code class="language-plaintext highlighter-rouge">async/await</code> is viral. For any async method call, however deep it may be, all
methods from top to deepest async method need to be async too. This means
supporting both sync and async becomes problematic as you are faced with either
trying to refactor, while still needing to copy entire method chains, or copy
pasting everything.</p>

<p>For Sep I choose the latter with a twist. Isolate IO calling methods e.g.
methods using <code class="language-plaintext highlighter-rouge">TextReader</code> for <code class="language-plaintext highlighter-rouge">SepReader</code> and <code class="language-plaintext highlighter-rouge">TextWriter</code> for <code class="language-plaintext highlighter-rouge">SepWriter</code>.
Then create two separate files for each of these methods: one for synchronous
methods and one for asynchronous methods. These files are nearly identical,
differing only by a preprocessor directive defined at the top of each file. For
example:</p>

<p><a href=""><code class="language-plaintext highlighter-rouge">src/Sep/SepReader.IO.Async.cs</code></a></p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="err">﻿</span><span class="c1">//#define SYNC</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p><a href=""><code class="language-plaintext highlighter-rouge">src/Sep/SepReader.IO.Sync.cs</code></a></p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="cp">#define SYNC
</span></pre></td></tr></tbody></table></code></pre></div></div>

<p>The rest of these files are then identical, but for each method in these files
<code class="language-plaintext highlighter-rouge">#if #else #endif</code> preprocessor switches are used to handle differences in the
method signature and implementation. All async method names are then suffixed
with <code class="language-plaintext highlighter-rouge">Async</code> to differentiate from the sync methods. For example:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="rouge-code"><pre><span class="cp">#if SYNC
</span>    <span class="k">internal</span> <span class="k">void</span> <span class="nf">Initialize</span><span class="p">(</span><span class="k">in</span> <span class="n">SepReaderOptions</span> <span class="n">options</span><span class="p">)</span>
<span class="cp">#else
</span>    <span class="k">internal</span> <span class="k">async</span> <span class="n">ValueTask</span> <span class="nf">InitializeAsync</span><span class="p">(</span><span class="k">in</span> <span class="n">SepReaderOptions</span> <span class="n">options</span><span class="p">,</span> 
        <span class="n">CancellationToken</span> <span class="n">cancellationToken</span><span class="p">)</span>
<span class="cp">#endif
</span>    <span class="p">{</span>
<span class="cp">#if SYNC
</span>        <span class="k">if</span> <span class="p">(</span><span class="nf">MoveNext</span><span class="p">())</span>
<span class="cp">#else
</span>        <span class="k">if</span> <span class="p">(</span><span class="k">await</span> <span class="nf">MoveNextAsync</span><span class="p">(</span><span class="n">cancellationToken</span><span class="p">))</span>
<span class="cp">#endif
</span></pre></td></tr></tbody></table></code></pre></div></div>

<p>This means it is easy to maintain. For consistency unit tests ensure the two
files are kept in sync in face of changes by simply comparing these pairs of
files and that all lines are the same except for the first line.</p>

<p>This approach “avoids” duplicating logic while maintaining separate
implementations for sync and async operations. The shared logic is preserved by
keeping the core functionality identical, with only the method signatures and
specific async-related keywords differing.</p>

<p>By using preprocessor directives, each variant has only the relevant code for
either sync or async, ensuring no unnecessary runtime checks or sync over async.</p>

<h3 id="performance">Performance</h3>

<p>All <code class="language-plaintext highlighter-rouge">Async</code> methods are implemented using <code class="language-plaintext highlighter-rouge">ValueTask</code> to avoid the overhead of
allocating <code class="language-plaintext highlighter-rouge">Task</code> instances if not needed (e.g. if data is already available).
This means the overhead is minimal, which can be seen in benchmarks (based on in
memory data in the form of <code class="language-plaintext highlighter-rouge">StringReader</code>) as shown below.</p>

<p><code class="language-plaintext highlighter-rouge">Sep_Async</code> is only 1.07x slower than sync <code class="language-plaintext highlighter-rouge">Sep</code> at the very lowest level of
simply parsing the CSV file. For any real workload the difference is neglible.</p>

<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="rouge-code"><pre><span class="err">BenchmarkDotNet</span> <span class="err">v0.14.0,</span> <span class="err">Windows</span> <span class="err">10</span> <span class="err">(10.0.19044.3086/21H2/November2021Update)</span>
<span class="err">AMD</span> <span class="err">Ryzen</span> <span class="err">9</span> <span class="err">5950X,</span> <span class="err">1</span> <span class="err">CPU,</span> <span class="err">32</span> <span class="err">logical</span> <span class="err">and</span> <span class="err">16</span> <span class="err">physical</span> <span class="err">cores</span>
<span class="err">.NET</span> <span class="err">SDK</span> <span class="err">9.0.102</span>
  <span class="nn">[Host]</span>     <span class="err">:</span> <span class="err">.NET</span> <span class="err">9.0.1</span> <span class="err">(9.0.124.61010),</span> <span class="err">X64</span> <span class="err">RyuJIT</span> <span class="err">AVX2</span>
  <span class="err">Job-RANURT</span> <span class="err">:</span> <span class="err">.NET</span> <span class="err">9.0.1</span> <span class="err">(9.0.124.61010),</span> <span class="err">X64</span> <span class="err">RyuJIT</span> <span class="err">AVX2</span>

<span class="py">Job</span><span class="p">=</span><span class="s">Job-RANURT  EnvironmentVariables=DOTNET_GCDynamicAdaptationMode=0  Runtime=.NET 9.0  </span>
<span class="py">Toolchain</span><span class="p">=</span><span class="s">net90  InvocationCount=Default  IterationTime=350ms  </span>
<span class="py">MaxIterationCount</span><span class="p">=</span><span class="s">15  MinIterationCount=5  WarmupCount=6  </span>
<span class="py">Quotes</span><span class="p">=</span><span class="s">False  Reader=String  </span>

<span class="err">|</span> <span class="err">Method</span>       <span class="err">|</span> <span class="err">Scope</span> <span class="err">|</span> <span class="err">Rows</span>    <span class="err">|</span> <span class="err">Mean</span>         <span class="err">|</span> <span class="err">Ratio</span> <span class="err">|</span> <span class="err">MB</span>  <span class="err">|</span> <span class="err">MB/s</span>    <span class="err">|</span> <span class="err">ns/row</span> <span class="err">|</span> <span class="err">Allocated</span>     <span class="err">|</span> <span class="err">Alloc</span> <span class="err">Ratio</span> <span class="err">|</span>
<span class="err">|-------------</span> <span class="err">|------</span> <span class="err">|--------</span> <span class="err">|-------------:|------:|----:|--------:|-------:|--------------:|------------:|</span>
<span class="err">|</span> <span class="err">Sep______</span>    <span class="err">|</span> <span class="err">Row</span>   <span class="err">|</span> <span class="err">50000</span>   <span class="err">|</span>     <span class="err">2.230</span> <span class="err">ms</span> <span class="err">|</span>  <span class="err">1.00</span> <span class="err">|</span>  <span class="err">29</span> <span class="err">|</span> <span class="err">13088.4</span> <span class="err">|</span>   <span class="err">44.6</span> <span class="err">|</span>       <span class="err">1.09</span> <span class="err">KB</span> <span class="err">|</span>        <span class="err">1.00</span> <span class="err">|</span>
<span class="err">|</span> <span class="err">Sep_Async</span>    <span class="err">|</span> <span class="err">Row</span>   <span class="err">|</span> <span class="err">50000</span>   <span class="err">|</span>     <span class="err">2.379</span> <span class="err">ms</span> <span class="err">|</span>  <span class="err">1.07</span> <span class="err">|</span>  <span class="err">29</span> <span class="err">|</span> <span class="err">12264.0</span> <span class="err">|</span>   <span class="err">47.6</span> <span class="err">|</span>       <span class="err">1.02</span> <span class="err">KB</span> <span class="err">|</span>        <span class="err">0.93</span> <span class="err">|</span>
<span class="err">|</span> <span class="err">Sep_Unescape</span> <span class="err">|</span> <span class="err">Row</span>   <span class="err">|</span> <span class="err">50000</span>   <span class="err">|</span>     <span class="err">2.305</span> <span class="err">ms</span> <span class="err">|</span>  <span class="err">1.03</span> <span class="err">|</span>  <span class="err">29</span> <span class="err">|</span> <span class="err">12657.6</span> <span class="err">|</span>   <span class="err">46.1</span> <span class="err">|</span>       <span class="err">1.02</span> <span class="err">KB</span> <span class="err">|</span>        <span class="err">0.93</span> <span class="err">|</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Sep 0.9.0 was released February 1st, 2025 - earlier this year - with a major new feature: Async support for both SepReader and SepWriter.]]></summary></entry><entry><title type="html">Sep 0.8.0 - SepWriter Replace StringBuilder with ArrayPool Array</title><link href="https://nietras.com/2025/05/07/sep-0-8-0/" rel="alternate" type="text/html" title="Sep 0.8.0 - SepWriter Replace StringBuilder with ArrayPool Array" /><published>2025-05-07T00:00:00+00:00</published><updated>2025-05-07T00:00:00+00:00</updated><id>https://nietras.com/2025/05/07/sep-0.8.0</id><content type="html" xml:base="https://nietras.com/2025/05/07/sep-0-8-0/"><![CDATA[<p>Sep 0.8.0 was released January 19th, 2025 - earlier this year - with two notable
changes:</p>

<ul>
  <li>🎯 Remove net7.0 target</li>
  <li>✨ SepWriter.Col: Replace StringBuilder with ArrayPool array and 
DefaultInterpolatedStringHandler</li>
</ul>

<p>See <a href="https://github.com/nietras/Sep/releases/tag/v0.8.0">v0.8.0 release</a> for all
changes and <a href="https://github.com/nietras/Sep">Sep README on GitHub</a> for full
details. Below is a quick (belated) blog post to explain the changes a bit.</p>

<h2 id="sepwriter-vs-textwriter">SepWriter vs TextWriter</h2>

<p><code class="language-plaintext highlighter-rouge">SepWriter</code> hasn’t gotten as much attention as <code class="language-plaintext highlighter-rouge">SepReader</code> here, which is partly
intentional, as <code class="language-plaintext highlighter-rouge">SepWriter</code> is not so much about performance and speed but more
about convenience, ease of use and change. And not much has changed about that
since Sep was introduced. If you want the best speed for writing you would be
better off simply using <code class="language-plaintext highlighter-rouge">TextWriter</code> directly (if done correctly).</p>

<p>Let’s do a quick code comparison of <code class="language-plaintext highlighter-rouge">SepWriter</code> and <code class="language-plaintext highlighter-rouge">TextWriter</code>. Given:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="k">const</span> <span class="kt">string</span> <span class="n">ColNameA</span> <span class="p">=</span> <span class="s">"A"</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">string</span> <span class="n">ColNameB</span> <span class="p">=</span> <span class="s">"B"</span><span class="p">;</span>

<span class="n">ReadOnlySpan</span><span class="p">&lt;</span><span class="kt">int</span><span class="p">&gt;</span> <span class="n">values</span> <span class="p">=</span> <span class="p">[</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">];</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>we want to write some multiple of the values to csv for each column. With Sep
this can be done like below. The main take away here is that with Sep you do not
have to separate the writing of the header (column name) and the values.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre><span class="k">using</span> <span class="nn">var</span> <span class="n">sepWriter</span> <span class="p">=</span> <span class="n">Sep</span><span class="p">.</span><span class="n">Default</span><span class="p">.</span><span class="nf">Writer</span><span class="p">().</span><span class="nf">ToText</span><span class="p">();</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">v</span> <span class="k">in</span> <span class="n">values</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">using</span> <span class="nn">var</span> <span class="n">row</span> <span class="p">=</span> <span class="n">sepWriter</span><span class="p">.</span><span class="nf">NewRow</span><span class="p">();</span>
    <span class="n">row</span><span class="p">[</span><span class="n">ColNameA</span><span class="p">].</span><span class="nf">Format</span><span class="p">(</span><span class="n">v</span> <span class="p">*</span> <span class="m">10</span><span class="p">);</span>
    <span class="n">row</span><span class="p">[</span><span class="n">ColNameB</span><span class="p">].</span><span class="nf">Format</span><span class="p">(</span><span class="n">v</span> <span class="p">*</span> <span class="m">100</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">sepWriter</span><span class="p">.</span><span class="nf">ToString</span><span class="p">());</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>One way to do this with <code class="language-plaintext highlighter-rouge">StringWriter</code> (aka <code class="language-plaintext highlighter-rouge">TextWriter</code>) is shown below. While
this clearly is longer, the other issue is how you have two separate parts for
first writing the header and then writing the rows. Not a big issue here but
when you have many columns keeping things in sync can be a challenge and a known
source of dev churn.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="rouge-code"><pre><span class="k">const</span> <span class="kt">char</span> <span class="n">Separator</span> <span class="p">=</span> <span class="sc">';'</span><span class="p">;</span>
<span class="k">using</span> <span class="nn">var</span> <span class="n">textWriter</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">StringWriter</span><span class="p">();</span>
<span class="c1">// Header</span>
<span class="n">textWriter</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">ColNameA</span><span class="p">);</span>
<span class="n">textWriter</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">Separator</span><span class="p">);</span>
<span class="n">textWriter</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">ColNameB</span><span class="p">);</span>
<span class="n">textWriter</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">();</span>
<span class="c1">// Rows</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">v</span> <span class="k">in</span> <span class="n">values</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">textWriter</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">v</span> <span class="p">*</span> <span class="m">10</span><span class="p">);</span>
    <span class="n">textWriter</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">Separator</span><span class="p">);</span>
    <span class="n">textWriter</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">v</span> <span class="p">*</span> <span class="m">100</span><span class="p">);</span>
    <span class="n">textWriter</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">textWriter</span><span class="p">.</span><span class="nf">ToString</span><span class="p">());</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="sepwritercol-stringbuilder-issue">SepWriter.Col: StringBuilder Issue</h2>

<p>The above <code class="language-plaintext highlighter-rouge">TextWriter</code> code is basically what Sep does under the hood. However,
for each column (e.g. <code class="language-plaintext highlighter-rouge">var col = row[ColNameA];</code>) Sep would store each column
value as a <code class="language-plaintext highlighter-rouge">StringBuilder</code> until the completion of a row and calling <code class="language-plaintext highlighter-rouge">Dispose()</code>
on it at which point the contents of <code class="language-plaintext highlighter-rouge">StringBuilder</code> is written to the
underlying <code class="language-plaintext highlighter-rouge">TextWriter</code> that <code class="language-plaintext highlighter-rouge">SepWriter</code> works over. In this way Sep could
utilize all the <code class="language-plaintext highlighter-rouge">StringBuilder</code> functionality to support <code class="language-plaintext highlighter-rouge">Format</code> (e.g.
<a href="https://learn.microsoft.com/en-us/dotnet/api/system.ispanformattable"><code class="language-plaintext highlighter-rouge">ISpanFormattable</code></a>)
and similar. Additionally, Sep would use a pool of <code class="language-plaintext highlighter-rouge">StringBuilder</code>s to reduce
repeated allocations.</p>

<p><code class="language-plaintext highlighter-rouge">StringBuilder</code> does have an underlying issue, though, in that it is basically
implemented as a linked list of <code class="language-plaintext highlighter-rouge">StringBuilder</code>s, which means in order to write
all the contents of it to <code class="language-plaintext highlighter-rouge">TextWriter</code>, without creating a string, one would
have to enumerate the chunks of it like:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">chunk</span> <span class="k">in</span> <span class="n">sb</span><span class="p">.</span><span class="nf">GetChunks</span><span class="p">())</span>
<span class="p">{</span>
    <span class="n">_writer</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">chunk</span><span class="p">.</span><span class="n">Span</span><span class="p">);</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Since, long columns are rare it is similarly rare for there being multiple
chunks. Hence, the enumeration causes a bit of a performance hit.</p>

<h2 id="sepwritercol-replace-stringbuilder-with-arraypool-array-and-defaultinterpolatedstringhandler">SepWriter.Col: Replace StringBuilder with ArrayPool array and DefaultInterpolatedStringHandler</h2>

<p>Performance is a feature, and while not the top priority for <code class="language-plaintext highlighter-rouge">SepWriter</code>, 0.8.0
addresses this issue by swapping out the internal <code class="language-plaintext highlighter-rouge">StringBuilder</code> with a
<code class="language-plaintext highlighter-rouge">char[]</code> from <code class="language-plaintext highlighter-rouge">ArrayPool</code>. To implement formatting Sep then relies on
<a href="https://learn.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.defaultinterpolatedstringhandler"><code class="language-plaintext highlighter-rouge">DefaultInterpolatedStringHandler</code></a>.</p>

<p>However, this doesn’t have public APIs allowing for using and managing arrays
from the <code class="language-plaintext highlighter-rouge">ArrayPool</code>. I was then faced with a choice of either copying the
entire implementation of <code class="language-plaintext highlighter-rouge">DefaultInterpolatedStringHandler</code> or finding another
way. That other way was to use the <code class="language-plaintext highlighter-rouge">UnsafeAccessor</code> attribute to access the
internal state of <code class="language-plaintext highlighter-rouge">DefaultInterpolatedStringHandler</code>, as shown below, and reuse
the array from <code class="language-plaintext highlighter-rouge">ArrayPool</code>. This is a bit of a hack, but it works and is fast.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="rouge-code"><pre><span class="c1">// Avoid recreating DefaultInterpolatedStringHandler while being</span>
<span class="c1">// able to reuse array from ArrayPool by using UnsafeAccessor to</span>
<span class="c1">// access internal state of this. This works fine for net8.0 and</span>
<span class="c1">// net9.0 but there are no guarantees if this could change in the</span>
<span class="c1">// future, if so consider using #if NET10_0_OR_GREATER or similar to</span>
<span class="c1">// address any changes or consider then copying the entire</span>
<span class="c1">// DefaultInterpolatedStringHandler source code and adopt for needs.</span>
 
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">AggressiveInlining</span><span class="p">)]</span>
<span class="p">[</span><span class="nf">UnsafeAccessor</span><span class="p">(</span><span class="n">UnsafeAccessorKind</span><span class="p">.</span><span class="n">Field</span><span class="p">,</span> <span class="n">Name</span> <span class="p">=</span> <span class="s">"_arrayToReturnToPool"</span><span class="p">)]</span>
<span class="k">static</span> <span class="k">extern</span> <span class="k">ref</span> <span class="kt">char</span><span class="p">[]?</span> <span class="nf">ArrayToReturnToPool</span><span class="p">(</span><span class="k">ref</span> <span class="n">DefaultInterpolatedStringHandler</span> <span class="n">handler</span><span class="p">);</span>
 
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">AggressiveInlining</span><span class="p">)]</span>
<span class="p">[</span><span class="nf">UnsafeAccessor</span><span class="p">(</span><span class="n">UnsafeAccessorKind</span><span class="p">.</span><span class="n">Field</span><span class="p">,</span> <span class="n">Name</span> <span class="p">=</span> <span class="s">"_pos"</span><span class="p">)]</span>
<span class="k">static</span> <span class="k">extern</span> <span class="k">ref</span> <span class="kt">int</span> <span class="nf">Position</span><span class="p">(</span><span class="k">ref</span> <span class="n">DefaultInterpolatedStringHandler</span> <span class="n">handler</span><span class="p">);</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The downside is that <code class="language-plaintext highlighter-rouge">UnsafeAccessor</code> is only supported on <code class="language-plaintext highlighter-rouge">net8.0</code> and above.
Given <code class="language-plaintext highlighter-rouge">net7.0</code> is no longer supported I decided it was time to drop support for
it.</p>

<p>I don’t have detailed benchmarks here, but the end result for <code class="language-plaintext highlighter-rouge">SepWriter</code> is
that for a given simple case of writing multiple short columns <code class="language-plaintext highlighter-rouge">SepWriter</code> is
10-15% faster while still having zero allocations after warmup/first rows.
Additionally, code is simpler, even with the <code class="language-plaintext highlighter-rouge">UnsafeAccessor</code> code.</p>

<p>For more details take a look at the pull request <a href="https://github.com/nietras/Sep/pull/216">SepWriter.Col: Replace
StringBuilder with ArrayPool array and
DefaultInterpolatedStringHandler</a>.</p>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Sep 0.8.0 was released January 19th, 2025 - earlier this year - with two notable changes:]]></summary></entry><entry><title type="html">Retrieving Azure DevOps Pull Requests for Entire Organization with PowerShell</title><link href="https://nietras.com/2025/03/18/azure-devops-all-last-years-pull-requests/" rel="alternate" type="text/html" title="Retrieving Azure DevOps Pull Requests for Entire Organization with PowerShell" /><published>2025-03-18T00:00:00+00:00</published><updated>2025-03-18T00:00:00+00:00</updated><id>https://nietras.com/2025/03/18/azure-devops-all-last-years-pull-requests</id><content type="html" xml:base="https://nietras.com/2025/03/18/azure-devops-all-last-years-pull-requests/"><![CDATA[<p>In this 99% LLM generated post, I’ll walk you through a PowerShell script that
retrieves pull requests (PRs) descriptions from your Azure DevOps organization.</p>

<p>The script targets scenarios where you want to list all PRs and filter them
based on criteria such as creation date and the author’s display name (for
example, matching initials or part of a full name). That is, it gets all pull
requests for a project since a given date, filters them, and accumulates all to
finally display them sorted by creation date.</p>

<h2 id="why-use-this-script">Why Use This Script?</h2>

<ul>
  <li>Retrieves all projects in your organization.</li>
  <li>Pulls all PRs per project since a given date using the Azure DevOps REST API
and pagination.</li>
  <li>Filters PRs by checking the <code class="language-plaintext highlighter-rouge">createdBy.displayName</code> field.</li>
  <li>Adds the project name to the PR object for easier reporting.</li>
  <li>Outputs the results in a neatly formatted table.</li>
</ul>

<h2 id="prerequisites">Prerequisites</h2>

<ul>
  <li><strong>Azure DevOps Personal Access Token (PAT):</strong> You need a PAT with at least
<em>Code (read)</em> permissions. Generate one from your Azure DevOps account
settings.</li>
  <li><strong>PowerShell Environment:</strong> This script uses basic PowerShell commands
available on modern Windows systems.</li>
  <li><strong>Basic Understanding of REST APIs:</strong> Familiarity with REST API concepts can
help in customizing the script further.</li>
</ul>

<h2 id="the-script">The Script</h2>

<p>Below is the complete PowerShell script. Update the variable values (e.g.,
organization name, PAT, author initials, full name) to match your environment.</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
</pre></td><td class="rouge-code"><pre><span class="c"># Define variables</span><span class="w">
</span><span class="nv">$organization</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"your-org-name"</span><span class="w">          </span><span class="c"># Replace with your Azure DevOps organization name</span><span class="w">
</span><span class="nv">$personalAccessToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"your-pat-token"</span><span class="w">  </span><span class="c"># Replace with your Azure DevOps PAT</span><span class="w">
</span><span class="nv">$top</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">400</span><span class="w">                               </span><span class="c"># Number of PRs to retrieve per request</span><span class="w">
</span><span class="nv">$authorInitials</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"your-initials"</span><span class="w">        </span><span class="c"># Substring to match (e.g., initials)</span><span class="w">
</span><span class="nv">$fullName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"your-name"</span><span class="w">                  </span><span class="c"># Full name to match</span><span class="w">
</span><span class="nv">$lastYearDate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Get-Date</span><span class="p">)</span><span class="o">.</span><span class="nf">AddYears</span><span class="p">(</span><span class="nt">-1</span><span class="p">)</span><span class="o">.</span><span class="nf">ToString</span><span class="p">(</span><span class="s2">"yyyy-MM-ddTHH:mm:ssZ"</span><span class="p">)</span><span class="w">

</span><span class="c"># Encode PAT for authentication</span><span class="w">
</span><span class="nv">$base64AuthInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="n">Convert</span><span class="p">]::</span><span class="n">ToBase64String</span><span class="p">([</span><span class="n">Text.Encoding</span><span class="p">]::</span><span class="n">ASCII.GetBytes</span><span class="p">(</span><span class="s2">":</span><span class="nv">$personalAccessToken</span><span class="s2">"</span><span class="p">))</span><span class="w">

</span><span class="c"># Function to retrieve PRs for a given project</span><span class="w">
</span><span class="kr">function</span><span class="w"> </span><span class="nf">Get-PullRequests</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="kr">param</span><span class="w"> </span><span class="p">(</span><span class="w">
        </span><span class="p">[</span><span class="n">string</span><span class="p">]</span><span class="nv">$projectName</span><span class="w">
    </span><span class="p">)</span><span class="w">
    </span><span class="nv">$skip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="w">
    </span><span class="nv">$prs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">@()</span><span class="w">
    </span><span class="kr">do</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="c"># Construct the API URL with time range filters</span><span class="w">
        </span><span class="nv">$prUrl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://dev.azure.com/</span><span class="nv">$organization</span><span class="s2">/</span><span class="nv">$projectName</span><span class="s2">/_apis/git/pullrequests?searchCriteria.status=all&amp;</span><span class="se">`$</span><span class="s2">top=</span><span class="nv">$top</span><span class="s2">&amp;</span><span class="se">`$</span><span class="s2">skip=</span><span class="nv">$skip</span><span class="s2">&amp;searchCriteria.minTime=</span><span class="nv">$lastYearDate</span><span class="s2">&amp;searchCriteria.queryTimeRangeType=Created&amp;api-version=7.1"</span><span class="w">
        </span><span class="nv">$response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Invoke-RestMethod</span><span class="w"> </span><span class="nt">-Uri</span><span class="w"> </span><span class="nv">$prUrl</span><span class="w"> </span><span class="nt">-Headers</span><span class="w"> </span><span class="p">@{</span><span class="nx">Authorization</span><span class="o">=</span><span class="err">(</span><span class="s2">"Basic {0}"</span><span class="w"> </span><span class="err">-</span><span class="nx">f</span><span class="w"> </span><span class="nv">$base64AuthInfo</span><span class="err">)</span><span class="p">}</span><span class="w"> </span><span class="nt">-Method</span><span class="w"> </span><span class="nx">Get</span><span class="w">
        </span><span class="nv">$prs</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nv">$response</span><span class="o">.</span><span class="nf">value</span><span class="w">
        </span><span class="nv">$skip</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nv">$top</span><span class="w">
    </span><span class="p">}</span><span class="w"> </span><span class="kr">while</span><span class="w"> </span><span class="p">(</span><span class="nv">$response</span><span class="o">.</span><span class="nf">value</span><span class="o">.</span><span class="nf">Count</span><span class="w"> </span><span class="o">-eq</span><span class="w"> </span><span class="nv">$top</span><span class="p">)</span><span class="w">  </span><span class="c"># Continue if the number of PRs retrieved equals $top</span><span class="w">
    </span><span class="kr">return</span><span class="w"> </span><span class="nv">$prs</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="c"># Get all projects in the organization</span><span class="w">
</span><span class="nv">$projectsUrl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://dev.azure.com/</span><span class="nv">$organization</span><span class="s2">/_apis/projects?api-version=7.1-preview.4"</span><span class="w">
</span><span class="nv">$projects</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Invoke-RestMethod</span><span class="w"> </span><span class="nt">-Uri</span><span class="w"> </span><span class="nv">$projectsUrl</span><span class="w"> </span><span class="nt">-Headers</span><span class="w"> </span><span class="p">@{</span><span class="nx">Authorization</span><span class="o">=</span><span class="err">(</span><span class="s2">"Basic {0}"</span><span class="w"> </span><span class="err">-</span><span class="nx">f</span><span class="w"> </span><span class="nv">$base64AuthInfo</span><span class="err">)</span><span class="p">}</span><span class="w"> </span><span class="nt">-Method</span><span class="w"> </span><span class="nx">Get</span><span class="w">

</span><span class="nv">$allPullRequests</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">@()</span><span class="w">

</span><span class="kr">foreach</span><span class="w"> </span><span class="p">(</span><span class="nv">$project</span><span class="w"> </span><span class="kr">in</span><span class="w"> </span><span class="nv">$projects</span><span class="o">.</span><span class="nf">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nv">$projectName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$project</span><span class="o">.</span><span class="nf">name</span><span class="w">
    </span><span class="nv">$prs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Get-PullRequests</span><span class="w"> </span><span class="nt">-projectName</span><span class="w"> </span><span class="nv">$projectName</span><span class="w">

    </span><span class="nv">$filteredPRs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$prs</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Where-Object</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="p">(</span><span class="w"> </span><span class="bp">$_</span><span class="o">.</span><span class="nf">createdBy</span><span class="o">.</span><span class="nf">displayName</span><span class="w"> </span><span class="o">-match</span><span class="w"> </span><span class="nv">$authorInitials</span><span class="w"> </span><span class="o">-or</span><span class="w"> </span><span class="bp">$_</span><span class="o">.</span><span class="nf">createdBy</span><span class="o">.</span><span class="nf">displayName</span><span class="w"> </span><span class="o">-match</span><span class="w"> </span><span class="nv">$fullName</span><span class="w"> </span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">ForEach-Object</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="c"># Manually add the project name (since it's not part of the PR object)</span><span class="w">
        </span><span class="bp">$_</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Add-Member</span><span class="w"> </span><span class="nt">-NotePropertyName</span><span class="w"> </span><span class="s2">"Project"</span><span class="w"> </span><span class="nt">-NotePropertyValue</span><span class="w"> </span><span class="nv">$projectName</span><span class="w"> </span><span class="nt">-PassThru</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nv">$allPullRequests</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nv">$filteredPRs</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="c"># Sort the results by creationDate and output desired columns</span><span class="w">
</span><span class="nv">$allPullRequests</span><span class="w"> </span><span class="o">|</span><span class="w">
    </span><span class="n">Sort-Object</span><span class="w"> </span><span class="nx">creationDate</span><span class="w"> </span><span class="o">|</span><span class="w">
    </span><span class="n">Select-Object</span><span class="w"> </span><span class="nx">Project</span><span class="p">,</span><span class="w"> </span><span class="p">@{</span><span class="nx">Name</span><span class="o">=</span><span class="s2">"Repository"</span><span class="p">;</span><span class="w"> </span><span class="nx">Expression</span><span class="o">=</span><span class="p">{</span><span class="w"> </span><span class="bp">$_</span><span class="o">.</span><span class="nf">repository</span><span class="o">.</span><span class="nf">name</span><span class="w"> </span><span class="p">}},</span><span class="w"> </span><span class="nx">pullRequestId</span><span class="p">,</span><span class="w"> </span><span class="nx">title</span><span class="p">,</span><span class="w"> </span><span class="nx">creationDate</span><span class="p">,</span><span class="w"> </span><span class="nx">status</span><span class="w"> </span><span class="o">|</span><span class="w">
    </span><span class="n">Format-Table</span><span class="w"> </span><span class="nt">-AutoSize</span><span class="w">
</span></pre></td></tr></tbody></table></code></pre></div></div>

<p>I have verified this script works for my needs, but use at your own peril.
Output may look something like.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre>Project     Repository pullRequestId title           creationDate    status
-------     ---------- ------------- -----           ------------    ------
ProjectName       repo         4001 PR title 2024-01-01T11:00:00 completed
ProjectName       repo         4002 PR title 2024-01-02T12:00:00 completed
ProjectName       repo         4003 PR title 2024-01-03T13:00:00 completed
ProjectName       repo         4004 PR title 2024-01-04T14:00:00 completed
</pre></td></tr></tbody></table></code></pre></div></div>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[In this 99% LLM generated post, I’ll walk you through a PowerShell script that retrieves pull requests (PRs) descriptions from your Azure DevOps organization.]]></summary></entry><entry><title type="html">Sep 0.7.0 - CSV Escape Support</title><link href="https://nietras.com/2025/01/12/sep-0-7-0/" rel="alternate" type="text/html" title="Sep 0.7.0 - CSV Escape Support" /><published>2025-01-12T00:00:00+00:00</published><updated>2025-01-12T00:00:00+00:00</updated><id>https://nietras.com/2025/01/12/sep-0.7.0</id><content type="html" xml:base="https://nietras.com/2025/01/12/sep-0-7-0/"><![CDATA[<p>Sep 0.7.0 has just been released with the following notable changes:</p>

<ul>
  <li>🎁 Add <code class="language-plaintext highlighter-rouge">SepWriterOptions.Escape</code> for escape support</li>
  <li>🎛️ Add <code class="language-plaintext highlighter-rouge">SepWriterOptions.DisableColCountCheck/ColNotSetOption</code></li>
</ul>

<p>See <a href="https://github.com/nietras/Sep/releases/tag/v0.7.0">v0.7.0 release</a> for all
changes and <a href="https://github.com/nietras/Sep">Sep README on GitHub</a> for full
details. Below I’ll briefly go over these changes. Escape support is one of the
last major features that was missing in Sep compared to other libraries.</p>

<h2 id="escape-support">Escape Support</h2>

<p>Sep now supports escaping by the <code class="language-plaintext highlighter-rouge">Escape</code> property on <code class="language-plaintext highlighter-rouge">SepWriterOptions</code> which
can be set like:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="k">using</span> <span class="nn">var</span> <span class="n">writer</span> <span class="p">=</span> <span class="n">Sep</span><span class="p">.</span><span class="nf">Writer</span><span class="p">(</span><span class="n">o</span> <span class="p">=&gt;</span> 
    <span class="n">o</span> <span class="n">with</span> <span class="p">{</span> <span class="n">Escape</span> <span class="p">=</span> <span class="k">true</span> <span class="p">}).</span><span class="nf">ToText</span><span class="p">();</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The result of escaping is shown below in comparison to other popular CSV
libraries. All basically do the same, except CsvHelper which also escapes spaces
despite this not being necessary.</p>

<table>
  <thead>
    <tr>
      <th>Input</th>
      <th>CsvHelper</th>
      <th>Sylvan</th>
      <th>Sep¹</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·</code></td>
      <td><code class="language-plaintext highlighter-rouge">"·"</code></td>
      <td><code class="language-plaintext highlighter-rouge">·</code></td>
      <td><code class="language-plaintext highlighter-rouge">·</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">;</code></td>
      <td><code class="language-plaintext highlighter-rouge">";"</code></td>
      <td><code class="language-plaintext highlighter-rouge">";"</code></td>
      <td><code class="language-plaintext highlighter-rouge">";"</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">,</code></td>
      <td><code class="language-plaintext highlighter-rouge">,</code></td>
      <td><code class="language-plaintext highlighter-rouge">,</code></td>
      <td><code class="language-plaintext highlighter-rouge">,</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"</code></td>
      <td><code class="language-plaintext highlighter-rouge">""""</code></td>
      <td><code class="language-plaintext highlighter-rouge">""""</code></td>
      <td><code class="language-plaintext highlighter-rouge">""""</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">\r</code></td>
      <td><code class="language-plaintext highlighter-rouge">"\r"</code></td>
      <td><code class="language-plaintext highlighter-rouge">"\r"</code></td>
      <td><code class="language-plaintext highlighter-rouge">"\r"</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">\n</code></td>
      <td><code class="language-plaintext highlighter-rouge">"\n"</code></td>
      <td><code class="language-plaintext highlighter-rouge">"\n"</code></td>
      <td><code class="language-plaintext highlighter-rouge">"\n"</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a"aa"aaa</code></td>
      <td><code class="language-plaintext highlighter-rouge">"a""aa""aaa"</code></td>
      <td><code class="language-plaintext highlighter-rouge">"a""aa""aaa"</code></td>
      <td><code class="language-plaintext highlighter-rouge">"a""aa""aaa"</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a;aa;aaa</code></td>
      <td><code class="language-plaintext highlighter-rouge">"a;aa;aaa"</code></td>
      <td><code class="language-plaintext highlighter-rouge">"a;aa;aaa"</code></td>
      <td><code class="language-plaintext highlighter-rouge">"a;aa;aaa"</code></td>
    </tr>
  </tbody>
</table>

<p>Separator/delimiter is set to semi-colon <code class="language-plaintext highlighter-rouge">;</code> (default for Sep)</p>

<p><code class="language-plaintext highlighter-rouge">·</code> (middle dot) is whitespace to make this visible</p>

<p><code class="language-plaintext highlighter-rouge">\r</code>, <code class="language-plaintext highlighter-rouge">\n</code> are carriage return and line feed special characters to make these
visible</p>

<p>¹ Sep with <code class="language-plaintext highlighter-rouge">Escape = true</code> in <code class="language-plaintext highlighter-rouge">SepWriterOptions</code></p>

<h2 id="sepwriteroptionsdisablecolcountcheckcolnotsetoption">SepWriterOptions.DisableColCountCheck/ColNotSetOption</h2>

<p>These new options are for allowing writing CSV files with different columns per
row and defining how to handle columns not set. Below is just one example of how
this could be used.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
</pre></td><td class="rouge-code"><pre><span class="kt">var</span> <span class="n">options</span> <span class="p">=</span> <span class="k">new</span> <span class="n">SepWriterOptions</span>
<span class="p">{</span>
    <span class="n">WriteHeader</span> <span class="p">=</span> <span class="k">false</span><span class="p">,</span>
    <span class="n">DisableColCountCheck</span> <span class="p">=</span> <span class="k">true</span><span class="p">,</span>
    <span class="n">ColNotSetOption</span> <span class="p">=</span> <span class="n">SepColNotSetOption</span><span class="p">.</span><span class="n">Skip</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">using</span> <span class="nn">var</span> <span class="n">writer</span> <span class="p">=</span> <span class="n">options</span><span class="p">.</span><span class="nf">ToText</span><span class="p">();</span>
<span class="p">{</span>
    <span class="k">using</span> <span class="nn">var</span> <span class="n">row</span> <span class="p">=</span> <span class="n">writer</span><span class="p">.</span><span class="nf">NewRow</span><span class="p">();</span>
    <span class="n">row</span><span class="p">[</span><span class="s">"A"</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R1C1"</span><span class="p">);</span>
    <span class="n">row</span><span class="p">[</span><span class="s">"B"</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R1C2"</span><span class="p">);</span>

<span class="p">}</span>
<span class="p">{</span>
    <span class="k">using</span> <span class="nn">var</span> <span class="n">row</span> <span class="p">=</span> <span class="n">writer</span><span class="p">.</span><span class="nf">NewRow</span><span class="p">();</span>
    <span class="n">row</span><span class="p">[</span><span class="m">0</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R2C1"</span><span class="p">);</span>
    <span class="n">row</span><span class="p">[</span><span class="m">1</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R2C2"</span><span class="p">);</span>
    <span class="n">row</span><span class="p">[</span><span class="m">2</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R2C3"</span><span class="p">);</span>
    <span class="n">row</span><span class="p">[</span><span class="m">3</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R2C4"</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">{</span>
    <span class="k">using</span> <span class="nn">var</span> <span class="n">row</span> <span class="p">=</span> <span class="n">writer</span><span class="p">.</span><span class="nf">NewRow</span><span class="p">();</span>
    <span class="n">row</span><span class="p">[</span><span class="s">"A"</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R3C1"</span><span class="p">);</span>
    <span class="n">row</span><span class="p">[</span><span class="m">2</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R3C3"</span><span class="p">);</span>
    <span class="n">row</span><span class="p">[</span><span class="m">1</span><span class="p">].</span><span class="nf">Set</span><span class="p">(</span><span class="s">"R3C2"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">var</span> <span class="n">expected</span> <span class="p">=</span>
<span class="s">@"R1C1;R1C2
R2C1;R2C2;R2C3;R2C4
R3C1;R3C2;R3C3
"</span><span class="p">;</span>
<span class="n">Assert</span><span class="p">.</span><span class="nf">AreEqual</span><span class="p">(</span><span class="n">expected</span><span class="p">,</span> <span class="n">writer</span><span class="p">.</span><span class="nf">ToString</span><span class="p">());</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Note how that each row has different number of columns and that any column not
set is skipped. There is also an option for writing an empty column if not set,
for example.</p>

<p>For more examples see tests on <a href="https://github.com/nietras/Sep">GitHub</a>.</p>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Sep 0.7.0 has just been released with the following notable changes:]]></summary></entry><entry><title type="html">Sep 0.6.0 - CSV Trim Support, .NET 9 and New Benchmarks incl. Apple M1</title><link href="https://nietras.com/2024/12/07/sep-0-6-0/" rel="alternate" type="text/html" title="Sep 0.6.0 - CSV Trim Support, .NET 9 and New Benchmarks incl. Apple M1" /><published>2024-12-07T00:00:00+00:00</published><updated>2024-12-07T00:00:00+00:00</updated><id>https://nietras.com/2024/12/07/sep-0.6.0</id><content type="html" xml:base="https://nietras.com/2024/12/07/sep-0-6-0/"><![CDATA[<p>It’s been a while since the last update on Sep, but recently 0.6.0 was released
with the following notable changes:</p>

<ul>
  <li>🐛 Bug fix to <code class="language-plaintext highlighter-rouge">SepWriter</code> when selecting multiple columns by indices</li>
  <li>✂️ <code class="language-plaintext highlighter-rouge">SepReader</code> trim support</li>
  <li>🤖 .NET 9 ready</li>
  <li>🚀 Updated and new benchmarks incl. Apple M1</li>
</ul>

<p>See <a href="https://github.com/nietras/Sep/releases/tag/v0.6.0">v0.6.0 release</a> for all
changes and <a href="https://github.com/nietras/Sep">Sep README on GitHub</a> for full
details. Below I’ll go over the notable changes briefly. Keep reading for perf
numbers!</p>

<h2 id="bug-fix-to-sepwriter">Bug Fix to SepWriter</h2>

<p>Yet another reminder that great code coverage (Sep has ~100% now) does not
preclude any bugs… a functioning brain should. However, sometimes a usually
reasonably well functioning brain has a day off and in that case wrote a test
wrong which meant a bug snuck into <code class="language-plaintext highlighter-rouge">SepWriter.Row</code> when selecting multiple
columns by indices e.g. for:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="kt">var</span> <span class="n">cols</span> <span class="p">=</span> <span class="n">row</span><span class="p">[</span><span class="m">3</span><span class="p">,</span> <span class="m">2</span><span class="p">];</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>one would expect columns at indices 3 and 2 to be selected, but instead indices
0 and 1 were selected. A simple mistake fixed by using <code class="language-plaintext highlighter-rouge">colIndices[i]</code> instead
of just <code class="language-plaintext highlighter-rouge">i</code>. I would guess no one have actually used this, or hope. At least
there’s been no such usage at work.</p>

<h2 id="trim-support">Trim Support</h2>

<p>Sep now supports trimming by the
<a href="https://github.com/nietras/Sep/tree/main/src/Sep/SepTrim.cs"><code class="language-plaintext highlighter-rouge">SepTrim</code></a> flags
enum, which has two options as documented in the code. To enable trimming set
the option like below:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="k">using</span> <span class="nn">var</span> <span class="n">reader</span> <span class="p">=</span> <span class="n">Sep</span><span class="p">.</span><span class="nf">Reader</span><span class="p">(</span><span class="n">o</span> <span class="p">=&gt;</span> 
    <span class="n">o</span> <span class="n">with</span> <span class="p">{</span> <span class="n">Trim</span> <span class="p">=</span> <span class="n">SepTrim</span><span class="p">.</span><span class="n">All</span> <span class="p">}).</span><span class="nf">FromText</span><span class="p">(</span><span class="n">text</span><span class="p">);</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Below the result of both trimming and unescaping is shown in comparison to
<a href="https://joshclose.github.io/CsvHelper/">CsvHelper</a>. Note unescaping is enabled
for all results shown. It is possible to trim without unescaping too, of course.</p>

<p>As can be seen Sep supports a simple principle of trimming <em>before</em> and <em>after</em>
unescaping with trimming before unescaping being important for unescaping if
there is a starting quote after spaces.</p>

<table>
  <thead>
    <tr>
      <th>Input</th>
      <th>CsvHelper Trim</th>
      <th>CsvHelper InsideQuotes</th>
      <th>CsvHelper All¹</th>
      <th>Sep Outer</th>
      <th>Sep AfterUnescape</th>
      <th>Sep All²</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·a·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"a"</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"·a"</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"a·"</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"·a·"</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"·a·a·"</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·"a"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·"a"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">"a"</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·"·a"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·"·a"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">"·a"</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·"a·"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">·"a·"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">"a·"</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·"·a·"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">·"·a·"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">"·a·"</code></td>
      <td><code class="language-plaintext highlighter-rouge">a</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">·"·a·a·"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">·"·a·a·"·</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
      <td><code class="language-plaintext highlighter-rouge">·a·a·</code></td>
      <td><code class="language-plaintext highlighter-rouge">"·a·a·"</code></td>
      <td><code class="language-plaintext highlighter-rouge">a·a</code></td>
    </tr>
  </tbody>
</table>

<p><code class="language-plaintext highlighter-rouge">·</code> (middle dot) is whitespace to make this visible</p>

<p>¹ CsvHelper with <code class="language-plaintext highlighter-rouge">TrimOptions.Trim | TrimOptions.InsideQuotes</code></p>

<p>² Sep with <code class="language-plaintext highlighter-rouge">SepTrim.All = SepTrim.Outer | SepTrim.AfterUnescape</code> in
<code class="language-plaintext highlighter-rouge">SepReaderOptions</code></p>

<p>Trimming has a cost, of course, benchmarks below show this for <code class="language-plaintext highlighter-rouge">AMD Ryzen 9
5950X</code> when accessing the column <code class="language-plaintext highlighter-rouge">Span</code> for the package assets benchmark where
most (but not all) columns have been prefixed and suffixed with <code class="language-plaintext highlighter-rouge">·"·</code>, which
then needs to be removed. For details on benchmarks see Sep on GitHub. As the
numbers show Sep is about 11.20 / 1.44 = <strong>7.78x</strong> to 11.07 / 1.69 = <strong>6.54x</strong>
faster than CsvHelper for this scenario. Sylvan does not appear to support
trimming.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre>| Method                     | Scope | Rows  | Mean      | Ratio | MB | MB/s   | ns/row | Allocated | Alloc Ratio |
|--------------------------- |------ |------ |----------:|------:|---:|-------:|-------:|----------:|------------:|
| Sep_                       | Cols  | 50000 |  8.599 ms |  1.00 | 41 | 4857.5 |  172.0 |   1.04 KB |        1.00 |
| Sep_Trim                   | Cols  | 50000 | 12.402 ms |  1.44 | 41 | 3368.0 |  248.0 |   1.05 KB |        1.01 |
| Sep_TrimUnescape           | Cols  | 50000 | 13.201 ms |  1.54 | 41 | 3164.1 |  264.0 |   1.06 KB |        1.02 |
| Sep_TrimUnescapeTrim       | Cols  | 50000 | 14.568 ms |  1.69 | 41 | 2867.3 |  291.4 |   1.07 KB |        1.02 |
| CsvHelper_TrimUnescape     | Cols  | 50000 | 96.272 ms | 11.20 | 41 |  433.9 | 1925.4 | 451.52 KB |      432.51 |
| CsvHelper_TrimUnescapeTrim | Cols  | 50000 | 95.183 ms | 11.07 | 41 |  438.8 | 1903.7 | 445.86 KB |      427.09 |
</pre></td></tr></tbody></table></code></pre></div></div>

<p>These benchmarks were run using .NET 9, but there is no big difference to .NET
8, which brings us to Sep and .NET 9.</p>

<h2 id="net-9-ready">.NET 9 Ready</h2>

<p>Sep now uses the latest .NET 9 SDK for development and besides still targeting
<code class="language-plaintext highlighter-rouge">net7.0</code> and <code class="language-plaintext highlighter-rouge">net8.0</code>, it now also targets <code class="language-plaintext highlighter-rouge">net9.0</code>. The main reason being to
add <code class="language-plaintext highlighter-rouge">allows ref struct</code> annotations and support <code class="language-plaintext highlighter-rouge">params ReadOnlySpan&lt;&gt;</code>.</p>

<p>The former means <code class="language-plaintext highlighter-rouge">SepReader</code> now for .NET 9+ actually implements
<code class="language-plaintext highlighter-rouge">IEnumerable&lt;SepReader.Row&gt;</code>. However, this isn’t particularly useful yet since
.NET apparently hasn’t updated LINQ extension methods to have <code class="language-plaintext highlighter-rouge">allows ref
struct</code> for <code class="language-plaintext highlighter-rouge">TSource</code> in any such methods. Someday perhaps. 🤷</p>

<p>The latter means for most indexing operations <code class="language-plaintext highlighter-rouge">params</code> is supported and one can
write e.g. <code class="language-plaintext highlighter-rouge">row[1, 2, 3]</code> or <code class="language-plaintext highlighter-rouge">row["A", "B", "C"]</code>. Straight-forward and without
allocation since the <code class="language-plaintext highlighter-rouge">params</code> is <code class="language-plaintext highlighter-rouge">ReadOnlySpan&lt;int&gt;</code> or <code class="language-plaintext highlighter-rouge">ReadOnlySpan&lt;string&gt;</code>
and backed by stack allocated storage.</p>

<p>For more details on the API and usage, please refer to the
<a href="https://github.com/nietras/Sep">README</a>.</p>

<h2 id="updated-and-new-benchmarks">Updated and New Benchmarks</h2>

<p>As part of updating for .NET 9 all benchmarks have been re-run and a new set of
benchmark “machines” has been used as listed below. <code class="language-plaintext highlighter-rouge">(Virtual)</code> below means this
machine is actually a GitHub CI agent machine, hence, it is subject to noisy
neighbors and is only a subset of the cores of any full the CPU.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">AMD EPYC 7763</code> (Virtual) X64 Platform Information
    <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="py">OS</span><span class="p">=</span><span class="s">Ubuntu 22.04.5 LTS (Jammy Jellyfish)</span>
<span class="err">AMD</span> <span class="err">EPYC</span> <span class="err">7763,</span> <span class="err">1</span> <span class="err">CPU,</span> <span class="err">4</span> <span class="err">logical</span> <span class="err">and</span> <span class="err">2</span> <span class="err">physical</span> <span class="err">cores</span>
</pre></td></tr></tbody></table></code></pre></div>    </div>
  </li>
  <li><code class="language-plaintext highlighter-rouge">AMD Ryzen 7 PRO 7840U</code> (Laptop on battery) X64 Platform Information
    <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre><span class="py">OS</span><span class="p">=</span><span class="s">Windows 11 (10.0.22631.4460/23H2/2023Update/SunValley3)</span>
<span class="err">AMD</span> <span class="err">Ryzen</span> <span class="err">7</span> <span class="err">PRO</span> <span class="err">7840U</span> <span class="err">w/</span> <span class="err">Radeon</span> <span class="err">780M</span> <span class="err">Graphics,</span> 
<span class="err">1</span> <span class="err">CPU,</span> <span class="err">16</span> <span class="err">logical</span> <span class="err">and</span> <span class="err">8</span> <span class="err">physical</span> <span class="err">cores</span>
</pre></td></tr></tbody></table></code></pre></div>    </div>
  </li>
  <li><code class="language-plaintext highlighter-rouge">AMD 5950X</code> (Desktop) X64 Platform Information
    <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="py">OS</span><span class="p">=</span><span class="s">Windows 10 (10.0.19044.2846/21H2/November2021Update)</span>
<span class="err">AMD</span> <span class="err">Ryzen</span> <span class="err">9</span> <span class="err">5950X,</span> <span class="err">1</span> <span class="err">CPU,</span> <span class="err">32</span> <span class="err">logical</span> <span class="err">and</span> <span class="err">16</span> <span class="err">physical</span> <span class="err">cores</span>
</pre></td></tr></tbody></table></code></pre></div>    </div>
  </li>
  <li><code class="language-plaintext highlighter-rouge">Apple M1</code> (Virtual) ARM64 Platform Information
    <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="py">OS</span><span class="p">=</span><span class="s">macOS Sonoma 14.7.1 (23H222) [Darwin 23.6.0]</span>
<span class="err">Apple</span> <span class="err">M1</span> <span class="err">(Virtual),</span> <span class="err">1</span> <span class="err">CPU,</span> <span class="err">3</span> <span class="err">logical</span> <span class="err">and</span> <span class="err">3</span> <span class="err">physical</span> <span class="err">cores</span>
</pre></td></tr></tbody></table></code></pre></div>    </div>
  </li>
</ul>

<p>This means the previous Neoverse M1 ARM64 processor benchmarks have been
replaced with the Apple M1 processor. Results for this on the floats benchmark
show Sep is ~3x-7x faster than others from low level to top level, as seen
below. At the lowest level of just parsing the CSV e.g. the rows, Sep hits ~5
GB/s vs around 1 GB/s for others.</p>

<p><strong>Apple M1 Floats Benchmarks</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="rouge-code"><pre>| Method    | Scope  | Rows  | Mean       | Ratio | MB | MB/s   | ns/row | Allocated   | Alloc Ratio |
|---------- |------- |------ |-----------:|------:|---:|-------:|-------:|------------:|------------:|
| Sep______ | Row    | 25000 |   3.887 ms |  1.00 | 20 | 5215.5 |  155.5 |      1.2 KB |        1.00 |
| Sylvan___ | Row    | 25000 |  17.956 ms |  4.62 | 20 | 1129.0 |  718.2 |    10.62 KB |        8.87 |
| ReadLine_ | Row    | 25000 |  14.074 ms |  3.62 | 20 | 1440.4 |  563.0 | 73489.65 KB |   61,381.24 |
| CsvHelper | Row    | 25000 |  27.741 ms |  7.14 | 20 |  730.8 | 1109.6 |    20.28 KB |       16.94 |
|           |        |       |            |       |    |        |        |             |             |
| Sep______ | Cols   | 25000 |   4.726 ms |  1.00 | 20 | 4289.0 |  189.1 |      1.2 KB |        1.00 |
| Sylvan___ | Cols   | 25000 |  20.241 ms |  4.28 | 20 | 1001.5 |  809.6 |    10.62 KB |        8.84 |
| ReadLine_ | Cols   | 25000 |  14.976 ms |  3.17 | 20 | 1353.6 |  599.0 | 73489.65 KB |   61,181.63 |
| CsvHelper | Cols   | 25000 |  29.842 ms |  6.31 | 20 |  679.3 | 1193.7 |  21340.5 KB |   17,766.40 |
|           |        |       |            |       |    |        |        |             |             |
| Sep______ | Floats | 25000 |  24.511 ms |  1.00 | 20 |  827.1 |  980.4 |     8.34 KB |        1.00 |
| Sep_MT___ | Floats | 25000 |   9.422 ms |  0.38 | 20 | 2151.5 |  376.9 |    79.89 KB |        9.58 |
| Sylvan___ | Floats | 25000 |  69.902 ms |  2.85 | 20 |  290.0 | 2796.1 |    18.57 KB |        2.23 |
| ReadLine_ | Floats | 25000 |  79.015 ms |  3.22 | 20 |  256.6 | 3160.6 |  73493.2 KB |    8,816.43 |
| CsvHelper | Floats | 25000 | 104.811 ms |  4.28 | 20 |  193.4 | 4192.4 | 22063.34 KB |    2,646.77 |
</pre></td></tr></tbody></table></code></pre></div></div>

<h3 id="net-9-performance-and-datas-issue">.NET 9 Performance and DATAS Issue</h3>

<p>If you are wondering whether .NET 9 provides any significant performance
improvements over .NET 8 for Sep, then no. Some minor within 5-10% improvements
have been observed, but also minor regressions. This is expected as Sep has been
thoroughly optimized to get as good as possible machine code as possible, so
while JIT improvements can improve this, there is not much left on the table.</p>

<p>However, as detailed in <a href="https://github.com/dotnet/runtime/issues/109047">.NET 8.0.10 vs 9.0.0 RC2 GC Server Performance
Regression in Sep (CSV Parser) Benchmark (due to DATAS
default)</a> the GC has switched
to enable <a href="https://maoni0.medium.com/dynamically-adapting-to-application-sizes-2d72fcb6f1ea">DATAS (Dynamically Adapting To Application Sizes) by default when
using Server GC in .NET
9</a>.</p>

<p>Generally, this means the GC is more aggressive with regards to running garbage
collections. However, for bursty workloads like Sep’s CSV parsing and the
package assets benchmark where 1 million instances of a parsed package asset
rows is accumulated, this can hurt performance by up to 1.7x (for parallel
enumeration) in my testing. Hence, for Sep benchmarks using GC Server mode,
DATAS has been disabled, and I would recommend doing the same if you have lots
of RAM, your own machine and are running machine learning pipelines like we are.</p>

<p>That’s all!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[It’s been a while since the last update on Sep, but recently 0.6.0 was released with the following notable changes:]]></summary></entry></feed>