.NET 6 vs .NET 4.8 GC stats - ~64% shorter max GC pause times (due to 1 letter removed in runtime)

Recently I tweeted about performance improvements we are seeing from migrating an industrial machine learning application (desktop) from .NET Framework 4.8 to .NET 6 which got more likes than any blog post I’ve made before. Additionally, Alex Yakunin’s blog post on .NET 6 vs .NET 5: up to 40% speedup! got some hacker news love, so I thought I’d give the crowd what it wants and turn the tweets into a quick blog post. 😉

The application in question is used for inspection of products on a conveyor belt and the sorting of these with the following characteristics:

  • Connected to a dozen cameras and processes about ~1 GB/s worth of images with ~1000 inference/s.
  • Uses digital IO to control “latches” that sort out products.
  • Sorting has to be performed with a certain precision, which means latency/jitter caused by GC pauses is an issue.

Therefore, the application is highly optimized with special focus on managed heap allocations utilizing pooling and recycling instances for all numerical processing. Special care is also taken around UI given WPFs propensity for doing lots of heap allocations. The remaining few allocations there are, are small and ephemeral. A best case scenario for the .NET garbage collector, yet we have always been seeing occasional longer pause times on .NET Framework 4.8.

Additionally, we recently switched from running inference on GPU to CPU and saw an increase in these max pause times. At the same time .NET 6 was just about to release, and we finally resolved the issues we had with porting to .NET Core/5+. We, therefore, wondered if running on .NET 6 would provide any improvement to this? It certainly did, as can be seen below.

.NET 6 vs .NET 4.8 GC stats

These are the Memory Group -> GCStats as shown in PerfView, which was also used to record GC collection events from the application running both on .NET Framework 4.8 and .NET 6 on Windows x64. In both cases data was collected for about a few days, but not exactly the same duration. Max pause times drop from 13.5 ms to 4.9 ms, which translates to (13.5-4.9)/13.5 ~= 64% shorter max pause times!

PerfView supports exporting GC events to a CSV-file. Below plots the suspend and total pause times in milliseconds (y-axis) for the same time frame in seconds (x-axis) for .NET 4.8 vs .NET 6. Worst case pause times are all gone!

.NET 6 vs .NET 4.8 GC events

Zooming in on 600 seconds of GC events provides more detail, as shown below. Not only do worst case pause times disappear, time between collections/pauses occuring (while still being shorter) increases by at least 2x. Meaning there are fewer collections. This I expected given the many performance improvements made in .NET 6 compared to .NET 4.8. I.e. less string allocations and similar. However, why did the worst case suspension time drop significantly?

.NET 6 vs .NET 4.8 GC events zoom

Maoni Stephens (.NET GC Architect on the .NET Runtime Team) was kind enough to provide a hint by pointing to the pull request https://github.com/dotnet/coreclr/pull/27578 which states:

Change PING_JIT_TIMEOUT constant from 10 milliseconds to 1 millisecond. This fixes the case where return address hijacking doesn’t work quickly, because the hijacked thread doesn’t return (e.g. because it’s in a loop doing further calls). In this case we have to retry the hijack, and changing the timeout constant makes this happen more quickly.

In short, when a garbage collection has to occur, all managed threads have to be suspended first. This is also referred to as stop-the-world (STW). In Suspending and resuming threads for GC Maoni Stephens covers how:

suspension and resumption of threads is not really part of the GC. GC calls functions to do the suspension and the resumption as a service provided in the CLR. Other components in the CLR also use this service such as the debugger implementation. But it’s true that suspending and resuming because of the GC is the most significant usage of this service.

Suspension is, however, reported as part of the total pause time when a GC occurs. For more details Matt Warren has written a great blog post about this in GC Pauses and Safe Points, so I won’t go into details here. For more info on thread hijacking you can also consult the BOTR (book of the runtime).

In any case, this single letter change, removing 0 and changing the PING_JIT_TIMEOUT timeout from 10 milliseconds to 1 millisecond, is the best performance improvement moving from .NET 4.8 to .NET 6 for our specific use case. Given the many other performance improvements and all the work put into them, this says a lot I think.

PING_JIT_TIMEOUT change

Astute readers will notice this PR was merged on November 2, 2019, and been available since .NET Core 3.1 or 5 (depending on cutoff date for 3.1). However, I’ve haven’t seen any reporting of this change anywhere and completely missed it. So as always only way to know is to measure. Always measure. 📏

Just so you know, whatever you may think, your likes and retweets are truly appreciated. I’m using my blog as a way to force myself to write more. Writing and documenting can be a thankless task. You rarely get any appreciation when you do make the effort, but always hear about when you didn’t 😅, and you are always busy. Writing in public gives an extra incentive and it’s very nice to know someone out there appreciates whatever you are doing. So thank you. ❤️

PS: Whether the single letter change is responsible for all of the improvement observed in max pause times here I don’t know, so I am perhaps extrapolating a bit 😉

2021.11.26