# Section 6 — Measured Pause Characteristics: Methodology and Data > Part of the FablePool research survey. Cross-references: §2 (what each GC type does), §4 (modes/settings), §5 (regions), §7 (allocation sources), §8 (mitigations). Citation keys (`[REF-nn]`) resolve in §9. **Honesty note on data provenance.** This section provides (a) a fully specified, reproducible **measurement methodology** with working harness code, and (b) **indicative pause ranges** synthesized from published measurements by the .NET GC team and practitioners [REF-02][REF-05][REF-11][REF-15][REF-16] plus first-principles cost models. Absolute numbers vary with CPU generation, memory bandwidth, heap count, survivor volume, and OS configuration. Every figure below is labeled with its driver so you can re-derive it; **run the harness on your own production-identical hardware before relying on any specific number.** That qualification run is itself a deliverable of FablePool Milestone 2. --- ## 6.1 What "a pause" is, precisely For latency engineering, the only number that matters is the **stop-the-world (STW) window**: the interval during which *your* threads cannot run managed code. Define it from runtime events: ``` pause = t(GCRestartEEEnd) − t(GCSuspendEEBegin) ``` - A **blocking GC** (all gen0/gen1; non-concurrent gen2) has **one** STW window covering suspension + mark + plan + relocate/compact + restart. - A **background GC (BGC)** of gen2 has **two short STW windows** (initial mark setup; sweep handoff) plus long *concurrent* phases that consume CPU and memory bandwidth but do not stop threads. BGC also typically triggers ephemeral GCs that have their own STW windows (§4.4). - **Suspension itself** is part of the pause: the runtime asks every managed thread to reach a safepoint (via return-address hijacking or polling). Suspension latency grows with thread count and is tail-dominated by threads slow to reach a safepoint. **Do not** measure GC cost with `Stopwatch` around `GC.Collect()` — induced GCs have different code paths and you'll miss suspension behavior under real thread loads. Use runtime events. ## 6.2 Measurement methodology ### 6.2.1 Out-of-process (production-grade): ETW / EventPipe Preferred for production because it adds no allocation or contention inside the process. - **Windows:** ETW provider `Microsoft-Windows-DotNETRuntime`, keyword `GC (0x1)`, Informational level. PerfView's *GCStats* view computes per-GC pause, "% pause time", "Max Suspend Msec" automatically [REF-15]. - **Cross-platform:** `dotnet-trace collect -p --profile gc-collect`, analyze with PerfView/TraceEvent or `dotnet-trace report`. - Relevant events: `GCStart_V2`, `GCEnd_V2`, `GCSuspendEEBegin_V1`, `GCRestartEEEnd_V1`, `GCHeapStats_V2`, `GCAllocationTick_V4` (per ~100 KB allocated — also your allocation-rate meter for §7 audits). ### 6.2.2 In-process: `EventListener` harness Useful in qualification rigs where you want pauses correlated with order-flow timestamps in the same log. Caveat: `EventListener` dispatch itself allocates; keep it on a non-critical thread and accept that it perturbs the measurement slightly (or use it only in qualification, never production). ```csharp // GcPauseListener.cs — .NET 7+. Records every STW window into an HdrHistogram. using System.Diagnostics.Tracing; using HdrHistogram; public sealed class GcPauseListener : EventListener { private readonly LongHistogram _pausesMicros = new(highestTrackableValue: 10_000_000, numberOfSignificantValueDigits: 3); private long _suspendStartTicks; private int _currentGen = -1; private readonly LongHistogram[] _byGen = Enumerable.Range(0, 3) .Select(_ => new LongHistogram(10_000_000, 3)).ToArray(); protected override void OnEventSourceCreated(EventSource source) { if (source.Name == "Microsoft-Windows-DotNETRuntime") EnableEvents(source, EventLevel.Informational, (EventKeywords)0x1 /* GC */); } protected override void OnEventWritten(EventWrittenEventArgs e) { switch (e.EventName) { case "GCStart_V2": _currentGen = Convert.ToInt32(e.Payload![1]); // "Depth" = condemned gen break; case "GCSuspendEEBegin_V1": Volatile.Write(ref _suspendStartTicks, Stopwatch.GetTimestamp()); break; case "GCRestartEEEnd_V1": long start = Volatile.Read(ref _suspendStartTicks); if (start == 0) return; long micros = (Stopwatch.GetTimestamp() - start) * 1_000_000 / Stopwatch.Frequency; _pausesMicros.RecordValue(Math.Max(1, micros)); if ((uint)_currentGen < 3) _byGen[_currentGen].RecordValue(Math.Max(1, micros)); break; } } public void Dump(TextWriter w) { w.WriteLine($"All pauses p50={_pausesMicros.GetValueAtPercentile(50)}us " + $"p99={_pausesMicros.GetValueAtPercentile(99)}us " + $"p99.9={_pausesMicros.GetValueAtPercentile(99.9)}us " + $"max={_pausesMicros.GetMaxValue()}us n={_pausesMicros.TotalCount}"); for (int g = 0; g < 3; g++) w.WriteLine($"gen{g}: p50={_byGen[g].GetValueAtPercentile(50)}us " + $"p99.9={_byGen[g].GetValueAtPercentile(99.9)}us " + $"max={_byGen[g].GetMaxValue()}us n={_byGen[g].TotalCount}"); } } ``` Supporting APIs worth wiring into telemetry regardless of method: - `GC.GetTotalPauseDuration()` (.NET 7+) — cumulative STW time since process start; cheap to poll per second and difference. - `GC.CollectionCount(g)` — per-generation GC counts; cadence sanity check. - `GC.GetGCMemoryInfo()` — per-GC `PauseDurations`, promoted bytes, fragmentation, heap size after. > ⚠ Payload index for `Depth` in `GCStart_V2` and the exact event/field names should be verified against your runtime's event manifest — these are stable in practice but are runtime implementation details [REF-15]. ### 6.2.3 Measurement discipline 1. **Pin the harness configuration**: same GC mode, heap count, `TieredCompilation` settings, core isolation, and NUMA layout as production. Tiered JIT recompilation early in a run produces non-GC pauses that pollute the first minutes — discard a warm-up window or disable tiering for qualification. 2. **Drive realistic allocation**: replay captured market data through the actual codec/strategy path. Synthetic uniform allocation understates survivor-driven pauses (§6.3). 3. **Report percentiles from histograms (HdrHistogram), never averages.** A mean gen0 pause is meaningless; HFT risk lives at p99.9–max. 4. **Avoid coordinated omission** [REF-16]: if you measure *request* latency, issue load on a fixed schedule and charge a stalled interval its full intended-start-to-completion time; otherwise a 40 ms gen2 pause hides as one slow sample instead of ~40 ms of queued misery. 5. **Record run metadata** (runtime version, GC env vars, `GC.GetGCMemoryInfo` snapshots) alongside histograms; pause data without configuration is unactionable. ## 6.3 Cost model: what each pause is proportional to | GC type | STW work is proportional to… | NOT proportional to… | |---|---|---| | Gen0 | Bytes/objects that **survive** gen0 (copy cost) + stack/handle root scan + suspension | Bytes allocated in gen0 (dead objects are free) | | Gen1 | Survivors of gen0+gen1 + dirty-card scan of gen2→ephemeral references | Total heap size | | Gen2 blocking | **Entire live set** (mark) + relocation if compacting | Dead bytes | | BGC STW windows | Root scan + bookkeeping (small, bounded) | Live set (that cost moved to concurrent phases) | | LOH (with gen2) | Free-list sweep; compaction only if requested (§3.3) | — | | Suspension | Thread count; worst single thread's distance to a safepoint | Heap size | Two derived quantities you should compute for your own system: **Gen0 frequency** ≈ allocation rate ÷ gen0 budget (per heap). Example: 200 MB/s steady allocation against a 64 MB effective gen0 budget ⇒ ~3 gen0 GCs/sec ⇒ at 150 µs each, ~0.05 % pause time — *but* a ~3 Hz chance that any given market event lands inside a pause window. Frequency, not just duration, is the HFT risk metric. **Gen2 blocking pause** ≈ live-set GB × (mark+compact throughput)⁻¹. Practitioner-reported effective throughputs cluster around **tens of ms per GB of live data for mark-only**, and substantially more when compacting with heavy reference density [REF-02][REF-05]. A 4 GB live set can plausibly cost 100–400+ ms blocking. This is the number that ends an HFT career; the entire architecture must ensure it never happens intraday (§8, strategy S-1/S-7). ## 6.4 Indicative pause ranges (64-bit, Server GC, modern x86 server, .NET 7/8) All values are **synthesized indicative ranges** per the provenance note above. "Typical" ≈ p50–p90 of a well-behaved service; "tail" ≈ p99.9-to-max pathologies and their causes. | Event | Typical | Tail | Tail drivers | |---|---|---|---| | Thread suspension (component of every pause) | 10–100 µs | 1–10+ ms | Many threads; a thread in a long non-interruptible stretch; OS descheduling of a to-be-suspended thread on a contended core | | Gen0 GC | 50 µs – 1 ms | 2–10 ms | High survivor volume (mid-life crisis objects, §7.9); heavy pinning on ≤ .NET 6 segments (§5.3); large stacks/handle tables to scan | | Gen1 GC | 100 µs – 3 ms | 5–20 ms | Big gen1 survivor waves; very large gen2 dirty-card surface | | Gen2 **blocking** (incl. compaction) | 10–100 ms per GB live | 100 ms – multiple seconds | Large live sets; LOH compaction requested; high reference density | | BGC: each of the two STW windows | 0.5–5 ms | 10–20 ms | Root volume; suspension tail as above | | BGC: total wall-clock (concurrent) | 100 ms – seconds | — | Live-set size; competes for CPU/memory bandwidth — raises *jitter*, not pauses, on saturated cores (§4.4) | | Ephemeral GCs forced during BGC | as gen0/gen1 above | + alloc-wait stalls | Allocation racing the concurrent sweep | | LOH allocation triggering gen2/BGC | n/a (it's the trigger) | inherits gen2/BGC cost | Any ≥ 85,000 B allocation on the hot path (§3.2, §7.7) | Reference points from public sources, for calibration: - The .NET GC team's design targets put **ephemeral GC pauses in the sub-millisecond-to-few-millisecond band** for healthy workloads, with gen0 commonly well under 1 ms on server hardware [REF-02][REF-11]. - Kokosa's measurements and the PerfView GCStats corpus show **full blocking gen2 on multi-GB live sets ranging from tens of ms into seconds**, dominated by live-set mark/copy [REF-05][REF-15]. - ASP.NET/TechEmpower-class telemetry repeatedly demonstrates that **allocation-rate reduction shifts the entire pause histogram left** more reliably than any GC-mode change — the empirical underpinning of §7/§8's priority ordering [REF-04][REF-17]. ## 6.5 Worked scenario profiles To make the ranges concrete, three configurations of the same hypothetical market-data + order-entry process (8 isolated cores, 4 GC heaps, 1.5 GB live set), with expected histogram shapes derived from the §6.3 model: **Profile A — naïve idiomatic C#** (LINQ on hot path, string-keyed lookups, async per message; ~400 MB/s allocation, frequent 100 KB+ buffers): - gen0 every ~150 ms; survivor-heavy ⇒ gen0 p50 ~0.4 ms, p99.9 ~5 ms. - LOH churn drags gen2/BGC every few minutes; BGC STW pairs ~2–4 ms; occasional blocking gen2 if budgets misfire ⇒ **max pause tens-to-hundreds of ms**. Unacceptable. **Profile B — settings-only tuning** (Server GC, `SustainedLowLatency`, BGC, bigger gen0 budget; same code): - gen0 cadence drops ~3–5×; gen2 becomes background-only intraday ⇒ max pause now the BGC STW pair + ephemeral tail, **single-digit ms p-max**, but with BGC CPU-bandwidth jitter on saturated cores. Better; still unfit for single-digit-µs tick-to-trade. **Profile C — zero-alloc hot path** (§8 disciplines: pooled buffers/POH, struct messages, no LINQ/closures/boxing/async on hot path; allocation ~0 intraday, gen2 forced at session boundaries): - **No intraday GCs at all** in steady state; pause histogram over the trading day is empty except the warm-up window. Residual jitter is OS/scheduler territory, not GC. This is the production pattern reported by low-latency .NET shops [REF-18][REF-19] and the target state for FablePool's later milestones. The qualitative gap between B and C — *fewer, smaller pauses* vs *no pauses* — is the central empirical finding of this survey. ## 6.6 Section conclusions 1. Measure pauses as **SuspendEE→RestartEE windows from runtime events**, percentile-ranked via HdrHistogram, with coordinated omission handled; everything else is folklore. 2. Pause cost follows **survivors and live set, not allocation volume** — so the two levers are (a) allocate less, (b) keep what survives small and stable. 3. Ephemeral pauses on healthy modern .NET are **sub-millisecond typical with multi-ms tails**; blocking gen2 on real live sets is **tens-to-hundreds of ms** and must be architecturally excluded from trading hours, not merely made rare. 4. Tuning settings (Profile B) buys roughly an order of magnitude; **only allocation discipline (Profile C) buys silence** — the quantitative justification for the mitigation hierarchy in §8. 5. All indicative figures here require **site-specific qualification** with the harness above on production-identical hardware; that run is scoped into FablePool Milestone 2. --- *Citations: [REF-02] .NET GC fundamentals docs; [REF-04] Stephen Toub performance retrospectives; [REF-05] Kokosa, "Pro .NET Memory Management"; [REF-11] Stephens, GC design notes; [REF-15] PerfView/TraceEvent GCStats documentation; [REF-16] Tene, "How NOT to Measure Latency"; [REF-17] BenchmarkDotNet/MemoryDiagnoser methodology; [REF-18][REF-19] practitioner reports from low-latency .NET shops. Full annotations in §9.*