Zero-Allocation C#: Span<T>, ArrayPool, and SIMD in Glacier

Zero-Allocation C#: Span<T>, ArrayPool, and SIMD in Glacier Engines

Ian Cowley open-sourced Glacier.Polaris, Glacier.Grep, and Glacier.DocTree, achieving near-zero GC pressure by using structs, Span<T>, ArrayPool, and SIMD. This guide details the techniques behind processing millions of rows per second in C#.

3 min readJun 15, 2026

Zero-Allocation C#: Span<T>, ArrayPool, and SIMD in Glacier Engines

The Problem: GC Kills Throughput

Ian Cowley open-sourced three high-performance C# engines: Glacier.Polaris (DataFrame library), Glacier.Grep (text searcher), and Glacier.DocTree (semantic Markdown parser). All target zero-allocation hot paths. The core insight: every new on a reference type triggers the GC. For engines processing millions of rows per second, a GC pause is catastrophic.

Level 1: Structs Over Classes

The stack is thread-local, unwinds instantly, and never involves the GC. By using struct (value types) instead of class, you avoid heap allocation. Additionally, with [StructLayout(LayoutKind.Sequential)], you pack data into contiguous memory that aligns with CPU cache lines (64 bytes). This reduces cache misses.

Level 2: Span and Memory for Zero-Allocation Slicing

Traditional Substring() or Skip().Take() allocate new heap objects. Glacier engines use ReadOnlySpan to slice buffers without copying. A Span is a ref struct that lives on the stack and holds a pointer + length. Example:

// Zero-allocation slice
ReadOnlySpan lineSpan = &#34;Error: Connection Timeout&#34;.AsSpan();
ReadOnlySpan messageSpan = lineSpan.Slice(7); // no allocation

Because Span can't cross await boundaries, Memory is used in async pipelines. Once synchronous processing resumes, call .Span to get a zero-cost view.

Level 3: ArrayPool for Temporary Buffers

Allocating large arrays in a loop triggers Gen 0 GC. Glacier.Polaris uses System.Buffers.ArrayPool.Shared to rent and return buffers:

int[] buffer = ArrayPool.Shared.Rent(100000);
try
{
    Span workSpan = buffer.AsSpan(0, 100000);
    // process
}
finally
{
    ArrayPool.Shared.Return(buffer);
}

The GC never sees a new allocation. The pool reuses arrays from a shared pool.

Level 4: SIMD via Vector256 and MemoryMarshal

Once memory is flat and contiguous, you can use CPU vector instructions. Modern .NET provides cross-platform Vector256 without explicit intrinsics. Glacier.Polaris uses MemoryMarshal.GetReference to get an unpinned reference and feeds it into SIMD loops:

public static int SimdSum(ReadOnlySpan data)
{
    int sum = 0;
    int i = 0;
    ref int current = ref MemoryMarshal.GetReference(data);
    if (Vector256.IsHardwareAccelerated &amp;&amp; data.Length &gt;= Vector256.Count)
    {
        Vector256 vSum = Vector256.Zero;
        for (; i &lt;= data.Length - Vector256.Count; i += Vector256.Count)
        {
            Vector256 vData = Vector256.LoadUnsafe(ref current, (nuint)i);
            vSum += vData;
        }
        sum += Vector256.Sum(vSum);
    }
    for (; i &lt; data.Length; i++)
        sum += Unsafe.Add(ref current, i);
    return sum;
}

This processes 8 integers per instruction on hardware with 256-bit vectors.

Level 5: Benchmarks

Cowley shared BenchmarkDotNet results:

Method	Mean	Allocated
Standard Substring	18.45 ns	32 B
Glacier Span Slice	0.02 ns	0 B

The 0 B allocation column is the goal.

Conclusion

Identify hot paths where data flows by the gigabyte. Replace class with struct, slice with Span, rent from ArrayPool, and use SIMD via Vector256. The Glacier repositories on GitHub demonstrate these patterns in production-grade code. Study them, and you can build engines that push .NET to its limits.

Editor's Take

I've been writing C# for over a decade, and I've seen too many devs reach for unsafe code or C++ interop when they hit GC bottlenecks. This article proves you can stay in safe C# and still get C-like performance. The Span<T> + ArrayPool combo is my go-to now for any data pipeline. I only wish the Glacier libraries were around when I built my last ETL system—I would have saved weeks of profiling. The SIMD section is a gem; most devs don't realize modern .NET abstracts away the hardware differences.

— DevDigest Editorial

Key Takeaways

•Replace class with struct on hot paths to avoid heap allocation and improve cache locality.
•Use Span<T> and ReadOnlySpan<T> for zero-allocation slicing of arrays and strings.
•Rent temporary buffers from ArrayPool<T>.Shared instead of allocating new arrays in loops.

Why It Matters

For C# developers building high-throughput systems (data processing, search, real-time analytics), GC pauses are the #1 performance killer. These techniques—structs, Span<T>, ArrayPool, SIMD—are immediately applicable to any .NET project and can reduce allocation to zero on hot paths, yielding 10-100x speedups.

#performance#C#SIMD#.NET#garbage-collection

Get the weekly digest

Every Sunday - top tech stories, industry breakthroughs, and developer tools delivered to your inbox.

No spam, unsubscribe anytime.