Faster C# array access

In .Net memory access is safe by default. This means that every time you access an array item the index is checked against the length of the array. In this post I explore how you can speed up plain old array access 18%-34% by convincing the compiler it can skip some checks.

I compile the C# code down to X64 native code and analyze that, then run some benchmarks to see if my assumptions are correct.

Bounds checking

To demonstrate the bounds check, lets start off with a simple application that iterates an array:

The x64 JIT Asm output of this is:

You don’t need to understand assembly code, I will explain.

In L0013 and L0017 there is a bounds-check inside the loop. This is done by comparing the index we are accessing to the size of the array. In 64-bit mode the size of an array is stored 8 bytes into the object memory right after Method Table, so pointer address+8 or r8+8.

Proof

A cool thing about compilers is that they can optimize the code a lot. But to optimize it, they need to be able to prove that there are no side effects.

So what if compiler was able to prove that we will never go outside of the bounds of the array? Then it could skip the bounds check. Lets try.

First of, an array can’t be resized. And we already guarantee that we won’t exceed the bounds of the array in the for-loop “i < PublicArray.Length”. The problem is that we are using a shared (public) variable (not in local scope), so it could be replaced at any point (by our code/other threads/callbacks). Therefore bounds check has to be performed on every access to the array.

If we make a local copy of PublicArray then the compiler should be able to prove that we will never exceed the bounds of the array, since the array pointer can’t be updated.

Gives us the X64 Jit ASM output:

Success. Immediately we can see that we went from 18 to 12 instructions. The compiler was able to prove that we will not exceed the bounds of this array, which allowed it to skip the bounds checks.

Fixed raw pointer

There is another way to ensure no bounds check is being made, and that is by using a fixed statement and do reference arithmetic. This is unsafe, because you can read/write outside of the bounds of the array. Fixed statements has a scope, you can use GCHandle to create a long lived pin. But note that pinned objects has a cost for the garbage collector unless you are using the new Pinned Object Heap in .NET 5 or later.

Note that we are using Public.Array.Length here now. We’ll get back to optimizing that in the section on Span<T>.

Which gives us x64 JIT Asm:

Benchmark

So lets run these through some benchmarks.

We test two different array sizes, one small that fits inside L1 cache and one larger that effectively purges the cache.

1024 bytes (1KB)

Using a local copy of the variable gives 34% performance gain.

1024 MB (1GB)

On a larger array we can see that the overhead of setting up fixed pointer is negligible, and performance is slightly better than a local copy.

There is some overhead in accessing main memory, so the relative percentage is lower.

Unsafe array 20% performance gain.
Local copy array 18% performance gain.

Foreach

So far we have used for. Foreach seemingly works differently with iterators and all that. Lets look at the same method using foreach:

Compiles to 100% identical assembly code as its for counterpart.

We see the same with foreach on public array, it compiles to the same assembly code as its for counterpart and hence has bounds checks.

Span<T>

Span<T> is a structure that is stored on the stack. It can be used to access both managed and unmanaged memory.

If you do not use .Length-property of the array you are looping over, the bounds check will be present.

Compiles to:

One way to bypass this is to use Span<T>. Span<T> has the same constraints for optimization and will still have the bounds check, but you can set the bounds to the desired size and have the for-loop iterate on that. In this case the bounds only needs to be checked once.

Compiles to:

Conclusion

In certain cases the compiler can remove the bounds checks when accessing arrays. This leads to around 18%-34% performance increase in accessing the array itself. Alternatively you could do unsafe reference arithmetic using fixed statement and achieve the same performance.

Condition: You access a local variable (pointing to the array) using a for-statement that caps at the arrays length.

Bonus: You can create these conditions using Span<T>.

So this:

Is 18%-34% faster than this:

On larger arrays accessing memory directly as pointer is about 20% faster.

Accessing single indexes

If we try to access the array by an index, the check is back.

Gives us two bounds checks:

But if we flip them around and check the highest number first, the compiler understands that there is no need to check the lower numbers.

Results in only one bounds check:

2 thoughts on “Faster C# array access”

  1. Great Article .

    Really good.

    Thanks!

    “In certain cases the compiler can remove the bounds checks when accessing arrays. This leads to around 18%-34% performance increase in accessing the array itself.”

    Would it be possible to include in the final performance increase % the final net negative impact of the increase in Garbage Collection required when creating a local copy of existing public arrays, both large and small?

    James Craig

    Reply
    • Hi. Thank you. 🙂 The local copy is only of the array reference, not the array itself. The reference is copied onto the local stack (scoped within the method) and occupies 4 bytes (32 bit) or 8 bytes (64 bits) of space. GC is not involved.

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Tedds blog

Subscribe now to keep reading and get access to the full archive.

Continue reading