Faster C# array access

By | 2020-06-01

In .Net memory access is safe by default. This means that every time you access an array item the index is checked against the length of the array. In this post I explore how you can speed up plain old array access by allowing the compiler to prove that you will do no harm.

I compile the C# code down to X64 native code and analyze that, then run some benchmarks to see if my assumptions are correct.

Bounds checking

To demonstrate the bounds check, lets start off with a simple application that iterates an array:

The x64 JIT Asm output of this is:

You don’t need to understand assembly code, I will explain.

In L0013 and L0017 there is a bounds-check inside the loop. This is done by comparing the index we are accessing to the size of the array. In 64-bit mode the size of an array is stored 8 bytes into the object memory right after Method Table, so pointer address+8 or r8+8.

Proof

A cool thing about compilers is that they can optimize the code a lot. But to optimize it, they need to be able to prove that there are no side effects.

So what if compiler was able to prove that we will never go outside of the bounds of the array? Then it could skip the bounds check. Lets try.

First of, an array can’t be resized. And we already guarantee that we won’t exceed the bounds of the array in the for-loop “i < PublicArray.Length”. The problem is that we are using a shared (public) variable (not in local scope), so it could be replaced at any point (by our code/other threads/callbacks). Therefore bounds check has to be performed on every access to the array.

If we make a local copy of PublicArray then the compiler should be able to prove that we will never exceed the bounds of the array, and the pointer will not be updated.

Gives us the X64 Jit ASM output:

Success. Immediately we can see that we went from 18 to 12 instructions. The compiler was able to prove that we will not exceed the bounds of this array, which allowed it to skip the bounds checks.

Fixed raw pointer

Objects in .Net go through a lookup-table. Because of Garbage Collect potentially defragmenting memory the objects are always a pointer to a pointer.

The alternative is to pin the object to a fixed address and access it through a direct pointer. This makes memory access faster, but has a setup cost and it is unsafe. You can potentially read/write outside of the bounds of the array, which is often very, very bad.

Which gives us x64 JIT Asm:

Using fixed to get an unsafe pointer has higher setup cost, but accessing a pointer directly in unsafe mode removes the bounds check. There are also less calculations as the index is only relative to the pointer address.

Benchmark

So lets run these through some benchmarks.

We test two different array sizes, one small that fits inside L1 cache and one larger that effectively purges the cache.

1024 bytes (1KB)

Using a local copy of the variable gives 34% performance gain.

Setting up fixed pointer for unsafe access has some overhead, so the performance gain on such small array is 21%.

1024 MB (1GB)

On a larger array we can see that the overhead of setting up fixed pointer is negligible, and performance is slightly better than a local copy.

There is some overhead in accessing main memory, so the relative percentage is lower.

Unsafe array 20% performance gain.
Local coopy array 18% perofrmance gain.

Foreach

So far we have used for. After all, foreach works differently with iterators and all that. Lets look at the same method using foreach:

Compiles to 100% identical assembly code as its for counterpart.

We see the same with foreach on public array, it compiles to the same assembly code as its for counterpart and hence has bounds checks.

Custom for-range

If you do not use .Length-property of the array you are looping over, the bounds check will be present.

Compiles to:

One way to bypass this is to use Span<T>. Span<T> has the same constraints for optimization and will still have the bounds check, but you can set the bounds to the desired size and have the for-loop iterate on that. In this case the bounds only needs to be checked once.

Compiles to:

Conclusion

In certain cases the compiler can remove the bounds checks when accessing arrays. This leads to around 18%-34% performance increase in accessing the array itself.

Condition: You access a local variable (pointing to the array) using a for-statement that caps at the arrays length.

You can create these conditions using Span<T>.

So this:

Is 18%-34% faster than this:

On larger arrays accessing memory directly as pointer is about 20% faster.

Accessing single indexes

If we try to access the array by an index, the check is back.

Gives us two bounds checks:

But if we flip them around and check the highest number first, the compiler understands that there is no need to check the lower numbers.

Results in only one bounds check:

Leave a Reply

Your email address will not be published. Required fields are marked *