Why C# multidimensional arrays are slow

By | 2021-10-07

In a previous blog post I detailed how you can speed up array access in certain cases. This time I’m taking a look at what is happening when we access arrays in .Net, and digging into why multidimensional arrays are slow.

First a quick refresher on array types.

Array

This is just a standard array. But in the context of these benchmarks “Singledimensional” uses index calculation. See further down for details, and benchmark comparing plain array with

Multidimensional array

Multidimensional arrays have 2 or more dimensions. For simplicity the picture shows only two dimensions, add one more and you get a cube. We can access the array by specifying row and column. This type of array is constrained to a fixed size for each dimension.

This is all just syntactic sugar. Under the hood it is stored sequentially as a single array. If you ever worked with raw image data you will probably have done this exercise many times: y * width + x. x is the first dimension (columns) and y is the second dimension (rows).

Jagged array

Jagged arrays are arrays of arrays. Think List<int[]>, except it is an array and not a List<>. In other words: It is an array that contains references to other array objects. An advantage with this approach is that each array can have any size, it is not restricted to uniform sizes like multidimensional arrays are. The drawback is that the arrays are actually memory pointers to array objects, so memory can be scattered and accessing it will require an additional lookup of array memory address.

Scattered memory may seem like a small thing, but depending on use it can have noticeable impact on performance. If you are sequentially accessing large amounts of data the CPU prefetcher will help you fetch memory before you need it. In the case of non-sequential memory layout, as can be the case with jagged arrays, the prefetcher may fail to guess the correct memory to fetch. See Latency Numbers Every Programmer Should Know for some hints on what the implications might be.

Benchmark

When it comes to CPU and memory, the answer to “which is faster” is often “it depends”. For this test I am using 32KB array which fits into my L1 cache (AMD x5950 with 64KB L1 per core, 512KB L2 per core, 64MB L3 cache shared), and access it sequentially 100 000 000 times in a loop. This should eliminate latency from memory as a factor and exaggerate the differences in access times. The operating system still moves execution between cores to distribute heat in the CPU (requiring L2/L3 fetch), but the effect from that should be negligible.

To access multidimensional data we need to calculate the offset or in case of jagged array, look up the array address. To calculate the offset of a two dimensional array we can use the formula: y * width + x.

To compare the real alternatives for multidimensional array I perform this calculation on “Singledimensional”. For reference I first ran it with “PlainArray” (standard array access).

MethodMeanErrorStdDevMedianRatioAllocated
PlainArray43.42 ms0.007 ms0.027 ms43.41 ms1.0040 B
Singledimensional72.79 ms0.003 ms0.012 ms72.78 ms1.6869 B
Multidimensional95.07 ms0.004 ms0.018 ms95.07 ms2.1980 B
Jagged82.95 ms0.011 ms0.047 ms82.95 ms1.9169 B

The difference in accessing multidimensional arrays is noticeable, up and above twice as slow in some cases. But we are not really interested in PlainArray, so we run again without it.

MethodMeanErrorStdDevMedianRatioAllocated
Singledimensional72.81 ms0.007 ms0.029 ms72.81 ms1.0069 B
Multidimensional95.23 ms0.039 ms0.163 ms95.15 ms1.3180 B
Jagged83.04 ms0.025 ms0.103 ms83.02 ms1.1469 B

As expected the jagged array is (~15%) slower than a single dimensional array. This is because of the overhead in additional address lookup to access the arrays. Also where the two other array types has sequential memory layout, the jagged array can have more scattered memory since objects are not guaranteed to be kept in sequence.

Finding that the multidimensional array is ~30% slower than single dimensional array is however a bit puzzling. The single dimension array test already contains calculations for accessing multidimensional data (y*width+x), so they should perform approximately* at the same speed. (*=we only check total bounds, not row and column separately.)

Digging into the code

The memory dump and assembly code that follows is compiled in .Net Core 5.0.921.35908 for x64.

Anatomy of an array

.Net objects in 64-bit has a 4 byte alignment padding, 4 byte object header, 8 byte method table and 8 bytes internal data. So any object takes up a minimum of 24 bytes. For arrays the internal data field is used to store array length.

4 bytes
Padding
4 bytes
Header
8 bytes
Method table reference
8 bytes
Array size
Element
1
Element
2
Element
3
Element

Knowing this we can look the memory of an array and have a look. We pin the array, get a reference to its first element and subtract

Single dimensional array access

Looking at the assembly code generated for array access in .Net we see that first bounds is checked (L0008, L000b with exception on L00019). Object references in .Net are to the method table address, so array size would be at position +8.

The important part here is L0008 which skips 8 bytes into array object and multiplies index (edx) with 4 (size of int) to find position in array.

Jagged array access

Looking at the jagged array we see a similar pattern with bounds check (L0008, L000b), get array item (L00010) which is an array, then check next dimension bounds on this new array (L0015, L0019) before data is returned (L001e). So the overhead of jagged arrays in terms of opcodes is essentially an additional address lookup.

Multidimensional arrays

Which brings us to the much slower multidimensional array.

We would expect it to be single dimensional array with the addition of index calculation (y*width+x for two dimensions). So assembly code would look something* like this. This code is close to what was tested for single dimensional array in the benchmark. (*Still only one bounds check.)

Before we jump into the assembler code, lets have a look at the memory layout of multidimensional array. Here the header before actual data is longer. We add 8 bytes + 8 bytes for each dimension. So two dimensions = 16+8 bytes extra. Each dimension has 4 bytes size and 4 bytes lower bound.

4 bytes
Padding
4 bytes
Header
8 bytes
Method table reference
8 bytes
Array size
4 byte
Dim 1 size
4 byte
Dim 2 size
4 byte
Dim 1 lbound
4 byte
Dim 2 lbound
Element
1
Element
2
Element
3
Element

With some modifications of the code we can look at memory of multidimensional arrays as well.

Looking at the assembly code, edx is first dimension and r8d (later rcx) second dimension.

Even though C# always has 0 as lower bound for arrays, CLI supports other lower bounds. In Visual Basic for example you can specify lower bound 1 by “Option Base 1”. The assembly code shows this being calculated and checked on for every dimension.

Conclusion

For standard arrays (single dimension) C#/CLI has an optimized object that only checks if the index you are accessing are within the length of the array.

For multidimensional arrays there is no such optimization, and therefore it will also check lower bound of each dimension.

My singledimensional test has the added advantage of not checking if all dimensions are out of bounds, only the total length of the array is checked.

So to be fast and somewhat safe, consider going for jagged array. If you feel comfortable with spinning your own, do that.

Leave a Reply

Your email address will not be published. Required fields are marked *