In a previous blog post I detailed how you can speed up array access in certain cases. This time I’m taking a look at what is happening when we access arrays in .Net, and digging into why multidimensional arrays are slow.
First a quick refresher on array types.
Array

1 |
int[] sd_array = new int[10]; |
This is just a standard array. But in the context of these benchmarks “Singledimensional” uses index calculation. See further down for details, and benchmark comparing plain array with
Multidimensional array

1 |
int[,] md_array = new int[3,10]; |
Multidimensional arrays have 2 or more dimensions. For simplicity the picture shows only two dimensions, add one more and you get a cube. We can access the array by specifying row and column. This type of array is constrained to a fixed size for each dimension.
This is all just syntactic sugar. Under the hood it is stored sequentially as a single array. If you ever worked with raw image data you will probably have done this exercise many times: y * width + x. x is the first dimension (columns) and y is the second dimension (rows).
1 2 3 4 5 |
var md = new int[3,10]; // 3 rows, 10 columns var sd = new int[30]; // 30 columns _ = md[2, 4]; // Access row 2, column 4 (third row, fifth column) _ = sd[2 * 10 + 4]; // Same |
Jagged array

1 2 3 |
int[][] jagged_array = new int[3][]; for (var i = 0; i < 3; i++) jagged_array[i] = new int[10]; |
Jagged arrays are arrays of arrays. Think List<int[]>, except it is an array and not a List<>. In other words: It is an array that contains references to other array objects. An advantage with this approach is that each array can have any size, it is not restricted to uniform sizes like multidimensional arrays are. The drawback is that the arrays are actually memory pointers to array objects, so memory can be scattered and accessing it will require an additional lookup of array memory address.
Scattered memory may seem like a small thing, but depending on use it can have noticeable impact on performance. If you are sequentially accessing large amounts of data the CPU prefetcher will help you fetch memory before you need it. In the case of non-sequential memory layout, as can be the case with jagged arrays, the prefetcher may fail to guess the correct memory to fetch. See Latency Numbers Every Programmer Should Know for some hints on what the implications might be.
Benchmark
When it comes to CPU and memory, the answer to “which is faster” is often “it depends”. For this test I am using 32KB array which fits into my L1 cache (AMD x5950 with 64KB L1 per core, 512KB L2 per core, 64MB L3 cache shared), and access it sequentially 100 000 000 times in a loop. This should eliminate latency from memory as a factor and exaggerate the differences in access times. The operating system still moves execution between cores to distribute heat in the CPU (requiring L2/L3 fetch), but the effect from that should be negligible.
To access multidimensional data we need to calculate the offset or in case of jagged array, look up the array address. To calculate the offset of a two dimensional array we can use the formula: y * width + x.
To compare the real alternatives for multidimensional array I perform this calculation on “Singledimensional”. For reference I first ran it with “PlainArray” (standard array access).
1 2 3 4 5 6 7 8 |
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1237 (21H1/May2021Update) AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET .NET SDK=6.0.100-rc.1.21463.6 [Host] : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT 32KB : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT |
1 2 3 |
Job=32KB IterationCount=100 LaunchCount=2 WarmupCount=10 Categories=Local |
Method | Mean | Error | StdDev | Median | Ratio | Allocated |
---|---|---|---|---|---|---|
PlainArray | 43.42 ms | 0.007 ms | 0.027 ms | 43.41 ms | 1.00 | 40 B |
Singledimensional | 72.79 ms | 0.003 ms | 0.012 ms | 72.78 ms | 1.68 | 69 B |
Multidimensional | 95.07 ms | 0.004 ms | 0.018 ms | 95.07 ms | 2.19 | 80 B |
Jagged | 82.95 ms | 0.011 ms | 0.047 ms | 82.95 ms | 1.91 | 69 B |

The difference in accessing multidimensional arrays is noticeable, up and above twice as slow in some cases. But we are not really interested in PlainArray, so we run again without it.
1 2 3 4 5 6 7 8 |
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1237 (21H1/May2021Update) AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET .NET SDK=6.0.100-rc.1.21463.6 [Host] : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT 32KB : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT |
1 2 3 |
Job=32KB IterationCount=100 LaunchCount=2 WarmupCount=10 Categories=Local |
Method | Mean | Error | StdDev | Median | Ratio | Allocated |
---|---|---|---|---|---|---|
Singledimensional | 72.81 ms | 0.007 ms | 0.029 ms | 72.81 ms | 1.00 | 69 B |
Multidimensional | 95.23 ms | 0.039 ms | 0.163 ms | 95.15 ms | 1.31 | 80 B |
Jagged | 83.04 ms | 0.025 ms | 0.103 ms | 83.02 ms | 1.14 | 69 B |

As expected the jagged array is (~15%) slower than a single dimensional array. This is because of the overhead in additional address lookup to access the arrays. Also where the two other array types has sequential memory layout, the jagged array can have more scattered memory since objects are not guaranteed to be kept in sequence.
Finding that the multidimensional array is ~30% slower than single dimensional array is however a bit puzzling. The single dimension array test already contains calculations for accessing multidimensional data (y*width+x), so they should perform approximately* at the same speed. (*=we only check total bounds, not row and column separately.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
public int Singledimensional() { var n = 0; var array = _sdArray; var sum = 1; for (var i = 0; i < Iterations; i++) { sum &= array[(n >> RowSizeBits) * RowSize + (n & RowSizeBitMask)]; n = ++n & TotalLengthBitmask; } return sum; } public int Multidimensional() { var n = 0; var array = _mdArray; var sum = 1; for (var i = 0; i < Iterations; i++) { sum &= array[(n >> RowSizeBits), (n & RowSizeBitMask)]; n = ++n & TotalLengthBitmask; } return sum; } |
Digging into the code
The memory dump and assembly code that follows is compiled in .Net Core 5.0.921.35908 for x64.
Anatomy of an array
.Net objects in 64-bit has a 4 byte alignment padding, 4 byte object header, 8 byte method table and 8 bytes internal data. So any object takes up a minimum of 24 bytes. For arrays the internal data field is used to store array length.
#arrayTable1 { text-align: center; border-style: none; } #arrayTable1 td { border-width: 1px; padding: 1px; border-style: solid; border-color: black; background-color: #D3D3D3; -moz-border-radius: ; }4 bytes Padding |
4 bytes Header |
8 bytes Method table reference |
8 bytes Array size |
Element 1 |
Element 2 |
Element 3 |
Element … |
Knowing this we can look the memory of an array and have a look. We pin the array, get a reference to its first element and subtract
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
using System; using System.Runtime.InteropServices; void Main() { Console.WriteLine($"Is64BitProcess: {System.Environment.Is64BitProcess}"); DumpArray(new byte[] { 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xAA }); } unsafe void DumpArray(Array array) { var pinnedArray = GCHandle.Alloc(array, GCHandleType.Pinned); var unsafeArray = (byte*)pinnedArray.AddrOfPinnedObject(); var data = new byte[sizeof(byte) * array.Length + 24]; unsafeArray -= 24; Marshal.Copy((IntPtr)unsafeArray, data, 0, data.Length); var arrayLength = BitConverter.ToInt64(data, 16); var dataHex = BitConverter.ToString(data); pinnedArray.Free(); Console.WriteLine($@"Array length: {arrayLength} Raw memory: {dataHex} Breakdown: Padding: {BitConverter.ToString(data[0..4])} Header: {BitConverter.ToString(data[4..8])} Method table ref: {BitConverter.ToString(data[8..16])} Array size: {BitConverter.ToString(data[16..24])} Array data: {BitConverter.ToString(data[24..])} "); } |
1 2 3 4 5 6 7 8 9 10 |
Is64BitProcess: True Array length: 10 Raw memory: 00-00-00-00-00-00-00-00-18-B7-42-7C-FC-7F-00-00-0A-00-00-00-00-00-00-00-11-22-33-44-55-66-77-88-99-AA Breakdown: Padding: 00-00-00-00 Header: 00-00-00-00 Method table ref: 18-B7-42-7C-FC-7F-00-00 Array size: 0A-00-00-00-00-00-00-00 Array data: 11-22-33-44-55-66-77-88-99-AA |
Single dimensional array access
Looking at the assembly code generated for array access in .Net we see that first bounds is checked (L0008, L000b with exception on L00019). Object references in .Net are to the method table address, so array size would be at position +8.
The important part here is L0008 which skips 8 bytes into array object and multiplies index (edx) with 4 (size of int) to find position in array.
1 2 3 4 5 6 7 8 9 10 |
L0000: sub rsp, 0x28 ; Reserve space on stack L0004: mov rax, [rcx+8] ; Get array from reference L0008: cmp edx, [rax+8] ; Compare index (edx) against size of array L000b: jae short L0019 ; If larger, jump to exception L000d: movsxd rdx, edx ; Copy index from 32-bit signed to 64-bit signed L0010: mov eax, [rax+rdx*4+0x10] ; Get data from array at rax+16+index*sizeof(int) L0014: add rsp, 0x28 ; Free stack space L0018: ret ; Return L0019: call 0x00007ff94aaaa370 ; Throw exception L001e: int3 ; Signal breakpoint |
Jagged array access
Looking at the jagged array we see a similar pattern with bounds check (L0008, L000b), get array item (L00010) which is an array, then check next dimension bounds on this new array (L0015, L0019) before data is returned (L001e). So the overhead of jagged arrays in terms of opcodes is essentially an additional address lookup.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
L0000: sub rsp, 0x28 ; Reserve space on stack L0004: mov rax, [rcx+0x18] ; Get array from reference L0008: cmp edx, [rax+8] ; Compare index (edx) against size of array L000b: jae short L0027 ; If larger, jump to exception L000d: movsxd rdx, edx ; Copy index from 32-bit signed to 64-bit signed L0010: mov rax, [rax+rdx*8+0x10] ; Get object reference at rax+16+index*sizeof(IntPtr) L0015: cmp r8d, [rax+8] ; Compare index (r8d) against size of array L0019: jae short L0027 ; If larger, jump to exception L001b: movsxd rdx, r8d ; Copy index from 32-bit signed to 64-bit signed L001e: mov eax, [rax+rdx*4+0x10] ; Get data from array at rax+16+index*sizeof(int) L0022: add rsp, 0x28 ; Free stack space L0026: ret ; Return L0027: call 0x00007ff94aaaa370 ; Throw exception L002c: int3 ; Signal breakpoint |
Multidimensional arrays
Which brings us to the much slower multidimensional array.
We would expect it to be single dimensional array with the addition of index calculation (y*width+x for two dimensions). So assembly code would look something* like this. This code is close to what was tested for single dimensional array in the benchmark. (*Still only one bounds check.)
1 2 3 4 5 6 7 8 9 10 11 12 13 |
L0000: sub rsp, 0x28 ; Reserve space on stack L0004: mov rax, [rcx+8] ; Get array from reference L0008: mov rcx, rax ; Make a copy L000b: imul edx, [rax+8] ; edx contains first dimension index, multiply with array size L0010: add edx, r8d ; Add second dimension index L0013: cmp edx, [rcx+8] ; Check that result is within array bounds L0016: jae short L0024 ; If larger, jump to exception L0018: movsxd rax, edx ; Copy index from 32-bit signed to 64-bit signed L001b: mov eax, [rcx+rax*4+0x10] ; Get data from array at rax+16+index*sizeof(int) L001f: add rsp, 0x28 ; Free stack space L0023: ret ; Return L0024: call 0x00007ff94aaaa370 ; Throw exception L0029: int3 ; Signal breakpoint |
Before we jump into the assembler code, lets have a look at the memory layout of multidimensional array. Here the header before actual data is longer. We add 8 bytes + 8 bytes for each dimension. So two dimensions = 16+8 bytes extra. Each dimension has 4 bytes size and 4 bytes lower bound.
#arrayTable1 { text-align: center; border-style: none; } #arrayTable1 td { border-width: 1px; padding: 1px; border-style: solid; border-color: black; background-color: #D3D3D3; -moz-border-radius: ; }4 bytes Padding |
4 bytes Header |
8 bytes Method table reference |
8 bytes Array size |
4 byte Dim 1 size |
4 byte Dim 2 size |
4 byte Dim 1 lbound |
4 byte Dim 2 lbound |
Element 1 |
Element 2 |
Element 3 |
Element … |
With some modifications of the code we can look at memory of multidimensional arrays as well.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
using System; using System.Runtime.InteropServices; unsafe void Main() { Console.WriteLine($"System.Environment.Is64BitProcess: {System.Environment.Is64BitProcess}"); Console.WriteLine($""); DumpArray(new byte[] { 1, 2, 3, 4, 5, 6 }); DumpArray(new byte[,] { { 1, 2, 3 }, { 4, 5, 6 } }); DumpArray(new byte[,,] { { { 0x10, 0x11, 0x12, 0x13 }, { 0x20, 0x21, 0x22, 0x23 }, { 0x30, 0x31, 0x32,0x33 } }, { { 0xA0, 0xA1, 0xA2, 0xA3 }, { 0xB0, 0xB1, 0xB2, 0xB3 }, { 0xC1, 0xC2, 0xC3,0xC4 } } }); } unsafe void DumpArray(Array array) { var pinnedArray = GCHandle.Alloc(array, GCHandleType.Pinned); var unsafeArray = (byte*)pinnedArray.AddrOfPinnedObject(); // Get total array size by multiplying all dimension upper bounds var size = array.GetUpperBound(0) + 1; if (array.Rank > 0) { for (var r = 1; r < array.Rank; r++) { var ub = array.GetUpperBound(r) + 1; size *= ub; } } // Header is 24 on standard array, and +(8*count) on additional dimensions var headerFwd = 24; if (array.Rank > 1) headerFwd += array.Rank * 8; var data = new byte[sizeof(byte) * size + headerFwd]; unsafeArray -= headerFwd; Marshal.Copy((IntPtr)unsafeArray, data, 0, data.Length); var arrayLength = BitConverter.ToInt64(data, 16); var dataHex = BitConverter.ToString(data); pinnedArray.Free(); Console.WriteLine($@"Array length: {arrayLength} Raw memory: {dataHex} Header size: {headerFwd} Breakdown: Padding: {BitConverter.ToString(data[0..4])} Header: {BitConverter.ToString(data[4..8])} Method table ref: {BitConverter.ToString(data[8..16])} Array size: {BitConverter.ToString(data[16..24])} ({BitConverter.ToInt64(data[16..24])})"); if (array.Rank > 1) { for (var i = 0; i < array.Rank; i++) { var pos = 24 + (i) * 4; var d = data[pos..(pos + 4)]; Console.WriteLine($"Dimension {i} len: {BitConverter.ToString(d)} ({BitConverter.ToInt32(d)})"); } for (var i = 0; i < array.Rank; i++) { var pos = 24 + (array.Rank*4) + (i) * 4; var d = data[pos..(pos + 4)]; Console.WriteLine($"Dimension {i} low: {BitConverter.ToString(d)} ({BitConverter.ToInt32(d)})"); } } Console.WriteLine($@"Array data: {BitConverter.ToString(data[headerFwd..])}"); Console.WriteLine(""); } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
System.Environment.Is64BitProcess: True Array length: 6 Raw memory: 00-00-00-00-00-00-00-00-18-B7-41-7C-FC-7F-00-00-06-00-00-00-00-00-00-00-01-02-03-04-05-06 Header size: 24 Breakdown: Padding: 00-00-00-00 Header: 00-00-00-00 Method table ref: 18-B7-41-7C-FC-7F-00-00 Array size: 06-00-00-00-00-00-00-00 (6) Array data: 01-02-03-04-05-06 Array length: 6 Raw memory: 00-00-00-00-00-00-00-00-78-91-BF-7C-FC-7F-00-00-06-00-00-00-00-00-00-00-02-00-00-00-03-00-00-00-00-00-00-00-00-00-00-00-01-02-03-04-05-06 Header size: 40 Breakdown: Padding: 00-00-00-00 Header: 00-00-00-00 Method table ref: 78-91-BF-7C-FC-7F-00-00 Array size: 06-00-00-00-00-00-00-00 (6) Dimension 0 len: 02-00-00-00 (2) Dimension 1 len: 03-00-00-00 (3) Dimension 0 low: 00-00-00-00 (0) Dimension 1 low: 00-00-00-00 (0) Array data: 01-02-03-04-05-06 Array length: 24 Raw memory: 00-00-00-00-00-00-00-00-B0-93-BF-7C-FC-7F-00-00-18-00-00-00-00-00-00-00-02-00-00-00-03-00-00-00-04-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-10-11-12-13-20-21-22-23-30-31-32-33-A0-A1-A2-A3-B0-B1-B2-B3-C1-C2-C3-C4 Header size: 48 Breakdown: Padding: 00-00-00-00 Header: 00-00-00-00 Method table ref: B0-93-BF-7C-FC-7F-00-00 Array size: 18-00-00-00-00-00-00-00 (24) Dimension 0 len: 02-00-00-00 (2) Dimension 1 len: 03-00-00-00 (3) Dimension 2 len: 04-00-00-00 (4) Dimension 0 low: 00-00-00-00 (0) Dimension 1 low: 00-00-00-00 (0) Dimension 2 low: 00-00-00-00 (0) Array data: 10-11-12-13-20-21-22-23-30-31-32-33-A0-A1-A2-A3-B0-B1-B2-B3-C1-C2-C3-C4 |
Looking at the assembly code, edx is first dimension and r8d (later rcx) second dimension.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
L0000: sub rsp, 0x28 ; Reserve space on stack L0004: mov rax, [rcx+0x10] ; Get array from reference L0008: sub edx, [rax+0x18] ; Subtract lower bound of dimension 0 from index 0 L000b: cmp edx, [rax+0x10] ; Check that result is within dimension 0 bounds L000e: jae short L0032 ; If larger, jump to exception L0010: mov rcx, r8 ; (Copy) L0013: sub ecx, [rax+0x1c] ; Subtract lower bound of dimension 1 from index 1 (32-bit) L0016: cmp ecx, [rax+0x14] ; Check that result is within dimension 1 bounds L0019: jae short L0032 ; If larger, jump to exception L001b: mov r8d, [rax+0x14] ; Copy dimension 1 length to r8d L001f: imul r8, rdx ; Multiply with index L0023: mov rdx, rcx ; (Copy) L0026: add rdx, r8 ; Add second dimension L0029: mov eax, [rax+rdx*4+0x20] ; Get data from array at rax+32+index*sizeof(int) L002d: add rsp, 0x28 ; Free stack space L0031: ret ; Return L0032: call 0x00007ff94aaaa370 ; Throw exception L0037: int3 ; Signal breakpoints |
Even though C# always has 0 as lower bound for arrays, CLI supports other lower bounds. In Visual Basic for example you can specify lower bound 1 by “Option Base 1”. The assembly code shows this being calculated and checked on for every dimension.
Conclusion
For standard arrays (single dimension) C#/CLI has an optimized object that only checks if the index you are accessing are within the length of the array.
For multidimensional arrays there is no such optimization, and therefore it will also check lower bound of each dimension.
My singledimensional test has the added advantage of not checking if all dimensions are out of bounds, only the total length of the array is checked.
So to be fast and somewhat safe, consider going for jagged array. If you feel comfortable with spinning your own, do that.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
public class MyArray2D { public readonly int[] Array; public readonly int Width; public readonly int Height; public MyArray2D(int width, int height) { Width = width; Height = height; Array = new int[Width * Height]; } /// <summary> /// Access array by 2D coordinates. /// </summary> /// <param name="x">Horizontal (width) axis.</param> /// <param name="y">Vertical (height) axis.</param> /// <returns></returns> public int this[int x, int y] { get => Array[y * Width + x]; set => Array[y * Width + x] = value; } } |
You, sir, are a genius. Given this amazing work though I wish you could have made the conclusion a bit more clear. Especially on the last line which could be interpreted a couple ways:
“So to be fast and somewhat safe, consider going for jagged array. <strong>If you feel comfortable with spinning your own</strong>, do that.”
By “If you feel comfortable <em>with spinning your own</em>“, do you mean constructing a singledimensional array (not a jagged array) and accessing like it’s a multidimenisional array, but using one’s own access calculations? Hey thanks!
Thank you. 🙂
Good point. I have added an example. This adds some overhead of class and indexers (methods, can be inlined), so how you choose to solve it depends on what you are doing of course.