Why C# multidimensional arrays are slow

In a previous blog post I detailed how you can speed up array access in certain cases. This time I’m taking a look at what is happening when we access arrays in .Net, and digging into why multidimensional arrays are slow.

First a quick refresher on array types.

Array

int[] sd_array = new int[10];

This is just a standard array. But in the context of these benchmarks “Singledimensional” uses index calculation. See further down for details, and benchmark comparing plain array with

Multidimensional array

int[,] md_array = new int[3,10];

Multidimensional arrays have 2 or more dimensions. For simplicity the picture shows only two dimensions, add one more and you get a cube. We can access the array by specifying row and column. This type of array is constrained to a fixed size for each dimension.

This is all just syntactic sugar. Under the hood it is stored sequentially as a single array. If you ever worked with raw image data you will probably have done this exercise many times: y * width + x. x is the first dimension (columns) and y is the second dimension (rows).

var md = new int[3,10]; // 3 rows, 10 columns
var sd = new int[30];   // 30 columns

_ = md[2, 4];           // Access row 2, column 4 (third row, fifth column)
_ = sd[2 * 10 + 4];     // Same

Jagged array

int[][] jagged_array = new int[3][];
for (var i = 0; i < 3; i++)
   jagged_array[i] = new int[10];

Jagged arrays are arrays of arrays. Think List<int[]>, except it is an array and not a List<>. In other words: It is an array that contains references to other array objects. An advantage with this approach is that each array can have any size, it is not restricted to uniform sizes like multidimensional arrays are. The drawback is that the arrays are actually memory pointers to array objects, so memory can be scattered and accessing it will require an additional lookup of array memory address.

Scattered memory may seem like a small thing, but depending on use it can have noticeable impact on performance. If you are sequentially accessing large amounts of data the CPU prefetcher will help you fetch memory before you need it. In the case of non-sequential memory layout, as can be the case with jagged arrays, the prefetcher may fail to guess the correct memory to fetch. See Latency Numbers Every Programmer Should Know for some hints on what the implications might be.

Benchmark

When it comes to CPU and memory, the answer to “which is faster” is often “it depends”. For this test I am using 32KB array which fits into my L1 cache (AMD x5950 with 64KB L1 per core, 512KB L2 per core, 64MB L3 cache shared), and access it sequentially 100 000 000 times in a loop. This should eliminate latency from memory as a factor and exaggerate the differences in access times. The operating system still moves execution between cores to distribute heat in the CPU (requiring L2/L3 fetch), but the effect from that should be negligible.

To access multidimensional data we need to calculate the offset or in case of jagged array, look up the array address. To calculate the offset of a two dimensional array we can use the formula: y * width + x.

To compare the real alternatives for multidimensional array I perform this calculation on “Singledimensional”. For reference I first ran it with “PlainArray” (standard array access).


BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1237 (21H1/May2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET
.NET SDK=6.0.100-rc.1.21463.6
  [Host] : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT
  32KB   : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT

Job=32KB  IterationCount=100  LaunchCount=2  
WarmupCount=10  Categories=Local

Method	Mean	Error	StdDev	Median	Ratio	Allocated
PlainArray	43.42 ms	0.007 ms	0.027 ms	43.41 ms	1.00	40 B
Singledimensional	72.79 ms	0.003 ms	0.012 ms	72.78 ms	1.68	69 B
Multidimensional	95.07 ms	0.004 ms	0.018 ms	95.07 ms	2.19	80 B
Jagged	82.95 ms	0.011 ms	0.047 ms	82.95 ms	1.91	69 B

The difference in accessing multidimensional arrays is noticeable, up and above twice as slow in some cases. But we are not really interested in PlainArray, so we run again without it.


BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1237 (21H1/May2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET
.NET SDK=6.0.100-rc.1.21463.6
  [Host] : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT
  32KB   : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT

Job=32KB  IterationCount=100  LaunchCount=2  
WarmupCount=10  Categories=Local

Method	Mean	Error	StdDev	Median	Ratio	Allocated
Singledimensional	72.81 ms	0.007 ms	0.029 ms	72.81 ms	1.00	69 B
Multidimensional	95.23 ms	0.039 ms	0.163 ms	95.15 ms	1.31	80 B
Jagged	83.04 ms	0.025 ms	0.103 ms	83.02 ms	1.14	69 B

As expected the jagged array is (~15%) slower than a single dimensional array. This is because of the overhead in additional address lookup to access the arrays. Also where the two other array types has sequential memory layout, the jagged array can have more scattered memory since objects are not guaranteed to be kept in sequence.

Finding that the multidimensional array is ~30% slower than single dimensional array is however a bit puzzling. The single dimension array test already contains calculations for accessing multidimensional data (y*width+x), so they should perform approximately* at the same speed. (*=we only check total bounds, not row and column separately.)

public int Singledimensional()
{
    var n = 0;
    var array = _sdArray;
    var sum = 1;

    for (var i = 0; i < Iterations; i++)
    {
        sum &= array[(n >> RowSizeBits) * RowSize + (n & RowSizeBitMask)];
        n = ++n & TotalLengthBitmask;
    }

    return sum;
}  

public int Multidimensional()
{
    var n = 0;
    var array = _mdArray;
    var sum = 1;

    for (var i = 0; i < Iterations; i++)
    {
        sum &= array[(n >> RowSizeBits), (n & RowSizeBitMask)];
        n = ++n & TotalLengthBitmask;
    }

    return sum;
}

Digging into the code

The memory dump and assembly code that follows is compiled in .Net Core 5.0.921.35908 for x64.

Anatomy of an array

.Net objects in 64-bit has a 4 byte alignment padding, 4 byte object header, 8 byte method table and 8 bytes internal data. So any object takes up a minimum of 24 bytes. For arrays the internal data field is used to store array length.

#arrayTable1 { text-align: center; border-style: none; } #arrayTable1 td { border-width: 1px; padding: 1px; border-style: solid; border-color: black; background-color: #D3D3D3; -moz-border-radius: ; }

4 bytes
Padding

4 bytes
Header

8 bytes
Method table reference

8 bytes
Array size

Element
1

Element
2

Element
3

Element
…

Knowing this we can look the memory of an array and have a look. We pin the array, get a reference to its first element and subtract

using System;
using System.Runtime.InteropServices;

void Main()
{
	Console.WriteLine($"Is64BitProcess: {System.Environment.Is64BitProcess}");

	DumpArray(new byte[] { 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xAA });
}

unsafe void DumpArray(Array array)
{
	var pinnedArray = GCHandle.Alloc(array, GCHandleType.Pinned);
	var unsafeArray = (byte*)pinnedArray.AddrOfPinnedObject();
	var data = new byte[sizeof(byte) * array.Length + 24];
	unsafeArray -= 24;
	Marshal.Copy((IntPtr)unsafeArray, data, 0, data.Length);
	var arrayLength = BitConverter.ToInt64(data, 16);
	var dataHex = BitConverter.ToString(data);
	pinnedArray.Free();

	Console.WriteLine($@"Array length: {arrayLength}
Raw memory: {dataHex}

Breakdown:
Padding:          {BitConverter.ToString(data[0..4])}
Header:           {BitConverter.ToString(data[4..8])}
Method table ref: {BitConverter.ToString(data[8..16])}
Array size:       {BitConverter.ToString(data[16..24])}
Array data:       {BitConverter.ToString(data[24..])}
");
}

Is64BitProcess: True
Array length: 10
Raw memory: 00-00-00-00-00-00-00-00-18-B7-42-7C-FC-7F-00-00-0A-00-00-00-00-00-00-00-11-22-33-44-55-66-77-88-99-AA

Breakdown:
Padding:          00-00-00-00
Header:           00-00-00-00
Method table ref: 18-B7-42-7C-FC-7F-00-00
Array size:       0A-00-00-00-00-00-00-00
Array data:       11-22-33-44-55-66-77-88-99-AA

Single dimensional array access

Looking at the assembly code generated for array access in .Net we see that first bounds is checked (L0008, L000b with exception on L00019). Object references in .Net are to the method table address, so array size would be at position +8.

The important part here is L0008 which skips 8 bytes into array object and multiplies index (edx) with 4 (size of int) to find position in array.

L0000: sub rsp, 0x28                ; Reserve space on stack
L0004: mov rax, [rcx+8]             ; Get array from reference
L0008: cmp edx, [rax+8]             ; Compare index (edx) against size of array
L000b: jae short L0019              ; If larger, jump to exception
L000d: movsxd rdx, edx              ; Copy index from 32-bit signed to 64-bit signed
L0010: mov eax, [rax+rdx*4+0x10]    ; Get data from array at rax+16+index*sizeof(int)
L0014: add rsp, 0x28                ; Free stack space
L0018: ret                          ; Return
L0019: call 0x00007ff94aaaa370      ; Throw exception
L001e: int3                         ; Signal breakpoint

Jagged array access

Looking at the jagged array we see a similar pattern with bounds check (L0008, L000b), get array item (L00010) which is an array, then check next dimension bounds on this new array (L0015, L0019) before data is returned (L001e). So the overhead of jagged arrays in terms of opcodes is essentially an additional address lookup.

L0000: sub rsp, 0x28               ; Reserve space on stack
L0004: mov rax, [rcx+0x18]         ; Get array from reference
L0008: cmp edx, [rax+8]            ; Compare index (edx) against size of array
L000b: jae short L0027             ; If larger, jump to exception
L000d: movsxd rdx, edx             ; Copy index from 32-bit signed to 64-bit signed
L0010: mov rax, [rax+rdx*8+0x10]   ; Get object reference at rax+16+index*sizeof(IntPtr)
L0015: cmp r8d, [rax+8]            ; Compare index (r8d) against size of array
L0019: jae short L0027             ; If larger, jump to exception
L001b: movsxd rdx, r8d             ; Copy index from 32-bit signed to 64-bit signed
L001e: mov eax, [rax+rdx*4+0x10]   ; Get data from array at rax+16+index*sizeof(int)
L0022: add rsp, 0x28               ; Free stack space
L0026: ret                         ; Return
L0027: call 0x00007ff94aaaa370     ; Throw exception
L002c: int3                        ; Signal breakpoint

Multidimensional arrays

Which brings us to the much slower multidimensional array.

We would expect it to be single dimensional array with the addition of index calculation (y*width+x for two dimensions). So assembly code would look something* like this. This code is close to what was tested for single dimensional array in the benchmark. (*Still only one bounds check.)

L0000: sub rsp, 0x28              ; Reserve space on stack
L0004: mov rax, [rcx+8]           ; Get array from reference
L0008: mov rcx, rax               ; Make a copy
L000b: imul edx, [rax+8]          ; edx contains first dimension index, multiply with array size
L0010: add edx, r8d               ; Add second dimension index
L0013: cmp edx, [rcx+8]           ; Check that result is within array bounds
L0016: jae short L0024            ; If larger, jump to exception
L0018: movsxd rax, edx            ; Copy index from 32-bit signed to 64-bit signed
L001b: mov eax, [rcx+rax*4+0x10]  ; Get data from array at rax+16+index*sizeof(int)
L001f: add rsp, 0x28              ; Free stack space
L0023: ret                        ; Return
L0024: call 0x00007ff94aaaa370    ; Throw exception
L0029: int3                       ; Signal breakpoint

Before we jump into the assembler code, lets have a look at the memory layout of multidimensional array. Here the header before actual data is longer. We add 8 bytes + 8 bytes for each dimension. So two dimensions = 16+8 bytes extra. Each dimension has 4 bytes size and 4 bytes lower bound.

4 bytes
Padding

4 bytes
Header

8 bytes
Method table reference

8 bytes
Array size

4 byte
Dim 1 size

4 byte
Dim 2 size

4 byte
Dim 1 lbound

4 byte
Dim 2 lbound

Element
1

Element
2

Element
3

Element
…

With some modifications of the code we can look at memory of multidimensional arrays as well.

using System;
using System.Runtime.InteropServices;
unsafe void Main()
{
	Console.WriteLine($"System.Environment.Is64BitProcess: {System.Environment.Is64BitProcess}");
	Console.WriteLine($"");
	DumpArray(new byte[] { 1, 2, 3, 4, 5, 6 });
	DumpArray(new byte[,] { { 1, 2, 3 }, { 4, 5, 6 } });
	DumpArray(new byte[,,] {
		{ { 0x10, 0x11, 0x12, 0x13 }, { 0x20, 0x21, 0x22, 0x23 }, { 0x30, 0x31, 0x32,0x33 } },
		{ { 0xA0, 0xA1, 0xA2, 0xA3 }, { 0xB0, 0xB1, 0xB2, 0xB3 }, { 0xC1, 0xC2, 0xC3,0xC4 } }
		});
}

unsafe void DumpArray(Array array)
{
	var pinnedArray = GCHandle.Alloc(array, GCHandleType.Pinned);
	var unsafeArray = (byte*)pinnedArray.AddrOfPinnedObject();
	// Get total array size by multiplying all dimension upper bounds
	var size = array.GetUpperBound(0) + 1;
	if (array.Rank > 0)
	{
		for (var r = 1; r < array.Rank; r++)
		{
			var ub = array.GetUpperBound(r) + 1;
			size *= ub;
		}
	}
	// Header is 24 on standard array, and +(8*count) on additional dimensions
	var headerFwd = 24;
	if (array.Rank > 1)
		headerFwd += array.Rank * 8;

	var data = new byte[sizeof(byte) * size + headerFwd];
	unsafeArray -= headerFwd;
	Marshal.Copy((IntPtr)unsafeArray, data, 0, data.Length);
	var arrayLength = BitConverter.ToInt64(data, 16);
	var dataHex = BitConverter.ToString(data);
	pinnedArray.Free();

	Console.WriteLine($@"Array length: {arrayLength}
Raw memory: {dataHex}
Header size: {headerFwd}
Breakdown:
Padding:          {BitConverter.ToString(data[0..4])}
Header:           {BitConverter.ToString(data[4..8])}
Method table ref: {BitConverter.ToString(data[8..16])}
Array size:       {BitConverter.ToString(data[16..24])} ({BitConverter.ToInt64(data[16..24])})");
	if (array.Rank > 1)
	{
		for (var i = 0; i < array.Rank; i++)
		{
			var pos = 24 + (i) * 4;
			var d = data[pos..(pos + 4)];
			Console.WriteLine($"Dimension {i} len:  {BitConverter.ToString(d)} ({BitConverter.ToInt32(d)})");
		}
		for (var i = 0; i < array.Rank; i++)
		{
			var pos = 24 + (array.Rank*4) + (i) * 4;
			var d = data[pos..(pos + 4)];
			Console.WriteLine($"Dimension {i} low:  {BitConverter.ToString(d)} ({BitConverter.ToInt32(d)})");
		}
	}
	Console.WriteLine($@"Array data:       {BitConverter.ToString(data[headerFwd..])}");

	Console.WriteLine("");
}

System.Environment.Is64BitProcess: True

Array length: 6
Raw memory: 00-00-00-00-00-00-00-00-18-B7-41-7C-FC-7F-00-00-06-00-00-00-00-00-00-00-01-02-03-04-05-06
Header size: 24
Breakdown:
Padding:          00-00-00-00
Header:           00-00-00-00
Method table ref: 18-B7-41-7C-FC-7F-00-00
Array size:       06-00-00-00-00-00-00-00 (6)
Array data:       01-02-03-04-05-06

Array length: 6
Raw memory: 00-00-00-00-00-00-00-00-78-91-BF-7C-FC-7F-00-00-06-00-00-00-00-00-00-00-02-00-00-00-03-00-00-00-00-00-00-00-00-00-00-00-01-02-03-04-05-06
Header size: 40
Breakdown:
Padding:          00-00-00-00
Header:           00-00-00-00
Method table ref: 78-91-BF-7C-FC-7F-00-00
Array size:       06-00-00-00-00-00-00-00 (6)
Dimension 0 len:  02-00-00-00 (2)
Dimension 1 len:  03-00-00-00 (3)
Dimension 0 low:  00-00-00-00 (0)
Dimension 1 low:  00-00-00-00 (0)
Array data:       01-02-03-04-05-06

Array length: 24
Raw memory: 00-00-00-00-00-00-00-00-B0-93-BF-7C-FC-7F-00-00-18-00-00-00-00-00-00-00-02-00-00-00-03-00-00-00-04-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-10-11-12-13-20-21-22-23-30-31-32-33-A0-A1-A2-A3-B0-B1-B2-B3-C1-C2-C3-C4
Header size: 48
Breakdown:
Padding:          00-00-00-00
Header:           00-00-00-00
Method table ref: B0-93-BF-7C-FC-7F-00-00
Array size:       18-00-00-00-00-00-00-00 (24)
Dimension 0 len:  02-00-00-00 (2)
Dimension 1 len:  03-00-00-00 (3)
Dimension 2 len:  04-00-00-00 (4)
Dimension 0 low:  00-00-00-00 (0)
Dimension 1 low:  00-00-00-00 (0)
Dimension 2 low:  00-00-00-00 (0)
Array data:       10-11-12-13-20-21-22-23-30-31-32-33-A0-A1-A2-A3-B0-B1-B2-B3-C1-C2-C3-C4

Looking at the assembly code, edx is first dimension and r8d (later rcx) second dimension.

L0000: sub rsp, 0x28              ; Reserve space on stack
L0004: mov rax, [rcx+0x10]        ; Get array from reference
L0008: sub edx, [rax+0x18]        ; Subtract lower bound of dimension 0 from index 0
L000b: cmp edx, [rax+0x10]        ; Check that result is within dimension 0 bounds
L000e: jae short L0032            ; If larger, jump to exception
L0010: mov rcx, r8                ; (Copy)
L0013: sub ecx, [rax+0x1c]        ; Subtract lower bound of dimension 1 from index 1 (32-bit)
L0016: cmp ecx, [rax+0x14]        ; Check that result is within dimension 1 bounds
L0019: jae short L0032            ; If larger, jump to exception
L001b: mov r8d, [rax+0x14]        ; Copy dimension 1 length to r8d
L001f: imul r8, rdx               ; Multiply with index
L0023: mov rdx, rcx               ; (Copy)
L0026: add rdx, r8                ; Add second dimension
L0029: mov eax, [rax+rdx*4+0x20]  ; Get data from array at rax+32+index*sizeof(int)
L002d: add rsp, 0x28              ; Free stack space
L0031: ret                        ; Return
L0032: call 0x00007ff94aaaa370    ; Throw exception
L0037: int3                       ; Signal breakpoints

Even though C# always has 0 as lower bound for arrays, CLI supports other lower bounds. In Visual Basic for example you can specify lower bound 1 by “Option Base 1”. The assembly code shows this being calculated and checked on for every dimension.

Conclusion

For standard arrays (single dimension) C#/CLI has an optimized object that only checks if the index you are accessing are within the length of the array.

For multidimensional arrays there is no such optimization, and therefore it will also check lower bound of each dimension.

My singledimensional test has the added advantage of not checking if all dimensions are out of bounds, only the total length of the array is checked.

So to be fast and somewhat safe, consider going for jagged array. If you feel comfortable with spinning your own, do that.

For example:

public class MyArray2D
{
	public readonly int[] Array;
	public readonly int Width;
	public readonly int Height;

	public MyArray2D(int width, int height)
	{
		Width = width;
		Height = height;
		Array = new int[Width * Height];
	}

	/// <summary>
	/// Access array by 2D coordinates.
	/// </summary>
	/// <param name="x">Horizontal (width) axis.</param>
	/// <param name="y">Vertical (height) axis.</param>
	/// <returns></returns>
	public int this[int x, int y]
	{
		get => Array[y * Width + x];
		set => Array[y * Width + x] = value;
	}
}

2 thoughts on “Why C# multidimensional arrays are slow”

Nicholas Petersen

December 23, 2021 at 11:44 pm

You, sir, are a genius. Given this amazing work though I wish you could have made the conclusion a bit more clear. Especially on the last line which could be interpreted a couple ways:
“So to be fast and somewhat safe, consider going for jagged array. <strong>If you feel comfortable with spinning your own</strong>, do that.”

By “If you feel comfortable <em>with spinning your own</em>“, do you mean constructing a singledimensional array (not a jagged array) and accessing like it’s a multidimenisional array, but using one’s own access calculations? Hey thanks!

Loading...

- tedd
  
  March 7, 2022 at 2:26 pm
  
  Thank you. 🙂
  
  Good point. I have added an example. This adds some overhead of class and indexers (methods, can be inlined), so how you choose to solve it depends on what you are doing of course.
  
  Loading...

Why C# multidimensional arrays are slow

Array

Multidimensional array

Jagged array

Benchmark

Digging into the code

Anatomy of an array

Single dimensional array access

Jagged array access

Multidimensional arrays

Conclusion

Like this:

Related

2 thoughts on “Why C# multidimensional arrays are slow”

Leave a ReplyCancel reply

Array

Multidimensional array

Jagged array

Benchmark

Digging into the code

Anatomy of an array

Single dimensional array access

Jagged array access

Multidimensional arrays

Conclusion

Share this:

Like this:

Related

2 thoughts on “Why C# multidimensional arrays are slow”

Leave a ReplyCancel reply

Discover more from Tedds blog