Benchmarking method execution through interface, base class, virtual, override, dynamic, reflection and expression trees.
In .Net there are many ways to execute a method. The fastest being the straight forward call to a static method. But how does its speed compare to other methods?
There are countless reasons why we sometimes can’t just make a direct call. We use interfaces and class inheritance for code structuring, code logic, readability, future compatibility, various patterns, module support, etc… In some rare cases we don’t even know they type of the object, or we need reflective access on the code itself.
With the exception of reflection, which is well-known to be slow, we usually don’t worry too much about the execution speed of a method call. There are other things that are far more important. As the test results shows the overhead of a call is in most cases negligible.
I had to skip DynamicMethod (IL injection) for now due to time constraints. I also considered direct IL injection/compiling C# code, loading an assembly that overrides base class virtual method or interface and executing that. But this would be a lot of work and is sort of covered by these tests, so that too was skipped.
Execution methods
To give an idea of how the different tests are performed I’m describing each execution type in code first.
Normal
public class MyClass {
public void Method() { }
}
MyClass myClass = new MyClass();
// Execute
myClass.Method();
Interface
Inheriting an interface may force a search through all types from front to back to find method.
public interface MyInterface {
void Method();
}
public class MyClass {
public void Method() { }
}
MyInterface myClass = new MyClass();
// Execute
myClass.Method();
Non-virtual method in base class
Inheriting an object may force a search through all types from front to back to find method.
public class MyBase {
public void Method() { }
}
public class MyClass: MyBase { }
MyClass myClass = new MyClass();
// Execute
myClass.Method();
Virtual method
A virtual method requires a lookup table, causing overhead when being called.
public class MyBase {
public virtual void Method() { }
}
public class MyClass: MyBase { }
MyBase myClass = new MyClass();
// Execute
myClass.Method();
Virtual Override
Similarly to interfaces, a method in a base class can be overriden.
public class MyBase {
public virtual void Method() { }
}
public class MyClass: MyBase {
public override void Method() { }
}
MyBase myClass = new MyClass();
// Execute
myClass.Method();
Dynamic
Dynamic was introduced in .Net 4.5 and allows you to skip compile time checking of the method. The execution is late bound, and therefore has more overhead than a normal exection.
public class MyClass {
public void Method() { }
}
dynamic myClass = new MyClass();
// Execute
myClass.Method();
Lambda
Lambdas can be used to pass execution in variables as Action or Func<>.
public class MyClass {
public void Method() { }
}
var myClass = new MyClass();
Action lambda = () => myClass.Method();
// Execute
lambda.Invoke();
Delegate
Delegates allows you to reference any method with a matching signature
public class MyClass {
public static void Method() { }
}
public delegate void MethodDelegate();
var myClass = new MyClass();
MethodDelegate methodDelegate = myClass.Method;
// Execute
myClass.Invoke();
Reflection
We use .Net’s System.Reflection to find MethodInfo. This MethodInfo should be cached as looking it up is slow. The advantage of this method is that the class doesn’t have to inherit any interface or class. We can simply check if a method exists, and invoke it if so. The disadvantage is that it is very slow.
public class MyClass {
public void Method() { }
}
MyClass myClass = new MyClass();
var methodInfo = class.GetType().GetMethod("Method");
// Execute
methodInfo.Invoke(myClass, null);
Static
This one is not in line with the problem we are looking into. I just added it to see if there were any surprises (none).
public class MyClass {
public static void Method() { }
}
// Execute
MyClass.Method();
Multiple inheritance
I added up to 10 levels of inheritance on interface and base class to see what effect that would have. Particularly the difference in what variable type we use. We can mostly infer this from logic, but its interesting to see.
Benchmarking tool and source
I’m using BenchmarkDotNet for benchmarking. Source code for the benchmark is located here.
Note that the time scale we are talking about here is negligible. 1ns = 0.000001ms, that is 1/1 000 000 000 of a second. So even a “bad” result of one whole ns can execute 1 billion times per second. For example the inlined static method executed on average 128,531,926,713.1 times per second during testing. Then we have not included the overhead of test itself, though benchmarkdotnet has removed overhead of execution. I had to put something in there to avoid it being optimized away completely. So every execution returns an integer that is discarded, as can be seen by the “pop” before the “ret” in MSIL below.
This is however in a tight test-loop where everything is in the CPU’s L1 cache. Fetching more complex executions that requires access to multiple memory areas will be far slower. Though we are still at a negligible duration.
Result
- .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3260.0
- Windows Server 2016 v1607, HyperV-enabled
- Intel Core i7-3770K @ 3.5GHz, 4 cores, HT, 256KB, L1 cache, 1MB L2 cache, 8MB L3 cache
- 4x16GB dual-channel 1333MHz RAM
| Rank | Ratio | Method | Mean | Median | Error | StdErr | Gen 0/1k Op | Allocated Memory/Op |
|---|---|---|---|---|---|---|---|---|
| 1 | 0.11 | ‘Self->10 int’ | 0.0078 ns | 0.0000 ns | 0.0032 ns | 0.0010 ns | – | – |
| 1 | 0.12 | ‘*10 base’ | 0.0086 ns | 0.0000 ns | 0.0054 ns | 0.0016 ns | – | – |
| 1 | 0.24 | ‘Self->1 base’ | 0.0173 ns | 0.0000 ns | 0.0113 ns | 0.0034 ns | – | – |
| 1 | 0.29 | ‘*1 base->10 base’ | 0.0205 ns | 0.0000 ns | 0.0107 ns | 0.0032 ns | – | – |
| 1 | 0.34 | ‘*1 base’ | 0.0244 ns | 0.0000 ns | 0.0113 ns | 0.0034 ns | – | – |
| 1 | 1.00 | *Normal | 0.0715 ns | 0.0000 ns | 0.0198 ns | 0.0060 ns | – | – |
| 1 | 1.12 | *Static | 0.0799 ns | 0.0000 ns | 0.0249 ns | 0.0075 ns | – | – |
| 2 | 8.45 | ‘*1 base->virt abs override’ | 0.6045 ns | 0.6416 ns | 0.0109 ns | 0.0033 ns | – | – |
| 3 | 8.75 | ‘Self->10 base virt override’ | 0.6257 ns | 0.6130 ns | 0.0254 ns | 0.0077 ns | – | – |
| 3 | 9.31 | ‘*1 base->virt override’ | 0.6660 ns | 0.6116 ns | 0.0279 ns | 0.0084 ns | – | – |
| 4 | 9.80 | ‘*Self->virt abs override’ | 0.7004 ns | 0.7225 ns | 0.0193 ns | 0.0058 ns | – | – |
| 5 | 10.06 | ‘*10 base->virt no override’ | 0.7192 ns | 0.6768 ns | 0.0236 ns | 0.0071 ns | – | – |
| 6 | 10.60 | ‘*Self->virt override’ | 0.7579 ns | 0.7227 ns | 0.0338 ns | 0.0102 ns | – | – |
| 7 | 11.07 | ‘*1 base->virt no override’ | 0.7914 ns | 0.6625 ns | 0.0478 ns | 0.0145 ns | – | – |
| 8 | 15.99 | ‘*1 int’ | 1.1436 ns | 1.1123 ns | 0.0245 ns | 0.0074 ns | – | – |
| 9 | 16.38 | ‘*10 int’ | 1.1711 ns | 1.1909 ns | 0.0174 ns | 0.0053 ns | – | – |
| 10 | 16.57 | ‘Self->1 int’ | 1.1846 ns | 1.1314 ns | 0.0213 ns | 0.0064 ns | – | – |
| 11 | 17.21 | ‘1 int->10 int’ | 1.2307 ns | 1.2219 ns | 0.0206 ns | 0.0063 ns | – | – |
| 12 | 17.61 | *Lambda | 1.2588 ns | 1.1337 ns | 0.0464 ns | 0.0141 ns | – | – |
| 13 | 17.76 | *Delegate | 1.2701 ns | 1.2387 ns | 0.0253 ns | 0.0077 ns | – | – |
| 14 | 113.59 | ‘*Expression Tree’ | 8.1217 ns | 7.8778 ns | 0.1324 ns | 0.0401 ns | – | – |
| 15 | 181.58 | *Dynamic | 12.9831 ns | 11.3489 ns | 0.2841 ns | 0.0861 ns | 0.0057 | 24 B |
| 16 | 1989.81 | *Reflection | 142.2713 ns | 132.4650 ns | 2.2464 ns | 0.6805 ns | 0.0055 | 24 B |
Rank 1: Static, instance and non-virtual
These are the calls that are optimized away by compiler. Depending on complexity of the call they are either completely inlined, or they are one jump away.
Use the
staticmodifier to declare a static member, which belongs to the type itself rather than to a specific object.Normal (call to instance method).
Microsoft (static) Microsoft (inheritance)
Base class (call to non-virtual base class method).
Even though there is a 10x spread in speed the measurements for these are so small that it is difficult to tell which one is a winner. Results vary with execution. Judging from the ASM, Static should be fastest. But the rest of them would be more affected by other factors such as memory alignment.
Static
The IL shows us that this is a call to a known method with static address, and the ASM shows us that this was completely inlined. We can see from the IL that even without inlining this is as efficient as it gets.
IL
.method public hidebysig instance void Call_StaticClass() cil managed
{
.maxstack 8
L_0000: call int32 Tedd.DynamicBindingBenchmark.Tests.Classes.StaticClass::Method()
L_0005: pop
L_0006: ret
}
ASM
ret
Normal (direct instance call)
Calling an instance requires first loading instance reference, then callvirt on the method.
IL
.method public hidebysig instance void Call_NormalClass() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldfld class Tedd.DynamicBindingBenchmark.Tests.Classes.NormalClass Tedd.DynamicBindingBenchmark.Tests.CallTests::_normalClass
L_0006: callvirt instance int32 Tedd.DynamicBindingBenchmark.Tests.Classes.NormalClass::Method()
L_000b: pop
L_000c: ret
}
ASM
mov rax,qword ptr [rcx+90h]
mov eax,dword ptr [rax+8]
ret
Base class without virtual
We see the same as Normal call.
IL
.method public hidebysig instance void Call_BaseClass10_10() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldfld class Tedd.DynamicBindingBenchmark.Tests.Classes.BaseClass10Class Tedd.DynamicBindingBenchmark.Tests.CallTests::_baseClass10_10
L_0006: callvirt instance int32 Tedd.DynamicBindingBenchmark.Tests.Classes.BaseClass1Class::Method()
L_000b: pop
L_000c: ret
}
ASM
mov rax,qword ptr [rcx+48h]
mov eax,dword ptr [rax+8]
ret
Rank 2-7: Derived virtual method
At 8.5-11 times slower than a normal instance method execution we find methods marked virtual in base class.
The
Microsoftvirtualkeyword is used to modify a method, property, indexer, or event declaration and allow for it to be overridden in a derived class.
Base class with virtual, derived not override
With only 1 level from derived to base the lookup is the fastest among the derived calls to virtual. From the ASM we see that it does two extra jumps by following lookups, compared to the same call without virtual.
IL
.method public hidebysig instance void Call_BaseClass1_1NotOverride() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldfld class Tedd.DynamicBindingBenchmark.Tests.Classes.BaseClass1ClassVirtual Tedd.DynamicBindingBenchmark.Tests.CallTests::_baseClassVirtualNotOverride1_1
L_0006: callvirt instance int32 Tedd.DynamicBindingBenchmark.Tests.Classes.BaseClass1ClassVirtual::Method()
L_000b: pop
L_000c: ret
}
ASM
mov rcx,qword ptr [rcx+60h]
mov rax,qword ptr [rcx]
mov rax,qword ptr [rax+40h]
mov rax,qword ptr [rax+20h]
; Content of Method: (returning an int32)
mov eax,dword ptr [rcx+8]
Base class with virtual, derived override / exact type
We see the same pattern for all of these. They are never inlined and they have two extra jumps in lookup. But if they type is specified directly then there can’t be any permutation, then the compiler is able to inline the method body. For example in the case of override.
The same setup, with variable type set to the base gives same result but without the inlining. (IL/ASM not shown here as it would be a bit redundant.)
IL
.method public hidebysig instance void Call_BaseClass1_1Override() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldfld class Tedd.DynamicBindingBenchmark.Tests.Classes.BaseClass1ClassVirtual Tedd.DynamicBindingBenchmark.Tests.CallTests::_baseClassVirtualOverride1_1
L_0006: callvirt instance int32 Tedd.DynamicBindingBenchmark.Tests.Classes.BaseClass1ClassVirtual::Method()
L_000b: pop
L_000c: ret
}
ASM
mov rcx,qword ptr [rcx+58h]
mov rax,qword ptr [rcx]
mov rax,qword ptr [rax+40h]
mov rax,qword ptr [rax+20h]
Rank 8-11: Interfaces
At 15-17 times slower than a normal instance method execution we find methods accessed through interfaces.
An interface contains only the signatures of methods, properties, events or indexers. A class or struct that implements the interface must implement the members of the interface that are specified in the interface definition.
Microsoft
Method defined in interface
All of the calls to interface method looks the same regardless of how many levels of interface it has to go through. This seems to be because compiler can infer which interface has the method and call it directly. Therefore there is no additional overhead on having multiple levels on interface between class and interface with method definition.
IL
.method public hidebysig instance void Call_Interface10_1() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldfld class Tedd.DynamicBindingBenchmark.Tests.Interfaces.Interface1 Tedd.DynamicBindingBenchmark.Tests.CallTests::_interface10_1
L_0006: callvirt instance int32 Tedd.DynamicBindingBenchmark.Tests.Interfaces.Interface1::Method()
L_000b: pop
L_000c: ret
}
ASM
mov rcx,qword ptr [rcx+20h]
mov r11,7FF7E4D304B0h
mov rax,qword ptr [r11]
cmp dword ptr [rcx],ecx
Rank 12: Lambda
17 times slower than a normal instance method execution. Lambda has an overhead of execution System.Func<T>.Invoke(). This gives us two extra instructions for calculating target address.
A lambda expression is an anonymous function that you can use to create delegates or expression tree types. By using lambda expressions, you can write local functions that can be passed as arguments or returned as the value of function calls.
Microsoft
IL
.method public hidebysig instance void Call_NormalClassLambda() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldfld class [mscorlib]System.Func1<int32> Tedd.DynamicBindingBenchmark.Tests.CallTests::_normalClassLambda
L_0006: callvirt instance !0 [mscorlib]System.Func1<int32>::Invoke()
L_000b: pop
L_000c: ret
}
ASM
mov rax,qword ptr [rcx+0A8h]
lea rcx,[rax+8]
mov rcx,qword ptr [rcx]
mov rax,qword ptr [rax+18h]
Rank 13: Delegate
17 times slower than a normal instance method execution. Although the IL differs, the final ASM looks the same as with lambda.
A delegate is a type that represents references to methods with a particular parameter list and return type. When you instantiate a delegate, you can associate its instance with any method with a compatible signature and return type. You can invoke (or call) the method through the delegate instance.
Microsoft
IL
.method public hidebysig instance void Call_NormalClassDelegate() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldfld class Tedd.DynamicBindingBenchmark.Tests.CallTests/MethodDelegate Tedd.DynamicBindingBenchmark.Tests.CallTests::_normalClassDelegate
L_0006: callvirt instance int32 Tedd.DynamicBindingBenchmark.Tests.CallTests/MethodDelegate::Invoke()
L_000b: pop
L_000c: ret
}
ASM
mov rax,qword ptr [rcx+0B8h]
lea rcx,[rax+8]
mov rcx,qword ptr [rcx]
mov rax,qword ptr [rax+18h]
Rank 14: Expression tree
114 times slower than a normal instance method execution. The call to execute the expression tree is as expected the same as with lambda, a call to System.Func<T>.Invoke(). But the execution happening behind the lambda is considerably slower. I haven’t dug into the ASM here so I’ll leave it at that for now.
Rank 15: Dynamic
180 times slower than a normal instance method execution.
Dynamic causes 24 bytes of memory allocation for GC, which I suspect is because of boxing of return type (int) via Object. It is 24 bytes because int takes 4 bytes, x64 address takes 16 bytes and .Net allocates memory in sizes of 12 and 24 so the next minimum would be 24. More details in John Skeets blog post of memory and strings.
C# 4 introduces a new type,
Microsoftdynamic. The type is a static type, but an object of typedynamicbypasses static type checking.
IL
.method public hidebysig instance void Call_NormalClassDynamic() cil managed
{
.maxstack 9
L_0000: ldsfld class [System.Core]System.Runtime.CompilerServices.CallSite1<class [mscorlib]System.Action2<class [System.Core]System.Runtime.CompilerServices.CallSite, object>> Tedd.DynamicBindingBenchmark.Tests.CallTests/<>o__43::<>p__0
L_0005: brtrue.s L_003b
L_0007: ldc.i4 0x100
L_000c: ldstr "Method"
L_0011: ldnull
L_0012: ldtoken Tedd.DynamicBindingBenchmark.Tests.CallTests
L_0017: call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
L_001c: ldc.i4.1
L_001d: newarr [Microsoft.CSharp]Microsoft.CSharp.RuntimeBinder.CSharpArgumentInfo
L_0022: dup
L_0023: ldc.i4.0
L_0024: ldc.i4.0
L_0025: ldnull
L_0026: call class [Microsoft.CSharp]Microsoft.CSharp.RuntimeBinder.CSharpArgumentInfo [Microsoft.CSharp]Microsoft.CSharp.RuntimeBinder.CSharpArgumentInfo::Create(valuetype [Microsoft.CSharp]Microsoft.CSharp.RuntimeBinder.CSharpArgumentInfoFlags, string)
L_002b: stelem.ref
L_002c: call class [System.Core]System.Runtime.CompilerServices.CallSiteBinder [Microsoft.CSharp]Microsoft.CSharp.RuntimeBinder.Binder::InvokeMember(valuetype [Microsoft.CSharp]Microsoft.CSharp.RuntimeBinder.CSharpBinderFlags, string, class [mscorlib]System.Collections.Generic.IEnumerable1<class [mscorlib]System.Type>, class [mscorlib]System.Type, class [mscorlib]System.Collections.Generic.IEnumerable1<class [Microsoft.CSharp]Microsoft.CSharp.RuntimeBinder.CSharpArgumentInfo>)
L_0031: call class [System.Core]System.Runtime.CompilerServices.CallSite1<!0> [System.Core]System.Runtime.CompilerServices.CallSite1<class [mscorlib]System.Action2<class [System.Core]System.Runtime.CompilerServices.CallSite, object>>::Create(class [System.Core]System.Runtime.CompilerServices.CallSiteBinder)
L_0036: stsfld class [System.Core]System.Runtime.CompilerServices.CallSite1<class [mscorlib]System.Action2<class [System.Core]System.Runtime.CompilerServices.CallSite, object>> Tedd.DynamicBindingBenchmark.Tests.CallTests/<>o__43::<>p__0
L_003b: ldsfld class [System.Core]System.Runtime.CompilerServices.CallSite1<class [mscorlib]System.Action2<class [System.Core]System.Runtime.CompilerServices.CallSite, object>> Tedd.DynamicBindingBenchmark.Tests.CallTests/<>o__43::<>p__0
L_0040: ldfld !0 [System.Core]System.Runtime.CompilerServices.CallSite1<class [mscorlib]System.Action2<class [System.Core]System.Runtime.CompilerServices.CallSite, object>>::Target
L_0045: ldsfld class [System.Core]System.Runtime.CompilerServices.CallSite1<class [mscorlib]System.Action2<class [System.Core]System.Runtime.CompilerServices.CallSite, object>> Tedd.DynamicBindingBenchmark.Tests.CallTests/<>o__43::<>p__0
L_004a: ldarg.0
L_004b: ldfld object Tedd.DynamicBindingBenchmark.Tests.CallTests::_normalClassDynamic
L_0050: callvirt instance void [mscorlib]System.Action2<class [System.Core]System.Runtime.CompilerServices.CallSite, object>::Invoke(!0, !1)
L_0055: ret
}
ASM
cmp qword ptr [12CC9718h],0
jne M00_L00
mov rcx,7FF7E4F18790h
call clr!InstallCustomModule+0x2320
mov rdi,rax
mov rcx,offset Microsoft_CSharp_ni+0x1af8a
mov edx,1
call clr+0x2690
mov rbx,rax
mov rcx,offset Microsoft_CSharp_ni+0xc0230
call clr+0x2540
mov r8,rax
xor ecx,ecx
mov dword ptr [r8+10h],ecx
xor ecx,ecx
mov qword ptr [r8+8],rcx
mov rcx,rbx
xor edx,edx
call clr+0x4180
mov qword ptr [rsp+20h],rbx
mov rdx,qword ptr [12CC38C0h]
mov r9,rdi
mov ecx,100h
xor r8d,r8d
call Microsoft.CSharp.RuntimeBinder.Binder.InvokeMember(Microsoft.CSharp.RuntimeBinder.CSharpBinderFlags, System.String, System.Collections.Generic.IEnumerable1, System.Type, System.Collections.Generic.IEnumerable1)
mov rdx,rax
mov rcx,7FF7E4F72938h
call System.Runtime.CompilerServices.CallSite1[[System.__Canon, mscorlib]].Create(System.Runtime.CompilerServices.CallSiteBinder)
mov ecx,12CC9718h
mov rdx,rax
call clr+0x3fc0
M00_L00
mov rdx,qword ptr [12CC9718h]
mov rax,qword ptr [rdx+18h]
lea rcx,[rax+8]
mov rcx,qword ptr [rcx]
mov r8,qword ptr [rsi+0C0h]
mov rax,qword ptr [rax+18h]
; Microsoft.CSharp.RuntimeBinder.Binder.InvokeMember(Microsoft.CSharp.RuntimeBinder.CSharpBinderFlags, System.String, System.Collections.Generic.IEnumerable1, System.Type, System.Collections.Generic.IEnumerable1)
test cl,2
setne al
movzx eax,al
test cl,4
setne dl
movzx edx,dl
test ecx,100h
setne cl
movzx ecx,cl
xor ebp,ebp
test eax,eax
je Microsoft_CSharp_ni+0x787fc
mov ebp,1
test edx,edx
je Microsoft_CSharp_ni+0x78803
or ebp,2
test ecx,ecx
je Microsoft_CSharp_ni+0x7880a
or ebp,4
; Microsoft.CSharp.RuntimeBinder.CSharpInvokeMemberBinder::.ctor(Microsoft.CSharp.RuntimeBinder.CSharpCallFlags,System.String,System.Type,System.Collections.Generic.IEnumerable1,System.Collections.Generic.IEnumerable1)
lea rcx,[Microsoft_CSharp_ni+0xc03b8
call Microsoft_CSharp_ni+0x62940
mov r14,rax
mov qword ptr [rsp+20h],rdi
mov rdi,qword ptr [rsp+80h]
mov qword ptr [rsp+28h],rdi
mov rcx,r14
mov edx,ebp
mov r8,rsi
mov r9,rbx
call Microsoft.CSharp.RuntimeBinder.CSharpInvokeMemberBinder..ctor(Microsoft.CSharp.RuntimeBinder.CSharpCallFlags, System.String, System.Type, System.Collections.Generic.IEnumerable1, System.Collections.Generic.IEnumerable1)
mov rax,r14
; System.Runtime.CompilerServices.CallSite1[[System.__Canon, mscorlib]].Create(System.Runtime.CompilerServices.CallSiteBinder)
mov rcx,qword ptr [rdi+30h]
mov rcx,qword ptr [rcx]
mov rcx,qword ptr [rcx]
test cl,1
je System_Core_ni+0x318515
mov rcx,qword ptr [rcx-1]
call System_Core_ni+0x2abf70
mov rbx,rax
mov rcx,qword ptr [System_Core_ni+0x96e40
call System_Core_ni+0x2abf70
mov rdx,rax
mov rcx,rbx
mov rax,qword ptr [rbx]
mov rax,qword ptr [rax+0D8h]
call qword ptr [rax+30h]
test al,al
je System_Core_ni+0x92cd2c
mov rcx,rdi
call System_Core_ni+0x2abf90
mov rdi,rax
lea rcx,[rdi+8]
mov rdx,rsi
call System_Core_ni+0x2abf88
mov rcx,rdi
call System.Runtime.CompilerServices.CallSite`1[[System.__Canon, mscorlib]].GetUpdateDelegate()
lea rcx,[rdi+18h]
mov rdx,rax
call System_Core_ni+0x2abf88
mov rax,rdi
add rsp,30h
pop rbx
pop rsi
pop rdi
ret
int 3
int 3
int 3
int 3
int 3
int 3
push rsi
sub rsp,30h
mov qword ptr [rsp+28h],rcx
mov rsi,rcx
mov rcx,qword ptr [rsi]
mov rdx,qword ptr [rcx+30h]
mov rdx,qword ptr [rdx]
mov rax,qword ptr [rdx+8]
Rank 16: Reflection
A whopping 2000 times slower than a normal instance method execution and 20 000 times slower than a static method. This is despite the fact that we have cached MethodInfo prior to test execution.
Reflection provides objects (of type Type) that describe assemblies, modules and types. You can use reflection to dynamically create an instance of a type, bind the type to an existing object, or get the type from an existing object and invoke its methods or access its fields and properties
Microsoft
Reflection causes 24 bytes of memory allocation for GC, which I suspect is because of boxing of return type (int) via Object. It is 24 bytes because int takes 4 bytes, x64 address takes 16 bytes and .Net allocates memory in sizes of 12 and 24 so the next minimum would be 24. More details in John Skeets blog post of memory and strings.
IL
.method public hidebysig instance void Call_NormalClassReflection() cil managed
{
.maxstack 8
L_0000: ldarg.0
L_0001: ldfld class [mscorlib]System.Reflection.MethodInfo Tedd.DynamicBindingBenchmark.Tests.CallTests::_normalClassReflectionMethodInfo
L_0006: ldarg.0
L_0007: ldfld class Tedd.DynamicBindingBenchmark.Tests.Classes.NormalClass Tedd.DynamicBindingBenchmark.Tests.CallTests::_normalClassReflectionClass
L_000c: ldnull
L_000d: callvirt instance object [mscorlib]System.Reflection.MethodBase::Invoke(object, object[])
L_0012: pop
L_0013: ret
}
ASM
mov rax,qword ptr [rcx+0D0h]
mov rdx,qword ptr [rcx+0C8h]
xor ecx,ecx
mov qword ptr [rsp+20h],rcx
mov qword ptr [rsp+28h],rcx
mov rcx,rax
xor r8d,r8d
xor r9d,r9d
mov rax,qword ptr [rax]
mov rax,qword ptr [rax+58h]
call qword ptr [rax+20h]
nop
Summary
The time it takes to execute of most of these techniques are well within negligible time frames. In fact, they are so small that it is difficult to accurately measure them. Your results may vary from mine.
The tests have been executed multiple times, and in multiple rounds, always giving the same result within a small margin of error.
What we learned
- There are certain ways of execution that are very slow. Reflection coming in at a clear last place. It is work noting that all of the “losers” have other strengths.
- Dynamic and Reflection can also cause memory allocations that GC has to handle.
- Static methods, instance method or non-virtual base class methods are candidates to be inlined. In some other scenarios it is not possible for the compiler to consider inlining.