Benchmarking dynamic method invocation in .Net

By | 2018-12-30

Benchmarking method execution through interface, base class, virtual, override, dynamic, reflection and expression trees.

In .Net there are many ways to execute a method. The fastest being the straight forward call to a static method. But how does its speed compare to other methods?

There are countless reasons why we sometimes can’t just make a direct call. We use interfaces and class inheritance for code structuring, code logic, readability, future compatibility, various patterns, module support, etc… In some rare cases we don’t even know they type of the object, or we need reflective access on the code itself.

With the exception of reflection, which is well-known to be slow, we usually don’t worry too much about the execution speed of a method call. There are other things that are far more important. As the test results shows the overhead of a call is in most cases negligible.

I had to skip DynamicMethod (IL injection) for now due to time constraints. I also considered direct IL injection/compiling C# code, loading an assembly that overrides base class virtual method or interface and executing that. But this would be a lot of work and is sort of covered by these tests, so that too was skipped.

Execution methods

To give an idea of how the different tests are performed I’m describing each execution type in code first.

Normal

Interface

Inheriting an interface may force a search through all types from front to back to find method.

Non-virtual method in base class

Inheriting an object may force a search through all types from front to back to find method.

Virtual method

A virtual method requires a lookup table, causing overhead when being called.

Virtual Override

Similarly to interfaces, a method in a base class can be overriden.

Dynamic

Dynamic was introduced in .Net 4.5 and allows you to skip compile time checking of the method. The execution is late bound, and therefore has more overhead than a normal exection.

Lambda

Lambdas can be used to pass execution in variables as Action or Func<>.

Delegate

Delegates allows you to reference any method with a matching signature

Reflection

We use .Net’s System.Reflection to find MethodInfo. This MethodInfo should be cached as looking it up is slow. The advantage of this method is that the class doesn’t have to inherit any interface or class. We can simply check if a method exists, and invoke it if so. The disadvantage is that it is very slow.

Static

This one is not in line with the problem we are looking into. I just added it to see if there were any surprises (none).

Multiple inheritance

I added up to 10 levels of inheritance on interface and base class to see what effect that would have. Particularly the difference in what variable type we use. We can mostly infer this from logic, but its interesting to see.

Benchmarking tool and source

I’m using BenchmarkDotNet for benchmarking. Source code for the benchmark is located here.

Note that the time scale we are talking about here is negligible. 1ns = 0.000001ms, that is 1/1 000 000 000 of a second. So even a “bad” result of one whole ns can execute 1 billion times per second. For example the inlined static method executed on average 128,531,926,713.1 times per second during testing. Then we have not included the overhead of test itself, though benchmarkdotnet has removed overhead of execution. I had to put something in there to avoid it being optimized away completely. So every execution returns an integer that is discarded, as can be seen by the “pop” before the “ret” in MSIL below.

This is however in a tight test-loop where everything is in the CPU’s L1 cache. Fetching more complex executions that requires access to multiple memory areas will be far slower. Though we are still at a negligible duration.

Result

  • .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3260.0
  • Windows Server 2016 v1607, HyperV-enabled
  • Intel Core i7-3770K @ 3.5GHz, 4 cores, HT, 256KB, L1 cache, 1MB L2 cache, 8MB L3 cache
  • 4x16GB dual-channel 1333MHz RAM

Rank Ratio Method Mean Median Error StdErr Gen 0/1k Op Allocated Memory/Op
1 0.11 ‘Self->10 int’ 0.0078 ns 0.0000 ns 0.0032 ns 0.0010 ns
1 0.12 ‘*10 base’ 0.0086 ns 0.0000 ns 0.0054 ns 0.0016 ns
1 0.24 ‘Self->1 base’ 0.0173 ns 0.0000 ns 0.0113 ns 0.0034 ns
1 0.29 ‘*1 base->10 base’ 0.0205 ns 0.0000 ns 0.0107 ns 0.0032 ns
1 0.34 ‘*1 base’ 0.0244 ns 0.0000 ns 0.0113 ns 0.0034 ns
1 1.00 *Normal 0.0715 ns 0.0000 ns 0.0198 ns 0.0060 ns
1 1.12 *Static 0.0799 ns 0.0000 ns 0.0249 ns 0.0075 ns
2 8.45 ‘*1 base->virt abs override’ 0.6045 ns 0.6416 ns 0.0109 ns 0.0033 ns
3 8.75 ‘Self->10 base virt override’ 0.6257 ns 0.6130 ns 0.0254 ns 0.0077 ns
3 9.31 ‘*1 base->virt override’ 0.6660 ns 0.6116 ns 0.0279 ns 0.0084 ns
4 9.80 ‘*Self->virt abs override’ 0.7004 ns 0.7225 ns 0.0193 ns 0.0058 ns
5 10.06 ‘*10 base->virt no override’ 0.7192 ns 0.6768 ns 0.0236 ns 0.0071 ns
6 10.60 ‘*Self->virt override’ 0.7579 ns 0.7227 ns 0.0338 ns 0.0102 ns
7 11.07 ‘*1 base->virt no override’ 0.7914 ns 0.6625 ns 0.0478 ns 0.0145 ns
8 15.99 ‘*1 int’ 1.1436 ns 1.1123 ns 0.0245 ns 0.0074 ns
9 16.38 ‘*10 int’ 1.1711 ns 1.1909 ns 0.0174 ns 0.0053 ns
10 16.57 ‘Self->1 int’ 1.1846 ns 1.1314 ns 0.0213 ns 0.0064 ns
11 17.21 ‘1 int->10 int’ 1.2307 ns 1.2219 ns 0.0206 ns 0.0063 ns
12 17.61 *Lambda 1.2588 ns 1.1337 ns 0.0464 ns 0.0141 ns
13 17.76 *Delegate 1.2701 ns 1.2387 ns 0.0253 ns 0.0077 ns
14 113.59 ‘*Expression Tree’ 8.1217 ns 7.8778 ns 0.1324 ns 0.0401 ns
15 181.58 *Dynamic 12.9831 ns 11.3489 ns 0.2841 ns 0.0861 ns 0.0057 24 B
16 1989.81 *Reflection 142.2713 ns 132.4650 ns 2.2464 ns 0.6805 ns 0.0055 24 B
 

Rank 1: Static, instance and non-virtual

These are the calls that are optimized away by compiler. Depending on complexity of the call they are either completely inlined, or they are one jump away.

Use the static modifier to declare a static member, which belongs to the type itself rather than to a specific object.

Normal (call to instance method).
Base class (call to non-virtual base class method).

Microsoft (static) Microsoft (inheritance)

Even though there is a 10x spread in speed the measurements for these are so small that it is difficult to tell which one is a winner. Results vary with execution. Judging from the ASM, Static should be fastest. But the rest of them would be more affected by other factors such as memory alignment.

Static

The IL shows us that this is a call to a known method with static address, and the ASM shows us that this was completely inlined. We can see from the IL that even without inlining this is as efficient as it gets.

IL

ASM

Normal (direct instance call)

Calling an instance requires first loading instance reference, then callvirt on the method.

IL

ASM

Base class without virtual

We see the same as Normal call.

IL

ASM

Rank 2-7: Derived virtual method

At 8.5-11 times slower than a normal instance method execution we find methods marked virtual in base class.

The virtual keyword is used to modify a method, property, indexer, or event declaration and allow for it to be overridden in a derived class.

Microsoft

Base class with virtual, derived not override

With only 1 level from derived to base the lookup is the fastest among the derived calls to virtual. From the ASM we see that it does two extra jumps by following lookups, compared to the same call without virtual.

IL

ASM


Base class with virtual, derived override / exact type

We see the same pattern for all of these. They are never inlined and they have two extra jumps in lookup. But if they type is specified directly then there can’t be any permutation, then the compiler is able to inline the method body. For example in the case of override.

The same setup, with variable type set to the base gives same result but without the inlining. (IL/ASM not shown here as it would be a bit redundant.)

IL

ASM

Rank 8-11: Interfaces

At 15-17 times slower than a normal instance method execution we find methods accessed through interfaces.

An interface contains only the signatures of methodspropertiesevents or indexers. A class or struct that implements the interface must implement the members of the interface that are specified in the interface definition.

Microsoft

Method defined in interface

All of the calls to interface method looks the same regardless of how many levels of interface it has to go through. This seems to be because compiler can infer which interface has the method and call it directly. Therefore there is no additional overhead on having multiple levels on interface between class and interface with method definition.

IL

ASM

Rank 12: Lambda

17 times slower than a normal instance method execution. Lambda has an overhead of execution System.Func<T>.Invoke(). This gives us two extra instructions for calculating target address.

A lambda expression is an anonymous function that you can use to create delegates or expression tree types. By using lambda expressions, you can write local functions that can be passed as arguments or returned as the value of function calls.

Microsoft

IL

ASM

Rank 13: Delegate

17 times slower than a normal instance method execution. Although the IL differs, the final ASM looks the same as with lambda.

delegate is a type that represents references to methods with a particular parameter list and return type. When you instantiate a delegate, you can associate its instance with any method with a compatible signature and return type. You can invoke (or call) the method through the delegate instance.

Microsoft

IL

ASM

Rank 14: Expression tree

114 times slower than a normal instance method execution. The call to execute the expression tree is as expected the same as with lambda, a call to System.Func<T>.Invoke(). But the execution happening behind the lambda is considerably slower. I haven’t dug into the ASM here so I’ll leave it at that for now.

Rank 15: Dynamic

180 times slower than a normal instance method execution.

Dynamic causes 24 bytes of memory allocation for GC, which I suspect is because of boxing of return type (int) via Object. It is 24 bytes because int takes 4 bytes, x64 address takes 16 bytes and .Net allocates memory in sizes of 12 and 24 so the next minimum would be 24. More details in John Skeets blog post of memory and strings.

C# 4 introduces a new type, dynamic. The type is a static type, but an object of type dynamic bypasses static type checking.

Microsoft

IL

ASM

Rank 16: Reflection

A whopping 2000 times slower than a normal instance method execution and 20 000 times slower than a static method. This is despite the fact that we have cached MethodInfo prior to test execution.

Reflection provides objects (of type Type) that describe assemblies, modules and types. You can use reflection to dynamically create an instance of a type, bind the type to an existing object, or get the type from an existing object and invoke its methods or access its fields and properties

Microsoft

Reflection causes 24 bytes of memory allocation for GC, which I suspect is because of boxing of return type (int) via Object. It is 24 bytes because int takes 4 bytes, x64 address takes 16 bytes and .Net allocates memory in sizes of 12 and 24 so the next minimum would be 24. More details in John Skeets blog post of memory and strings.

IL

ASM

Summary

The time it takes to execute of most of these techniques are well within negligible time frames. In fact, they are so small that it is difficult to accurately measure them. Your results may vary from mine.

The tests have been executed multiple times, and in multiple rounds, always giving the same result within a small margin of error.

What we learned

  • There are certain ways of execution that are very slow. Reflection coming in at a clear last place. It is work noting that all of the “losers” have other strengths.
  • Dynamic and Reflection can also cause memory allocations that GC has to handle.
  • Static methods, instance method or non-virtual base class methods are candidates to be inlined. In some other scenarios it is not possible for the compiler to consider inlining.

Leave a Reply

Your email address will not be published. Required fields are marked *