Cost of method wrapper

Introduction

What happens if a method is just a wrapper for another method? Is the extra jump optimized away by compiler? Does it take much time? I thought I’d look into this and measure a bit. With the different compilers, Jits and runtimes I thought it would be fun to see what happens.

I’ll use a == operator implementation calling IEquatable<T>.Equals(T other)  for testing. A good practice when creating structs is to implement Object.Equals , GetHashCode() , IEquatable<T> , op_Equality (== operator) and op_Inequality  (!= operator). (Read more on Microsoft docs.) Since Object.Equals(object) , Equal(T other) , op_Equality  and op_Inequality  all more or less implement the same logic I figured one could just call the other. So whats the cost?

Note that this is not for optimization. The cost we are talking about here is negligle compared to the rest of your code, so this is purely for fun.

And this is not an attempt to measure the cost of an additional JMP, which is well documented and even varies depending on scenarios.

Test setup

Public variables and using Count for something after run, since thought I had some issue with RyuJit being too smart.

OpEqualsDirect

OpEqualsIndirect

Decompiled

OpEqualsDirect

This one is pretty much as we would expect.

Bytecode hex

IL

C#

OpEqualsIndirect

My first question was whether the extra jump would be optimized away. I can’t see that from decoding the method directly, but we see for reference that it loads the argument and calls Equals on the struct instance. Pretty much as expected.

Bytecode hex

IL

C#

Callee

So how about callee? Is it optimized away?

No, it is calling op_Equality  which in turn is calling IEquatable<T>.Equals(T other) .

Benchmark

All of this is before Jit. So lets see how it performs with some test runs.

For the test I’m doing a lightweight operation where I am comparing three int’s and it will fail on third.

Method Job Jit Runtime Mean Error StdDev Scaled ScaledSD Allocated
‘Direct op_equals’ LegacyJit-Mono LegacyJit Mono x64 13.4742 ns 1.0661 ns 0.0602 ns 1.00 0.00 N/A
‘Indirect op_equals’ LegacyJit-Mono LegacyJit Mono x64 15.5428 ns 6.9294 ns 0.3915 ns 1.15 0.02 N/A
‘Direct op_equals’ Llvm-Mono Llvm Mono x64 13.4156 ns 5.5125 ns 0.3115 ns 1.00 0.00 N/A
‘Indirect op_equals’ Llvm-Mono Llvm Mono x64 15.9306 ns 8.7020 ns 0.4917 ns 1.19 0.04 N/A
‘Direct op_equals’ RyuJit-Clr RyuJit Clr 0.9740 ns 0.8871 ns 0.0501 ns 1.00 0.00 0 B
‘Indirect op_equals’ RyuJit-Clr RyuJit Clr 1.1444 ns 1.1916 ns 0.0673 ns 1.18 0.07 0 B
‘Direct op_equals’ RyuJit-Mono RyuJit Mono x64 14.6879 ns 4.9166 ns 0.2778 ns 1.00 0.00 N/A
‘Indirect op_equals’ RyuJit-Mono RyuJit Mono x64 15.8684 ns 4.6367 ns 0.2620 ns 1.08 0.02 N/A

 

Result

For the most part we can see a penalty of 8% to 19% in our simple test scenario. Neither compilers/JIT’ers optimize away the jump. However we can see that RyuJit on Clr is doing some black (register?) magic here. It stil has the relative overhead of 18%, but it is much faster than the other runtimes.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Tedds blog

Subscribe now to keep reading and get access to the full archive.

Continue reading