Introduction
What happens if a method is just a wrapper for another method? Is the extra jump optimized away by compiler? Does it take much time? I thought I’d look into this and measure a bit. With the different compilers, Jits and runtimes I thought it would be fun to see what happens.
I’ll use a == operator implementation calling IEquatable<T>.Equals(T other) for testing. A good practice when creating structs is to implement Object.Equals , GetHashCode() , IEquatable<T> , op_Equality (== operator) and op_Inequality (!= operator). (Read more on Microsoft docs.) Since Object.Equals(object) , Equal(T other) , op_Equality and op_Inequality all more or less implement the same logic I figured one could just call the other. So whats the cost?
Note that this is not for optimization. The cost we are talking about here is negligle compared to the rest of your code, so this is purely for fun.
And this is not an attempt to measure the cost of an additional JMP, which is well documented and even varies depending on scenarios.
Test setup
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
[Config(typeof(BigJobConfig))] public class OpEqualsTest { public OpEqualsDirect OpEqualsDirect1 = new OpEqualsDirect() { z = 1 }; public OpEqualsDirect OpEqualsDirect2 = new OpEqualsDirect() { z = 2 }; public OpEqualsIndirect OpEqualsIndirect1 = new OpEqualsIndirect() { z = 1 }; public OpEqualsIndirect OpEqualsIndirect2 = new OpEqualsIndirect() { z = 2 }; public int Count = 0; [GlobalSetup()] public void GlobalSetup() { Count = 0; } [GlobalCleanup()] public void GlobalCleanup() { Console.WriteLine(Count); } [Benchmark(Baseline = true, Description = "Direct op_equals")] public void OpEqualsDirect() { if (OpEqualsDirect1 == OpEqualsDirect2) Count++; } [Benchmark(Baseline = false, Description = "Indirect op_equals")] public void OpEqualsIndirect() { if (OpEqualsIndirect1 == OpEqualsIndirect2) Count++; } } |
Public variables and using Count for something after run, since thought I had some issue with RyuJit being too smart.
OpEqualsDirect
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
public struct OpEqualsDirect : IEquatable<OpEqualsDirect> { private int x; private int y; private int z; public bool Equals(OpEqualsDirect other) { return x == other.x && y == other.y && z == other.z; } public static bool operator ==(OpEqualsDirect o1, OpEqualsDirect other) { return o1.x == other.x && o1.y == other.y && o1.z == other.z; } public static bool operator !=(OpEqualsDirect o1, OpEqualsDirect other) { return !(o1.x == other.x && o1.y == other.y && o1.z == other.z); } } |
OpEqualsIndirect
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
public struct OpEqualsIndirect : IEquatable<OpEqualsIndirect> { private int x; private int y; private int z; public bool Equals(OpEqualsIndirect other) { return x == other.x && y == other.y && z == other.z; } public static bool operator ==(OpEqualsIndirect o1, OpEqualsIndirect other) { return o1.Equals(other); } public static bool operator !=(OpEqualsIndirect o1, OpEqualsIndirect other) { return !o1.Equals(other); } } |
Decompiled
OpEqualsDirect
This one is pretty much as we would expect.
Bytecode hex
1 |
B6027B07000004037B07000004331D027B08000004037B08000004330F027B09000004037B09000004FE012A162A |
IL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
.method public hidebysig specialname static bool op_Equality ( valuetype Tedd.BenchmarkRunner.Cases.OpEqualsDirect o1, valuetype Tedd.BenchmarkRunner.Cases.OpEqualsDirect other ) cil managed { .maxstack 8 IL_0000: ldarg.0 IL_0001: ldfld int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::x IL_0006: ldarg.1 IL_0007: ldfld int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::x IL_000C: bne.un.s IL_002B IL_000E: ldarg.0 IL_000F: ldfld int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::y IL_0014: ldarg.1 IL_0015: ldfld int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::y IL_001A: bne.un.s IL_002B IL_001C: ldarg.0 IL_001D: ldfld int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::z IL_0022: ldarg.1 IL_0023: ldfld int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::z IL_0028: ceq IL_002A: ret IL_002B: ldc.i4.0 IL_002C: ret } // end of method OpEqualsDirect::op_Equality |
C#
1 2 3 4 |
public static bool operator ==(OpEqualsDirect o1, OpEqualsDirect other) { return o1.x == other.x && o1.y == other.y && o1.z == other.z; } |
OpEqualsIndirect
My first question was whether the extra jump would be optimized away. I can’t see that from decoding the method directly, but we see for reference that it loads the argument and calls Equals on the struct instance. Pretty much as expected.
Bytecode hex
1 |
260F0003280F0000062A |
IL
1 2 3 4 5 6 7 8 9 10 11 12 13 |
.method public hidebysig specialname static bool op_Equality ( valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect o1, valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect other ) cil managed { .maxstack 8 IL_0000: ldarga.s o1 IL_0002: ldarg.1 IL_0003: call instance bool Tedd.BenchmarkRunner.Cases.OpEqualsIndirect::Equals(valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect) IL_0008: ret } // end of method OpEqualsIndirect::op_Equality |
C#
1 2 3 4 |
public static bool operator ==(OpEqualsIndirect o1, OpEqualsIndirect other) { return o1.Equals(other); } |
Callee
So how about callee? Is it optimized away?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
.method public hidebysig instance void OpEqualsIndirect () cil managed { .custom instance void [BenchmarkDotNet.Core]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor() = ( 01 00 02 00 54 02 08 42 61 73 65 6c 69 6e 65 00 54 0e 0b 44 65 73 63 72 69 70 74 69 6f 6e 12 49 6e 64 69 72 65 63 74 20 6f 70 5f 65 71 75 61 6c 73 ) .maxstack 8 IL_0000: ldarg.0 IL_0001: ldfld valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect Tedd.BenchmarkRunner.Tests.OpEqualsTest::_opEqualsIndirect1 IL_0006: ldarg.0 IL_0007: ldfld valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect Tedd.BenchmarkRunner.Tests.OpEqualsTest::_opEqualsIndirect2 IL_000C: call bool Tedd.BenchmarkRunner.Cases.OpEqualsIndirect::op_Equality(valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect, valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect) IL_0011: brfalse.s IL_0021 IL_0013: ldarg.0 IL_0014: ldarg.0 IL_0015: ldfld int32 Tedd.BenchmarkRunner.Tests.OpEqualsTest::Count IL_001A: ldc.i4.1 IL_001B: add IL_001C: stfld int32 Tedd.BenchmarkRunner.Tests.OpEqualsTest::Count IL_0021: ret } // end of method OpEqualsTest::OpEqualsIndirect |
No, it is calling op_Equality which in turn is calling IEquatable<T>.Equals(T other) .
Benchmark
All of this is before Jit. So lets see how it performs with some test runs.
For the test I’m doing a lightweight operation where I am comparing three int’s and it will fail on third.
1 2 3 4 5 6 7 8 9 10 11 12 |
BenchmarkDotNet=v0.10.12, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.125) Intel Core i7-3930K CPU 3.20GHz (Ivy Bridge), 1 CPU, 12 logical cores and 6 physical cores Frequency=3124843 Hz, Resolution=320.0161 ns, Timer=TSC [Host] : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0 [AttachedDebugger] LegacyJit-Mono : Mono 5.4.1 (Visual Studio), 64bit Llvm-Mono : Mono 5.4.1 (Visual Studio), 64bit RyuJit-Clr : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0 RyuJit-Mono : Mono 5.4.1 (Visual Studio), 64bit MinIterationTime=10.0000 s Platform=X64 LaunchCount=1 TargetCount=3 WarmupCount=3 </code><code> |
Method | Job | Jit | Runtime | Mean | Error | StdDev | Scaled | ScaledSD | Allocated |
---|---|---|---|---|---|---|---|---|---|
‘Direct op_equals’ | LegacyJit-Mono | LegacyJit | Mono x64 | 13.4742 ns | 1.0661 ns | 0.0602 ns | 1.00 | 0.00 | N/A |
‘Indirect op_equals’ | LegacyJit-Mono | LegacyJit | Mono x64 | 15.5428 ns | 6.9294 ns | 0.3915 ns | 1.15 | 0.02 | N/A |
‘Direct op_equals’ | Llvm-Mono | Llvm | Mono x64 | 13.4156 ns | 5.5125 ns | 0.3115 ns | 1.00 | 0.00 | N/A |
‘Indirect op_equals’ | Llvm-Mono | Llvm | Mono x64 | 15.9306 ns | 8.7020 ns | 0.4917 ns | 1.19 | 0.04 | N/A |
‘Direct op_equals’ | RyuJit-Clr | RyuJit | Clr | 0.9740 ns | 0.8871 ns | 0.0501 ns | 1.00 | 0.00 | 0 B |
‘Indirect op_equals’ | RyuJit-Clr | RyuJit | Clr | 1.1444 ns | 1.1916 ns | 0.0673 ns | 1.18 | 0.07 | 0 B |
‘Direct op_equals’ | RyuJit-Mono | RyuJit | Mono x64 | 14.6879 ns | 4.9166 ns | 0.2778 ns | 1.00 | 0.00 | N/A |
‘Indirect op_equals’ | RyuJit-Mono | RyuJit | Mono x64 | 15.8684 ns | 4.6367 ns | 0.2620 ns | 1.08 | 0.02 | N/A |
Result
For the most part we can see a penalty of 8% to 19% in our simple test scenario. Neither compilers/JIT’ers optimize away the jump. However we can see that RyuJit on Clr is doing some black (register?) magic here. It stil has the relative overhead of 18%, but it is much faster than the other runtimes.