Cost of method wrapper

Introduction

What happens if a method is just a wrapper for another method? Is the extra jump optimized away by compiler? Does it take much time? I thought I’d look into this and measure a bit. With the different compilers, Jits and runtimes I thought it would be fun to see what happens.

I’ll use a == operator implementation calling IEquatable<T>.Equals(T other)  for testing. A good practice when creating structs is to implement Object.Equals , GetHashCode() , IEquatable<T> , op_Equality (== operator) and op_Inequality  (!= operator). (Read more on Microsoft docs.) Since Object.Equals(object) , Equal(T other) , op_Equality  and op_Inequality  all more or less implement the same logic I figured one could just call the other. So whats the cost?

Note that this is not for optimization. The cost we are talking about here is negligle compared to the rest of your code, so this is purely for fun.

And this is not an attempt to measure the cost of an additional JMP, which is well documented and even varies depending on scenarios.

Test setup

[Config(typeof(BigJobConfig))]
public class OpEqualsTest
{
    public OpEqualsDirect OpEqualsDirect1 = new OpEqualsDirect() { z = 1 };
    public OpEqualsDirect OpEqualsDirect2 = new OpEqualsDirect() { z = 2 };
    public OpEqualsIndirect OpEqualsIndirect1 = new OpEqualsIndirect() { z = 1 };
    public OpEqualsIndirect OpEqualsIndirect2 = new OpEqualsIndirect() { z = 2 };

    public int Count = 0;

    [GlobalSetup()]
    public void GlobalSetup()
    {
        Count = 0;
    }

    [GlobalCleanup()]
    public void GlobalCleanup()
    {
        Console.WriteLine(Count);
    }

    [Benchmark(Baseline = true, Description = "Direct op_equals")]
    public void OpEqualsDirect()
    {
        if (OpEqualsDirect1 == OpEqualsDirect2)
            Count++;
    }

    [Benchmark(Baseline = false, Description = "Indirect op_equals")]
    public void OpEqualsIndirect()
    {
        if (OpEqualsIndirect1 == OpEqualsIndirect2)
            Count++;
    }
}

Public variables and using Count for something after run, since thought I had some issue with RyuJit being too smart.

OpEqualsDirect

public struct OpEqualsDirect : IEquatable<OpEqualsDirect>
{
    private int x;
    private int y;
    private int z;

    public bool Equals(OpEqualsDirect other)
    {
        return x == other.x && y == other.y && z == other.z;
    }

    public static bool operator ==(OpEqualsDirect o1, OpEqualsDirect other)
    {
        return o1.x == other.x && o1.y == other.y && o1.z == other.z;
    }
    public static bool operator !=(OpEqualsDirect o1, OpEqualsDirect other)
    {
        return !(o1.x == other.x && o1.y == other.y && o1.z == other.z);
    }
}

OpEqualsIndirect

public struct OpEqualsIndirect : IEquatable<OpEqualsIndirect>
{
    private int x;
    private int y;
    private int z;

    public bool Equals(OpEqualsIndirect other)
    {
        return x == other.x && y == other.y && z == other.z;
    }

    public static bool operator ==(OpEqualsIndirect o1, OpEqualsIndirect other)
    {
        return o1.Equals(other);
    }
    public static bool operator !=(OpEqualsIndirect o1, OpEqualsIndirect other)
    {
        return !o1.Equals(other);
    }
}

Decompiled

OpEqualsDirect

This one is pretty much as we would expect.

Bytecode hex

B6027B07000004037B07000004331D027B08000004037B08000004330F027B09000004037B09000004FE012A162A

IL

.method public hidebysig specialname static 
    bool op_Equality (
        valuetype Tedd.BenchmarkRunner.Cases.OpEqualsDirect o1,
        valuetype Tedd.BenchmarkRunner.Cases.OpEqualsDirect other
    ) cil managed 
{
    .maxstack 8

    IL_0000: ldarg.0
    IL_0001: ldfld     int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::x
    IL_0006: ldarg.1
    IL_0007: ldfld     int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::x
    IL_000C: bne.un.s  IL_002B

    IL_000E: ldarg.0
    IL_000F: ldfld     int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::y
    IL_0014: ldarg.1
    IL_0015: ldfld     int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::y
    IL_001A: bne.un.s  IL_002B

    IL_001C: ldarg.0
    IL_001D: ldfld     int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::z
    IL_0022: ldarg.1
    IL_0023: ldfld     int32 Tedd.BenchmarkRunner.Cases.OpEqualsDirect::z
    IL_0028: ceq
    IL_002A: ret

    IL_002B: ldc.i4.0
    IL_002C: ret
} // end of method OpEqualsDirect::op_Equality

C#

public static bool operator ==(OpEqualsDirect o1, OpEqualsDirect other)
{
    return o1.x == other.x && o1.y == other.y && o1.z == other.z;
}

OpEqualsIndirect

My first question was whether the extra jump would be optimized away. I can’t see that from decoding the method directly, but we see for reference that it loads the argument and calls Equals on the struct instance. Pretty much as expected.

Bytecode hex

260F0003280F0000062A

IL

.method public hidebysig specialname static 
    bool op_Equality (
        valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect o1,
        valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect other
    ) cil managed 
{
    .maxstack 8

    IL_0000: ldarga.s  o1
    IL_0002: ldarg.1
    IL_0003: call      instance bool Tedd.BenchmarkRunner.Cases.OpEqualsIndirect::Equals(valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect)
    IL_0008: ret
} // end of method OpEqualsIndirect::op_Equality

C#

public static bool operator ==(OpEqualsIndirect o1, OpEqualsIndirect other)
{
    return o1.Equals(other);
}

Callee

So how about callee? Is it optimized away?

.method public hidebysig 
    instance void OpEqualsIndirect () cil managed 
{
    .custom instance void [BenchmarkDotNet.Core]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor() = (
        01 00 02 00 54 02 08 42 61 73 65 6c 69 6e 65 00
        54 0e 0b 44 65 73 63 72 69 70 74 69 6f 6e 12 49
        6e 64 69 72 65 63 74 20 6f 70 5f 65 71 75 61 6c 73
    )
    .maxstack 8

    IL_0000: ldarg.0
    IL_0001: ldfld     valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect Tedd.BenchmarkRunner.Tests.OpEqualsTest::_opEqualsIndirect1
    IL_0006: ldarg.0
    IL_0007: ldfld     valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect Tedd.BenchmarkRunner.Tests.OpEqualsTest::_opEqualsIndirect2
    IL_000C: call      bool Tedd.BenchmarkRunner.Cases.OpEqualsIndirect::op_Equality(valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect, valuetype Tedd.BenchmarkRunner.Cases.OpEqualsIndirect)
    IL_0011: brfalse.s IL_0021

    IL_0013: ldarg.0
    IL_0014: ldarg.0
    IL_0015: ldfld     int32 Tedd.BenchmarkRunner.Tests.OpEqualsTest::Count
    IL_001A: ldc.i4.1
    IL_001B: add
    IL_001C: stfld     int32 Tedd.BenchmarkRunner.Tests.OpEqualsTest::Count

    IL_0021: ret
} // end of method OpEqualsTest::OpEqualsIndirect

No, it is calling op_Equality  which in turn is calling IEquatable<T>.Equals(T other) .

Benchmark

All of this is before Jit. So lets see how it performs with some test runs.

For the test I’m doing a lightweight operation where I am comparing three int’s and it will fail on third.

BenchmarkDotNet=v0.10.12, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.125)
Intel Core i7-3930K CPU 3.20GHz (Ivy Bridge), 1 CPU, 12 logical cores and 6 physical cores
Frequency=3124843 Hz, Resolution=320.0161 ns, Timer=TSC
  [Host]         : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0  [AttachedDebugger]
  LegacyJit-Mono : Mono 5.4.1 (Visual Studio), 64bit 
  Llvm-Mono      : Mono 5.4.1 (Visual Studio), 64bit 
  RyuJit-Clr     : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0
  RyuJit-Mono    : Mono 5.4.1 (Visual Studio), 64bit 

MinIterationTime=10.0000 s  Platform=X64  LaunchCount=1  
TargetCount=3  WarmupCount=3  
Method Job Jit Runtime Mean Error StdDev Scaled ScaledSD Allocated
‘Direct op_equals’ LegacyJit-Mono LegacyJit Mono x64 13.4742 ns 1.0661 ns 0.0602 ns 1.00 0.00 N/A
‘Indirect op_equals’ LegacyJit-Mono LegacyJit Mono x64 15.5428 ns 6.9294 ns 0.3915 ns 1.15 0.02 N/A
‘Direct op_equals’ Llvm-Mono Llvm Mono x64 13.4156 ns 5.5125 ns 0.3115 ns 1.00 0.00 N/A
‘Indirect op_equals’ Llvm-Mono Llvm Mono x64 15.9306 ns 8.7020 ns 0.4917 ns 1.19 0.04 N/A
‘Direct op_equals’ RyuJit-Clr RyuJit Clr 0.9740 ns 0.8871 ns 0.0501 ns 1.00 0.00 0 B
‘Indirect op_equals’ RyuJit-Clr RyuJit Clr 1.1444 ns 1.1916 ns 0.0673 ns 1.18 0.07 0 B
‘Direct op_equals’ RyuJit-Mono RyuJit Mono x64 14.6879 ns 4.9166 ns 0.2778 ns 1.00 0.00 N/A
‘Indirect op_equals’ RyuJit-Mono RyuJit Mono x64 15.8684 ns 4.6367 ns 0.2620 ns 1.08 0.02 N/A

 

Result

For the most part we can see a penalty of 8% to 19% in our simple test scenario. Neither compilers/JIT’ers optimize away the jump. However we can see that RyuJit on Clr is doing some black (register?) magic here. It stil has the relative overhead of 18%, but it is much faster than the other runtimes.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.