Fibres / Microthreading / User space threading

By | 2010-03-20

… yes… even with the cool new advances within multiprocessing we still need more… :)

What we need and what we have

OpenSim needs to be able to:

  1. Compile user scripts, supporting a in a wide variety of languages.
    Implemented
  2. Run them in a secure sandbox as we don’t trust the users or other servers where scripts/binaries may come from.
    Implemented
  3. Schedule thousands of scripts fairly, potentially could all of them run tight while(true)-loops.
    Partially implemented
  4. Be able to save a running program to disk for later to load and resume it.
    Partially implemented
  5. Be able to move a script to a new region (server) while it is running.
    Partially implemented

#3: Since modern computers today don’t have any problem running thousands of scripts we can already run a massive amount of scripts on a single OpenSim server. Currently a running script occupies a whole thread and needs to voluntarily release execution.

#4: Scripts are killed at shutdown and fully restarted at startup. We are not saving any state (afaik).

#5: Scripts can’t be moved successfully between servers while they are running (afaik).

Fiber support

As you may have guessed from the title the solution is to add support for fibers. While threads are controlled and enforced by the operating system kernel, fibers are more like a voluntary threading happening inside the application (in user space).

Most operative systems have some kind of library for fibers, and it used to be a popular way of multitasking. MS-SQL server for example used it for quite some time.

OS fiber implementation

I have done some reading and tests of NT fibers… (Win32 API: ConvertThreadToFiber, CreateFiber, SwitchToFiber, DeleteFiber, GetLastError) Although it is easy to implement and could be made to work I’ve read a few places that it doesn’t work well with .Net. This could probably be made to work as the scripts are performing private executions and will not crash into each other. Though the .Net thread locking/mutex system is useless as it only blocks when attempting to lock something other threads has locked. It does however still require us to modify the script before compile or script assembly before load as we need to voluntarily  release control to next fiber. It does however keep track of execution and parameter stacks for us.

The problem with this implementation is that it only solves #3, it doesn’t solve #4 and #5. It also has to be implemented specifically for each OS. My tests were on Win32 API.

Custom fiber implementation

From what I can see custom fiber implementation is probably the way to go.

How it works

  1. User uploads script (source) to OpenSim region.
  2. Script is compiled to an .Net assembly (*.dll) and saved to disk.
  3. The .Net assembly is opened up and processed (at bytecode/binary level) by injecting fiber code into it. A new .Net assembly is generated and saved, we now have two .Net assemblies (one with and one without fiber support.)
  4. The .Net assembly with fiber code is loaded into a secure AppDomain and executed as normally.
  5. Whenever the server needs to pause the script it sets a global variable like “__ISPAUSING” to true on it. The script will then within a short time pause and release the thread.
  6. Server keeps accounting of what scripts has used much CPU and schedules accordingly to give fair scheduling, for example by calculating a simple 10 second usage score.
  7. The running script can when its paused be serialized, saved to disk (during shutdown) and later reloaded (during startup).
  8. Server can at any time resume script by executing “__RESUME()” method on it.
  9. If script crosses region border it will often be sent to a different server. It is here necessary with three trust levels.
    1. If the neighboring server trusts the server then it can send the already fiberized script assembly. Typically servers with same owner will do this.
    2. If there is no trust then there are two possibilities:
      1. Server sends script source code to neighbor server which compiles the script itself. (safe)
      2. Server sends the original .Net assembly (without fiber implementation) to neighbor and neighbor implements fibers. (unsafe)
        Because there are no good (afaik) bytecode verifiers for Mono and because of unknown factors (is bytecode verifier really checking everything?) it could potentially be a security risk to execute third party assemblies. The reason is simply that the bytecode could be invalid and used to gain access to the server (even though .Net can’t buffer overflow easily it doesn’t mean that the bytecode can’t).

Implementing it

I have been working on this on and off for a month or so now and made great progress. I am using Mono.Cecil to open up the .Net assembly (*.dll), go through everything machine code by machine code and add code at the right places.

The idea is to inject code at the start of each method and before some/all jumps (GOTO-statements.) This code consists of two steps, pausing and resuming script execution. We want to check for pause often time-wise, but not too often CPU-wise. This means that we want to check if script is pausing at all points in the code so that the user can’t make a script that spends 200ms of CPU-time without obeying the pause-signal. But we don’t want to keep checking after every instruction as it will use too much CPU.

I think a reasonable tradeoff is to place the pause-check code at the beginning of every method, just after the method call and before most of the JUMP-instructions (GOTO in assembler.) Every check consists of 10 instructions of which 3 are “Nop” (No operation). It is basically a simple “if (__ISPAUSING) { do_pause(); }” where __ISPAUSING will be False most of the time. We only need to add it around custom local methods (private/internal/public methods inside the script), thus avoiding the overhead of checking 3 times on a simple “string r = s.Replace(“est”, “EST”).ToLower() + intA”. All loops (while, do, for, foreach, etc) are converted to JUMP instructions when compiled, so by adding pause-check around some JUMPS we will get pause-check inside all loops. (Though not all jumps are loops.)

The check will be placed at the top of each method:

  1. If __ISPAUSING is set: Return immediately.
  2. If __ISRESUMING is set:
    1. Jump to location where we were paused last.
    2. Resume execution.
    3. If last pause was not started locally then next jump will be a custom local method. So we feed it with dummy-variables.
      To make implementation easier we could replace all Value method parameters with “object” (do boxing/unboxing) and cast them back at start of method.
  3. For each
    void myMethod() {
    if (__ISRESUMING) {
  4. // Get my current stack
    _currentStack = ScriptStack.Pop();
      switch (
      goto LABEL_PAUSELOC15;
    }

    if (__ISPAUSED) {
      // TODO: Save argument stack
      // TODO: Save local variables (that only exist within the method scope)
      // TODO: Push argument stack, local variables and pointer to current pos to execution stack
      return;
    }
    }

  5. etc… More to come.. I’m tired!

 

I still have at least one problem to overcome…

  1. I need to keep track of the argument stack so that it can be saved and restored into serializable private fields when pausing. Currently it almost works, but I loose track on overloaded methods (method with multiple parameters), for example on “call instance string [mscorlib]System.Int32::ToString()”. The obvious solution here is to compare method signatures with the data on the parameter stack. It seems a bit expensive, but I guess it can be cached fairly well.
  2. Probably more, just too tired to think right now ;)

Proof of concept hardcoded fibers, version 1

Output

myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: True, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() pausing: Stack size: 1, SkipMethod: True, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 2, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 3, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 4, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 5, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 6, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True

ExecuteHardcodedTestThread_Resume

myMethod() resume: Stack size: 5, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 4, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 3, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 2, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 1, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 0, SkipMethod: True, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() returning naturally: Stack size: 0, SkipMethod: True, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: True, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() pausing: Stack size: 1, SkipMethod: True, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 2, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 3, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 4, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 5, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 6, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 7, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 8, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True
myMethod() pausing: Stack size: 9, SkipMethod: False, __ISPAUSING: True, __ISPAUSING_INPROGRESS: True, __ISRESUMING: True

myMethod() resume: Stack size: 8, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 7, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 6, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 5, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 4, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 3, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 2, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 1, SkipMethod: False, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: True
myMethod() resume: Stack size: 0, SkipMethod: True, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() returning naturally: Stack size: 0, SkipMethod: True, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False
myMethod() start: Stack size: 0, SkipMethod: <null>, __ISPAUSING: False, __ISPAUSING_INPROGRESS: False, __ISRESUMING: False

Code

[sourcecode type=”c#”]
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;

namespace Tedd.Fibres.TestModule
{
    public class HardcodedTest
    {
        private bool __ISPAUSING;
        private bool __ISPAUSING_INPROGRESS;
        private bool __ISRESUMING
        {
            get
            {
                return __ScriptStack.Count &gt; 0;
            }
        }
        private ScriptStack __ScriptStack = new ScriptStack();
        private class ScriptStack
        {
            private Stack&lt;ScriptStackItem&gt; _Stack = new Stack&lt;ScriptStackItem&gt;();
            public ScriptStackItem Pop()
            {
                return _Stack.Pop();
            }
            public void Push(ScriptStackItem scriptStackItem)
            {
                _Stack.Push(scriptStackItem);
            }

            public int Count
            {
                get
                {
                    return _Stack.Count;
                }
            }
        }
        private class LocalVarObject_17
        {
            internal int VARIABLE1;
            internal string VARIABLE2;
            internal object VARIABLE3;
        }
        private int LOCALVARIABLE1;
        private string LOCALVARIABLE2;
        private object LOCALVARIABLE3;
        private class ScriptStackItem
        {
            public Int16 LastPos;
            public object[] ArgumentStack;
            public object LocalVars;
            public bool SkipMethod;
        }
        private string GetStatus(ScriptStackItem scriptStackItem)
        {
            string sm = "&lt;null&gt;";
            if (scriptStackItem != null)
                sm = scriptStackItem.SkipMethod.ToString();

            return "Stack size: " + __ScriptStack.Count
                   + ", SkipMethod: " + sm
                   + ", __ISPAUSING: " + __ISPAUSING
                   + ", __ISPAUSING_INPROGRESS: " + __ISPAUSING_INPROGRESS
                   + ", __ISRESUMING: " + __ISRESUMING;

        }

        /// &lt;summary&gt;
        /// Is script currently paused in the middle of execution?
        /// &lt;/summary&gt;
        public bool __CANRESUME
        {
            get
            {
                return ResumeMethod != 0;
            }
        }
        // ResumeMethod has a numeric value to bind it to the root method where execution is paused from right now. 0 = Not paused.
        private UInt16 ResumeMethod = 0;

        public void __PAUSE()
        {
            __ISPAUSING = true;
        }

        public void __RESUME()
        {
            // Reset some variables
            __ISPAUSING_INPROGRESS = false;
            __ISPAUSING = false;

            // Resume to correct method
            switch (ResumeMethod)
            {
                case 0:
                    // Nothing to resume
                    break;
                case 1:
                    myMethod();
                    break;
                case 2:
                    mySecondMethod(0, null);
                    break;
            }

            // We are not pausing? Then next time we shouldn’t have any method to resume on
            if (!__ISPAUSING)
                ResumeMethod = 0;

            return;
        }

        public void myMethod()
        {
            // Hardcoded set
            if (ResumeMethod == 0)
                ResumeMethod = 1;

            // We need a local scoped bool to keep track stack
            bool lastMethodWasInStack = false;      // Was last method executed by stack restore?
            ScriptStackItem currentStack = null;    // Our local stack object

            // Anything on the stack?
            if (__ScriptStack.Count &gt; 0)
            {
                // Yes we have a stack: Get stack item for local stack/locals/pos/etc
                currentStack = __ScriptStack.Pop();

                // We got an item so we are resuming to somewhere
                if (currentStack != null)
                {
                    Debug.WriteLine("myMethod() resume: " + GetStatus(currentStack));
                    //
                    // JUMP TO LAST PAUSE-POSITION
                    //
                    // First set next jump to 0 (nothing)
                    Int16 lastPos = currentStack.LastPos;
                    currentStack.LastPos = 0;
                    switch (lastPos)
                    {
                        case 0:
                            // No jump? Something wrong?
                            throw new ApplicationException("We are restoring stack, but it contains no last position to jump to.This needs to be debugged… :)");
                            break;
                        case 1:
                            goto LABEL_PAUSE_LOC_1;
                            //case 2:
                            //    goto LABEL_PAUSE_LOC_2;
                            //case 3:
                            //    goto LABEL_PAUSE_LOC_3;
                    }
                }
            }
            else
            {
                // We are done with resume
                //__ISRESUMING = false;
                Debug.WriteLine("myMethod() start: " + GetStatus(currentStack));

                // Are we entering pause?
                if (__ISPAUSING)
                    goto LABEL_PAUSE_PROCESS_1;

            }

            //////////////////////////////////////
            // ORIGINAL USER SCRIPT STARTS HERE //
            // bla bla bla                      //
            // script script script             //
            /////////////////////////////////////////////////////////////////////////////////////////////////////////////
            // User has a local method called "MySecondMethod(int, string)". We add our stack restore/pause around it. //
            /////////////////////////////////////////////////////////////////////////////////////////////////////////////

            string ret = null;

            Debug.WriteLine("myMethod() execution: Natural");

            // Our first restore position — this is where we jump if we are restoring to this point
            LABEL_PAUSE_LOC_1:

            // Are we restoring from pause? If currentStack is null then we came here naturally.
            if (currentStack != null)
            {
                // We have currentStack, so we are restoring…

                // This keeps track of if custom method is executed by us as part of a stack restore or not
                lastMethodWasInStack = false;

                // Is there more on stack? If so we move one more down the stack by executing the custom method
                // SkipMethod = we are not executing method in this restore
                if (__ScriptStack.Count &gt; 0 &amp;&amp; !currentStack.SkipMethod)
                {
                    // Yes, we should continue into this method. No need to restore our stack/locals just yet.
                    // Just fake our way into this method, we will restore local variables once we get there anyway.
                    Debug.WriteLine("myMethod() enter child: Resume");
                    ret = mySecondMethod(0, null);
                    // We have executed the custom method
                    lastMethodWasInStack = false;
                }

                // If we are not going into pause mode then custom methd has finished by itself and we need to restore stack/locals to continue execution here
                if (!__ISPAUSING)
                {
                    //
                    // START RESTORE OF STACK/LOCALS
                    //

                    // Restore local variables (that only exist within the method scope)
                    if (currentStack.LocalVars == null)
                    {
                        LocalVarObject_17 lvo = currentStack.LocalVars as LocalVarObject_17;
                        LOCALVARIABLE1 = lvo.VARIABLE1;
                        LOCALVARIABLE2 = lvo.VARIABLE2;
                        LOCALVARIABLE3 = lvo.VARIABLE3;
                    }

                    // TODO: Restore argument stack
                    // Can’t be described in C#. Basically we ldstr stuff from object array to argument stack.
                    // { arg0, arg1, arg2 } = currentStack.ArgumentStack;

                    //
                    // RESTORE DONE
                    //
                }

                // We want to skip the custom method because the pause happened after the method returned
                if (currentStack.SkipMethod)
                    lastMethodWasInStack = true;

                // We are done with resume
                //__ISRESUMING = false;

            }

            // If we are not pausing and we did not call this method in the above stack restore code then we have come here naturally and it should be executed naturally
            if (!__ISPAUSING_INPROGRESS &amp;&amp; !lastMethodWasInStack)
            {
                Debug.WriteLine("myMethod() enter child: Natural");
                ret = mySecondMethod(LOCALVARIABLE1, "param");
            }

            LABEL_PAUSE_PROCESS_1:
            // We just exited a local method (either naturally or by PAUSE-return): Check if we are about to pause…
            if (__ISPAUSING)
            {
                if (currentStack == null)
                    currentStack = new ScriptStackItem();

                // If this is where the pause starts
                if (!__ISPAUSING_INPROGRESS)
                {
                    __ISPAUSING_INPROGRESS = true;
                    // .. then we should not execute custom method when restoring next time (as it is above us in code)
                    // TODO: This is set to true to fix a bug where it doesn’t execute custom method on last resume step, causing everything to exit
                    // TODO: Needs fixing… things work now, but not sure how well it would work on all types of scripts
                    currentStack.SkipMethod = true;
                }

                // Set restore position
                currentStack.LastPos = 1;

                // TODO: Save argument stack
                // arg0, arg1 and arg2 are automatically generated to match current argument position
                // Can’t be done in C#: currentStack.ArgumentStack = new object[] { arg0, arg1, arg2 };

                // Save local variables (that only exist within the method scope)
                // NOTE: LocalVarObject_17 is an automatically generated object which contains a mirror set for the local fields in this scope.
                if (currentStack.LocalVars == null)
                    currentStack.LocalVars = new LocalVarObject_17();

                LocalVarObject_17 lvo = currentStack.LocalVars as LocalVarObject_17;
                lvo.VARIABLE1 = LOCALVARIABLE1;
                lvo.VARIABLE2 = LOCALVARIABLE2;
                lvo.VARIABLE2 = LOCALVARIABLE2;

                // Add current state to ScriptStack
                __ScriptStack.Push(currentStack);

                Debug.WriteLine("myMethod() pausing: " + GetStatus(currentStack));
                // Then return
                return;
            }

            ///////////////////////////////////////
            // ORIGINAL USER SCRIPT RESUMES HERE //
            // bla bla bla                       //
            // script script script              //
            ///////////////////////////////////////
            Debug.WriteLine("myMethod() returning naturally: " + GetStatus(currentStack));
            return;
        }

        public string mySecondMethod(int i, string s)
        {
            // Hardcoded set
            if (ResumeMethod == 0)
                ResumeMethod = 2;

            // If we are restoring then skip the delay
            if (__ScriptStack.Count == 0)
                Thread.Sleep(500);

            myMethod();

            return s;
        }
    }
}
[/sourcecode]

3 thoughts on “Fibres / Microthreading / User space threading

  1. Oleksandr Nikitin

    Hi!

    I’m thinking about doing basically the same microthreading implementation.

    How do you get the current stack vars?

    Reply

Leave a Reply