Work / Virtual Machine / Overview

2011-11-20 15:06:54

Stack Semantics

This virtual machine is entirely stack-based. Stack frames are used to store information about the functions that control passed through to get to the current point of execution. This information is needed so that each called function knows where to return execution to. Each stack frame also stores a pointer to an exception handler. In addition to the return address and exception handler address, each stack frame stores a pointer to the base of the previous frame and also provides storage for local variables.

Frames stack on top of one another, the parameters for a given frame are said to reside within that frame. Thus the stack can be visualised as follows:

Figure 1. Stack Frame Layout

The return address and the parameters are optional components of a stack frame. This is because new frames can be entered at any point of execution, not just when a function is called. Specifically, there are separate instructions that manage the calling of (and return from) functions and the creation (and destruction) of new stack frames.

This separation of concern provides the ability for exceptions thrown in a function to be handled within the context of that function, instead of the exception unwinding the stack to the previous frame (that of the caller).

The grey areas in the above diagram represent values that are pushed onto the stack before a function call but are not parameters to that function call - this can include temporary working values for the calling frame.

Calling Convention

There is no explicit calling convention in place. It is ultimately up to the programmer to define which calling convention they wish to use. However, it is recommended to adhere to the caller clean-up convention as instructions exist which facilitate this method.

In the caller clean-up convention suggested, the caller of a function cleans the parameters (for that function call) from the stack. Parameters are pushed onto the stack in left-to-right order (thus differing from cdecl). Return values are pushed onto the stack by returning functions, above the pushed parameters.

For illustration purposes, consider the following pseudo (high-level) code:

1
2
3
4
int SomeFunction(int a, int b, int c);

int a, b, c, d;
d = FunctionName(a, b, c);

Such code might produce the following assembly:

1
2
3
4
5
6
7
8
9
10
nvar 0, a
nvar 1, b
nvar 2, c
nvar 3, d

push a
push b
push c
call SomeFunction
popr d, 3

As can be seen here, the parameters are pushed in order of their occurrence from left-to-right. Also note that the parameters are removed from the stack by the caller using the POPR instruction.

The ENTF instruction is used to copy passed parameters into the local variables of the newly created frame. It is this instruction that defines the recommended order of parameters (left-to-right). Consider an example implementation of the aforementioned SomeFunction:

1
2
3
4
int SomeFunction(int a, int b, int c)
{
    return a + b + c;
}

This may result in the following assembly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SomeFunction:
    nvar    0, a
    nvar    1, b
    nvar    2, c
    nvar    3, d

    entf    4, 3
    movi    d, 0

    add     d, a
    add     d, b
    add     d, c

    lret    d

Here, the ENTF instruction copies the values of passed parameters a, b and c (from left-to-right) into the local variables of the newly created frame, starting at the first local variable.

Symbol Table

When control is passed to a symbol in the symbol table (using the FARC instruction), different operations in relation to the stack occur depending on whether the symbol is implemented natively or virtually. This difference is, however, irrelevant to the calling code, as when control returns, the stack is in the same state regardless of the implementation type of the symbol.

Native Symbols

When control passes to a native symbol, the implementation of the symbol is passed a reference to a StackInterface class. The native implementation uses this class to interface with the stack of the calling virtual context. The stack of the calling context is not altered during the passing of control.

The public interface of the StackInterface class is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class StackInterface
{
public:
    StackInterface(uint** stackTop, uint parameterCount = 0)
        : stackTop_((uint**)stackTop), parameterCount_(parameterCount);

    // --- stack access ---------------------------------------------------
    // return data element relative to Top()
    template<typename type> type Peek(int offset = 0);
    // return the value at the top of the stack
    template<typename type> type Top();
    // push a value onto the stack top
    template<typename type> void Push(type value);

    // --- parameter access -----------------------------------------------
    // set the number of parameters passed on the stack
    void Params(uint parameterCount);
    // get parameter from stack
    template<typename type> type Param(uint offset);
};

Note that the interface does not provide a Pop() function, so the stack top cannot be decremented. This forces native implementations (and calls to them) to use the caller clean-up convention. There is however a Push() function. This can be used to push return values onto the stack.

Virtual Symbols

The process is slightly more complex when a virtual symbol is called. Firstly, a new runtime context record is created and the stack pointers of the parent context (SS, EBP and ESP) are copied across. The code and data segment pointers are set up to point to the segments that contain the called symbol.

EIP is then set to the specified entry point, and null is pushed onto the stack to fill the return address slot of the newly created frame (if any). The virtual machine then continues with execution of the called symbol. When control returns to the caller, ESP and EBP are copied back into the parent context, with any pushed return values being present at the top of the stack.

The net effect of this is that issuing a FARC on a virtually implemented symbol will result in the same overall operation (in terms of the stack) as if the symbol was local and was called with the CALL instruction.

Exception Handling

When an exception is thrown, the exception handler address stored in the current frame is checked to see if its null. If it isn't, execution jumps to the specified local symbol. If however the exception handler pointer is null, the stack is unwound to the ancestral frame that defines a handler address that is not null.

Exceptions can cross the boundaries between native and virtual symbols in either direction (native -> virtual and virtual -> native).