Wednesday, May 21, 2008

Extending Tamarin Tracing with Forth

My previous article on extending Tamarin Tracing with native methods described how to implement the native methods in C. It's also possible to implement native methods in Forth.

Methods implemented in JavaScript are compiled to ABC bytecode by a compiler (currently the asc.jar provided by the Flex SDK). These are compiled to the basic Forth instructions by the Tamarin Tracing engine and those Forth instructions are run by the interpreter. 'Hot' areas of the Forth instructions are traced and compiled to native machine code as needed.

Methods implemented in Forth don't need to be compiled from ABC to Forth. They are immediately available for interpreting and JITing via the tracing mechanism. I'm a little unsure of the exact pros and cons of implementing things in Forth vs C vs JavaScript and would be interested in comments if anyone can help.

As an example of a method in Forth I'm going to use the same fibonacci function in my previous article. A Forth method is marked 'native' just like a method implemented in C. But it has some metadata associated with it to say it is implemented in Forth, and to give the name of the Forth word (in Forth terminology a 'word' is a 'function'):
package testing {
public function fib(n) {
if( n <= 1)
return 1;
else
return fib(n-1)+fib(n-2);
}

public native function fib2(n:int):int;

[forth(word="forth_fib3")]
public native function fib3(n:int):int;
}
Notice the 'forth(word="forth_fib3")' metadata annotation. This tells the asc.jar compiler that the following native function is implemented in Forth by the word 'forth_fib3' rather than in JavaScript or C. I placed this code in 'fib.as' in the 'shell' subdirectory and modified 'shell.py' to build it in exactly the same manner as my previous article.

The 'forth_fib3' word needs to be written. The Tamarin Tracing Forth compiler is implemented in 'utils/fc.py'. It is a 'whole program' compiler in that it needs to have all Forth files listed on the command line so it can analyse and compile everything. The invocation of this compiler is done in 'core/builtin.py'. This means any Forth extensions really need to be added to the 'core' subdirectory and build files. I added a 'core/fib3.fs' as follows:
: fib3 ( n -- n )
DUP 1 <= IF DROP 1 ELSE DUP 1 - fib3 SWAP 2 - fib3 + THEN ;

EXTERN: forth_fib3 ( obj n argc=1 -- int )
DROP NIP fib3 ibox ;
The 'forth_fib3' word is implemented using EXTERN:. This marks it as a word that is an available entry point by external code. The arguments it receives on the stack will be ( obj arg1 arg2 argn argc=n -- result ). In the fibonacci case there is one argument, the number passed to fib. The argc argument is the count of the number of arguments provided, in this case 1. The bottom argument on the stack is the object the method is called on. Since our fib function is 'global' and not part of an object this is not used hence the NIP to remove it.

Note that the stack effect names (the obj, n, argc=1, etc) are for documentation purposes and are not used by the compiler at all - just like most other Forth systems).

So 'forth_fib3' removes the argc and obj arguments and uses just 'n'. It calls a helper function 'fib' which does the actual fibonacci calcuation, leaving the result of that on the stack. The call to 'ibox' tags the final result as an integer number.

'fib' is a pretty standard Forth implementation of fibonacci. It uses IF/ELSE/THEN to do the testing of the number. IF/ELSE/THEN is implemented by the Forth compiler directly (fc.py) since the Tamarin Tracing Forth system doesn't have parsing words.

'core/builtin.py' needs to be modified to include 'fib3.fs' as an argument to the compiler:
 os.system("../utils/fc.py -c vm_fpu prim.fs fpu.fs vm.fs e4x.fs fib3.fs")
There are multiple invocations of the compiler for different variants of the virtual machine (without fpu, minimal VM, full VM, etc). Each of these should be changed.

Running 'core/builtin.py' will compile the Forth code and generate the necessary code. Follow this up with running 'shell/shell.py' to compile the fib.as and other code and build Tamarin Tracing as per my previous article.

Some simple test code:
import testing.*;
print("fib3 30 = " + fib3(30));
With equivalent test functions for the other implementations of fib you can compare the different runtimes:
$ time shell/avmshell fib.abc
fib 30 = 1346269

real 0m0.298s
user 0m0.252s
sys 0m0.032s

$ time shell/avmshell fib2.abc
fib2 30 = 1346269

real 0m0.063s
user 0m0.024s
sys 0m0.028s

$ time shell/avmshell fib3.abc
fib3 30 = 1346269

real 0m0.192s
user 0m0.144s
sys 0m0.024s
As can be seen in the times the C implementation smokes the other two with the Forth code being faster than the JavaScript code.

The Forth implementation has features I haven't explored in this article. This includes different ways of declaring words to take advantage of various optimisations and automatic generation of superwords. It is a 'static' Forth compiler in that it doesn't allow the execution of Forth words at parse or compile time so features like parsing words, CREATE DOES>, interactive development, etc are not available. This makes the implementation of Forth words a bit more verbose than in more full featured Forth implementations.

If you have any tips on using Forth in Tamarin Tracing please leave a comment. I'm keen to see more features of the Forth system is use.

Categories: ,

Labels:

3 Comments:

Blogger edwsmith said...

Doh! while TT does handle recursion (detects it and turns it into a loop on the winding side and the unwinding side), this detection is on JS methods, but not on forth functions.

So, the forth interpreter should be just fine with fib3, but it won't be traced because the call to itself is not detected as a back-edge.

grep for isRecursive() in Interpreter.cpp to see how the detection works for JS methods.

general tradeoffs of forth vs C

+ easy to factor since all side effects are on the stack
+ very dense if you work at it
- interpreted, although superwords mitigate the cost of this somewhat
+ traceable; C isn't.
+ easy to write helpers that manipulate the JS interpreter stack directly (which *is* the forth value stack)
+ easier than writing arrays of bytecodes out by hand
- weird syntax or lack thereof

DOER> DOES would be nice extensions; there have to be some idioms that would benefit from this. we also use CASE heavily, but having ' WORD (address of WORD) and EXECUTE might be cleaner sometimes.

11:40 PM  
Blogger Ian Osgood said...

This example is good for showing a use of the open stack to replace recursion.

: fib4 ( n -- fib )
0 1 rot 0 ?do over + swap loop drop ;


DOER> DOES (now called CREATE DOES> in modern Forth) can be a handy factoring tool if you have lots of repetitious code working on slightly different data. Otherwise, it was mainly a tool to reduce code size in tiny environments, like microcontrollers.

5:18 PM  
Blogger Samuel A. Falvo II said...

CREATE/DOES> really doesn't have much of anything to do with small environments like microcontrollers. CREATE binds a word to an address, but nothing further than this. It's often used to create tables of things and the like. Invoking a CREATEd word, in the absence of DOES>, will return the address to which it is bound. E.g., VARIABLE FOO and CREATE FOO 0 , are semantically identical.

DOES> converts a word recently CREATEd into a parameterized colon definition -- in a sense, quite literally, into a singleton message on a singleton object. The "method's" stack-effects are ( i*x -- i*x self ), where self is a pointer to the object that WOULD have been returned by the CREATEd word had you not overridden its behavior with DOES>.

E.g.:

CREATE Munch
5 , 6 , 7 ,

Munch 3 cells dump

: muncher-boaWithName
CREATE , , ,
DOES> 3 cells dump ;

1 2 3 muncher-boaWithName myMuncher

Invoking myMuncher will dump the three cells comprising the instance of the entity known as myMuncher.

This construct proves invaluable when, e.g., you're writing a cross-assembler in Forth, for example, since many opcodes are of a common "type", with only specific bit-fields differing. E.g.:

( this example is inspired from similar
code in a 6502 assembler in Forth )

: ALU ( declarator for ALU instructions)
CREATE , DOES> @ $20 OR C, ;

$00 ALU ORA
$04 ALU AND
$08 ALU EOR
$0C ALU ROR
$10 ALU ROL
$14 ALU LSR
$18 ALU ASL
$1C ALU CMP

Constructor words may also be forward-parsing, allowing construction of whole matrices of data, the row of which is specified by the word invoked, and the column selected by parameters on the data stack at run-time. I used this technique in a 6502 assembler in Forth years ago.

To conclude, CREATE/DOES> is, in some sense, creates a construct very similar to a closure, and while it can result in substantial static code reduction, it has no direct influence on dynamic memory consumption (in my experience, they're often the same). It's real power comes from the syntactic abstraction it provides without having to resort to macros.

7:56 PM  

Post a Comment

<< Home