Sunday, October 29, 2006

Compilers and Interpreters in Factor

Writing compilers and interpreters in Factor is quite easy. By writing the grammar using parser combinators, and having those combinators produce an abstract syntax tree (AST), it's easy to write a simple compiler or interpreter for the tree. I've already covered writing simple parser combinators so I'll concentrate on processing the AST for a simple language in this post. A later post might generate a parser for it or it could be an 'exercise for the reader'.

For an example to start with I took the evaluator defined in this Lambda the Ultimate posting and converted it to Factor. For the interpreter I tried two different approaches to compare them. The first was to use pattern matching. For the second I use generic functions and Factor's OO system.

For the compiler I show how to compile to Factor code, which can then be interpreted or compiled by Factor itself, and I also compile to Javascript. This javascript can be run in a standalone Javascript interpreter like Rhino, or in a web browser.

To more easily re-use the AST across the two examples I added the ability to pattern match on tuples to the 'match' contrib library. This can be obtained from the standard Factor darcs repository:
darcs get http://www.factorcode.org/repos

The factor 'latest' boot images can be used to bootstrap from this repository. The instructions on how to do this are on factorcode.org. For an Intel x86 linux machine the steps would be:
darcs get http://www.factorcode.org/repos
cd factor/Factor
make linux-x86
wget http://www.factorcode.org/images/latest/boot.image.x86
./f boot.image.x86
The examples here should also work with the released Factor 0.85 as long as you replace 'contrib/match/match.factor' with the version from my repository. To load the 'match' library in Factor use:
"contrib/match" require
USE: match
In the Factor code snippets below I use '=>' to show the result of running Factor words. You shouldn't actually type that. Instead the result will be shown on the Factor stack or you can use '.' to print it.

The code for the examples in this post can be downloaded from interpreter.factor. If you copy it into the 'contrib' directory of your Factor installation you can load it with:

"contrib/interpreter" require
The evaluator that is being implemented has the following description from the Lambda the Ultimate posting:
data Term a where
Lit :: Int -> Term Int
Inc :: Term Int -> Term Int
IsZ :: Term Int -> Term Bool
If :: Term Bool -> Term a -> Term a -> Term a
Pair :: Term a -> Term b -> Term (a,b)
Fst :: Term (a,b) -> Term a
Snd :: Term (a,b) -> Term b

eval :: Term a -> a
eval (Lit i) = i
eval (Inc t) = eval t + 1
eval (IsZ t) = eval t == 0
eval (If b t e) = if eval b then eval t else eval e
eval (Pair a b) = (eval a, eval b)
eval (Fst t) = fst (eval t)
eval (Snd t) = snd (eval t)
Basically we have literal values, which in this example are integers. These can be incremented and tested to see if they are zero. A 'pair' can be created and the first and second elements obtained from the pair. An 'if' expression is available for testing an expression and evaluating a true or false clause. These are represented using Factor tuples:
TUPLE: lit i ;
TUPLE: inc t ;
TUPLE: isz t ;
TUPLE: iff b t e ;
TUPLE: pair a b ;
TUPLE: fst t ;
TUPLE: snd t ;
I used the same names for the types and elements as the example but changed 'if' to 'iff' to prevent clashing with Factor's standard 'if' word. An example tree can be created that is the increment of the number 5 with:
5 <lit> <inc>
When evaluated this should produce the result '6'.

The interpreter implementation that uses pattern matching uses 'match-cond'. This requires match variables to be set up which can be used to destructure sequences and tuples. For example:
5 <lit> T{ lit f ?t } match [ ?t ] bind => 5
'T{ lit f 123 }' is the Factor syntax for a tuple. It creates a tuple at parse time and the actual object is stored in the word. This is different from '123 <lit>' which will produce the literal object at runtime. '?t' is a matching variable and the 'match' and 'match-cond' words use these to destructure matching elements of tuples and sequences. The match variable can then be used to get the actual value obtained. For more on the Factor pattern matching system you can read the documentation from within Factor with:
\ match help
The Factor implementation of the interpreter using pattern matching is:
: eval1 ( a -- a )
{
{ T{ lit f ?i } [ ?i ] }
{ T{ inc f ?t } [ ?t eval1 1+ ] }
{ T{ isz f ?t } [ ?t eval1 zero? ] }
{ T{ iff f ?b ?t ?e } [ ?b eval1 [ ?t ] [ ?e ] if eval1 ] }
{ T{ pair f ?a ?b } [ ?a eval1 ?b eval1 2array ] }
{ T{ fst f ?t } [ ?t eval1 first ] }
{ T{ snd f ?t } [ ?t eval1 second ] }
} match-cond ;
It is pretty much a direct translation of the Haskell example. An example of using it:
5 <lit> <inc> eval1 => 6
The Lambda the Ultimate post gave a usage example:
stuff = Pair (If (IsZ (Inc (Inc (Lit 5))))
(Lit 6)
(Lit 7))
(Fst (Snd (Pair (Lit 42)
(Pair (Lit 8) (Lit 9)))))

Translated to Factor it is:
: driver ( -- v )
5 <lit> <inc> <inc> <isz>
6 <lit>
7 <lit>
<iff>
42 <lit> 8 <lit> 9 <lit> <pair> <pair> <snd> <fst> <pair> ;
Running the 'eval1' word on this produces the answer, { 7 8 }:
driver eval1 => { 7 8 }
Although 'eval1' is quite short and easy to understand it has a disadvantage in that to extend it with new AST types you need to modify the 'eval1' word. Using the Factor OO generic word system you extend the interpreter without having to modify existing code. Here's a generic word approach to the interpreter:
GENERIC: eval2 ( a -- a )

M: lit eval2 ( a -- a ) lit-i ;
M: inc eval2 ( a -- a ) inc-t eval2 1+ ;
M: isz eval2 ( a -- a ) isz-t eval2 zero? ;
M: iff eval2 ( a -- a ) dup iff-b eval2 [ iff-t ] [ iff-e ] if eval2 ;
M: pair eval2 ( a -- a ) dup pair-a eval2 swap pair-b eval2 2array ;
M: fst eval2 ( a -- a ) fst-t eval2 first ;
M: snd eval2 ( a -- a ) snd-t eval2 second ;
The 'M:' word is used to define a method for a generic word. This is similar to how generic functions and methods work in CLOS and Dylan. When a generic word is called the actual method invoked is based on the topmost item of the stack. The first symbol past the 'M:' is the type of that object that that method is for. The second is the name of the generic word the method will be added too. The rest of the definitions should be reasonably clear - they break apart the tuples and return results or call 'eval2' recursively. 'eval2' produces the same result as 'eval1':
driver eval2 => { 7 8 }
Which approach to use is really personal preference. Of the two, 'eval2' is the most efficient. This is because methods can be compiled in Factor whereas the current implementation of 'match-cond' cannot be. So 'eval1' will run in the Factor interpreter whereas 'eval2' will be compiled to native code. This can be seen by trying to compile the examples and running a timed test:
\ eval1 compile => "The word eval1 does not have a stack effect"
\ eval2 compile
\ eval1 compiled? => f
\ eval2 compiled? => t
[ 1000 [ driver eval1 ] times ] time => 487ms run / 5 ms GC time
[ 1000 [ driver eval2 ] times ] time => 3ms run / 0 ms GC time
There is obviously room for improvement in the 'match' routines! I can say that because I wrote them :)

A compiler follows the same structure as the interpreter but instead of computing the result we generate code. This code is then later fed to the compiler. For the compiler examples I'm only going to show the pattern matching based example. The generic word implementation is very similar and would follow the relevant interpreter implementation.

Factor provides a 'make' word that can be used for dynamically appending to a new sequence. From within a 'make' scope you can use ',' to append an element and '%' to splice in a sequence to the constructed sequence. 'make' can be used for constructing arrays, strings, quotations, etc. The type of the constructed sequence is identified by an exemplar sequence passed to 'make'. For example:
[ 1 , 2 , { 3 4 } % { 5 } , ] { } make => { 1 2 3 4 { 5 } }
The top level 'compile' word will set up a 'make' scope for the AST compile routines to append to. Here's the compiler to Factor:
: (compile1) ( a -- )
{
{ T{ lit f ?i } [ ?i , ] }
{ T{ inc f ?t } [ ?t (compile1) \ 1+ , ] }
{ T{ isz f ?t } [ ?t (compile1) \ zero? , ] }
{ T{ iff f ?b ?t ?e } [ ?b (compile1)
[ ?t (compile1) ] [ ] make ,
[ ?e (compile1) ] [ ] make ,
\ if , ] }
{ T{ pair f ?a ?b } [ ?a (compile1) ?b (compile1) \ 2array , ] }
{ T{ fst f ?t } [ ?t (compile1) \ first , ] }
{ T{ snd f ?t } [ ?t (compile1) \ second , ] }
} match-cond ;

: compile1 ( a -- quot )
[ (compile1) ] [ ] make ;
As you can see it is very similar to the interpreter. Instead of returning the result of the AST evaluation '(compile1)' appends the equivalent Factor code to the constructed quotation created in the 'compile1' word.

The implementation for handling 'lit' just appends the numeric value directly. For 'inc' it calls '(compile1)' on the term to be incremented so that gets appended to the quotation. It then appends the Factor '1+' word which will be used to increment it. The '\' is an escape mechanism to tell Factor to store the word itself rather than call it.

The main complication is in the implementation of 'iff'. This word must delay the computation of the two terms of the 'iff' statement. To to this it creates Factor quotations with 'make' and then recursively calls '(compile1)' for the terms. The results of the compilation for this recursive call will be stored in the most recently created quotation rather than the one in the top level 'compile1' word. That is, nested 'make' calls are dynamically scoped.

An example of usage of the compiler:
5 <lit> <inc> compile1 => [ 5 1 + ] 
[ 5 1 + ] call => 6
The 'driver' AST shown previously compiles to Factor correctly:
driver compile1 => [
5 1+ 1+ zero? [
6
] [
7
] if 42 8 9 2array 2array second first 2array
]
This can be compiled to native code using Factor or run in the Factor interpreter:
driver compile1 compile-quot execute => { 7 8 }
driver compile1 call => { 7 8 }
The final example is a compiler to Javascript. Again it follows the same basic outline of the interpreter. Instead of generating a Factor quotation it uses 'make' to generate a string:
: (compile2) ( a -- )
{
{ T{ lit f ?i } [ ?i number>string % ] }
{ T{ inc f ?t } [ ?t (compile2) "+1" % ] }
{ T{ isz f ?t } [ ?t (compile2) "== 0" % ] }
{ T{ iff f ?b ?t ?e } [
"function() {if(" %
?b (compile2)
") {return " %
?t (compile2)
"} else { return " %
?e (compile2)
"}}()" %
]
}
{ T{ pair f ?a ?b } [ "{ first:" % ?a (compile2) ",second:" % ?b (compile2) " }" % ] }
{ T{ fst f ?t } [ ?t (compile2) ".first" % ] }
{ T{ snd f ?t } [ ?t (compile2) ".second" % ] }
} match-cond ;

: compile2 ( a -- quot )
[ "(" % (compile2) ")" % ] "" make ;
The main complication with this compiler was handling 'if'. The Javascript 'if' has no result so cannot be assigned immediately. So I wrap the 'if' in a function which is called immediately. The two terms of the 'if' use 'return' to return the result from the function. A simple example:
5 <lit> <inc> compile2 => "(5+1)"
And 'driver' also compiles successfully:
driver compile2 => 
"({
first:function() {
if(5+1+1== 0) {
return 6
} else {
return 7
}
}(),
second: {
first:42,
second: { first:8,second:9 }
}.second.first
})"
This result evaluates successfully in a browser. You can test it by clicking here.

Categories:

Monday, October 23, 2006

Narrative Javascript beta 1 released

Neil Mix has made Beta 1 of Narrative Javascript available. This release includes an official threading API with support for futures. I'll be revisiting my previous examples of using threading and futures with Narrative Javascript to move things over to this new API.

Categories:

Factor 0.85 released

Slava has released Factor 0.85. There are lots of great additions and improvements.

Categories:

Tuesday, October 17, 2006

Factor Parser Combinator example

I was asked on IRC how to use parser combinators to write a simple translator from s-expressions to Factor quotations. The translation was quite simple - here are some examples of how it was supposed to work:
(set a 10) 
=> [ "set" "a" 10 ]
(set square (lambda (n) (* n n)))
=> [ "set" "square" [ "lambda" [ "n" ] [ "*" "n" "n" ] ] ]
This looked like it should be fairly simple so I put together a quick implementation.

The parser combinators were recently rewritten so this code is likely to work best with the Darcs version of Factor. Instructions on how to get this going are at the Factor website. Basic instructions for an Intel Linux based machine would go something like:
darcs get http://www.factorcode.org/repos
cd repos/Factor
make linux-x86
wget http://www.factorcode.org/boot.image.x86
./f boot.image.x86
./f
To run the examples from within Factor you will need to load the parser-combinators module and use it:
"contrib/parser-combinators" require
USE: parser-combinators
USE: lazy-lists
A parser combinator is a tuple that has specialised the 'parse' generic word. This word has stack effect ( input parser -- result ). The input is a sequence of tokens (usually a string) and the result is what is known as a lazy list of successes.

A 'list of sucesses' is a list of all possible sucessful parse results. It being lazy, each result in the list is not computed (ie. the parse is not done) until that list element is requested. This is how parser combinators handle backtracking. If the first parse result in the list is not what is required, the second can be tried, etc.

Parsers can be written manually or existing parsers can be combined together using combinators. A number of existing parsers and combinators are provided in the 'parser-combinators' vocabulary to build simple parsers.

To start with the s-expression parser we can handle the '(' and ')' using the 'token' word. This word returns a parser that parses only the given string:
LAZY: 'lparens' ( -- parser )
"(" token ;

LAZY: 'rparens' ( -- parser )
")" token ;
Notice these are defined with the 'LAZY:' word instead of ':'. This means the result of the word is a promise and needs to be forced to get the actual value. Although not strictly necessary with these simple parsers it is required for those that recursively call themselves so I generally always define parsers using it. Examples of usage:
"(" 'lparens' parse car . => T{ parse-result f "(" T{ slice f "(" 1 1 } }
The result of the parse is a lazy list. Returning the 'car' of that list shows the first parse result. The 'parse-result' tuple has two slots. The first slot is the parse tree returned by the parser. The second is the rest of the input string remaining to be parsed if any. Usually the latter is a slice for efficiency purposes. The parsers created with 'token' return the token itself as the parse tree.

The s-expression parser needs to parse numbers and identifiers. Numbers are digits repeated multiple times. A parser for digits could be:
LAZY: 'digit' ( -- parser )
[ digit? ] satisfy ;
This uses the 'satisfy' parser generator word. Where 'token' parses an input string for a specified string, 'satisfy' will call a given quotation with the first character of the input string. If the quotation returns 'true' then the parse is succesful otherwise it fails. Examples of using this definition of 'digit' are:
"123" 'digit' parse car parse-result-parsed . => 49
The 'parse-result-parsed' word returns just the parse tree of the result from the 'parse-result' tuple. The result, '49', is the character code of the successfully parsed digit. Ideally I want the actual number returned rather than the character code.

The '<@' word applies a transformation to the parse tree of a parser. It converts an existing parser to one that does what the original parser did but has the transformation applied. '<@' takes a quotation on the stack with stack effect ( old-tree -- new-tree ). This quotation is called to perform the transformation. One way to remember the effect of '<@' is to note that '<' sign points to the parser being transformed. Here is 'digit' converted to return a number:
LAZY: 'digit' ( -- parser )
[ digit? ] satisfy [ digit> ] <@ ;

"123" 'digit' parse car parse-result-parsed . => 1
'digit>' is a standard Factor word that coverts the character code of a digit into the digit itself. A number is one or more digits in a row. The <+> word takes a parser as input and returns a parser that parses one or more instances of the original. It has the same meaning as '+' in regular expressions. This allows a 'number' parser word to be written as:
LAZY: 'number' ( -- parser )
'digit' <+> ;

"123" 'number' parse car parse-result-parsed . => { 1 2 3 }
Notice here that the result is an array of the digits in the number. This is the parse tree that <+> generates - an array of the results of the original parser. By using '<@' this can be converted into a numeric value using Factor's 'reduce' word (basically a left fold):
LAZY: 'number' ( -- parser )
'digit' <+> [ 0 [ swap 10 * + ] reduce ] <@ ;

"123" 'number' parse car parse-result-parsed . => 123
Interestingly 'number' actually returns more than one parse result since "123" contains more than one occurrence of digits in a sequence:
"123" 'number' parse [ parse-result-parsed ] lmap [ . ] leach 
=> 123
12
1
It returns the longest match first which is what we usually want. As the list is lazy if we don't request anything beyond the first match then the remaining parse results aren't actually computed.

Next on the list is identifiers. These are any sequences of characters that are not numbers, spaces, or parenthesis. The 'satisfy' word can be used for this, combined with <+>:
LAZY: 'identifier' ( -- parser )
[
[ blank? not ] keep
[ digit? not ] keep
[ CHAR: ( = not ] keep
CHAR: ) = not
and and and
] satisfy <+> [ >string ] <@ ;

"foo" 'identifier' parse car parse-result-parsed . => "foo"
The '<@' word is used to transform the result into a string otherwise we'd get an array of the character codes in the identifier as a result.

Often it is desired to parse either a number or an identifier. The 'atom' word does this:
LAZY: 'atom' ( -- parser )
'identifier' 'number' <|> ;

"123" 'atom' parse car parse-result-parsed . => 123
"foo" 'atom' parse car parse-result-parsed . => "foo"
It uses '<|>', known as the choice word. It takes two parsers on the stack, and generates a parser which when run will try the first parser on the input string. If it fails it will then try the second parser. It returns the list of successes resulting from both parses.

A first cut at parsing a single expression like '(set a 10)' requires a sequencing parser combinator. This is '<&>'. It takes two parsers and returns a resulting parser which when run will run the first parser on the input string, and then run the second parser on the remaining unparsed portion of the input string of each result from the first parser. A simple expression parser might look like:
LAZY: 'expr1' ( -- parser )
'lparens'
'atom' <&>
'rparens' <&> ;

"(123)" 'expr1' parse car parse-result-parsed . =>
{ { "(" 123 } ")" }
This results in a very ugly parse tree. The '<&>' word returns a parse tree which is an array of the two parsers it uses. Since we have two '<&>' calls we get nested results. This can be fixed using two variants of '<&>'. They are '<&' and '&>'. These words do the same as '<&>' but don't put the result of one of the parsers in the parse tree. The '>' or '<' point to the parser that will have the result returned. Here's a new version that has a nicer parse tree using these words:
LAZY: 'expr2' ( -- parser )
'lparens'
'atom' &>
'rparens' <& ;

"(123)" 'expr2' parse car parse-result-parsed . => 123
Notice the variants of '<&>' used both select the result of the 'atom' parser to be included in the parse tree and not the results of the parenthesis parsers.

There is still a problem with the expression parser. It doesn't handle more than one 'atom' in the expression. Similar to '<+>' there is '<*>' which takes a parser and returns a new parser which when called parses zero or more instances of the original parser. The parse tree for the result of '<*>' is an array of the results of the original parser:
LAZY: 'expr3' ( -- parser )
'lparens'
'atom' sp <*> &>
'rparens' <& ;

"(set a 123)" 'expr3' parse car parse-result-parsed . =>
{ "set" "a" 123 }
This has one other change in that it needs to handle white space between atoms. The 'sp' word takes a parser (in this case 'atom') and returns a parser that first removes all white space from the start of the input string and then calls the original parser. The effect is to produce a parser that ignores white space.

This gets close to what out test case requires but it returns an array not a quotation. Using '<@' fixes this:
LAZY: 'expr4' ( -- parser )
'lparens'
'atom' sp <*> &>
'rparens' <& [ >quotation ] <@ ;

"(set a 123)" 'expr4' parse car parse-result-parsed . =>
[ "set" "a" 123 ]
Unfortunately this fails on our second test case which requires handling nested expressions like '(+ 1 (+ 2 3) 4)'. By recursively calling the 'expr' word this is easy to add:
LAZY: 'expr5' ( -- parser )
'lparens'
'atom' 'expr5' <|> sp <*> &>
'rparens' <& [ >quotation ] <@ ;

"(+ 10 (+ 20 30))" 'expr5' parse car parse-result-parsed . =>
[ "+" 10 [ "+" 20 30 ] ]
This change was as simple as replacing the 'atom' parser used between the two parenthesis with a parser which checks for an atom or an expression. Note that this recursively calls itself, which is safe from infinite recursion due to the lazy evaluation. It doesn't actually get evaluated unless the 'atom' test fails first. One final change is to allow for only atoms as well as expressions and then our simple test cases work and it is completed:
LAZY: 'expr' ( -- parser )
'atom'
'lparens'
'atom' 'expr' <|> sp <*> &>
'rparens' <& [ >quotation ] <@ <|> ;

"(set square (lambda (n) (* n n)))" 'expr' parse car parse-result-parsed . =>
[ "set" "square" [ "lambda" [ "n" ] [ "*" "n" "n" ] ] ]

"123" 'expr' parse car parse-result-parsed . =>
123

"hello" 'expr' parse car parse-result-parsed . =>
"hello"
These examples should work using the parser combinators code in the current Factor darcs repository and in the upcoming 0.85 release. I'm currently writing more documentation for the parser combinators and it will be accessable using the standard Factor help system. The source for this example can be found here.

Categories:

Monday, October 16, 2006

Flapjax

Flapjax is a very interesting looking client side browser framework. It's a superset of Javascript that provides Functional Reactive Programming to the client side browser. From the Flapjax website:
Flapjax is a new programming language designed around the demands of modern, client-based Web applications. Its principal features include:
  1. Templating syntax
  2. Event-driven, reactive evaluation
  3. Persistent data saved on a data store we provide
  4. Convenient data sharing
  5. Access-control for shared data
  6. Interfaces to external Web services
Lambda the Ultimate has a discussion about it and more information from one of the developers can be found here.

Categories: ,

Tuesday, October 03, 2006

Online Poker in Jeopardy

As those that hang out in #concatenative know I've been playing online poker recently. After a slow start and a bit of book study I started winning some multi table tournaments and doing fairly well. I made more in poker in the last two weeks than I do at my job which is pretty neat. Actually that says more about how much I'm paid at work than my poker skills but never mind :-)

So, as luck would have it, just as I find something I can enjoy doing and get the odd win from, disaster strikes. The US government has passed a bill that bans funding of online gaming via credit cards and bank accounts. It looked for awhile like this bill wasn't going to go ahead but it was tacked onto another 'must pass' bill, which had nothing to do with online gaming, and got through. Slashdot recently covered this.

Most online poker sites are not based in the US but many of the players are. There are a lot of professional poker players that make a living playing on various sites. Their income just got destroyed from the looks of things. Recently, the Wold Championship for Online Poker was held. The winner got $600,000 US. See the results here. The top two placegetters in that list are from the US - they are now legally unable to play it seems, when George Bush signs the bill into law.

This innocent seeming bill has also sent a number of companies into a tailspin on the UK stock exchange. Neteller, a company similar to PayPal, handles much of the online gaming monetary transactions for deposits and withdrawals. Their stock has plummeted with the news. Hopefully they'll recover since that's where my winnings are currently!

I know that things like online access to gambling are quite a thorny issue. Personally I think such issues should be up to the individual and those that have trouble dealing with addiction issues should be helped - but not by punishing those without the problems.

It may seem that the law change is to help those with gambling addictions in the US - so they can't immediately gamble all their credit card money online. But the bill apparently excludes state sponsered gambling. State lotteries, Horse Racing, etc are all still allowed. This points to the real reason for the ban in my opinion - the fact that no taxes are obtained by the US from people playing online games for money through overseas companies.

There is much discussion on the issue in Two Plus Two's Poker Legislation Forum, from the perspective of the players, and in rec.gambling.poker.

Categories: