Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2011-12-16

Pattern Matching Against Linear Objects in ATS

In a project I’m working on I’m using linear lists. This is the list_vt type in the ATS prelude. list_vt is similar to the list types in Lisp and functional programming languages except it is linear. The memory for the list is not managed by the garbage collector and the type system enforces the rule that only one reference to the linear object can exist. This sometimes requires a bit of extra effort when using pattern matching against the list_vt instances.

Pattern Matching

When pattern matching against linear objects you can do a destructive match or a non-destructive match. The former will destroy and free the memory allocated for the object automatically. The latter will not. Destructive matches are done by having the pattern match clause prefixed with a ~. For example, the following will print an integer list and destroy the list while it does it:

fun print_list (l: List_vt (int)): void =
  case+ l of
  | ~list_vt_nil () => printf("nil\n", @())
  | ~list_vt_cons (x, xs) => (printf("cons %d\n", @(x)); print_list(xs))

fun test1 (): void = {
  val a = list_vt_cons {int} (1, list_vt_nil)
  val () = print_list (a)
}

Things get complicated when doing non-destructive matches. The following won’t typecheck:

fun print_list2 (l: !List_vt (int)): void =
  case+ l of
  | list_vt_nil () => printf("nil\n", @())
  | list_vt_cons (x, xs) => (printf("cons %d\n", @(x)); print_list(xs))

fun test2 (): void = {
  val a = list_vt_cons {int} (1, list_vt_nil)
  val () = print_list2 (a)
  val () = list_vt_free (a)
}

The problem with this example is that when the match is made we are effectively taking the linear object out of the variable l. This leaves l with a different type, but we’ve stated in the function signature for print_list2 that the type is not modified or consumed. We need a way of putting the linear object back into l once we’re done using the match. This primitive to do this is fold@ which I briefly introduced in my linear datatypes post. fold@ will change the type of l back to the original and prevent access to the pattern match variables. Usage looks like this:

fun print_list2 (l: !List_vt (int)): void =
  case+ l of
  | list_vt_nil () => (fold@ l; printf("nil\n", @()))
  | list_vt_cons (x, !xs) => (printf("cons %d\n", @(x)); print_list2(!xs); fold@ l)

fun test2 (): void = {
  val a = list_vt_cons {int} (1, list_vt_nil)
  val () = print_list2 (a)
  val () = list_vt_free (a)
}

You’ll notice with this version that the match for list_vt_cons has changed the xs parameter to be !xs. The second argument in the cons constructor is a linear object. If the object itself is matched against xs then it is another example of aliasing the linear object. It is taken out of the l and needs to be put back. The way ATS handles this is to require pattern matching with a ! prefixed. This makes xs be a pointer to the object rather than the object itself. So in this example xs has the type ptr addr where addr is the address of the actual List_vt object. This is why the xs is prefixed by ! in the recursive call to print_list2. The ! means dereference the pointer, so the List_vt it is pointing to is passed as the argument to the recursive call.

In this way the linear object is never taken out, we only access it via its pointer. The fold@ call in this clause will change xs back to the List_vt object. The fold@ call is done after the usage of !xs. If it was done before then we wouldn’t have access to the view for xs to be able to derefence it. print_list2 is still tail recursive as the fold@ call is only used during typechecking and is erased afterwards.

Filtering a linear list

In my project I needed to filter a linear list. Unfortunately ATS doesn’t have a filter implementation in the standard prelude for linear lists (it does for persistent lists). My first attempt at writing a list_vt_filter looked like:

fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) =
  case+ l of
  | list_vt_nil () => (fold@ l; list_vt_nil)
  | list_vt_cons (x, !xs) when f (x) => let
                                          val r = list_vt_cons (x, list_vt_filter (!xs, f))
                                        in
                                          fold@ l; r
                                        end
  | list_vt_cons (x, !xs) => let
                                val r = list_vt_filter (!xs, f)
                              in
                                fold@ l; r
                              end

This should look familiar since it’s very similar to the print_list2 code shown previously in the way it uses non-destructive matching and fold@. The function list_vt_filter takes a list_vt as an argument and a function to apply to each element in the list. That function returns true if the element should be included in the result list. Usage looks like:

val a  = list_vt_cons (1, list_vt_cons (2, list_vt_cons (3, list_vt_cons (4, list_vt_nil ()))))
val b  = list_vt_filter (a, lam (x) => x mod 2 = 0)
val () = list_vt_foreach_fun<int> (a, lam(x) =<> $effmask_all (printf("Value: %d\n", @(x))))
val () = list_vt_free (b)
val () = list_vt_free (a)

One issue with this implementation is it is not tail recursive. It has stack growth proportional to the size of the result list.

Tail Recursive Filtering

In Lisp code I’d often build the result list tail recursively by passing an accumulator, with each new element in the result being prepended to the accumulator. This builds a list in the reverse order so before returning it the list would be reversed. The ATS code for this is:

fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) = let
  fun loop (l: !List_vt (int), accum: List_vt (int)):<cloptr1> List_vt (int)  =
    case+ l of
    | list_vt_nil () => (fold@ l; accum)
    | list_vt_cons (x, !xs) when f (x) => let
                                            val r = loop (!xs, list_vt_cons (x, accum))
                                          in
                                            (fold@ l; r)
                                          end
    | list_vt_cons (x, !xs) => let
                                 val r = loop (!xs, accum)
                                in
                                  (fold@ l; r)
                                end
in
  list_vt_reverse (loop (l, list_vt_nil))
end

The cloptr1 function annotation marks the inner function as being a closure where the memory for the closure’s environment is managed by the compiler using malloc and free instead of the garbage collector (which is what cloref1 would signify). See my post on closures in ATS for more about the different closure and function types used by ATS.

Unfortunately the requirement to use fold@ after we’ve finished with using the pattern matched variables makes the code slightly more verbose as we need to do the tail recursion, obtaining the result, then do the fold@ and return the result. Remember that the fold@ is erased at type checking type which is how this code remains tail recursive even though the code structure makes it look like it isn’t.

One downside to this approach is we iterate over the list twice. Once to build the result, and once over the result to reverse it.

Single Pass Tail Recursive Filtering

The creation of the result list can be done in a single pass if we could create a cons with no second argument, and fill in that argument later when we have a result to store there that passes filtering. ATS allows construction of datatypes with a ‘hole’ that can be filled in later. The ‘hole’ is an unintialized type and we get a pointer to it. An example of doing this is:

var x = list_vt_cons {int} {0} (1, ?)

This creates a list_vt_cons with the data set to 1 but no second parameter. Instead of that parameter being of type List_vt (int) it is of type List_vt (int)?, the ? signifying it is uninitialized. For this example we have to pass the universal type parameters explicitly (the {int} {0}) as the ATS type inference algorithm can’t compute them.

To get a pointer to the ‘hole’ we have to pattern match:

val+ list_vt_cons (_, !xs) = x
val () = !xs := list_vt_nil
val () = fold@ x

In this example the xs is a pointer, pointing to the List_vt (int)?. It assigns a list_vt_nil to this, making the tail of the cons a list_vt_nil. Just like in our previous pattern matching examples using case, the code has to do a fold@ to change the type of x back to that containing a linear object once we’ve finished using xs.

Now that we can get pointers to the tail of the list we can implement a single pass tail recursive filter function:

fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) = let
  fun loop (l: !List_vt (int), res: &List_vt (int)? >> List_vt (int)):<cloptr1> void =
    case+ l of
    | list_vt_nil () => (fold@ l; (res := list_vt_nil))
    | list_vt_cons (x, !xs) when f (x) => let
                                            val () = res := list_vt_cons {int} {0} (x, ?)
                                            val+ list_vt_cons (_, !p_xs) = res
                                           in
                                             loop (!xs, !p_xs); fold@ l; fold@ res
                                           end
    | list_vt_cons  (x, !xs) => (loop (!xs, res); fold@ l)

  var res: List_vt (int)?
  val () = loop (l, res)
in
  res
end

The loop function here no longer turns a result. Instead the result is passed via a reference (the & signifies ‘by reference’). When there is something that needs to be stored in the list, a cons is created with a hole in the tail position. This cons is stored in the result we are passing by reference and we tail recursively call with the hole as the new result. ATS converts this to nice C code that is a simple loop rather than recursive function calls.

Miscellaneous

The code examples in this post use List_vt (a). This is actually a typedef for list_vt (a,n) where a is the type and n is the length of the list. The typedef allows shorter examples without needing to specify the sorts for the list length. Using the full type though has the advantage of being able to specifiy a bit more type safety. For example, the original filter function would be declared as:

fun list_vt_filter {n:nat} (l: !list_vt (int,n), f: int -<> bool): [r:nat | r <= n] list_vt (int, r)

This defines the type of the result as having a length equal to or less than that of the original list. This helps prevent errors in the implementation of the filter - it can’t accidentally leave extra items in the list. I cover this type of thing in my post on dependent types.

Another addition to safety that adding the extra sorts can provide is the ability to check that the function terminates. This can be done by adding a termination metric to the function definition:

fun list_vt_filter {n:nat} .<n>. (l: !list_vt (int,n), f: int -<> bool): [r:nat | r <= n] list_vt (int, r)

The compiler checks that n is decreasing on each recursive call. If this fails to happen the recursive calls may not terminate and it becomes a compile error. This is discussed in the Termination-Checking for Recursive Functions section of the Introduction to Programming in ATS book.

A description of how fold@ works is in the ATS/Anairats User’s Guide PDF. It’s in the ‘Dataviewtypes’ section of the ‘Programming with Linear Types’ chapter and is referred to as folding and unfolding a linear type.

It’s the usage of linear types and dealing with their restrictions that makes my examples a bit more complex. If you use ATS mainly with non-linear types and link with the garbage collector then it becomes very much like using any other functional programming language, but with additional features in the type system. My interest has been around avoiding using a garbage collector and having the compiler give errors when memory is not allocated or free’d correctly. Don’t be put off from using ATS by these complex examples if you’re fine with using garbage collection and non-linear datatypes. You might never need to deal with the cases that bring in the extra complexity.

Tags: ats 

2011-11-01

Using Records and C structs in ATS

ATS has record types which are like tuples but each item is referenced via a name. They closely approximate C structs and in the generated C code are represented in this way. The following uses a record to hold x and y values representing a point on a 2D plane:

fun print_point (p: @{x= int, y= int}): void =
  printf("%d@%d\n", @(p.x, p.y))

implement main () = {
  val p1 = @{x=10, y=20}
  val () = print_point (p1)
}

A literal record object in this example is created using the @{ ... } syntax. Dereferencing record fields is done using ’.’. The generated C code shows that the representation for the record is a C struct:

typedef struct {
  ats_int_type atslab_x ;
  ats_int_type atslab_y ;
} anairiats_rec_0 ;

The @ syntax used for the record literal and type marks the record as a ‘flat’ record. Flat records have value semantics and variables of this type have a size in bytes equivalent to the size of the underlying C structure. This is shown by the generated C code for the main function above:

ATSlocal (anairiats_rec_0, tmp4) ;

__ats_lab_mainats:
tmp4.atslab_x = 10 ;
tmp4.atslab_y = 20 ;

/* tmp3 = */ print_point_0 (tmp4) ;

Here the variable tmp4 is the p1 in our ATS code. It is an instance of the C struct representing the record and is created on the stack. It is initialized and passed to the print_point function by value:

ats_void_type
print_point_0 (anairiats_rec_0 arg0) {
  ATSlocal (ats_int_type, tmp1) ;
  ATSlocal (ats_int_type, tmp2) ;

  ...
}

Records can also be defined using a '{...} syntax (note the ' instead of @) for boxed records which are heap allocated and the memory managed by the garbage collector. Boxed records have pointer semantics and have a size in bytes equivalent to the size of a pointer:

implement main () = {
  val x = sizeof<@{x=int}>
  val y = sizeof<'{x=int}>
  val a = int_of_size x
  val b = int_of_size y
  val () = printf ("%d %d\n", @(a, b))
}

This outputs “4 8” showing the flat type as having the size of an int and the boxed type having the size of a pointer. Most usage of records in ATS I’ve done is using flat records as I tend to avoid the use of the garbage collector.

To pass a reference to a flat record so it can be modified by a function you need to mark the function argument as ‘by reference’ using &:

typedef point = @{x= int, y= int}

fun print_point (p: point): void =
  printf("%d@%d\n", @(p.x, p.y))

fun add1 (p: &point): void = {
  val () = p.x := p.x + 1
  val () = p.y := p.y + 1
}

implement main () = {
  var p1 = @{x=10, y=20}
  val () = add1 (p1)
  val () = print_point (p1)
}

In this example I use typedef to create a type alias for the record so I can refer to the type as point. The add1 function takes a point by reference (as indicated by the & prefix). This works like C++ reference arguments. The function effectively takes a pointer to an instance of the struct and can modify the instance passed to the function. For this to work the point passed to the function must be an lvalue. That is, it must be mutable. This is done in ATS by making it a var vs a val.

Note that it’s a type error to create an uninitialized point object and pass it to add1. For example, the following gives a type check error:

implement main () = {
  var p1: point?
  val () = add1 (p1)
  val () = print_point (p1)
}

Types that are unintialized have a ? suffix added to it. The type of p1, due to it being unintialized, is point?. Since add1 takes a point this fails type checking. Initializing it allows it to pass:

implement main () = {
  var p1: point
  val () = p1.x := 5
  val () = p1.y := 10
  val () = add1 (p1)
  val () = print_point (p1)
}

As well as passing by reference to functions you can pass a pointer and deal with the pointer management directly. This requires using the proof system and I hope to go through dealing with pointers and their associated proofs in a later post:

fun add1 {l:agz} (pf: !point @ l | p: ptr l): void = {
  val () = p->x := p->x + 1
  val () = p->y := p->y + 1
}

implement main () = {
  var p1 = @{x=10, y=20}
  val () = add1 (view@ p1 | &p1)
  val () = print_point (p1)
}

When interfacing with C API’s you often have to deal with C structures. In ATS you can declare a type as being defined as a struct in C with a matching ATS record definition so that ATS can be used to access the struct. The following example shows a struct declared in C, a function that uses it in C, and how this is wrapped in ATS:

%{^
typedef struct Point {
  int x;
  int y;
} Point;

void print_point (Point* p) {
  printf("%d@%d\n", p->x, p->y);
}
%}

typedef point = $extype_struct "Point" of {x= int, y= int}
extern fun print_point (p: &point): void = "mac#print_point"

implement main () = {
  var p1: point
  val () = p1.x := 10;
  val () = p1.y := 20;
  val () = print_point (p1)
}

The $extype_struct keyword creates a type that is represented by a C struct with the given name. By using the of {x= int, y= int} suffix we define the record layout as seen via ATS. This will stop ATS from creating its own structure to map the type and instead uses the C structure. The generated C code for the main function looks like:

ATSlocal (Point, tmp1) ;

__ats_lab_mainats:
/* Point tmp1 ; */
ats_select_mac(tmp1, x) = 10 ;
ats_select_mac(tmp1, y) = 20 ;
print_point ((&tmp1)) ;

Note it uses the C struct type directly. I use this approach in my 0MQ wrapper to wrap the zmq_msg_t and zmq_pollitem_t structures.

Tags: ats 

2011-10-28

Performing Compatible Updates of Mozilla Central Git Forks

Earlier this year I posted about how I created my git mirror of the mozilla-central mercurial repository. I’ve been keeping a fork on github updated regularly since then that a number of people have started using.

Users of my mirror have wanted to be able to keep up to date with the mercurial mozilla-central repository themselves, or add updates from other branches of the Mozilla source. It takes a large amount of time to start a conversion from scratch but it’s possible to start from the existing git mirror, perform the incremental updates from mercurial yourself, and still stay compatible with the incremental updates I push to my github mirror.

hg-git uses a git-mapfile in the .hg directory of the mercurial clone to keep track of the mapping between mercurial and git commits. By using the same git-mapfile that I use for my mirror you can start your own incremental update from the latest data in the git-mapfile instead of from the beginning. I keep an up-to-date git-mapfile in git-mapfile.bz2. It’s compressed with bzip2 as the uncompressed version is quite large.

The steps to start your own update process are:

  1. Clone the mercurial mozilla-central repository
  2. Clone my git mirror as a bare repository in .hg/git
  3. Place the git-mapfile in .hg
  4. Do an hg bookmark -f -r default master to mark the commit to convert to
  5. Perform hg gexport to update .hg/git with recent mercurial commits

The .hg/git directory should now be up-to-date with respect to the mercurial repository. And the additional git commits will have the same SHA id as any commits I push into my mirror when I perform my own update.

The steps to do an incremental update are the normal:

  1. Pull from the mercurial mozilla-central repository
  2. Run hg bookmark -f -r default master
  3. Run hg gexport
  4. Push or pull to/from .hg/git as needed

As a working example, the following shell commands on a Linux system should set up your own repository ready for incremental updating:

$ hg clone https://hg.mozilla.org/mozilla-central
$ cd mozilla-central/.hg
$ git clone --bare git://github.com/doublec/mozilla-central.git git
$ wget http://www.bluishcoder.co.nz/git-mapfile.bz2
$ bunzip2 git-mapfile.bz2
$ cd ..
$ hg bookmark -f -r default master
$ hg gexport
$ cd .hg/git
$ git push --all ~/some/other/repo.git

Incremental updates just repeat the last few commands above:

$ hg pull -u
$ hg bookmark -f -r default master
$ hg gexport
$ cd .hg/git
$ git push --all ~/some/other/repo.git

Note that I use hg gexport and push from the .hg/git repository using the git command instead of doing an hg push and relying on hg-git to do the conversion and push. In my original article I did the latter. The hg-git push method uses slower python based routines to do the push which can take a long time and large amounts of memory on big repositories like mozilla-central. By splitting this up into gexport and using the native git command I save a lot of time and memory.

Tags: mozilla  git 

2011-10-24

Building Rust

Update 2011-11-08 - The steps to build Rust have changed since I wrote this post, there’s now no need to build LLVM - it’s included as a submodule in the Rust git repository and will automatically be cloned and built.

A while ago I wrote a quick look at Rust post, describing how to build it and run some simple examples. Since that post the bootstrap compiler has gone away and the rustc compiler, written in Rust, is the compiler to use.

The instructions for building Rust in the wiki are good but I’ll briefly go through how I installed things on 64-bit Linux. The first step is to build the required version of LLVM from the LLVM source repository. I use a git mirror and install to a local directory so as not to clash with other programs that use older LLVM versions.

$ git clone git://github.com/earl/llvm-mirror.git
$ cd llvm-mirror
$ git reset --hard 9af578
$ CXX='g++ -m32' CC='gcc -m32' CFLAGS=-m32 CXXFLAGS=-m32 LDFLAGS=-m32 ./configure \
    --{build,host,target}=i686-unknown-linux-gnu \
    --enable-targets=x86,x86_64,cbe --enable-optimized --prefix=/home/chris/rust
$ make && make install

Note the use of prefix to ensure this custom LLVM build is a local install. I also do a git reset to go to the commit for SVN revision 142082 which is the current version that works with Rust according to the wiki. After installation add the install bin to the PATH and lib to LD_LIBRARY_PATH:

$ export PATH=~/rust/bin:$PATH
$ export LD_LIBRARY_PATH=~/rust/lib:$LD_LIBRARY_PATH

Next step, clone and build Rust:

$ git clone git://github.com/graydon/rust
$ mkdir build
$ cd build
$ ../rust/configure
$ make

The build process downloads an existing build for your platform to bootstrap from. It uses this to build a stage1 version of the compiler. This stage1 is used to build a stage2, and that is then used to build a stage3. All the built compilers should work exactly the same if things are working correctly. The compiler can be run within the build directory, or outside it if you put the stage3/bin directory in the path:

$ export PATH=~/path/to/build/stage3/bin:$PATH
$ rustc
error: No input filename given.

Test with a simple ‘hello world’ program:

$ cat hello.rs
use std;
import std::io::print;

fn main () {
  print ("hello\n");
}
$ rustc hello.rs
$ ./hello
hello
Tags: rust 

2011-10-19

Overloading Functions in ATS

ATS allows overloading of functions where the function that is called is selected based on number of arguments and the type of the arguments. The following example shows overloading based on type to provide a generic print function:

symintr myprint
fun myprint_integer (a: int)    = printf("%d\n", @(a))
fun myprint_double  (a: double) = printf("%f\n", @(a))
fun myprint_string  (a: string) = printf("%s\n", @(a))

overload myprint with myprint_integer
overload myprint with myprint_double
overload myprint with myprint_string

implement main() = {
  val () = myprint ("hello")
  val () = myprint (10)
  val () = myprint (20.0)
}

The keyword symintr introduces a symbol that can be overloaded. The keyword overload will overload that symbol with an existing function. In this example we overload with three functions that take different types. The overload resolution is performed at compile time. The actual C code generated for the main function includes:

/* tmp4 = */ myprint_string_2 (ATSstrcst("hello")) ;
/* tmp5 = */ myprint_integer_0 (10) ;
/* tmp3 = */ myprint_double_1 (20.0) ;

The ATS standard prelude includes a print function that is overloaded in this manner for most of the standard types. One downside to the way overloading works is the overload resolution sometimes fails in template functions. The following code gives a compile error for example:

fun {a:t@ype} printme (x: a) = print(x)

implement main() = {
  val () = printme (10)
  val () = printme (20.0)
}

The error given is that the symbol print cannot be resolved. The ATS compiler attempts to resolve the overload by looking up the t@ype sort. There is no overload for this so the resolution fails. This can be worked around using a template function to call the overloaded function and partially specialize the implementation of the new template function. The following code demonstrates this:

extern fun {a:t@ype} gprint (x: a):void
implement gprint<int> (x) = print_int(x)
implement gprint<double> (x) = print_double(x)

fun {a:t@ype} printme (x: a):void = gprint<a>(x)

implement main() = {
  val () = printme (10)
  val () = printme (20.0)
}

The print symbol can be overloaded with the new gprint function to allow print to be called over t@ype sorts:

extern fun {a:t@ype} gprint (x: a):void
implement gprint<int> (x) = print_int(x)
implement gprint<double> (x) = print_double(x)

overload print with gprint

fun {a:t@ype} printme (x: a) = print(x)

implement main() = {
  val () = printme (10)
  val () = printme (20.0)
}

This example is contrived in that you could just specialize printme but in real world code this issue comes up occasionally. The most common example for me has been using = in a template function, comparing arguments that are template type parameters. = is an overloaded function and the overload lookup fails in the same manner as above. A workaround is to create an equals template function specialized over the types you plan to compare as above.

Tags: ats 


Tags

Archives
Links