Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2010-11-23

More on type safety using C and ATS

I wrote previously about using the ATS programming language to safely use C libraries. I've recently been using ATS in a couple of projects and have had to access a few more C libraries. This post goes through a few more ways ATS can be used to make it harder to incorrectly use C libraries by making some classes of errors fail the typechecker.

For my project I needed to parse JSON requests. A good C library for this purpose is jansson. This library has a good implementation and documentation.

JSON objects used in the jansson API are reference counted and the documentation is quite clear about which API calls take ownership of references and which don't. Even so there are some areas where seemingly correct code can cause problems. These edge cases are what I'd hope an ATS wrapper could detect at compile time.

Reference Counting

Mismanaging reference counts are an obvious source of error. The following C program fails to dereference a JSON object and results in a memory leak:

#include <stdio.h>
#include <jansson.h>

int main() {
  json_t* a = json_string("this is a string");
  const char* s = json_string_value(a);
  printf("String value: %s\n", s);
  return 0;
}   

Running the resulting program under valgrind confirms the leak. The fix is to decrement the reference count for the JSON object:

json_t* a = json_string("this is a string");
const char* s = json_string_value(a);
printf("String value: %s\n", s);
json_decref(a); 

By modelling the JSON types as linear types in ATS we can ensure that the object is destroyed at compile time. The is done by declaring an 'absviewtype' for the 'json_t' equivalent and in the function declarations we preserve or consume the linear type as needed. The ATS definitions to wrap the jansson code looks like:

%{^
#include <jansson.h>
%}

absviewtype JSONptr (l:addr)

extern fun json_string
  (value: string)
  : [l:addr] JSONptr l = "#json_string"
extern fun json_string_value
  {l1:agz} (json: !JSONptr l1)
  : string = "#json_string_value"
extern fun json_decref
  {l:agz} (json: JSONptr l)
  : void = "#json_decref"
extern fun JSONptr_is_null
  {l:addr} (x: !JSONptr l)
  :<> bool (l == null) = "#atspre_ptr_is_null"
extern fun JSONptr_isnot_null
  {l:addr} (x: !JSONptr l)
  :<> bool (l > null) = "#atspre_ptr_isnot_null"

With these wrappers defined the body of the 'main' function looks very similar to the C version:

implement main () = () where {
  val a = json_string("this is a string")
  val () = assert_errmsg(JSONptr_isnot_null a, "json_string failed")
  val s = json_string_value(a)
  val () = print(s)
  val () = print_newline()
  val () = json_decref(a)
}

The ATS version however provides a couple of extra safety guarantees. According to the jansson documentation, 'json_string' can return NULL. I've made this part of the type in the ATS definition of 'json_string'. I've declared 'json_string_value' as requiring a non-null pointer. ATS will not compile the program if we don't check 'a' for null.

I've declared 'json_decref' as consuming the linear type. This means if the call to 'json_decref' is excluded like the original C version then the program will fail to compile.

Borrowed Objects

In jansson some API functions return 'borrowed' objects. These are reference counted JSON objects where the reference count has not been incremented. The caller does not own the object. If the caller wants to keep the object around it needs to increment the reference count itself.

This is probably done for performance reasons so the caller can optimize out reference counts if it's just retrieving the object to call a method on it then discarding it.

The jansson documentation is very good at identifying which functions return borrowed objects and which ones return objects owned by the caller.

Here is an example C program that incorrectly uses a borrowed object after it has been destroyed:

int main() {
  json_error_t e;
  json_t* a = json_loads("[\"a\", \"b\", \"c\"]", 0, &e);
  assert(a);
  json_t* a2 = json_array_get(a, 1);
  assert(a2);
  json_decref(a);
  const char* s = json_string_value(a2);
  printf("String value: %s\n", s);
  return 0;
}

Here the 'a2' object is used after 'a' is destroyed. But 'a2' is a borrowed object, owned by 'a'. When 'a' is destroyed so too is 'a2'. This compiles and even runs (on some systems it may fail) but gives an incorrect result. Running under valgrind shows the memory errors.

The equivalent ATS program (note this won't compile):

implement main () = () where {
  var e: json_error_t?
  val a = json_loads("[\"a\", \"b\", \"c\"]", 0, e)
  val () = assert_errmsg(JSONptr_isnot_null a, "json_loads failed")
  val (pf_a2 | a2) = json_array_get(a, 1)
  val () = assert_errmsg(JSONptr_isnot_null a2, "json_array_get failed")
  val () = json_decref(a)
  val s = json_string_value(a2)
  val () = print(s)
  val () = print_newline()
}

This requires ATS definitions for the additional jansson calls:

abst@ype json_error_t = $extype "json_error_t"
extern fun json_array_get
  {n:nat} {l1:agz} (json: !JSONptr l1, index: int n)
  : [l2:addr] (minus(JSONptr l1, JSONptr l2) | JSONptr l2)
  = "#json_array_get"
extern fun json_loads
  (input: string, flags: int, error: &json_error_t?)
  : [l:addr] JSONptr l = "#json_loads" 

The 'json_array_get' function is defined to return a proof. The 'minus' is an ATS standard library proof function. It basically states that the result of 'json_array_get' is owned by the object originally passed to it. This proof must be consumed when the result is no longer needed.

By requiring this proof, and the need for it to be consumed, we ask the programmer to define the lifetime of the borrowed object. If we don't consume the proof we get a compile error. If we call 'json_decref' on the owning object then we can't consume the proof later as the owning object ('a' in this example) is consumed by the 'json_decref'. This forces the proof to be consumed within the lifetime of the owning object.

Another feature of the 'json_array_get' definition is that the 'index' parameter is defined as being a natural number. It is a compile error to pass a negative number as the index.

The correct ATS code becomes:

implement main () = () where {
  var e: json_error_t?
  val a = json_loads("[\"a\", \"b\", \"c\"]", 0, e)
  val () = assert_errmsg(JSONptr_isnot_null a, "json_loads failed")
  val (pf_a2 | a2) = json_array_get(a, 1)
  val () = assert_errmsg(JSONptr_isnot_null a2, "json_array_get failed")
  val s = json_string_value(a2)
  val () = print(s)
  val () = print_newline()
  prval () = minus_addback(pf_a2, a2 | a)
  val () = json_decref(a)
}

Note the call to the proof function 'minus_addback'. This is what returns the borrowed object to the owning object in the eyes of the proof system. Remember that these proof functions all occur at compile time. There is no runtime overhead.

We could control the lifetime of the borrowed object by incrementing its reference count and decrementing it when we're done. But this has runtime overhead. The point of 'borrowed' objects is for efficiency and we use the type system here to keep that but ensure we're still correct. Of course if we need 'a2' to outlive 'a' we can control its reference count.

Pointers to internal memory

Similar to the borrowed object issue are API calls that return pointers to memory internal to JSON objects. In the first example I used a call to 'json_string_value'. What's not documented is who owns this string memory. There is no API to 'free' it so the assumption is it's owned by the JSON object somehow. Looking at the jansson source you can see that the string returned is actually a pointer to the internal string value. This means its lifetime is bound by the lifetime of the JSON object.

There is also a 'json_string_set' call which lets you change the value of the string. This free's the old internal pointer and allocates a new one. Any prior call to 'json_string_value' will be pointing to free'd memory if the pointer to it remains live. Here's an example of a bad C program that does this:

int main() {
  json_t* a = json_string("this is a string");
  const char* s = json_string_value(a);
  json_string_set(a, "oops");
  printf("Old string value: %s\n", s);
  json_decref(a);
  return 0;
}

The call to 'json_set_string' prior to the printing of the prior call to 'json_string_value' results in the print using free'd memory.

In ATS we can enforce the constraint that if any pointer to the returned 'json_string_value' is live then it is a compile error to call 'json_set_string', or to decrement the reference count of the owning JSON object. This is what the ATS code equivalent to the above looks like, and won't compile:

implement main () = () where {
  val a = json_string("this is a string")
  val () = assert_errmsg(JSONptr_isnot_null a, "json_string failed")
  val (pf_s | s) = json_string_value(a)
  val _ = json_string_set(a, "oops")
  val () = print(s)
  val () = print_newline()
  prval () = JSONptr_minus_addback(pf_s, s | a)
  val () = json_decref(a)
} 

Removing the 'json_string_set' call, or moving it to after the 'JSONptr_minus_addback' call allows it to compile. To get this to work I modified the wrappers for the jansson types so that they keep track of the count of callers to 'json_string_value'. The linear type for the 'json_t' pointer becomes:

absviewtype JSONptr (l:addr,n:int)

Previously it was parameterized over the pointer address. Now it includes an integer count. This count starts at zero when an object is created:

extern fun json_string
  (value: string)
  : [l:addr] JSONptr (l, 0) = "#json_string"

A call to 'json_string_value' increments the count. So the type of the JSON object after this call has a non-zero value:

extern fun json_string_value
  {l1:agz} {n:int} (
    json: !JSONptr (l1, n) >> JSONptr (l1, n+1)
  ) : [l2:addr] (JSONptr_minus (l1, strptr l2) | strptr l2)
  = "#json_string_value"

The ATS syntax 'X >> Y' means 'current_type_of_object >> type_of_object_after_call'. The return value has a 'JSONptr_minus' proof which is a modified version of the 'minus' proof discussed previously. It serves the same purpose but is simplified since it doesn't need to care about the 'n' value. There is an equivalent 'JSONptr_minus_addback' as well:

absview JSONptr_minus (addr, view)
extern prfun JSONptr_minus_addback {l1:agz} {v:view} {n:int}
  (pf1: JSONptr_minus (l1, v), pf2: v |
   obj: !JSONptr (l1, n) >> JSONptr (l1, n-1)): void

Notice that the 'JSONptr_minus_addback' call reduces the value of the 'n' parameter in the object. The secret to not allowing 'json_string_set' to be called if any 'json_string_value' results are live is to declare 'json_string_set' as accepting a JSON object where the count of live strings is only zero. 'json_decref' is changed in a similar manner so that it can only be called on objects that don't have internal strings active:

extern fun json_string_set
  {l:agz} (
    json: !JSONptr (l, 0), value: string
  ) : int = "#json_string_set"
extern fun json_decref
  {l:agz} (json: JSONptr (l, 0))
  : void = "#json_decref"

With these changes the type system will prevent the user from making the mistake of keeping pointers to unallocated memory around.

Again, all this wrapper code is used for typechecking only. The runtime code generated by ATS has no counts of live strings and looks almost exactly like the original C code.

Conclusion

Wrapping a C library from ATS gives an appreciation of how difficult manual resource management can be when there is no help from the type system. Coming up with ways to use the type system to make the usage safer makes you more aware when using C of exactly when you should be asking the question "Who owns this object/memory?".

Some of the methods in this post come from a discussion I started on the ATS mailing list. Read that thread for more on the ideas presented here.

There are other things that can be done to improve the use of jansson from ATS. Some of the jansson API functions return an integer representing success or failure. Checking of these values can be enforced. I describe a way to do this in my earlier article. The jansson API allows iteration over JSON objects. These should be wrapped to enforce safety guarantees for the lifetime of the iterator, modification of the object during iteration, etc.

Another possible improvement might be to ensure that incorrect API calls can't be made on JSON objects of the wrong type. For example, calling 'json_string_value' on an array type. This would force asserting the 'json_type' is checked before calling the type specific functions.

I've put my wrapper for the jansson library on github at http://github.com/doublec/ats-jansson. It's still a work in progress but can be used to try out some of the ideas I've written about here.

Tags


This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/


Tags

Archives
Links