Saturday, April 29, 2006

Unifying Events and Threads in Haskell

[ Update: 3 May 2006: Source is now available ]

Lambda the Ultimate points to a great paper, A Language-based Approach to Unifying Events and Threads, by Peng Li and Steve Zdancewic.
This paper presents a language based technique to unify two seemingly opposite programming models for building massively concurrent network services: the event driven model and the multithreaded model. The result is a unified concurrency model providing both thread abstractions and event abstractions.

The paper is about the implementation of the unified concurrency model in Haskell. The threads are 'lightweight' threads and according to the paper scale to 10 million threads. The scheduler outperforms NPTL in I/O benchmarks.
  • Monads provide the thread abstraction by defining an imperative sub-language of Haskell with system calls and thread control primitives.
  • Higher-order functions provide the internal representation of threads in continuation passing style.
  • Lazy data structures provide the event abstraction, which is a lazy tree that represents the trace of system calls generated by threads.

Some interesting benchmark results too. One benchmark test they did involved launching 10 million threads that just yield within a loop. This was running on a machine with 1GB set aside for the GHC heap. The per thread cost was only 48 bytes!

Their multiprocessor benchmark, yielded a 3.65 times speedup over a non-MP version version on a 4CPU machine (The library is able to take advantage of SMP machines).

Other interesting things in the paper are a web server they built using this framework and an application level TCP stack written entirely in Haskell.

I hope the source for the things mention in this paper becomes available. In the meantime, you can read it here.

Categories:

2 Comments:

Anonymous Anonymous said...

It sounds wrong to me to have every language reinvent its own lightweight process abstraction and scheduler, possibly even multiple times, given these are already OS services.

Also, if I am not mistaken, in this paper there is still the issue that threads are cooperative: long-running computations need to be split artificially. The type system does not help there. Erlang gets around this by running the whole computation within a VM, so its scheduler can take care of it.

However, if (arbitrary) preemption is introduced, we are back to mutual exclusion problems and synchronization primitives... back to Square One, UNLESS we do away with shared state, and use something like Erlang-style message passing. Of course, that does not get rid of deadlocks and other synchronization errors, but the protocol is explicit at least. Funnily, in my experience it is as easy (or even easier) to formally verify absense of deadlocks and data corruption in a shared-state system as is a message-passing system.

I cannot help but notice that we seem to be running in circles, with different trade-offs, with no optimal solution in sight.

Still, I am wondering when OSes will scale to millions of threads, but without restrictions of computation time and to non-blocking IO.

Another option would perhaps be to go the Exokernel route, and provide thread abstractions and schedulers as library in the first place, possibly specialized for application classes.

5:21 AM  
Blogger Aaron McDaid said...

The paper has moved to here

This update was in the LtU comments.

5:24 AM  

Post a Comment

<< Home