Saturday, September 22, 2007

How to publish a Git repository

This post is aimed mainly at Factor developers who need to make their repository accessible to Slava to retrieve patches. Factor has recently moved to using git as the version control system.

To set up a repository on a server you should clone the existing Factor repository using the '--bare' option:
git clone --bare http://www.factorcode.org/git/factor.git factor.git
A bare repository is one without a checked out working copy of the code. It only contains the git database. As a general rule you should never push into a repository that contains changes in the working copy. To ensure this doesn't happen, we're making the server repository a bare repository - it has no working copy.

Copy the 'factor.git' directory onto your server. I put it in '/git/factor.git'. Now if you have changes on your local machine that you want to push to your repository you can use something like:
git push yourname@yourserver.com:/git/factor.git
If you want to push changes from a specific branch in your local repository:
git push yourname@yourserver.com:/git/factor.git mybranch:master
To publish the remote repository you have two options. You can publish via the HTTP protocol, or via the git protocol. The first is slower but usable by people behind restrictive firewalls, while the second is more efficient but requires an open port. I suggest doing both.

To publish via HTTP, you must make the file 'hooks/post-update' executable:
chmod +x /git/factor.git/hooks/post-update
This gets executed whenever something is pushed to the repository. It runs a command 'git-update-server-info' which updates some files that makes the HTTP retrieval work. You should also run this once manually:
cd /git/factor.git
git-update-server-info
Now make the /git directory published via your webserver (I symbolic link to it in the server's doc-root). People can pull from the repository with:
git pull http://yourserver.com/git/factor.git
To set up the git protocol you need to run the 'git-daemon' command. You pass it a directory which is the root of your git repositories. It will make public all git repositories underneath that root that have the file 'git-daemon-export-ok' in it. So first create this file:
touch /git/factor.git/git-daemon-export-ok
Run the daemon with:
git-daemon --verbose /git
The '--verbose' will give you output showing the results of connecting to it. I run this from within a screen session. You can set it up to automatically run using whatever features your server OS has. Now people can retrieve via the git protocol:
git pull git://yourserver.com/git/factor.git


My repository is accessible from both protocols:
git clone http://www.double.co.nz/git/factor.git
git clone git://double.co.nz/git/factor.git
You can also browse it using gitweb.

Categories: ,

Monday, September 17, 2007

Vodka: An Interesting Concurrent Language

Recently on Lambda the Ultimate there was an announcement about the Vodka programming language. The language is a masters thesis project by Tiark Rompf and has some interesting ideas.

It is designed to be a concurrency oriented language, based on the ideas of Join Calculus and Petri Nets. It has multimethod dispatch (like CLOS, Dylan and Nice), generators and many interesting features of other languages can be implemented as libraries.

The Vodka web site has some examples:The language itself is explained in Tiark's thesis document and a reference implementation is in an SVN repository. There is a discussion group about it at google groups.

Categories:

Sunday, September 16, 2007

Erlang on the OpenMoko Cellphone

Tony Garnock-Jones has got Erlang running on the OpenMoko phone:
Running the interactive erlang shell on a cellphone is pretty cool. Erlang’s built-in clustering support works fine: I’ve successfully connected an erlang node on my pc to a node on the phone using the USB ethernet support the phone provides.


Categories:

Thursday, September 06, 2007

Git, Binary Files and Cherry Picking Patches

Steve Dekorte has some things he dislikes about git. This post is how I work around these issues in my own git repositories.

Git has a heuristic for detecting binary files. You can force other file types to be binary by adding a .gitattributes file to your repository. This file contains a list of glob patterns, followed by attributes to be applied to files matching those patterns. By adding .gitattributes to the repository all cloned repositories will pick this up as well.

For example, if you want all *.foo files to be treated as binary files you can have this line in .gitattributes:
*.foo -crlf -diff -merge
This will mean all files with a .foo extension will not have carriage return/line feed translations done, won't be diffed and merges will result in conflicts leaving the original file untouched.

Now when you pull from another repository that has changes to a .foo file you'll see something like:
 test.foo |  Bin 32 -> 36 bytes
Note that it shows it is a binary file. If you pull from another repository with changes to test.foo you'll get:
Auto-merged test.foo
CONFLICT (content): Merge conflict in test.foo
The file will be untouched and you can change it manually to be the correct version. Either by leaving it untouched, or copying a new file over it. Then you need to commit the merge conflict fix (even if you left the file untouched):
git commit -a -m "Fix merge conflict in test.foo"
The cherry picking of patches works differently to Darcs. There are a couple of ways of handling this, but I use 'git cherry-pick'. If you have a number of contributers with their own repositories that you regularly pull from you can set up remote tracking branches:
git remote add john http://...
git remote add mary http://...
Now when you want John and Mary's most recent patches you can fetch them:
git fetch john
git fetch mary
This does not make any changes to your local branches. It gets and stores their changes in a separate remote tracking branch. If you want to see what John has changed, compared to yours:
git log -p master..john/master
From there you can decide to pull in all John's commits:
git merge john/master
If you want one commit, but not its dependencies then this is where 'cherry-pick' is used.

Given a commit id, 'cherry-pick' will take the patch for that commit and apply it to your current branch. It's used like:
git cherry-pick abcdefgh
This creates a commit with a different commit id than the original, but with the same contents. It needs to be a different id as it doesn't have the same dependencies as the original.

If you decide later you want all John's commits and do a merge which includes the commit that you cherry picked from you might expect conflicts. Git handles this case fine and does an automatic merge, noticing the patches are the same. So it effectively gives you the same functionality as Darcs selective patch pulling, but not as nice a user interface.

Categories:

Wednesday, September 05, 2007

Distributed Channels in Factor

Following on from my Channels implementation, I've now added 'Remote Channels'. These are distributed channels that allow you to access channels in separate Factor instances, even on different machines on the network. It's based on my Distributed Concurrency work.

A channel can be made accessible by remote Factor nodes using the 'publish' word. Given a channel this will return a long Id value that can be used by remote nodes to use the channel. For example:
<channel> [ sieve ] spawn drop publish .
=> "ID12345678901234567890....."
From a remote node you can create a <remote-channel> which contains the hostname and port of the node containing the channel, and the Id of that channel:
"foo.com" 9000 <node> "ID1234..." <remote-channel>
You can use 'from' and 'to' on the remote channel exactly as you can on normal channels. The data is marshalled over the network using the serialization library.
Remote channels are implemented using distributed concurrency so you must start a node on the Factor instance you are using. This is done with 'start-node' giving the hostname and port:
"foo.com" 9000 start-node
Once this is done all published channels become available. Note that the hostname and port must be accessible by the remote machine so it can connect to send the data you request.

As an experiment I published the prime number sieve example mentioned in my last post. It's running on one of my servers. To make it easy to create a <remote-channel> without needing to know the hostname and port I serialized the <remote-channel> instance, saved it in a file and made it available as [...server down sorry...].

You can load this file into Factor, deserialize it and get the <remote-channel> instance. You can then call 'from' on it to get the next prime number in the series. Until they get so big that my Factor instance is DOS'd of course! The code to do this is:
USING: serialization http.client 
channels.remote concurrency.distributed ;

"yourhostname-or-ip-address.com" 9000 start-server
"[server-down-sorry]/prime.ser" http-get-stream 2nip
[ deserialize ] with-stream
dup from .
dup from .
...etc...
The '9000' can be any port number openly accessible on your machine. A current Factor bug means you may get an error in 'start-server' about an address already assigned if you run Linux. This is due to an interaction with ipv6 - you can ignore it, the server will start fine. 'start-server' needs to be run whenever you start our Factor instance.

The 'dup from .' duplicates the <remote-channel>, gets the next number from it and prints it. It may not be in sequence as other users may have gotten the next number before you.

There is a lot of room for improvement and additions to the code. Feel free to hack at it and send in patches. Let me know some ideas on how this could be used in 'real world' applications.

Categories:

Tuesday, September 04, 2007

Concurrent Channels for Factor

Rob Pike gave a talk at Google about the NewSqueak programming language, and specifically how it's concurrency features work. NewSqueak uses channels and is based on the concepts of Communicating Sequential Processes.

To play around with some of the ideas in the video I created a quick implementation of channels for Factor and converted a couple of the NewSqueak examples.

The <channel> word creates a channel that threads can send data to and receive data from. The 'to' word sends data to a channel. It is a synchronous send and blocks waiting for a receiver if there is none. The 'from' word receives data from a channel. It will block if there is no sender.

There can be multiple senders waiting, and if a thread then receives on the channel, a random sender will be released to send its data. There can also be multiple receivers blocking. A random one is selected to receive the data when a sender becomes available.

I'm not sure if I've got the best stack effect ordering for the words but it gives something to experiment with and work out the best interface. I've not yet done a 'select' or 'mux' words.

Here is the 'counter' example ported from NewSqueak:
: (counter) ( channel n -- )
[ swap to ] 2keep 1+ (counter) ;

: counter ( channel -- )
2 (counter) ;

: counter-test ( -- n1 n2 n3 )
[ counter ] spawn drop
[ from ] keep [ from ] keep from ;
Given a channel, the 'counter' word will send numbers to it, incrementing them by one, starting from two. 'counter-test' creates a channel, spawns a process to run 'counter', and then receives a few values from the channel.

It'll be interesting to compare how well CSP works in a concatenative language vs the message passing concurrency library I implemented previously. The CSP implementation is simple enough that I should be able to get it going in the Factor->Javascript system fairly easily. That would give CSP on the browser.

Categories: