Dynamic Code Generation and Image Files
An image based programming language like Factor stores executable code in an image file. When Factor starts up it loads the image and starts executing the code stored in it.
I wanted to explore some ideas with images and dynamic code generation so wrote a simple example on how to generate assembly, store it in an image and later load and run it. The same principle is used to generate machine code at runtime and execute it.
As a start I wanted to generate the equivalent of a simple function in the x86 architecture that returns an integer value. The equivalent of:
Allocating a memory area with 'malloc', loading the image into it, and then casting the memory pointer to a function pointer and calling it seems the obvious step but it won't work.
Modern CPUs distinguish between code and data. Code cannot be executed from within a memory area which is not marked as executable. We need to allocate a memory area specifically marked as being executable. The way to do this is different between operating systems. For this example I'm running on Windows and VirtualAlloc is the needed call:
The following pseudo-ish code loads the image, and calls it:
Real code is a bit more complicated than that simple function. It has branches and jumps. To handle this I made the image file format more structured. At the beginning is the size of the generated machine code, the machine code data itself and then some relocation records.
The runtime loaded in C loads the image data first, then walks through the relocation records patching up any references in the code to the address where the image was loaded. This needs to be done if there are any absolute jumps or references in the image to other parts of the image as the image could be loaded at a different address from where it was first generated.
The relocation records exist as a number of 'labels'. Each label contains the offset in the image file for that label, and a number of addresses that reference the label. Those references are updated to be the actual address of the label. Something like this:
I wrote some Javascript code to generate the machine code. It's not a full X86 assembler in Javascript, but just enough to generate the 3-4 instructions I use as a proof of concept. The Javascript to generate the machine code looks like this:
This assembler is very simple. It basically stores the generated bytes for each instruction in an array in the Image object. When saved the Image object writes the generated bytes and the relocation data if there is any.
The javascript code is in image.js, x86assembler.js, armassembler.js and generate.js. It can be run with the Rhino Javascript interpreter. The full distribution is in image1.tar.gz and includes the Rhino Javascript interpreter JAR file. A darcs repository with the code if you want to play with it and send patches is at:
It's an interesting exercise to generate this sort of thing from scratch to get an idea of how systems like Factor do it. An interesting approach might be to use Factor to generate small images for use on memory constrained devices with a tiny C runtime loader.
Although I haven't tried it this might work on Symbian devices as well. Symbian devices are ARM based but their applications are really DLLs that are loaded by the main phone application. By writing a DLL C loader that reads an executes the image file, or different images for each exposed virtual function from the DLL, you could bootstrap a simple Symbian program.
The code I wrote here is very much 'hack away until it worked' code. The assemblers in particular are ugly but it should be just enough to give you an idea of how it works if you want to go off and do something similar.
Categories: javascript, programming, wince
I wanted to explore some ideas with images and dynamic code generation so wrote a simple example on how to generate assembly, store it in an image and later load and run it. The same principle is used to generate machine code at runtime and execute it.
As a start I wanted to generate the equivalent of a simple function in the x86 architecture that returns an integer value. The equivalent of:
int func(void) { return 42; }To see the assembly code this generates I used gcc to compile it and saved the temporary files:gcc -c -save-temps test.cThe relevant part of the output was:
_func:Using objdump I looked at the generated machine code with:
pushl %ebp
movl %esp, %ebp
movl $42, %eax
popl %ebp
ret
objdump -D test.oGiving:
00000000 <_func>:For the first simple test I just wrote some code that output the relevant bytes into a file called 'image.x86'. Now I needed a program to load and execute the code.
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: b8 2a 00 00 00 mov $0x2a,%eax
8: 5d pop %ebp
9: c3 ret
Allocating a memory area with 'malloc', loading the image into it, and then casting the memory pointer to a function pointer and calling it seems the obvious step but it won't work.
Modern CPUs distinguish between code and data. Code cannot be executed from within a memory area which is not marked as executable. We need to allocate a memory area specifically marked as being executable. The way to do this is different between operating systems. For this example I'm running on Windows and VirtualAlloc is the needed call:
void* allocate_code_buffer(int size) {
return VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
}
void free_code_buffer(void* addr) {
VirtualFree(addr, 0, MEM_RELEASE);
}
void flush_icache() {
FlushInstructionCache(GetCurrentProcess(), 0, 0);
}
Also needed is VirtualFree to free the memory area when we no longer need it. Modern CPUs also cache some memory areas. If the memory area where the generated code is stored is cached then our new code in that area won't be seen by the CPU unless we flush the cache. That's what 'FlushInstructionCache' does. This isn't actually needed on x86 machines but on ARM it is. Windows on x86 provides the function anyway and it's good practice to call it so the code is portable.The following pseudo-ish code loads the image, and calls it:
void* load_image(FILE* file) {
void* image = 0;
image = allocate_code_buffer(1024);
fread(image, 1024, 1, file);
flush_icache();
return image;
}
typedef int (*entry_func)(void);
void run_image(void* image) {
entry_func func = (entry_func)image;
printf("Result is: %d\n", func());
}
FILE* file = fopen("image.x86", "rb");
void* image = load_image(file);
fclose(file);
run_image(image);
free_image(image);Once the small C loader is written it can be used with any generated image files. Real code is a bit more complicated than that simple function. It has branches and jumps. To handle this I made the image file format more structured. At the beginning is the size of the generated machine code, the machine code data itself and then some relocation records.
The runtime loaded in C loads the image data first, then walks through the relocation records patching up any references in the code to the address where the image was loaded. This needs to be done if there are any absolute jumps or references in the image to other parts of the image as the image could be loaded at a different address from where it was first generated.
The relocation records exist as a number of 'labels'. Each label contains the offset in the image file for that label, and a number of addresses that reference the label. Those references are updated to be the actual address of the label. Something like this:
void relocate_image(FILE* file, void* image) {
unsigned long num_labels = 0;
int i = 0;
fread(&num_labels, sizeof(num_labels), 1, file);
for(i=0;i<num_labels;++i) {
unsigned long offset = 0;
unsigned long num_references = 0;
int j = 0;
fread(&offset, sizeof(offset), 1, file);
fread(&num_references, sizeof(num_references), 1, file);
for(j=0;j<num_references;++j) {
unsigned long ref = 0;
unsigned char* bimage = (unsigned char*)image;
fread(&ref, sizeof(ref), 1, file);
*(bimage+ref)=bimage+offset;
}
}
}I didn't fully implement jumps in this simple example but to prove the concept I do relocations for 'near relative' jumps. These could actually be computed before writing the image but I did it as part of the image relocation as I'd need to add that for absolute jumps later anyway.I wrote some Javascript code to generate the machine code. It's not a full X86 assembler in Javascript, but just enough to generate the 3-4 instructions I use as a proof of concept. The Javascript to generate the machine code looks like this:
var image = new Image(LITTLEENDIAN);This snippet generates code to load EAX with 42, EBX with 5, and jumps to the label 'end' which results in returning from the function. If you remove the jump, or comment it out, it adds the two numbers together before returning. The resulting image file can be run with the C loaded and prints out the integer value returned by this snippet (which is the result stored in the EAX register).
var assembler = new X86Assembler(image);
assembler.move(42, EAX)
.move(5, EBX)
.jmpNear("end")
.adc(EBX,EAX)
.label("end")
.ret();
image.save("image.x86");
This assembler is very simple. It basically stores the generated bytes for each instruction in an array in the Image object. When saved the Image object writes the generated bytes and the relocation data if there is any.
The javascript code is in image.js, x86assembler.js, armassembler.js and generate.js. It can be run with the Rhino Javascript interpreter. The full distribution is in image1.tar.gz and includes the Rhino Javascript interpreter JAR file. A darcs repository with the code if you want to play with it and send patches is at:
darcs get http://www.bluishcoder.co.nz/repos/image1A simple test run is:
java -jar js.jar generate.jsI also did an ARM version which can be tested on a Windows CE machine. I tested it on the Microsoft Windows Mobile 5 Emulator with run.c compiled with 'cegcc'. In that case I ran the test.arm image which is simple code generated with:
gcc -o run_image run.c
run_image image.x86
var image = new Image(LITTLEENDIAN);This code returns '42' in the R0 register. It's exactly the code generated with the test function described before on x86, but for ARM, and I worked it out using the same compile/objdump method but using cegcc for the compiler.
var assembler = new ARMAssembler(image);
assembler.moveRsRd(arm.SP, arm.IP)
.stmfd(arm.SP, [1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0])
.sub(arm.IP, arm.FP, 4)
.moveImmediate(42, arm.R3)
.moveRsRd(arm.R3, arm.R0)
.ldmfd(arm.SP, [1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0]);
image.save("image.arm");
It's an interesting exercise to generate this sort of thing from scratch to get an idea of how systems like Factor do it. An interesting approach might be to use Factor to generate small images for use on memory constrained devices with a tiny C runtime loader.
Although I haven't tried it this might work on Symbian devices as well. Symbian devices are ARM based but their applications are really DLLs that are loaded by the main phone application. By writing a DLL C loader that reads an executes the image file, or different images for each exposed virtual function from the DLL, you could bootstrap a simple Symbian program.
The code I wrote here is very much 'hack away until it worked' code. The assemblers in particular are ugly but it should be just enough to give you an idea of how it works if you want to go off and do something similar.
Categories: javascript, programming, wince

12 Comments:
wow, the code generation from both of those compilers really sucks!
x86 only needs "mov #42,eax;ret" and ARM only needs "mov #42,r0;branch to link register"
To Gliderguider:
C:\temp>cat 42.c
int func(void) { return 42;}
C:\temp>gcc -S -fomit-frame-pointer 42.c -o 42.s
C:\temp>cat 42.s
.file "42.c"
.text
.globl _func
.def _func; .scl 2; .type 32; .endef
_func:
movl $42, %eax
ret
Thanks anonymous. glider, I didn't use any optimisation switches at all, hence the extra generated code.
Did you consider generating a platform-format shared library instead of creating your own format for relocations etc?
Generating a platform specific shared library did cross my mind.
One of the things that kept going through my head as I was doing the relocation code was "doesn't the platform already do this for me?".
I'm exploring image based environments like Smalltalk, etc which allow saving images as works in progress.
So I need to eventually be able to store references to live objects, handle persisting and restoring network connections, etc.
I could probably fit all that into a shared library as well by writing out the specific format for the platform and including the code to do the restoring of the objects - it's something to look at.
There is at least one variant of Smalltalk that generates shared libraries rather than an image but I don't know if they save a running 'world' with live objects.
One downside might be that virus protection programs tend to get active when they see DLL's and EXE's change - which they will when the image is resaved.
Wouldn't it be simpler to use any of those runtime code generators available? GNU lightning and libjit come to mind.
Also, have you looked at Sassy ?
Yes it would be simpler, but then I wouldn't learn how to do it for myself. For 'real world' usage I'd probably go for libjit, etc.
One issue would be whether those tools allowed persisting the code to an image. I'm sure at least one of them probably allows that sort of thing.
Sassy looks interesting, thanks for the link!
Browsing libjit's documentation I see they do have a way of storing an on-disk representation of the compiled code using elf format, nice:
You've not got much spare time, obviously, but I do wish you'd ask Nokia to give you an n800 so as to get this working there, also!
If a Nokia N800 came my way i'd definitely work on getting it going on that!
It's an ARM CPU so should be pretty easy to get going. The only change would be the memory allocation routines - probably to use mmap if the N800 has that.
Here's another runtime assembler:
http://www.flipcode.com/cgi-bin/fcarticles.cgi?show=64097
Cheers!
This was a very insightful post for me. I've been trying to figure out the whole dynamic code generation/execution concept for a while now, but I guess staring at source code implementations (including factor's) didn't make the penny drop.
Thank you for bringing knowledge to the ignorant! :)
Post a Comment
<< Home