Skip to content

Instantly share code, notes, and snippets.

@pete
Created February 4, 2009 06:13
Show Gist options
  • Save pete/57980 to your computer and use it in GitHub Desktop.
Save pete/57980 to your computer and use it in GitHub Desktop.
/*
Summary paragraph for this big comment block:
This is a test for only compiling the text segment of a program, manually
"linking" (by which I mean just generating the file with addresses from the
host program) so that no data section is needed. I think this is a more
correct approach, as we only need the compiler to compile code; data
allocation, symbol resolution, etc., can/should be done by Roboto proper.
Details:
This is, I believe, the final proof-of-concept needed for the Roboto
compiler. No idea if this has occurred to anyone else before, but following
the line of reasoning that
1. All of the data is generated by the Roboto compiler, and thus can be
allocated and kept in the host binary (the one that does the
compiling and loading), so there's no need for a data section. We
just allocate the data internally, and #define (for readability,
rather than stuffing raw addresses all over) the address in the
generated code.
2. Since the generated code will only be calling functions defined by
Roboto's and C's standard libraries, we can do exactly the same
thing for functions that we do for (say) strings, meaning no linking
step is needed at all. We can use #define as an ad-hoc "linking
step".
3. If we eliminate .data and .bss altogether, then all the compiler has
to do is turn portable C into assembly, removing machine dependence
from Roboto.
leads us to a straightforward, easy to understand, non-hackish, and very
portable solution to the problem: generate a C file that doesn't require
data/bss sections, compile it, dump the text section, copy it to memory, and
run it from there. It at least runs on Linux under x86-64 and ARM, so I
think any problems for other chips or Unix-like OSs will be minor and easy to
fix. (For example, I imagine that the semantics around mprotect may be
different on OS X. I plan on checking this out tonight.)
So, no bytecode VM is necessary to run code dynamically and portably. (Just
between you and me, I was actually getting pretty worried about this today.
I was almost ready to cross this approach off and decide on having Roboto
emit either Forth or bytecode for LLVM, the JVM, Rubinius, or Squeak.)
Drawbacks of this approach:
1. Until I come up with a better way than the pretty mindless objcopy
method, one function per object file.
2. Without some hackery (which may be necessary...I think
dlopen("/proc/$pid/exe") would do the trick), if we want a
string-to-function name mapping that is readable, we need that
mapping at compile time. I don't think this is as hard as it might
look.
3. Roboto won't run without a working C compiler and a place to dump
temp files for the intermediate compile steps. I have a couple of
solutions in mind, but those are miles down the road and for the
beta release, Roboto will likely only work under Unix. (Doesn't run
on Windows? Did I list that as a drawback?)
Please note, by the way, that SECURITY/STABILITY ARE NOT EVEN SORT OF TAKEN
INTO CONSIDERATION below, and everything that doesn't go towards proving the
concept is sloppy and hard-coded. It is a proof-of-concept (and a pretty
trivial one at that), after all. Nonetheless, it should be (somewhat)
readable, portable, and functional. (Fun game, kids: find the race
condition that potentially allows another local user to run code as you!
Hint: it requires that they be able to write to the directory that you run
this program in or that (your_umask & 022) != 022. Did I give it away?)
*/
#include <stdlib.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <malloc.h>
#include <time.h>
#include <string.h>
#include <unistd.h>
#define ENGUGH 0x1000 // A tribute
char str[ENGUGH];
int yay_it_worked(const char* str) {
return printf("Yay, it worked! HERE IS MY STRING: %s\n", str);
}
int write_c_file()
{
FILE *f = NULL;
f = fopen("auto.c", "w");
if(!f) exit(__LINE__);
fprintf(f,
"/* Built at %d. */\n"
"#define yay_it_worked ((int (*)(char *))0x%lx)\n"
"#define teleo ((void (*)(int))0x%lx)\n"
"#define str ((char *)0x%lx)\n"
"\n"
""
"void function() {\n"
"\tyay_it_worked(str);\n"
"\tteleo(0);\n"
"}\n"
,
time(NULL),
yay_it_worked, exit, str
);
fclose(f);
return 0;
}
int build_c_file() {
system("gcc -Os -c -fPIC auto.c -o auto.o");
return system("objcopy -O binary -j .text auto.o auto.bin");
}
void *load_c_file() {
long page_size = sysconf(_SC_PAGE_SIZE);
int i, fd;
void *page = NULL;
page = memalign(page_size, page_size); if(!page) exit(__LINE__);
fd = open("auto.bin", O_RDONLY); if(fd < 0) exit(__LINE__);
i = read(fd, page, page_size); if(i < 0) exit(__LINE__);
close(fd);
printf(__FILE__ ":%d: Function compiled to %d bytes.\n", __LINE__, i);
i = mprotect(page, page_size, PROT_READ | PROT_EXEC);
if(i < 0) exit(__LINE__);
return page;
}
int main(int argc, char *argv[]) {
void (*f)() = NULL;
strcpy(str, "\"Testing, testing...\"");
write_c_file();
build_c_file();
f = load_c_file();
f();
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment