Created
February 4, 2009 06:13
-
-
Save pete/57980 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /* | |
| Summary paragraph for this big comment block: | |
| This is a test for only compiling the text segment of a program, manually | |
| "linking" (by which I mean just generating the file with addresses from the | |
| host program) so that no data section is needed. I think this is a more | |
| correct approach, as we only need the compiler to compile code; data | |
| allocation, symbol resolution, etc., can/should be done by Roboto proper. | |
| Details: | |
| This is, I believe, the final proof-of-concept needed for the Roboto | |
| compiler. No idea if this has occurred to anyone else before, but following | |
| the line of reasoning that | |
| 1. All of the data is generated by the Roboto compiler, and thus can be | |
| allocated and kept in the host binary (the one that does the | |
| compiling and loading), so there's no need for a data section. We | |
| just allocate the data internally, and #define (for readability, | |
| rather than stuffing raw addresses all over) the address in the | |
| generated code. | |
| 2. Since the generated code will only be calling functions defined by | |
| Roboto's and C's standard libraries, we can do exactly the same | |
| thing for functions that we do for (say) strings, meaning no linking | |
| step is needed at all. We can use #define as an ad-hoc "linking | |
| step". | |
| 3. If we eliminate .data and .bss altogether, then all the compiler has | |
| to do is turn portable C into assembly, removing machine dependence | |
| from Roboto. | |
| leads us to a straightforward, easy to understand, non-hackish, and very | |
| portable solution to the problem: generate a C file that doesn't require | |
| data/bss sections, compile it, dump the text section, copy it to memory, and | |
| run it from there. It at least runs on Linux under x86-64 and ARM, so I | |
| think any problems for other chips or Unix-like OSs will be minor and easy to | |
| fix. (For example, I imagine that the semantics around mprotect may be | |
| different on OS X. I plan on checking this out tonight.) | |
| So, no bytecode VM is necessary to run code dynamically and portably. (Just | |
| between you and me, I was actually getting pretty worried about this today. | |
| I was almost ready to cross this approach off and decide on having Roboto | |
| emit either Forth or bytecode for LLVM, the JVM, Rubinius, or Squeak.) | |
| Drawbacks of this approach: | |
| 1. Until I come up with a better way than the pretty mindless objcopy | |
| method, one function per object file. | |
| 2. Without some hackery (which may be necessary...I think | |
| dlopen("/proc/$pid/exe") would do the trick), if we want a | |
| string-to-function name mapping that is readable, we need that | |
| mapping at compile time. I don't think this is as hard as it might | |
| look. | |
| 3. Roboto won't run without a working C compiler and a place to dump | |
| temp files for the intermediate compile steps. I have a couple of | |
| solutions in mind, but those are miles down the road and for the | |
| beta release, Roboto will likely only work under Unix. (Doesn't run | |
| on Windows? Did I list that as a drawback?) | |
| Please note, by the way, that SECURITY/STABILITY ARE NOT EVEN SORT OF TAKEN | |
| INTO CONSIDERATION below, and everything that doesn't go towards proving the | |
| concept is sloppy and hard-coded. It is a proof-of-concept (and a pretty | |
| trivial one at that), after all. Nonetheless, it should be (somewhat) | |
| readable, portable, and functional. (Fun game, kids: find the race | |
| condition that potentially allows another local user to run code as you! | |
| Hint: it requires that they be able to write to the directory that you run | |
| this program in or that (your_umask & 022) != 022. Did I give it away?) | |
| */ | |
| #include <stdlib.h> | |
| #include <stdio.h> | |
| #include <sys/stat.h> | |
| #include <sys/types.h> | |
| #include <sys/mman.h> | |
| #include <fcntl.h> | |
| #include <malloc.h> | |
| #include <time.h> | |
| #include <string.h> | |
| #include <unistd.h> | |
| #define ENGUGH 0x1000 // A tribute | |
| char str[ENGUGH]; | |
| int yay_it_worked(const char* str) { | |
| return printf("Yay, it worked! HERE IS MY STRING: %s\n", str); | |
| } | |
| int write_c_file() | |
| { | |
| FILE *f = NULL; | |
| f = fopen("auto.c", "w"); | |
| if(!f) exit(__LINE__); | |
| fprintf(f, | |
| "/* Built at %d. */\n" | |
| "#define yay_it_worked ((int (*)(char *))0x%lx)\n" | |
| "#define teleo ((void (*)(int))0x%lx)\n" | |
| "#define str ((char *)0x%lx)\n" | |
| "\n" | |
| "" | |
| "void function() {\n" | |
| "\tyay_it_worked(str);\n" | |
| "\tteleo(0);\n" | |
| "}\n" | |
| , | |
| time(NULL), | |
| yay_it_worked, exit, str | |
| ); | |
| fclose(f); | |
| return 0; | |
| } | |
| int build_c_file() { | |
| system("gcc -Os -c -fPIC auto.c -o auto.o"); | |
| return system("objcopy -O binary -j .text auto.o auto.bin"); | |
| } | |
| void *load_c_file() { | |
| long page_size = sysconf(_SC_PAGE_SIZE); | |
| int i, fd; | |
| void *page = NULL; | |
| page = memalign(page_size, page_size); if(!page) exit(__LINE__); | |
| fd = open("auto.bin", O_RDONLY); if(fd < 0) exit(__LINE__); | |
| i = read(fd, page, page_size); if(i < 0) exit(__LINE__); | |
| close(fd); | |
| printf(__FILE__ ":%d: Function compiled to %d bytes.\n", __LINE__, i); | |
| i = mprotect(page, page_size, PROT_READ | PROT_EXEC); | |
| if(i < 0) exit(__LINE__); | |
| return page; | |
| } | |
| int main(int argc, char *argv[]) { | |
| void (*f)() = NULL; | |
| strcpy(str, "\"Testing, testing...\""); | |
| write_c_file(); | |
| build_c_file(); | |
| f = load_c_file(); | |
| f(); | |
| return 0; | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment