Created
March 28, 2011 01:25
-
-
Save bbatha/889846 to your computer and use it in GitHub Desktop.
Java Byte Code to Parrot Byte code Proposal for GSoC 2011
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Ben Batha | |
Email: [email protected] | |
Google ID: applied under: elektronjunge | |
prefered contact: bhbatha | |
Other contact info: bbatha on irc.parrot.org #parrot | |
Your Project's Title: Java Byte Code to Parrot | |
Abstract | |
The focus of this project is to build a translator from Java Byte Code to Parrot friendly targets (PIR). | |
By building a parser of JVM code and using a translation table where possible and AST based code generation | |
where the table falls short. The resulting compiler should pair with an existing Java Standard library for | |
new code development and run previously written JVM applications. | |
Benefits to the Parrot VM and Open Source Community | |
This project would be a boon to both the Parrot and larger open source community. The JVM is becoming a | |
target for many languages due to the popularity of Java, unfortunately the JVM is largely target for Java | |
development and is limited for other languages. For instance, the JVM does not support dynamic typing | |
which makes porting many scripting languages to the JVM difficult. Parrot can provide the same role as the | |
JVM in many cases, but with out the weight of Java and Oracle to drag it down. Doing a direct translation | |
of Java byte code would guarantee compatibility of existing Java applications and would include support for | |
existing JVM languages like Scala and Groovy "for free." This foundation though not optimal for Java ports | |
allows for simple porting of old code and guarantees a robust solution that can keep up with the development | |
of new JVM languages. | |
Deliverables | |
A library with a table and functions for conversion from Java bytecode to PIR. | |
A tool for .class file conversion. This tool has several pieces which are described in detail in the Project | |
Details section. | |
A tool for .jar file conversion. This may or may not be completed due to time constraints. | |
A test suite with example libraries and programs that have been translated using JUnit and Rossella where | |
appropriate. | |
Complete documentation with a particular focus on how to extend the project so future improvements can be | |
made using Javadoc and included html pages. | |
Build infrastructure for major platforms. Development focus will be Mac OSX and linux. | |
Project Details | |
The project will consist of three pieces. | |
1. A table based translator -- this will be broken into several pieces | |
a) expression converter -- converts assignments, register allocations, operations, etc. | |
b) method converter -- converts calls and definitions | |
c) class converter -- converts class definitions and object creation | |
d) common pool converter -- converts the Java common pool to Parrot compatible types and fixes UTF-8 | |
strings. | |
2. Class and Jar file conversion tools | |
3. Test and build suite | |
The table based translator will be implemented as a library designed so access to PIR statements is simple | |
so if future projects like a CLI converter or a Java bytecode to PBC library does not have to rewrite code. | |
The class file converter will use the library to find appropriate translations for read in files and output | |
the translation. The jar file converter will use the class file converter on the contained classes and | |
resolve classpath needed to compile the jar and the manifest of the jar into Parrot equivalents. | |
The test and build suite will have tests for a variety of conversions to Java. These will be based on the | |
how the project works and will be chosen at later date. There will be tests that test each completed feature | |
of the converter. The build system will be completed to build on Unix based systems with Parrot and Java | |
and will integrate with the tests. | |
<Problem Resolution> | |
1. Plan ahead to foresee potential problems | |
2. Interface with Parrot and Java compiler communities for assistance | |
3. Try different implementation of problematic code sections | |
4. Revert to backup plan | |
<Backup plan> | |
If it seems impossible to do a direct translation via a table, I will need to develop a compiler to parse | |
Java bytecode and generate an AST and generate code from there. Tentatively, this can be done using PAST and | |
other Parrot tools. Criteria for making this decision will be based on number of incompatible features and | |
the scope these features. The scope of these problems will be determined by how other translations are | |
Project Schedule | |
March 27 - April 8: | |
Finish proposal. Get familiar with Java bytecode specification and code bases. Begin to get familiar with | |
Parrot. Discuss proposal with mentors. | |
April 8 - May 23: | |
Begin familiarizing myself with Parrot internals and Java bytecode specification. The best rout for the | |
development of a parser. Determine viability of each of the Java standards (Java 7 versus Java 6). Discuss | |
these with mentor and the Parrot community. Find contacts in the Java community to discuss with as well. | |
Begin to find places where direct translation is impossible. | |
May 23: Begin work. | |
May 30: Write file reading framework. Begin expression translation. | |
Milestone: Reads java code. Translates basic expressions. Document this. | |
June 6: Finish expression translation. Control flow translation. | |
Milestone: Finish up work from previous week. Expression and control flow translation. Tests to go with | |
this. Documentation matches completed work. | |
June 13: Potential absence for the work week, I should be around on the weekends to make up for any | |
major set back to the project. The worst case is I will be a week behind. If there is no absence then I can | |
begin work on more advanced Java features sooner, or have more time for debugging, etc. | |
June 20: Begin working on method translation. | |
Milestone: Translation of basic method translation. | |
June 27: Method translation of more complex cases (polymorphism, etc.). Begin work on class translation. | |
Milestone: Translation of complex methods. Documentation for method translator. Test suite includes | |
tests of method translation. | |
July 4: Continue work on translation of classes. | |
Milestone: Simple class translation (no inheritance, interface, static abstract, etc). | |
July 11: Interim review. Finish class translation for more complex cases. | |
Milestone: Basic code should convert properly. Test suite for classes and previous functions. | |
Documentation to this point. | |
July 18: Begin work on conversion of the common pool. Finish remaining work on class translation and | |
documentation, and tests. | |
Milestone: Classes complete. | |
July 25: Continue work on converting the common pool. Begin finding algorithm to de-mangle UTF-8 | |
strings. | |
Milestone: non-UTF-8 strings convert properly. Tests to test strings of different types. | |
August 1: Begin work on Jar file conversion. | |
Milestone: Proper conversion of UTF-8 strings, update tests and documentation to reflect changes. | |
August 8: Make a build system around jar file conversion. Find or create test code to test beyond | |
JUnit tests. Finish any incomplete sections of the project. | |
Milestone: Finish jar file conversion. | |
August 15: FEATURE FREEZE. Finish debugging and continue to find and create test cases. | |
Milestone: Complete build and test system. Complete documentation | |
August 22: Final submission | |
Milestone: Complete GSOC project. | |
References and Likely Mentors | |
??? | |
IRC Discussions: | |
Discussed with Whiteknight | |
License | |
All Parrot projects should use the Artistic 2.0 license. | |
Bio | |
I am a sophomore computer science student at the University of Rochester. I have taken the following | |
computer science classes: | |
Programming Language Design and Implementation, this covered the design of compiler front ends and basics of | |
optimization. In this class I wrote a compiler for C. | |
Computation and Formal Systems, | |
Data Structures and Algorithms | |
Artificial Intelligence | |
Discrete Math | |
Linear Algebra & Differential Equations | |
Calculus I & II | |
I have experience with Java, C/C++, Python, Ruby, FORTRAN, and a host of small scientific languages, and I | |
am familiar with Linux/Unix. While I do not have familiarity with open source development I do use a variety | |
of pieces of open source soft ware and have participated in private sector software development. In the | |
past, I have been responsible in the past for projects of this size in my previous employment at Los Alamos | |
National Laboratory, where over three summers I completed three projects: | |
1) Rewrote an old hydrodynamic code from FORTRAN77 to FORTRAN95. | |
2) Developed a graphing tool for use with a rad-hydro code | |
3) Developed a graphing tool and scripting interface for use with a different rad-hydro code. | |
Eligibility | |
I am an eligible student who is a U.S. citizen and has the documentation to prove it. | |
See my most recent comments on https://gist.github.com/889902. Those comments are relevant to this proposal as well.
Hi is the code ready? Can we somewhere see the code? Or if it is not ready when can we see it?
This project wasn't accepted to GSOC. I had another job so I did no work on it.
hmm...too sad. anyways the project did looked promising to me. Well best of luck to you and thanks for your reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Do we need anything as heavy as a real "parser" to read bytecode? It seems like once we have a tool for reading bytecode and finding the code segments, we would be able to walk the bytecode in a linear way to do table-based translations.
PBC is too hard a target to use for this project, we don't have a good PBC generation tool besides Parrot itself (Creating a good library to do that would be the same size as a GSoC project, and might even make a good project idea in it's own right). What is your target language going to be? PIR? Something higher like winxed/NQP? Explain your decision.
On June 13 you mention that potential absence. Explain how that will affect your schedule.
Break up some of the deliverables into smaller milestones: How long is it going to take you to read the constant pool and translate that? The class definitions?
You mention you might fall back from a table-based translator to a PAST-based translator. When will you make that decision? What will be your criteria for deciding? The earlier, the better because this is a major change in goals. You should probably provide two timelines since the two projects will be very different.
Also, is this translator going to work on .class files? .jar files?
Java Class files do weird mangling of UTF-8 strings in the constant pool. Is un-mangling these string literals something that is part of your goals for the summer, or do we leave that for a later time (I ask because this could turn out to be non-trivial, and we want to know what your priorities are in case time gets tight).