Skip to content

Instantly share code, notes, and snippets.

@cotto
Created April 27, 2012 08:02
Show Gist options
  • Save cotto/2507245 to your computer and use it in GitHub Desktop.
Save cotto/2507245 to your computer and use it in GitHub Desktop.
PACT gsoc propsal

Project Description

There are three main portions to this project: a library to represent Parrot bytecode (PBC) as a set of classes and compile them to a file, an assembly language that uses that library, and a disassembler that produces the new assembly language. The assembler and disassembler act as proof of concepts for the library, but should be useful tools instead of just toys.

The library should have classes that represent a packfile, subs, and instructions (opcode and arguments). These classes should be used as the model that the library can convert to and from normal PBC files. This provides a far more friendly interface than the standard Packfile PMCs.

The assembly language is intended to be a simple textual representation of bytecode. It mirrors the multiple segments of the packfile itself: a PMC constant section, a float constant section, then a bytecode section. The bytecode section will contain sub markers and labels so that programmers don't have to determine opcode counts themselves. In all other ways, it should be a direct representation of the bytecode.

The disassembler should take a PBC file and output the language for the assembler. The goal is that the output could be passed to the assembler and create an identical bytecode file. Relation to Parrot

Parrot's low level tools are not in great shape. There is no format to directly represent the contents of a packfile. PASM used to fill this role, but has not been kept up to date. PIR contains many assumptions and hides many details of Parrot's calling conventions (PCC). In addition the PIR compiler, IMCC, is difficult to alter and maintain. This project would form the framework for eventually depreciating PIR and IMCC. In the short term, this format could be very useful for high-level language (HLL) writers to debug the output of their compilers.

This project relies heavily on the PMCs used to introspect and interact with the low levels of the Parrot VM. In particular, the following PMCs will be used:

Packfile and related PMCs read and write PBC files
Oplib and Opcode to find information about Parrot opcodes
Key for the opcodes that rely on them

Tools and Technologies Used

Parrot's ecosystem has been evolving quickly and provides many useful libraries and tools. Rather than starting from scratch, I will use:

Winxed as the primary language for its balance of low level and high expressiveness.
nqp-rx for a grammar to parse the assembly language
Rosella for build and test infrastructure

A github repository already exists for the PACT project, I will commit to that. Delivered Results

In addition to the library, assembler, and disassembler the following will be created:

POD documentation of all classes
Unit tests for each feature in the library
A document that describes the language for the assembler
POD pages to describe the usage of the assembler and disassembler

Project Timeline

This schedule is written in terms of milestones, so the work listed on each date will be done in the week(s) prior.

May 23 GSoC Start Date

May 30 Improvements to Key PMC: Creation/introspection of keys with register contents. This both a useful improvement to Parrot on its own and useful for later portions of this project.

June 6 Build infrastructure: Create a framework that will compile the library to PBC, build documentation, and run unit tests.

June 13 Basic Classes: The classes that represent the packfile. Write documentation and code.

June 20 Disassemble to Objects: Write a program that disassembles PBC files to the above classes to prove that they can accurately represent a packfile. Can use the disasm.winxed file from PACT as a basis.

June 27 Unit Tests: Write unit tests that check for PIR to objects and objects to bytecode. Half of these should work.

July 4: Vacation

July 11: Midterm Evaluations: Basic packfile creation. The goal is to get as many of the unit tests working as possible.

July 18 Language Design: Design the format for the assembly language. Document it.

July 25 Disassembly output: Prove the usefulness of the language by outputting the objects to it. This should create new unit tests based on the ones from the original disassembly.

August 1 Basic Assembly: Parse basic assembly and output PBC files

August 8 More Features: Add convenience features to assembly language.

August 15 Suggested 'pencils down' date: Additional features, if time allows.

August 22 Firm 'pencils down' date: Code cleanup, documentation additions. Allowances

The design of the textual assembly language can be made simpler if earlier portions of the project run long. In the worst case, the project could finish with just the library to enable the assembler. It should be fairly simple to add a front end after GSoC is over. Additional Ideas

If goals are met ahead of schedule, there are a number of features that could be added to the assembly language:

Naming of registers
Basic handling of function calls (PCC)
Annotations
Macros
Immediate subs to create PMC constants
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment