Created
August 8, 2011 18:13
-
-
Save brixen/1132332 to your computer and use it in GitHub Desktop.
Proposal to change .rbc files to use only the rbc.db mechanism
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Rubinius uses .rbc files to cache on disk the compiled bytecode for a Ruby | |
source code file. Typically, these cache files exist alongside the | |
corresponding .rb file, however, it is possible to collect all the cache files | |
into a single directory (and subdirectories) by hashing the full path to the | |
Ruby source file as a key to find a file in the cache directory. | |
In Rubinius 2.0, we have multiple language modes. The bytecode for 1.8 | |
language mode differs from the bytecode for 1.9 language mode. The .rbc file | |
format was extended to include language version. This ensures that running the | |
same Ruby file in different modes will not use the wrong version of bytecode. | |
When Rubinius is installed, we pre-compile all the Ruby files in the lib/ | |
directory. This ensures that if Rubinius is installed to a directory where a | |
user does not have write access, the cache files will still be created and can | |
be used to speed loading of standard library code. | |
If the .rbc files are placed alongside the .rb files, the existing arrangement | |
must be changed to provide different .rbc files depending on language mode. In | |
other words, just versioning the .rbc file is no longer sufficient as the | |
version of the .rbc files created for lib/**/*.rb files would be one or the | |
other. The same situation exists for the pre-installed gems, which are not | |
split into different gem directories for 1.8 and 1.9 mode. | |
An additional problem with creating .rbc files alongside the .rb files is that | |
people object to cluttering their source with the cache files. | |
There have repeatedly been requests for distributing Ruby applications without | |
Ruby source code. The existing .rbc files can be used for this, but are quite | |
primitive and don't provide easy ability to abstract other storage | |
configurations (eg encryption). | |
Finally, there are potentially numerous advantages to storing the compilation | |
cache in a proper database that would permit storing a great deal of | |
additional metadata for building tools for Ruby. Abstracting the cache from | |
the existing .rbc files to the directory of files using the -Xrbc.db option is | |
a good first step. | |
To summarize the problems with the existing .rbc mechanism: | |
1. Multiple different files are required to permit .rbc files in different | |
language modes to exist alongside a single .rb file, as is the case with | |
pre-compiling the standard library files on install. | |
2. People object to the files cluttering their source code. | |
3. The files don't easily permit extending them to store additional, valuable | |
metadata. | |
4. Related to 3, the files don't provide a suitably powerful mechanism for | |
distributing Ruby applications without source code. | |
The existing -Xrbc.db option is a direct replacement for storing the .rbc | |
files alongside the .rb files and immediately solves problem #1 above. One | |
issue with the rbx.db option is what to provide for a default value. This is | |
my proposal: | |
1. If the user explicitly provides a path with -Xrbc.db, cache all files in | |
that path. | |
2. If the user does not provide a path, use two separate paths as follows: | |
a. on boot, record the current working directory (referred to as CWD below). | |
b. if the file being loaded has CWD as a prefix, store the cache for the | |
file in CWD/.rbx/<wherever> | |
c. if the file being loaded does not have CWD as a prefix, store the cache | |
for the file in ~/.rbx/<wherever> | |
3. When hashing the file path to determine the cache file, add the language | |
mode so that 1.8 and 1.9 files are separated. This does not replace the use | |
of the language version information embedded in the .rbc format, but avoids | |
recompile thrashing for e.g. running the specs under 1.8 mode and then | |
under 1.9 mode. | |
4. Only read and write to the cache if the cache directory is owned by the | |
user. This avoids a potential security hole where a superuser could be | |
running bytecode that was put into the cache maliciously and prevents the | |
superuser from creating files that the user would not be able to overwrite. | |
With these changes above, we have a reasonable default for all files. The | |
standard library files cache would exist in ~/.rbx/, which is reasonable for a | |
file installed with Rubinius that isn't going to be changing. The application | |
files would by default be cached with the application directory, but would not | |
liter files where source code files are. If the user explicitly requests a | |
rbc.db directory, all files are written there, but are still segregated based | |
on language version. | |
As a related but separate change, since we have full Ruby concurrency in | |
Rubinius 2.0, I propose making the Writer stage of the bytecode compiler use a | |
separate thread. Once the CompiledMethod is created, it is enqueued for | |
writing to the cache and immediately returned. The program can start executing | |
the method while the separate cache thread figures out where to put it and | |
marshals the contents to disk. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment