Bundles are immutable files/directories that represent the code, data, and results of an experimental pipeline. There are two ways to create bundles. First, users can upload bundles, datasets in any format or programs in any programming language. Second, users can create "run bundles" by executing shell commands that depend on the contents of previous bundles. A run bundle is specified by a set of bundle dependencies and an arbitrary shell command. This shell command is executed in a docker container in a directory with the dependencies. The contents of the run bundle are the files/directories which are written to the current directory by the shell command. In the end, the dependency graph over bundles precisely captures the research process in an immutable way.
The dependency graph above might result from the user doing the following from their own machine using the CodaLab CLI:
>> ls
cnn.py
mnist/
>> cl upload cnn.py
0x45d17cd82dd4b9d98f05d8c566eb # uploaded a new bundle with this UUID
>> cl upload mnist # directory that contains the two data files
0x1ba22372a6926d82dd4b9d98f05c
>> cl run cnn.py:0x45d17c data:0x1ba223 --name exp2 'python cnn.py data/train.dat data/test.dat' # maps the UUIDs of the previously created bundles to file/directory names in the Docker container in which CodaLab will run the given command
0x2d419245d17cd82dd4b1ba22376
>> cl cat 0x2d4192/stdout # prints the contents of the 'stdout' file in the resulting run bundle
Training model...
Evaluating model...
F1 score: 0.89231
Implement a new CodaLab CLI command, let's say 'cl ancestors', that takes in a bundle spec, and recursively prints out all of the ancestors of the bundle to stdout. In the simple example above, it might look like this:
>> cl ancestors 0x2d4192
- exp2(0x2d4192)
- cnn.py(0x45d17c)
- mnist(0x1ba223)
When a bundle has many great-(great-...)-grand parents, this might be a pretty deeply nested list.
All of our CLI commands are defined in bundle_cli.py
. To define a new command, you will need to define a new method in the BundleCLI
class. Take a look around at how the other commands are defined (do_rm_command()
, which defines the cl rm
command, for example) to get started.
Our CLI talks to the server through a REST API. Unfortunately, our REST API documentation doesn't really exist yet. All of the API endpoints are defined as Bottle routes in the codalab-worksheets/codalab/rest
directory. Most importantly, you will want to look at codalab-worksheets/codalab/rest/bundles.py:_fetch_bundles()
, which defines the GET /rest/bundles
API.
It will also be helpful to poke around in the other CLI commands to see how they work, and use print statements liberally to understand the formats of the JSON responses.
We expect this task to take about two hours. The codebase is quite large and complex, and this task is relatively open-ended, so please don't hesitate to shoot us an email if you have any questions.
- CodaLab Wiki
- REST API Reference (potentially out-of-date)