Lorien: A Hyper-Automated Tuning System for Tensor Operators

Lorien is a system built on the top of TVM to massively explore/benchmark the best schedule configs of TOPI schedules.

Motivation

Although TVM already has a TOPI (TVM Operator Inventory) with the implementations of algorithm and schedules for commonly used operators such as conv2d and dense, there is a challenge makes TOPI hard to be improved efficiently.

The best schedule of TOPI is stored in TopHub, which is a JSON file in GitHub. However, it has the following problems.

Storing all schedules in a single text file has low accessibility and scalability. Every time AutoTVM has to load an entire JSON file in order to find only one schedule config for a workload.
The coverage of workloads and platforms are insufficient in the current version. For example, the latest TopHub covers only 690 workloads for CUDA backend, including conv2, depthwise conv2d, and 5 GPU models.
Comparing to TVM that has several commits everyday, TopHub is not frequently updated. As a result, some schedule configs are out-of-date and cannot achieve good performance anymore.

Since it is impractical to use TVM CI to benchmark the performance for every pull request, we need a separate system to regularly benchmark and update the stored schedule configs.

Commandline Interface and Example Usages

The system has a complete CLI with hierarchical commands. All commands can also be specified in a config file in YAML format, and use a prefix "@" to expand them. See the following examples for CLI usages, and configs/samples for example configurations. Note the the complete description of each command can be retrieved by the help command:

python3 -m lorien <commands> -h

Extract workloads from a Gluon CV model.

# gcv_models.yaml
gcv:
  - alexnet:
    data: [1, 3, 224, 224]
  - InceptionV3:
    data: [1, 3, 299, 299]

python3 -m lorien generate extract-from-model @gcv_models.yaml --target llvm

Extract workloads from a TF model.

# tf_models.yaml
tf:
  - mobilenet.pb:
    Placeholder: [1, 224, 224, 3]

python3 -m lorien generate extract-from-model @tf_modes.yaml --target llvm

Accept a list of workloads and mutate them to generate new workloads.

python3 -m lorien generate mutate @workloads.yaml @rules.yaml

Tune workloads with RPC workers.

See the tutorial.

Tune workloads with AWS batch.

See the tutorial.

System Requirements

Python 3.6+
Amazon DynamoDB (local or aws): DynamoDB is used for storing and maintain the tuned schedules. You can choose to either of the following:
1. Launch a local version using JVM on your machine, and specify endpoint URL (e.g. --db "endpoint_url: http://<your IP>:8000") when invoking a tuning procses.
2. Configure AWS credential on your machine to directly use AWS DynamoDB service. In this case, you do not have to specify any argument in tuning configurations.
AWS S3 (optional): S3 is used to store the full tuning logs (JSON files generated by AutoTVM). If you specify --commit-log-to bucket_name and configure an AWS credential on your machine, then all complete tuning logs will be uploaded to the S3 bucket for debugging or research prupose. Note that this is an optional requirement, so you can ignore the --commit-log-to argument if you do not want to keep full tuning logs.
AWS Batch (AWS ECR): You have to set up AWS batch computation environments, job queues, and job definitions in advance to use Lorien AWS batch worker for tuning. See this blog post for reference. You may also need to build an upload Lorien docker images to AWS ECR as the AWS batch job running container.

Docker Images

You can directly make use of pre-built Lorien docker images on Docker Hub, which includes two typs of images for CPU and CPU+CUDA platforms. The docker images have TVM and deployed so you can launch a tuning process in the container after cloning Lorien (we have to ask you to manually clone Lorien because of the private repo). The docker image is also used for Lorien CI purpose.

Documentation

https://comaniac.github.io/lorien/

comaniac/lorien_readme.md