Checkout https://github.com/rapidsai/gpu-xb-ai:
git clone https://github.com/rapidsai/gpu-xb-ai
Create a conda environment from conda/environments/gpu-xb-ai-legate-all.yaml
:
conda env create -f conda/environments/gpu-xb-ai-legate-all.yaml
This environment should contain all the dependencies needed. If not, please report what is missing.
Make sure you can access the following repositories (if needed, ask in #swrapids-legate to get access):
- https://github.com/nv-legate/legate.core.internal
- https://github.com/nv-legate/cunumeric.internal
- https://github.com/rapidsai/legate-raft
- https://github.com/rapidsai/legate-dataframe
- https://github.com/rapidsai/legate-boost/
You will need to checkout a copy of each of these repositories, set the contents to a particular
commit for each and then build them. To find the commit ID for each repository check the Dockerfile.legate
file. A way to checkout one of the repositories and set it to a particular commit is:
mkdir legate.core.internal
cd legate.core.internal
git init
git remote add origin <GitHub URL of repo>
git fetch origin <commitID>
git reset --hard FETCH_HEAD
After cloning each repostiory you will heave to build them. The best order to build them in is the order
in which they are listed above. To see the exact command to build each project with take a look at
Dockerfile.legate
after line 165. This is how things are built inside the docker image.
Make sure to activate the gpu-xb-ai-legate-all
conda environment before building.
Once everything is built you should be able to run the benchmark with
legate use_cases/uc10/legate.py --workdir /tmp --stage=training --iterations=4 /opt/gpu-xb-ai/data/sf-800.0/uc10/train/ /lustre/fsw/nvr_legate/sebastianb/gpu-xb-ai/data/sf-800.0/uc10/train/
The last argument is the directory that contains the input data. /lustre/fsw/nvr_legate/sebastianb/gpu-xb-ai/data/sf-800.0/uc10/train/
is the
path that you can use on EOS, but if you are somewhere else you need to put the data somewhere else.