This gist contains a shell.nix
file that can be used to create a Python environment for running training jobs in the GCP AI Platform.
This is specifically for the following tutorial:
https://cloud.google.com/ai-platform/docs/getting-started-keras
This uses code from https://github.com/GoogleCloudPlatform/cloudml-samples in the census/tf-keras
directory.
However, this shell.nix
file should be able to be modified to work for almost training job.
First run nix-shell
to get into the shell.
Then, install all required python Packages:
$ pip install -r requirements.txt
If you add additional Python packages to the buildInputs
line in the shell.nix
file, you should be able to use these system-level packages, instead of having to download them.
Now you should be able to actually run training:
$ python3 -m trainer.task --job-dir local-training-output
The tutorial linked above recommends using the following command, however, this fails when using gcloud
from Nixpkgs :-\
$ gcloud ai-platform local train --package-path trainer --module-name trainer.task --job-dir local-training-output
As long as training directly with python
works, you can launch a training job on the GCP AI Platform.
First, you need to login with gcloud
using OAuth:
$ gcloud auth login
Set your default project name so you don't have to specify it in each command below:
$ gcloud config set project inner-melody-274800
Next, you need to create a bucket to store the trained models:
$ BUCKET_NAME="my-training-example-task-3"
$ REGION="us-central1"
$ gsutil mb -l $REGION gs://$BUCKET_NAME
Finally, actually launch the training job:
$ gcloud ai-platform jobs submit training "my_first_keras_job" --package-path trainer/ --module-name trainer.task --region $REGION --python-version 3.7 --runtime-version 1.15 --job-dir "gs://$BUCKET_NAME/keras-job-dir" --stream-logs