We here at Unacast are using Google Cloud Datalab quite a bit for data analysis and exploration and think it's a great product.
The Version Control experience is a bit clunky to say the least. Either you'll have to use the bundled Ungit web interface, or you have to ssh
and docker
your way into the running Docker container to use the git
CLI. Either way the you'll have to work against a Google Cloud Source Repository as the remote, while we really want to utilize Github's .ipynb
preview functionallity and Pull Request mechanism. Source Repositories do have a Github sync feature, but it only works one way and you have to set it up when you create the repository (which you can't do for datalab repos).
So this is an attempt of setting up some git
tricks to make this workflow a bit smoother and setting up Github as the remote for the project.
First of all it's a bit of a pain to docker exec
into the container from the Compute Engine instance every time you want to use the git
CLI. So I added this alias to my .bashrc
file to go straight to the notebooks
directory.
alias dl="docker exec -it datalab bash -c 'cd /content/datalab/notebooks && bash'"
Then run dl
to open the notebooks
folder.
Now it's time to download and run the datalab-github-remote.sh
script below inside the notebooks
folder.
This will ask you for your Github username, personal access token and what repo you'd like to set up as a remote.
The username is what comes after github.com/
in the url when you look at your profile on Github. A Personal Access Token can be created here and it needs repo
permissions.
Use this command to run the setup script inside the docker container:
bash -c "$(wget -O - https://git.io/vNLh3)"
NB! Always read through scripts that your asked to execute this way to check that they don't do anything malicious.
This adds a few new entries the projects git
config that makes Github the only remote, hence removing the Source Repository that was automatically set up when the Datalab instance was created.
You should now be able to use the configured Github repo as your remote both form the Ungit web interface and from the git
CLI inside the docker container. 🎉