GitHub actions enable users to automate their infrastructure and run code check, tests, builds, model training, etc, on code chance, such as Pull Requests and merges.
The Faculty platform has several task that can be automated in this fashion, and this guide aims to give some initial guidance how to get started with integrating the two services. Will look into choosing the right type of runner, installing a self-hosted runner on the platform itself, and showing an example use case of triggering jobs on PRs.
There are two kinds of "runners" available for GitHub actions:
- the public, GitHub-hosted runners
- self-hosted runner
There's a trade-off between them, mainly as follows in the case of the Faculty Platform.
Public runners are managed, run, and maintained by GitHub, thus they are always up-to date. On the other hand, they are run in GitHub's own infrastructure, and thus if interacting with the Faculty Platform, there's a need for additional administration to whitelist the relevant GitHub infrastructure. The actions themselves also need more detailed setup to provide the runners the relevant authorization keys to be able issue commands.
Self-hosted runners can live inside the existing Faculty Platform infrastructure, in fact run as "apps" on the platform. This makes it very easy to set up (as apps already have all the required libraries and tokens to interact with the platform itself), while on the other hand requires ongoing maintenance (updating the runner software). It is also recommended that self-hosted runners are used only with private repositories (as in public repositories anyone would be able to control the code run on your runner by simply opening a PR, and would be able to potentially extract secrets from your environment).
Thus we recommend using self-hosted runners with private code repositories, especially when the Faculty Platform deployment is firewalled from the rest of the Internet.
You can install and use self-hosted runners as well directly on the platform. This only works for the Javascript-type actions (the "docker" types need access to the docker socket, which is not a great idea nor really possible currently).
Let's say one wants to create Github Actions that interact with one of the projects (say running jobs, etc). The easiest is the following setup:
- enable third party actions for the given Github repo
- on Faculty, run a single server in the project you want to apply the action for (for ease of use)
- and using that server go through the the self-hosted runner setup (find it at at the "Settings/Actions" setting of the repo
These steps go first enabling third party actions:
Start a small server in the project that you want to add a runner to, as the first steps need to add a few things to your workspace. Follow the setup steps when "Add runner" for Linux:
... in the terminal of the server on the Faculty Platform:
The config.sh
script has a bunch of settings, eg. setting the name and not asking any other questions, compared to the default one show in the "Add runner" popup. See more when using config.sh —help
./config.sh --url ... --token ... --name "somename" --unattended
As per the above console screenshot, we have our runner created:
Now to start it, have to run run.sh. That can now be done from a Custom app:
Which should then show:
Here I'll outline one specific use case for GitHub Actions with the platform & its setup. It should give an inspiration for setting up other situations as well.
In our case we'd like to using self-hosted runners, trigger a specific job on the Faculty platform for each Pull Request received.
The basic setup outlined in the connection diagram below:
Conceptually the following flow happens:
- a GitHub runner is deployed as a app inside the project that it is going to interact with. It is listening to updates from GitHub, for a specific repository.
- when GitHub receives a new PR, GitHub will check the defined Action workflows (
faculty.yml
below), and allows the relevant (e.g. self-hosted) runners to pick up that change. Since the runner is polling GitHub, it doesn't need to be accessible from the wider Internet to receive notifications. - the runner will receive the steps to be taken from the workflow, including code checkout, scripts run, etc.
- in our case, the workflow uses a Python script (
jobrun.py
later), which will do the actual job trigger and monitoring (including setting the status of the GitHub action succeeded or failed based on the job run). - the script triggers a pre-setup job, but through a special script (
basic-job-action.sh
below). When the job is run, that task is now not within the runner, but in a server spun up by the Faculty Platform as jobs are normally done. That extra script can take a commit variable to check out the relevant code on the job server, using a deployment key (deployment_ssh_key
below), and run the actual job with the remaining parameters (the actual job issomejob.sh
below)
We are using the following files in a repository (and examples for these files are attached to this gist):
├── .github
│ └── workflows
│ └── faculty.yml
├── deployment_ssh_key
├── jobs
│ ├── basic-job-action.sh
│ └── somejob.sh
└── workflow
└── jobrun.py
The faculty.yml
file sets what actions GitHub Actions will take. For more details, can check the relevant
GitHub documentation
as well. The name of the file is arbitrary, here we've chosen it to be easier to distinguish. See the attached
example
In that file:
name: Faculty
is a name, that will be shown in GitHub, an arbitrary value. The on
sections sets when will the action trigger:
on:
push:
branches:
- master
pull_request:
branches:
- master
results in pushes and pull request that are targeting the master
branch.
The actual job definition is:
jobs:
jobrun-selfhosted:
name: Trigger Job on Self-hosted Runner
runs-on: self-hosted
env:
FACULTY_JOB_NAME: ${{ secrets.FACULTY_JOB_NAME }}
steps:
- uses: actions/checkout@v2
# We already have python/pip/... installed
- name: Python version
run: python -V
- name: Run a job
run: python workflow/jobrun.py
Here the action selects to run it on a self-hosted runner. The job name (jobrun-selfhosted
) is arbitrary,
just has to be unique. The name is a description shown later in GitHub, such as this:
The env
section uses GitHub secrets
to pass on information, such as the Job name, but this is optional, and can be hard coded in this case as well.
Here we are adding that name in the given repository's "Settings > Secrets" section:
The last part of the workflow are the steps taken in the action, which includes checking out the code,
logging the Python version used (optional), and running the actual payload, jobrun.py
.
The logs from each of the workflow jobs can be expanded in the GitHub interface, and can see for example a view
like this (where the steps are visible, plus here expanded the logs from jobrun.py
, which is described in the
next section):
The attached example works with a job set up as this:
with command as:
bash jobs/basic-job-action.sh "$COMMIT" "$MESSAGE" "$CYCLES"
where the COMMIT
is a value for to be used by the code checkout, while the other parameters are passed on to the
actual job somejob.sh
, as described later.
jobrun.py
then follows the following flow:
- loads the relevant environment variables:
- project ID, from the default env vars on a Faculty environment,
- job name, set by the workflow as shown above,
- commit-ish value (here the PR's branch name, most often in practice), set automatically by GitHub Actions
- resolves the job ID from the job's name
- sets up the jobs, here it's an array run with two settings:
parameter_value_sets = [
{"COMMIT": commit, "MESSAGE": "automating", "CYCLES": "10"},
{"COMMIT": commit, "MESSAGE": "automating", "CYCLES": "15"},
]
- triggers the run with the given parameters
run_id = job_client.create_run(project_id, myjob.id, parameter_value_sets)
- waits for it to finish
while run_data.state not in COMPLETED_RUN_STATES:
run_data = job_client.get_run(project_id, myjob.id, run_id)
sleep(1)
- if the run was successful, it returns a success, otherwise (failed, cancelled) it will return a failure in GitHub and shows the result.
if run_data.state == RunState.COMPLETED:
print("Job completed successfully.")
else:
sys.exit(f"Job has not not finished correctly: {run_data.state}")
The role of this wrapper is to check out the given state of the repository, and run the actual job with the settings passed on (see the attached example)
This requires one piece of additional setup, a deployment SSH key, so that the job will be able to pull the code from the (private) repository.
Start up a server in the given project, and run:
ssh-keygen -t ed25519 -f /project/deployment_ssh_key -N ""
which will generate a new key with an empty passphrase:
(Python3) /project$ ssh-keygen -t ed25519 -f /project/deployment_ssh_key -N ""
Generating public/private ed25519 key pair.
Your identification has been saved in /project/deployment_ssh_key.
Your public key has been saved in /project/deployment_ssh_key.pub.
The key fingerprint is:
SHA256:p0c2enRYpYX7FrDCmTrUlRdU7T3PimIJBrA53DNCVjU faculty@cube-a83ccbb7-0ff9-4eb5-bea5-c4ea797aadcc-554588db99-fr26f
The key's randomart image is:
+--[ED25519 256]--+
| ...E +=oo|
| + . =+. .|
| + = o +o= ..|
| * = . *oo ..o|
| o =S.B... oo|
| =B o o o|
| .ooo. o . |
| o+ . . |
| . . |
+----[SHA256]-----+
(Python3) /project$ cat deployment_ssh_key.pub
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIML7ONplcN/rlynNZccUDFlapQLpVBKQ/9I56XsKHMZY faculty@cube-a83ccbb7-0ff9-4eb5-bea5
-c4ea797aadcc-554588db99-fr26f
Then copy the contents of deployment_ssh_key.pub
and add it as a new deploy key in the "Settings > Deploy Keys"
section of your GitHub repository:
and save it:
The jobs wrapper will then set up to use that key and the given repository to pull the code when a job is triggered:
COMMIT=$1
REMOTE="[email protected]:imrehg/faculty-github-actions.git"
DEPLOYMENT_KEY_PATH="/project/deployment_ssh_key"
# Private repo related setup
export GIT_SSH_COMMAND="/usr/bin/ssh -i ${DEPLOYMENT_KEY_PATH} -o StrictHostKeyChecking=no"
where the REMOTE
value needs to be updated to the correct repository's SSH clone link, and if different name
is used for the key file, then that can be changed too.
The next section of the wrapper will clone the code to /code
, checks out the given commit, and the rest of the
steps is as your job requires it:
- if any Python requirements are needed to be installed, it can do that step
- call the actual job script with the remaining command line flags, here:
bash jobs/somejob.sh "${@:2}"
Note this script is running before the given code is checked out, so it's kept simple, and has to be hosted in the workspace before it can be used. Also, any changes to the script will only take effect if they are present in the workspace.
This part completely depends on your application, here it's a very simple example just to show running something and using the passed flags correctly. It receives 2 variables (message, and cycles), and will just idle as many times as many cycles, with the log lines prepended by the message.
In practical uses most likely your job is a Pythin script, and thus in the basic-job-action.sh
wrapper the call
would be like that, instead of bash ...
, rather python somejob.jy "${@:2}"
, etc.
When everyting is set up, a new PR will trigger a job like this:
with the correct parameters, such as shown here for the example setup:
and the given run's log is is available (both the wrapper and the actual job's logs)