Note:
- Default GH token does not allow reads from other repos. I use GH App to auth the action.
- GH search API has vicious rate limits, 3s sleep is not enough, or I am getting labelled as a bot. WTF Microsoft?
- This will open one issue, listing all the images, in a table
|repo|dockerfile|image|. It should process multi-stage dockerfiles. - the way it finds dockerfiles is dumb - find anything with
dockerfilein name, findFROMline... works fine on my computer. I
name: List docker images
on:
schedule:
- cron: '0 8 * * *' # 8am utc/midnight-late night in the US
workflow_dispatch:
inputs:
repo_scope:
description: "scope to what repos"
required: true
default: "default"
type: choice
options:
- default
- all
jobs:
list-dockerfiles:
runs-on: ubuntu-latest
permissions:
issues: write
contents: read
steps:
- name: generate token
id: generate_token
uses: actions/create-github-app-token@v2
with:
app-id: ${{ secrets.SECURITY_REPORTER_APP_ID }}
private-key: ${{ secrets.SECURITY_REPORTER_PRIVATE_KEY }}
owner: ${{ github.repository_owner }}
- name: Search and List Dockerfiles
env:
GH_TOKEN: ${{ steps.generate_token.outputs.token }}
ORG: ${{ github.repository_owner }}
SCOPE: ${{ github.event.inputs.repo_scope }}
#DEFAULT_REPOS: "myreponame"
DEFAULT_REPOS: ${{ github.event.repository.name }} # repo where the action runs
run: |
```
```bash
set -euo pipefail
summary() { echo -e "$*" | tee -a "$GITHUB_STEP_SUMMARY"; }
report() { echo -e "$*" | tee -a report.md; }
report "## Dockerfile Audit Results"
report "| Repository | Path | Base Image (FROM) |"
report "|------------|------|-------------------|"
summary "SCOPE: $SCOPE"
if [ "$SCOPE" == "all" ]; then
REPOS=$(gh repo list "$ORG" --limit 200 --json name)
else
REPOS=$(printf '{"name":"%s"}' "$DEFAULT_REPOS" | jq -s '.')
fi
summary "\nREPOS: $REPOS"
echo "$REPOS" | jq -r '.[].name' | while read -r REPO; do
[ -z "$REPO" ] && continue;
summary "### Processing \`$REPO\`"
RAW_RESPONSE=$(gh api "search/code?q=filename:Dockerfile+repo:$ORG/$REPO" )
{
echo "<details>"
echo "<summary>Raw Response from GH Search API</summary>"
echo ""
echo "\`\`\`json"
echo "$RAW_RESPONSE"
echo "\`\`\`"
echo ""
echo "</details>"
} >> "$GITHUB_STEP_SUMMARY"
echo "$RAW_RESPONSE" | jq -r '.items[].path' 2>/dev/null | while read -r FILE_PATH; do
[ -z "FILE_PATH" ] && continue;
if echo "$RAW_RESPONSE" | grep -q "rate limit exceeded"; then
summary "FATAL: Rate limit hit. Exiting."
exit 1
fi
{
echo "<details>"
echo "<summary>Raw Response from GH Search API</summary>"
echo ""
} >> "$GITHUB_STEP_SUMMARY"
# the pattern and the regex are split because escaping regex in the braces is a pain
# bash `test` does not support regex, and I don't want to run that by grep
pattern='[{}[] ]'
[[ "$FILE_PATH" =~ $pattern ]] && continue
echo "-- processing: $FILE_PATH"
echo "getting raw file from: repos/$ORG/$REPO/contents/$FILE_PATH"
CONTENT=$(gh api "repos/$ORG/$REPO/contents/$FILE_PATH" -H "Accept: application/vnd.github.raw" || true)
[ -z "$CONTENT" ] && { echo "ERROR: failed to retrieve file: $FILEPATH"; continue; }
echo "content received, parsing for image name"
IMAGES=$(echo "$CONTENT" | awk 'toupper($1) == "FROM" { if ($2 ~ /^--/) print $3; else print $2 }' | paste -sd "," - | sed 's/,/, /g')
echo "image parsing done, result: $IMAGES"
summary "* **inspecting file**: \`$FILE_PATH\`"
if [ -n "$IMAGES" ]; then
report "| $REPO | \`$FILE_PATH\` | \`$IMAGES\` |"
summary " _found image_: \`$IMAGES\`"
fi
sleep $((4 + RANDOM % 2))
done
{
echo ""
echo "</details>"
} >> "$GITHUB_STEP_SUMMARY"
sleep $((5 + RANDOM % 2))
done
- name: Create Issue
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
gh issue create \
--title "Dockerfile Audit - $(date +'%Y-%m-%d')" \
--body-file report.md \
--repo ${{ github.repository }}