Skip to content

Instantly share code, notes, and snippets.

@Wattenberger
Last active January 13, 2024 14:41
Show Gist options
  • Save Wattenberger/77242e463b2b850aaddd02d08b158e9a to your computer and use it in GitHub Desktop.
Save Wattenberger/77242e463b2b850aaddd02d08b158e9a to your computer and use it in GitHub Desktop.
#!/bin/bash
# This script will download the contents of a GitHub repo
# and place them in a local directory.
#
# Usage:
# download-repo.sh <repo> <output-path> <nested-path> <branch-name>
#
# Example:
# download-repo.sh wattenberger/kumiko ./kumiko-assets master public/assets
#
# You'll get rate-limited by GitHub, so create a PAT here:
# https://github.com/settings/tokens
# This will also let you download from private repos.
GITHUB_TOKEN="YOUR_TOKEN_HERE"
repo=$1
# split repo name to username and repository name
repo_name=`echo $repo | cut -d/ -f2`
repo_user=`echo $repo | cut -d/ -f1`
output_path=$2
nested_path=$3
branch_name=$4
# if no branch_name is given, use main or master or the first one listed
if [ -z "$branch_name" ]; then
# get branches from repo
branches_string=`curl -s -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/repos/${repo_user}/${repo_name}/branches`
branches=`echo ${branches_string} | jq -r '.[] | .name'`
if [[ ${branches} == *"main"* ]]; then
branch_name="main"
elif [[ ${branches} == *"master"* ]]; then
branch_name="master"
else
branch_name=`echo ${branches_string} | jq -r '.[0] | .name'`
fi
echo "fetching from branch ${branch_name}"
fi
# if no output_path is given, use the repo name
if [ -z "$output_path" ]; then
output_path="./${repo_name}"
fi
url="https://api.github.com/repos/${repo}/git/trees/${branch_name}?recursive=1"
# fetch repo data
full_tree_string=`curl -s -H "Authorization: token ${GITHUB_TOKEN}" "${url}"`
# get paths where type is not tree
paths=`echo ${full_tree_string} | jq -r '.tree[] | select(.type != "tree") | .path'`
# if no paths found, exit
if [ -z "${paths}" ]; then
echo "No files found in this repo, more info at ${url}"
exit 1
fi
# filter out lines that don't start with nested_path and remove nested_path prefix
paths=`echo "${paths}" | grep -E "^${nested_path}" | sed "s|^${nested_path}||g"`
number_of_paths=`echo "${paths}" | wc -l | sed "s/^[ \t]*//"`
echo "Found ${number_of_paths} files, fetching contents..."
mkdir -p "${output_path}/"
set -o noclobber
# fetch contents for each line in paths
for path in ${paths}; do
echo "Fetching ${path}..."
url="https://raw.githubusercontent.com/${repo}/master/${nested_path}${path}"
path_without_filename=$(dirname "/${path}")
full_path="${output_path}${path_without_filename}"
mkdir -p "${full_path}/"
# download and save file from url
curl -s -H "Authorization: token ${GITHUB_TOKEN}" "${url}" > "${output_path}/${path}"
done
echo "All set! 🌈"
@gr2m
Copy link

gr2m commented Jul 22, 2021

Hi Amelia, I tried

./download-repo.sh gr2m/sandbox . data

but it only created the empty "data" folder, it didn't download the file octokit.csv file?

I also get a jq error

jq: error (at :1): Cannot iterate over null (null)

I also tried the same command from your tweet, but got the same result

./download-repo.sh wattenberger/kumiko ./kumiko-assets public/assets

@gr2m
Copy link

gr2m commented Jul 22, 2021

Also I think that you cannot download assets from private repositories via https://raw.githubusercontent.com, if you hit the raw button on a file in a private repository, a ?token=... gets appended to the https://raw.githubusercontent.com/{repo}/{branch}/{path} URL. Without it you'll get a 404

I tested it and it works with private repositories 🎉

@Wattenberger
Copy link
Author

ah thanks for the flag - update the code so that it looks for a specified, or main, or master branch

@gr2m
Copy link

gr2m commented Jul 22, 2021

works now 🎉

I also totally missed that I have to set the GITHUB_TOKEN value. If I leave the default value it will not send an unauthenticated request, it will send invalid authentication.

And download from private repositories worked too, so please ignore my comment above. Very cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment