-
-
Save rodw/3073987 to your computer and use it in GitHub Desktop.
#!/bin/bash | |
# A simple script to backup an organization's GitHub repositories. | |
#------------------------------------------------------------------------------- | |
# NOTES: | |
#------------------------------------------------------------------------------- | |
# * Under the heading "CONFIG" below you'll find a number of configuration | |
# parameters that must be personalized for your GitHub account and org. | |
# Replace the `<CHANGE-ME>` strings with the value described in the comments | |
# (or overwrite those values at run-time by providing environment variables). | |
# | |
# * If you have more than 100 repositories, you'll need to step thru the list | |
# of repos returned by GitHub one page at a time, as described at | |
# https://gist.github.com/darktim/5582423 | |
# | |
# * If you want to back up the repos for a USER rather than an ORGANIZATION, | |
# there's a small change needed. See the comment on the `REPOLIST` definition | |
# below (i.e search for "REPOLIST" and make the described change). | |
# | |
# * Thanks to @Calrion, @vnaum, @BartHaagdorens and other commenters below for | |
# various fixes and updates. | |
# | |
# * Also see those comments (and related revisions and forks) for more | |
# information and general troubleshooting. | |
#------------------------------------------------------------------------------- | |
#------------------------------------------------------------------------------- | |
# CONFIG: | |
#------------------------------------------------------------------------------- | |
GHBU_ORG=${GHBU_ORG-"<CHANGE-ME>"} # the GitHub organization whose repos will be backed up | |
# # (if you're backing up a USER's repos, this should be your GitHub username; also see the note below about the `REPOLIST` definition) | |
GHBU_UNAME=${GHBU_UNAME-"<CHANGE-ME>"} # the username of a GitHub account (to use with the GitHub API) | |
GHBU_PASSWD=${GHBU_PASSWD-"<CHANGE-ME>"} # the password for that account | |
#------------------------------------------------------------------------------- | |
GHBU_BACKUP_DIR=${GHBU_BACKUP_DIR-"github-backups"} # where to place the backup files | |
GHBU_GITHOST=${GHBU_GITHOST-"github.com"} # the GitHub hostname (see comments) | |
GHBU_PRUNE_OLD=${GHBU_PRUNE_OLD-true} # when `true`, old backups will be deleted | |
GHBU_PRUNE_AFTER_N_DAYS=${GHBU_PRUNE_AFTER_N_DAYS-3} # the min age (in days) of backup files to delete | |
GHBU_SILENT=${GHBU_SILENT-false} # when `true`, only show error messages | |
GHBU_API=${GHBU_API-"https://api.github.com"} # base URI for the GitHub API | |
GHBU_GIT_CLONE_CMD="git clone --quiet --mirror git@${GHBU_GITHOST}:" # base command to use to clone GitHub repos | |
TSTAMP=`date "+%Y%m%d-%H%M"` # format of timestamp suffix appended to archived files | |
#------------------------------------------------------------------------------- | |
# (end config) | |
#------------------------------------------------------------------------------- | |
# The function `check` will exit the script if the given command fails. | |
function check { | |
"$@" | |
status=$? | |
if [ $status -ne 0 ]; then | |
echo "ERROR: Encountered error (${status}) while running the following:" >&2 | |
echo " $@" >&2 | |
echo " (at line ${BASH_LINENO[0]} of file $0.)" >&2 | |
echo " Aborting." >&2 | |
exit $status | |
fi | |
} | |
# The function `tgz` will create a gzipped tar archive of the specified file ($1) and then remove the original | |
function tgz { | |
check tar zcf $1.tar.gz $1 && check rm -rf $1 | |
} | |
$GHBU_SILENT || (echo "" && echo "=== INITIALIZING ===" && echo "") | |
$GHBU_SILENT || echo "Using backup directory $GHBU_BACKUP_DIR" | |
check mkdir -p $GHBU_BACKUP_DIR | |
$GHBU_SILENT || echo -n "Fetching list of repositories for ${GHBU_ORG}..." | |
REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?per_page=100 -q | check grep "^ \"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'` # hat tip to https://gist.github.com/rodw/3073987#gistcomment-3217943 for the license name workaround | |
# NOTE: if you're backing up a *user's* repos, not an organizations, use this instead: | |
# REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/user/repos -q | check grep "^ \"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'` | |
$GHBU_SILENT || echo "found `echo $REPOLIST | wc -w` repositories." | |
$GHBU_SILENT || (echo "" && echo "=== BACKING UP ===" && echo "") | |
for REPO in $REPOLIST; do | |
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO}" | |
check ${GHBU_GIT_CLONE_CMD}${GHBU_ORG}/${REPO}.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git | |
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO}.wiki (if any)" | |
${GHBU_GIT_CLONE_CMD}${GHBU_ORG}/${REPO}.wiki.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.wiki-${TSTAMP}.git 2>/dev/null && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.wiki-${TSTAMP}.git | |
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO} issues" | |
check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/repos/${GHBU_ORG}/${REPO}/issues -q > ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.issues-${TSTAMP} && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.issues-${TSTAMP} | |
done | |
if $GHBU_PRUNE_OLD; then | |
$GHBU_SILENT || (echo "" && echo "=== PRUNING ===" && echo "") | |
$GHBU_SILENT || echo "Pruning backup files ${GHBU_PRUNE_AFTER_N_DAYS} days old or older." | |
$GHBU_SILENT || echo "Found `find $GHBU_BACKUP_DIR -name '*.tar.gz' -mtime +$GHBU_PRUNE_AFTER_N_DAYS | wc -l` files to prune." | |
find $GHBU_BACKUP_DIR -name '*.tar.gz' -mtime +$GHBU_PRUNE_AFTER_N_DAYS -exec rm -fv {} > /dev/null \; | |
fi | |
$GHBU_SILENT || (echo "" && echo "=== DONE ===" && echo "") | |
$GHBU_SILENT || (echo "GitHub backup completed." && echo "") |
If you don't want to continuously maintain a script, you can use the service Repo Restore to backup github issues without worrying about maintaining your own bash script for every API change.
Thank you 👍
for everyone whos hitting license "name": unless you prefer moving whole thing to jq
parser as mentioned above, checking for nesting level works, too:
--- backup-github.sh.bak 2020-03-19 07:04:36.655778577 +0000
+++ backup-github.sh 2020-03-19 07:14:45.730777602 +0000
@@ -43,7 +43,7 @@
$GHBU_SILENT || echo -n "Fetching list of repositories for ${GHBU_ORG}..."
-REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?per_page=100 -q | check grep "\"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
+REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?per_page=100 -q | check grep "^ \"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
# NOTE: if you're backing up a *user's* repos, not an organizations, use this instead:
# REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/user/repos -q | check grep "\"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
When trying to back up my user, this fails on the repos that I am only a collaborator on, because it assumes they are part of my account. Is there an easy way to have it ignore repos that are not mine?
=== BACKING UP ===
Backing up maxlaumeister/repo-that-is-not-in-my-account
ERROR: Repository not found.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
ERROR: Encountered error (128) while running the following:
git clone --quiet --mirror [email protected]:maxlaumeister/repo-that-is-not-in-my-account.git github-backups/maxlaumeister-repo-that-is-not-in-my-account-20200714-1244.git
(at line 59 of file ./backup-github.sh.)
Aborting.
I don't know why I didn't notice your comments earlier but thanks to @vnaum for the easy fix and to @aktary for the nudge. I've just made those changes now.
I think we probably want the same change in the user-specific variation of the REPOLIST definition since I haven't tested that myself I just added a note about that.
I'm pleasantly surprised so many people have found and continue to find this script useful. It's just something I threw together one afternoon to assuage an IT colleague that was worried about the reliability of a hosted version control platform. It's definitely handy but I never imagined it would be this (moderately) popular or long-lived.
Hi all and big thx for this script,
But I have error when i run this script :
Fetching list of repositories for 3cie...ERROR: Encountered error (1) while running the following:
grep ^ "name"
(at line 43 of file ./backupGithub.sh.)
Aborting.
Dont know why, any idea ? ty
@gponty - You may be hitting this: https://gist.github.com/rodw/3073987#gistcomment-1704365
Ok thank you, it's not anymore possible to connect with user/password but only with user/access token.
EDIT : I confirm, it work with token
I'm using this to backup user repos (as opposed to organisation repos).
I can confirm that the fix noted on line 49 is needed in case of license "name" and that the fix works.
However, a typo slipped into the URL with the last revision: it should be ${GHBU_API}/user/repos
, the latest revision has ${GHBU_API}/users/repos
. With the extra s
, the url no longer works.
Thanks @BartHaagdorens - I've updated the gist to reflect this info
BTW are you sure about the /user/ vs /users/ URL? An older comment (https://gist.github.com/rodw/3073987#gistcomment-2017030) requested the opposite change.
I was pretty sure already: I tested the script first with /users/
, didn't work; with /user/
, it did.
To be absolutely sure, I just checked the github api docs:
/user/repos
: lists repositories for the authenticated user. (https://docs.github.com/en/rest/reference/repos#list-repositories-for-the-authenticated-user)/users/repos
: provides publicly available information about GitHub accountrepos
, which seems to actually exist. (https://docs.github.com/en/rest/reference/users#get-a-user).
Hi Team I Will like to backup a GitHub Repository into another repository i will want a backup to be made anytime there is a new commit. Kindly
share resource for this
@phemmmie What you're asking for is a different use-case than what this script addresses. I think you may want to look at something like a "post-commit hook" - an action that fires after every commit - that will push the most recent changes to another repository.
A quick google yields this gist - https://gist.github.com/stinoga/3136312 - which I haven't vetted in any way but I think claims to do what you're trying to do.
This fails for forks (and breaks) since tokens apparently don't have access to forked repos. :(
Same for repos you have access to but are not the owner of I think.
This seems to only access public repositories for an org I am an admin of. Any ideas on how to address that?
This seems to only access public repositories for an org I am an admin of. Any ideas on how to address that?
Turns out I needed to be using a personal access token with Repo Admin rights rather than my Github password. Once changing that, I was able to iterate through the private repositories.
I had to authenticate with an SSH key for this to work, the token wasn't enough. Added it to my ~/.ssh/config
file:
Host github.com
IdentityFile ~/.ssh/github_rsa
Hola, we created same functionality using Github CLI: https://github.com/druidfi/git-backupper
It's a bash script which can be used as it is, GHA workflow or with Docker image.
@rodw : thanks for cobbling this together, saved me some time from reinventing the wheel, and next to no dependencies involved :)
A few nits though:
- Auto-pruner blindly deletes files by age. If the backup host goes offline (but alive), or GitHub kicks the bit bucket, after 3 days the backups would get nuked... oops! I'd suggest coupling (or replacing) that with "(also?) keep last N files" :)
- Fetching, tarballing and nuking repos locally seems wasteful, especially when there are many repos and many bytes/objects to fetch. I assume it should be possible to keep and just update the mirror-repo directories? This would allow for faster updates of the backup, less storage wear, uplink abuse, etc.
- The nuance for user vs org backups can be handled with yet another envvar option (the first thing I added in my copy locally) :)
- Something should happen about paging for users/orgs with more than 100 repos - perhaps: fetch a page of 100 at an offset of N*100, check if there were 100, do N++ and loop (break if not 100 at the last page). I'm sure I did something of the sort elsewhere...
UPDATE: All of the above, and some more, solved and pushed to my fork.
In fact, is there a "real repo" version so PRs can be made? ;p
UPDATE: Posted my tweaks to https://github.com/jimklimov/github-scripts - testing so far
On a related note, a gist backup script from https://github.com/aprescott/gist-backup looks nifty
So far got an issue of sorts, that when listing my (user) repos via API I see all that I have access to, perhaps thousands via organizations I am part of (like Jenkins and all its plugins). In GitHub WebUI my account "only" has the 200+ repos that I created or forked under my name.
Is there a different API call (or parameter) to get that list of repo URLs for backup?
Is there a different API call (or parameter) to get that list of repo URLs for backup?
https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#list-repositories-for-a-user
Seems like you want type to be owner, although that seems to be the default if you're querying the user endpoint. Does that not work?
@spanthetree : good catch, thanks! I can only guess the API token I've used was "too strong" and so the metadata returned by default did include repos I have access to but are owned by other organizations. Also with the current script structure, that led to fetching repo names not known under my personal account, so the backup failed.
Adding the explicit option to curl listing did constrain it to expected amount.
FYI: Stumbled upon an issue with my version of the script that:
- its backed up list of issues was rather short
- there are no PR data
Digging into it, I found that both can be fetched (issues do include PRs, but have different metadata somewhat), and are paged (so "short" list was the default 30 newest items). Script is now updated to maintain a local git repo with exports of both the (paged=>JSON-concatenated) lists of issues and pulls, and to walk the resulting list to get each entry's comments (and commit metadata for good measure). By storing this in a local git repo (whose snapshots are tarballed), I get a history of how those discussions evolved.
Now... struggling to get If-Modified-Since
posted in a way that GitHub would actually reply with HTTP-304 instead of eating REST API quota points :)
UPDATE: The ETag
support went better, not all resources have the last-modified
in replies.
Having a prob that the script only recognize 100 repositories, but my association has 150
Is that a way to take this cap?
@JPC18 : you need to parse paginated output of github REST API.
FWIW, my continuation of this gist as posted in https://github.com/jimklimov/github-scripts seems to have pulled all 222 of my repos (346 if adding issues and PR info available for some of those, which are also git repos under the hood), 187 for a colleague... so this part works quite well :)
I've checked that the orgs I back up happen to all have under 100 repos, though... so that aspect is a bit lacking in real-life testing :)
I would just install
jq
as it's designed to parse JSON as opposed to trying to chase the API return text withgrep -v
statements.