Skip to content

Instantly share code, notes, and snippets.

@pascalschulz
Last active November 12, 2020 13:42
Show Gist options
  • Save pascalschulz/5587c2044e028d0c5f8251e4ec05f755 to your computer and use it in GitHub Desktop.
Save pascalschulz/5587c2044e028d0c5f8251e4ec05f755 to your computer and use it in GitHub Desktop.
This code snippet takes a Github organization name as input, crawls for all its public repositories and returns a list of all the "Git clone URLs" for those repos.
import itertools
import re
import requests as rq
# Your Github organization (e.g. /Github)
organization = "/<company_name>"
response = rq.request("GET", "https://github.com{0}".format(organization))
try:
pages = re.search(r"data-total-pages=\"(\d+)\">", response.text).group(1)
except:
pages = 1
repositoryUrls = []
for page in range(1, int(pages) + 1):
response = rq.request("GET", "https://github.com{}?page={}".format(organization, str(page)))
repositoryUrls.append(re.findall(r"itemprop=\"name codeRepository\".*href=\"" + path + "/(.*)\" class", response.text))
repositoryUrls = list(itertools.chain.from_iterable(repositoryUrls))
repositoryUrls = ["https://github.com" + organization + "/{0}.git".format(repo) for repo in repositoryUrls]
print(repositoryUrls)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment