Through the GitHub API it is possible to retrieve individual files from a Git repository via, e.g. curl
. To do so, first retrieve the content information for the relevant file (or folder):
curl https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}
For private repositories, authenticate using your username and a personal access token
curl -u {username}:{personal access token'} https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}
This will return a JSON response:
{
"name": "README.md",
"path": "README.md",
"sha": "41553899f901843f5339794256s2444ed351708a",
"size": 815,
"url": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
"html_url": "https://github.com/{organisation}/{repository}/blob/main/README.md",
"git_url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
"download_url": "https://raw.githubusercontent.com/{organisation}/{repository}/main/README.md?token=AAL57UOYWVQ56ZZGDGWYUAK76WFNO",
"type": "file",
"_links": {
"self": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
"git": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
"html": "https://github.com/{organisation}/{repository}/blob/main/README.md"
}
}
The file can then be downloaded using the sha
:
curl -u https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a
This gives another JSON response with the file contents in base64 encoding:
{
"sha": "41553899f901843f5339794256s2444ed351708a",
"node_id": "{node id}",
"size": 815,
"url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
"content": "{base64 encoded content}",
"encoding": "base64"
}
Note that for smaller files, the base64 encoded content will already be included in the first call.
Retrieving an LFS file requires a few extra steps. For LFS files, decoding the base64 string will not return the file's content, but information in the following format:
version https://git-lfs.github.com/spec/v1
oid sha256:{sha}
size {filesize}
Using this information, you need to create a JSON object as follows, filling in the sha
and filesize
information from the previous step:
{
"operation": "download",
"transfer": ["basic"],
"objects": [
{"oid": "{sha}", "size": "{size}"}
]}
}
Pass this object as data parameter to a curl request to the LFS api:
curl -X POST \
-H "Accept: application/vnd.git-lfs+json" \
-H "Content-type: application/json" \
-d '{"operation": "download", "transfer": ["basic"], "objects": [{"oid": "{sha}", "size": {size}}]}' \
https://github.com/{organisation}/{repository}.git/info/lfs/objects/batch
Almost there! This should return a JSON object that tells you where the file is stored:
{
"objects": [
{
"oid": "{sha}",
"size": {size},
"actions": {
"download": {
"href": "https://github-cloud.s3.amazonaws.com/alambic/media/278163869/a2/42/{sha}?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXX%2F20210106%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210106T104409Z&X-Amz-Expires=3600&X-Amz-Signature=XXX&X-Amz-SignedHeaders=host&actor_id=XXX&key_id=0&repo_id=XXX&token=1",
"expires_at": "2021-01-06T11:44:09Z",
"expires_in": 3600
}
}
}
]
}
Download the file from the URL stated in the href
attribute.
@athletic-geek
I don't know how relevant this is to you now, but it may also be helpful to others, so I'm posting it here.
In short: there won't be such an API.
Git LFS is a poorly implemented abstraction on top of the git we used to. There are no differences between the real LFS object annotation inside the git blob (file) and the fake one. There are no flags that are attached to git blobs and that can help determine if the content of the blob is a Git LFS annotation or not. Git LFS client turns such blobs from annotations to real big files on the fly based on
.gitattributes
files. The first problem arises when dealing with the fact that.gitattributes
can be placed anywhere: deeply inside the directory structure, in the repo's root, or even outside the repo e.g. inside the user's home folder. There's no pure, independent, and definitive way to safely determine if it's a Git LFS object or not.--branch
flag added to thegit clone
command, and as a result see that all the files that previously were big LFS files are just small text files with Git LFS annotations. That's because they are and always have been..gitattributes
file in your home folder, and they'll be applied globally./contents
API will say that the file's size is the size you had put into the manually created file. They couldn't have known if it was real and they won't.So the only thing left is to check if the file is a valid Git LFS annotation or not and if it is, just assume it IS a file stored inside Git LFS. GitHub's API does it the same way. If for some reason it's extremely important to know for sure, you can also ask the LFS server and determine if it has an object with a specific sha256 hash.
My good enough marker of the file being a valid git LFS annotation is if it matches the following RegExp available here: