Skip to content

Instantly share code, notes, and snippets.

@fkraeutli
Last active October 10, 2024 05:29
Show Gist options
  • Save fkraeutli/66fa741d9a8c2a6a238a01d17ed0edc5 to your computer and use it in GitHub Desktop.
Save fkraeutli/66fa741d9a8c2a6a238a01d17ed0edc5 to your computer and use it in GitHub Desktop.
How to download GIT LFS files

How to retrieve GIT LFS files from GitHub

Retrieving non-LFS files

Through the GitHub API it is possible to retrieve individual files from a Git repository via, e.g. curl. To do so, first retrieve the content information for the relevant file (or folder):

curl https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}

For private repositories, authenticate using your username and a personal access token

curl -u {username}:{personal access token'} https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}

This will return a JSON response:

{
  "name": "README.md",
  "path": "README.md",
  "sha": "41553899f901843f5339794256s2444ed351708a",
  "size": 815,
  "url": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
  "html_url": "https://github.com/{organisation}/{repository}/blob/main/README.md",
  "git_url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
  "download_url": "https://raw.githubusercontent.com/{organisation}/{repository}/main/README.md?token=AAL57UOYWVQ56ZZGDGWYUAK76WFNO",
  "type": "file",
  "_links": {
    "self": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
    "git": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
    "html": "https://github.com/{organisation}/{repository}/blob/main/README.md"
  }
}

The file can then be downloaded using the sha:

curl -u https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a

This gives another JSON response with the file contents in base64 encoding:

{
  "sha": "41553899f901843f5339794256s2444ed351708a",
  "node_id": "{node id}",
  "size": 815,
  "url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
  "content": "{base64 encoded content}",
  "encoding": "base64"
}

Note that for smaller files, the base64 encoded content will already be included in the first call.

Retrieving LFS files

Retrieving an LFS file requires a few extra steps. For LFS files, decoding the base64 string will not return the file's content, but information in the following format:

version https://git-lfs.github.com/spec/v1
oid sha256:{sha}
size {filesize}

Using this information, you need to create a JSON object as follows, filling in the sha and filesize information from the previous step:

{
    "operation": "download", 
    "transfer": ["basic"], 
    "objects": [
        {"oid": "{sha}", "size": "{size}"}
    ]}
}

Pass this object as data parameter to a curl request to the LFS api:

curl -X POST \
-H "Accept: application/vnd.git-lfs+json" \
-H "Content-type: application/json" \
-d '{"operation": "download", "transfer": ["basic"], "objects": [{"oid": "{sha}", "size": {size}}]}' \
https://github.com/{organisation}/{repository}.git/info/lfs/objects/batch

Almost there! This should return a JSON object that tells you where the file is stored:

{
  "objects": [
    {
      "oid": "{sha}",
      "size": {size},
      "actions": {
        "download": {
          "href": "https://github-cloud.s3.amazonaws.com/alambic/media/278163869/a2/42/{sha}?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXX%2F20210106%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210106T104409Z&X-Amz-Expires=3600&X-Amz-Signature=XXX&X-Amz-SignedHeaders=host&actor_id=XXX&key_id=0&repo_id=XXX&token=1",
          "expires_at": "2021-01-06T11:44:09Z",
          "expires_in": 3600
        }
      }
    }
  ]
}

Download the file from the URL stated in the href attribute.

@bauergeorg
Copy link

@fkraeutli and @MrCsabaToth
In the meantime I made the game. You have to verify your download. That's all.

Here is an extract of my python code using PyGithub and requests module.

# see: https://www.mattmoriarity.com/2019-04-25-uploading-media-with-git-lfs/#initiating-the-transfer

# check for verify lock 
headers = {'Content-Type': 'application/vnd.git-lfs+json', 'Accept': 'application/vnd.git-lfs+json'}
url = 'https://lfs.github.com/{}/{}/locks/verify'.format(self.org, repo.name)
# locks 
ans0 = requests.post(url, headers=headers, auth=(self.token, ''))
if ans0.status_code != 200:
    print('status code: ' +  str(ans0.status_code))
    print('text: ' + ans0.text)
    raise Exception("Status code error")
# handle unhandled lock stuff
# we expect an no locked files (empty answer)
# see: https://github.com/git-lfs/git-lfs/blob/main/docs/api/locking.md
if ans0.text.find('{"ours":[],"theirs":[],"next_cursor":""}') == -1:
    raise Exception("Lock handle error. Please contact your favorite developer to add lfs lock handling!")
# print ans
#pprint(ans0.text)

# get old content
old_lfs_pointer_content = decoded_content.split('\n')
old_sha = old_lfs_pointer_content[1].replace('oid sha256:', '')
old_size = int(old_lfs_pointer_content[2].replace('size ', ''))
# calculate content of lfs pointer to replace
new_size = get_file_size(source_file_path)
new_sha = get_file_hash(source_file_path)
# replace content
new_lfs_pointer_content = decoded_content.replace(str(old_size), str(new_size))
new_lfs_pointer_content = new_lfs_pointer_content.replace(old_sha, new_sha)

# get url to upload file to lfs
headers = {'Content-Type': 'application/vnd.git-lfs+json', 'Accept': 'application/vnd.git-lfs+json'}
pointer_data = '{"operation": "upload", "transfer": ["basic"], "objects": [{"oid": "' + new_sha + '", "size": ' + str(new_size) + '}]}'
url = 'https://github.com/{}/{}.git/info/lfs/objects/batch'.format(self.org, repo.name)
res = requests.post(url, headers=headers, data=pointer_data, auth=(self.token, ''))
ans1 = json.loads(res.text)
pprint(ans1['objects'][0])
upload_href = ans1['objects'][0]['actions']['upload']['href']
upload_header = ans1['objects'][0]['actions']['upload']['header']
verify_href = ans1['objects'][0]['actions']['verify']['href']
verify_header = ans1['objects'][0]['actions']['verify']['header']

# add content type to header
upload_header['Content-Type'] = 'application/octet-stream'

# read new data (not encoded)
data = open(source_file_path, 'rb').read()
# send/upload file to git lfs
ans2 = requests.put(url=upload_href, headers=upload_header, data=data)
if ans2.status_code != 200:
    print('status code: ' +  str(ans2.status_code))
    print('text: ' + ans2.text)
    raise Exception("Status code error")
# print ans
pprint(ans2.text)

# verify
# see: https://github.com/git-lfs/git-lfs/blob/main/docs/api/basic-transfers.md#verification
verify_data = '{"oid": "' + new_sha + '", "size": ' + str(new_size) + '}'
ans3 = requests.post(url=verify_href, headers=verify_header, data=verify_data, auth=(self.token, ''))
if ans3.status_code != 200:
    print('status code: ' +  str(ans3.status_code))
    print('text: ' + ans3.text)
    raise Exception("Status code error")
# print ans
#pprint(ans3.text)

# create blob with new content (lfs pointer)
blob = repo.create_git_blob(content=new_lfs_pointer_content, encoding='utf-8')

Greetings and thank's a lot for your posts.

@Iqwertz
Copy link

Iqwertz commented Apr 13, 2023

Thanks for the example!
I am trying to write a implementation to make a github lfs files downloadable via a button on a website. My current approach is to request the oid and file size with a javascript get request. This works as expected, but when I then try to make the post request to retrieve the download url I get the following error:
Access to XMLHttpRequest at 'https://github.com/{organization...}/{repo...}.git/info/lfs/objects/batch' from origin 'http://localhost:9000' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.
I assume this is because github.com isn't meant to be used as an api and therefore doesnt allow Cross origin requests :/

So does anyone know how I could retrieve the download url with a javascript request that doesn't get blocked by the cors policy?

@athletic-geek
Copy link

athletic-geek commented Feb 19, 2024

Hi @fkraeutli,
Thanks for sharing the example, but how to deal with the special case using your method?

Say the content of a small file is like the following, How to avoid it to be mistakenly treated as a LFS blob?
I think there should be a API to determine whether a file is ordinary blob or LFS blob.

version https://git-lfs.github.com/spec/v1
oid sha256:{sha}
size {filesize}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment