Skip to content

Instantly share code, notes, and snippets.

@fkraeutli
Last active October 10, 2024 05:29
Show Gist options
  • Save fkraeutli/66fa741d9a8c2a6a238a01d17ed0edc5 to your computer and use it in GitHub Desktop.
Save fkraeutli/66fa741d9a8c2a6a238a01d17ed0edc5 to your computer and use it in GitHub Desktop.
How to download GIT LFS files

How to retrieve GIT LFS files from GitHub

Retrieving non-LFS files

Through the GitHub API it is possible to retrieve individual files from a Git repository via, e.g. curl. To do so, first retrieve the content information for the relevant file (or folder):

curl https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}

For private repositories, authenticate using your username and a personal access token

curl -u {username}:{personal access token'} https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}

This will return a JSON response:

{
  "name": "README.md",
  "path": "README.md",
  "sha": "41553899f901843f5339794256s2444ed351708a",
  "size": 815,
  "url": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
  "html_url": "https://github.com/{organisation}/{repository}/blob/main/README.md",
  "git_url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
  "download_url": "https://raw.githubusercontent.com/{organisation}/{repository}/main/README.md?token=AAL57UOYWVQ56ZZGDGWYUAK76WFNO",
  "type": "file",
  "_links": {
    "self": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
    "git": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
    "html": "https://github.com/{organisation}/{repository}/blob/main/README.md"
  }
}

The file can then be downloaded using the sha:

curl -u https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a

This gives another JSON response with the file contents in base64 encoding:

{
  "sha": "41553899f901843f5339794256s2444ed351708a",
  "node_id": "{node id}",
  "size": 815,
  "url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
  "content": "{base64 encoded content}",
  "encoding": "base64"
}

Note that for smaller files, the base64 encoded content will already be included in the first call.

Retrieving LFS files

Retrieving an LFS file requires a few extra steps. For LFS files, decoding the base64 string will not return the file's content, but information in the following format:

version https://git-lfs.github.com/spec/v1
oid sha256:{sha}
size {filesize}

Using this information, you need to create a JSON object as follows, filling in the sha and filesize information from the previous step:

{
    "operation": "download", 
    "transfer": ["basic"], 
    "objects": [
        {"oid": "{sha}", "size": "{size}"}
    ]}
}

Pass this object as data parameter to a curl request to the LFS api:

curl -X POST \
-H "Accept: application/vnd.git-lfs+json" \
-H "Content-type: application/json" \
-d '{"operation": "download", "transfer": ["basic"], "objects": [{"oid": "{sha}", "size": {size}}]}' \
https://github.com/{organisation}/{repository}.git/info/lfs/objects/batch

Almost there! This should return a JSON object that tells you where the file is stored:

{
  "objects": [
    {
      "oid": "{sha}",
      "size": {size},
      "actions": {
        "download": {
          "href": "https://github-cloud.s3.amazonaws.com/alambic/media/278163869/a2/42/{sha}?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXX%2F20210106%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210106T104409Z&X-Amz-Expires=3600&X-Amz-Signature=XXX&X-Amz-SignedHeaders=host&actor_id=XXX&key_id=0&repo_id=XXX&token=1",
          "expires_at": "2021-01-06T11:44:09Z",
          "expires_in": 3600
        }
      }
    }
  ]
}

Download the file from the URL stated in the href attribute.

@tljstewart
Copy link

Wow, thank you for putting this info together. Github needs a better solution, I'm just trying to download a repo as a zip but I can't make any progress unless I do what you've outlined to actually retrieve the datasets :(

@bauergeorg
Copy link

Florian, thanks a lot for your example!

@fkraeutli
Copy link
Author

Glad it's useful!

@bauergeorg
Copy link

bauergeorg commented Apr 8, 2022

@fkraeutli do you know how to add or to replace a git lfs file to/on github?

@MrCsabaToth
Copy link

I'm trying to download a file form an LFS enabled repo to correct quota overflow. When I execute the curl command it returns Cookies must be enabled to use GitHub. When I execute the same POST command from ARC it returns 422 Unprocessable Entity with the message Your browser did something unexpected. Please try again. If the error continues, try disabling all browser extensions.. I need help.

@bauergeorg
Copy link

@fkraeutli and @MrCsabaToth
In the meantime I made the game. You have to verify your download. That's all.

Here is an extract of my python code using PyGithub and requests module.

# see: https://www.mattmoriarity.com/2019-04-25-uploading-media-with-git-lfs/#initiating-the-transfer

# check for verify lock 
headers = {'Content-Type': 'application/vnd.git-lfs+json', 'Accept': 'application/vnd.git-lfs+json'}
url = 'https://lfs.github.com/{}/{}/locks/verify'.format(self.org, repo.name)
# locks 
ans0 = requests.post(url, headers=headers, auth=(self.token, ''))
if ans0.status_code != 200:
    print('status code: ' +  str(ans0.status_code))
    print('text: ' + ans0.text)
    raise Exception("Status code error")
# handle unhandled lock stuff
# we expect an no locked files (empty answer)
# see: https://github.com/git-lfs/git-lfs/blob/main/docs/api/locking.md
if ans0.text.find('{"ours":[],"theirs":[],"next_cursor":""}') == -1:
    raise Exception("Lock handle error. Please contact your favorite developer to add lfs lock handling!")
# print ans
#pprint(ans0.text)

# get old content
old_lfs_pointer_content = decoded_content.split('\n')
old_sha = old_lfs_pointer_content[1].replace('oid sha256:', '')
old_size = int(old_lfs_pointer_content[2].replace('size ', ''))
# calculate content of lfs pointer to replace
new_size = get_file_size(source_file_path)
new_sha = get_file_hash(source_file_path)
# replace content
new_lfs_pointer_content = decoded_content.replace(str(old_size), str(new_size))
new_lfs_pointer_content = new_lfs_pointer_content.replace(old_sha, new_sha)

# get url to upload file to lfs
headers = {'Content-Type': 'application/vnd.git-lfs+json', 'Accept': 'application/vnd.git-lfs+json'}
pointer_data = '{"operation": "upload", "transfer": ["basic"], "objects": [{"oid": "' + new_sha + '", "size": ' + str(new_size) + '}]}'
url = 'https://github.com/{}/{}.git/info/lfs/objects/batch'.format(self.org, repo.name)
res = requests.post(url, headers=headers, data=pointer_data, auth=(self.token, ''))
ans1 = json.loads(res.text)
pprint(ans1['objects'][0])
upload_href = ans1['objects'][0]['actions']['upload']['href']
upload_header = ans1['objects'][0]['actions']['upload']['header']
verify_href = ans1['objects'][0]['actions']['verify']['href']
verify_header = ans1['objects'][0]['actions']['verify']['header']

# add content type to header
upload_header['Content-Type'] = 'application/octet-stream'

# read new data (not encoded)
data = open(source_file_path, 'rb').read()
# send/upload file to git lfs
ans2 = requests.put(url=upload_href, headers=upload_header, data=data)
if ans2.status_code != 200:
    print('status code: ' +  str(ans2.status_code))
    print('text: ' + ans2.text)
    raise Exception("Status code error")
# print ans
pprint(ans2.text)

# verify
# see: https://github.com/git-lfs/git-lfs/blob/main/docs/api/basic-transfers.md#verification
verify_data = '{"oid": "' + new_sha + '", "size": ' + str(new_size) + '}'
ans3 = requests.post(url=verify_href, headers=verify_header, data=verify_data, auth=(self.token, ''))
if ans3.status_code != 200:
    print('status code: ' +  str(ans3.status_code))
    print('text: ' + ans3.text)
    raise Exception("Status code error")
# print ans
#pprint(ans3.text)

# create blob with new content (lfs pointer)
blob = repo.create_git_blob(content=new_lfs_pointer_content, encoding='utf-8')

Greetings and thank's a lot for your posts.

@Iqwertz
Copy link

Iqwertz commented Apr 13, 2023

Thanks for the example!
I am trying to write a implementation to make a github lfs files downloadable via a button on a website. My current approach is to request the oid and file size with a javascript get request. This works as expected, but when I then try to make the post request to retrieve the download url I get the following error:
Access to XMLHttpRequest at 'https://github.com/{organization...}/{repo...}.git/info/lfs/objects/batch' from origin 'http://localhost:9000' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.
I assume this is because github.com isn't meant to be used as an api and therefore doesnt allow Cross origin requests :/

So does anyone know how I could retrieve the download url with a javascript request that doesn't get blocked by the cors policy?

@athletic-geek
Copy link

athletic-geek commented Feb 19, 2024

Hi @fkraeutli,
Thanks for sharing the example, but how to deal with the special case using your method?

Say the content of a small file is like the following, How to avoid it to be mistakenly treated as a LFS blob?
I think there should be a API to determine whether a file is ordinary blob or LFS blob.

version https://git-lfs.github.com/spec/v1
oid sha256:{sha}
size {filesize}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment