Skip to content

Instantly share code, notes, and snippets.

@fkraeutli
Last active February 1, 2025 10:29
Show Gist options
  • Save fkraeutli/66fa741d9a8c2a6a238a01d17ed0edc5 to your computer and use it in GitHub Desktop.
Save fkraeutli/66fa741d9a8c2a6a238a01d17ed0edc5 to your computer and use it in GitHub Desktop.
How to download GIT LFS files

How to retrieve GIT LFS files from GitHub

Retrieving non-LFS files

Through the GitHub API it is possible to retrieve individual files from a Git repository via, e.g. curl. To do so, first retrieve the content information for the relevant file (or folder):

curl https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}

For private repositories, authenticate using your username and a personal access token

curl -u {username}:{personal access token'} https://api.github.com/repos/{organisation}/{repository}/contents/{file or folder path}

This will return a JSON response:

{
  "name": "README.md",
  "path": "README.md",
  "sha": "41553899f901843f5339794256s2444ed351708a",
  "size": 815,
  "url": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
  "html_url": "https://github.com/{organisation}/{repository}/blob/main/README.md",
  "git_url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
  "download_url": "https://raw.githubusercontent.com/{organisation}/{repository}/main/README.md?token=AAL57UOYWVQ56ZZGDGWYUAK76WFNO",
  "type": "file",
  "_links": {
    "self": "https://api.github.com/repos/{organisation}/{repository}/contents/README.md?ref=main",
    "git": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
    "html": "https://github.com/{organisation}/{repository}/blob/main/README.md"
  }
}

The file can then be downloaded using the sha:

curl -u https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a

This gives another JSON response with the file contents in base64 encoding:

{
  "sha": "41553899f901843f5339794256s2444ed351708a",
  "node_id": "{node id}",
  "size": 815,
  "url": "https://api.github.com/repos/{organisation}/{repository}/git/blobs/41553899f901843f5339794256s2444ed351708a",
  "content": "{base64 encoded content}",
  "encoding": "base64"
}

Note that for smaller files, the base64 encoded content will already be included in the first call.

Retrieving LFS files

Retrieving an LFS file requires a few extra steps. For LFS files, decoding the base64 string will not return the file's content, but information in the following format:

version https://git-lfs.github.com/spec/v1
oid sha256:{sha}
size {filesize}

Using this information, you need to create a JSON object as follows, filling in the sha and filesize information from the previous step:

{
    "operation": "download", 
    "transfer": ["basic"], 
    "objects": [
        {"oid": "{sha}", "size": "{size}"}
    ]}
}

Pass this object as data parameter to a curl request to the LFS api:

curl -X POST \
-H "Accept: application/vnd.git-lfs+json" \
-H "Content-type: application/json" \
-d '{"operation": "download", "transfer": ["basic"], "objects": [{"oid": "{sha}", "size": {size}}]}' \
https://github.com/{organisation}/{repository}.git/info/lfs/objects/batch

Almost there! This should return a JSON object that tells you where the file is stored:

{
  "objects": [
    {
      "oid": "{sha}",
      "size": {size},
      "actions": {
        "download": {
          "href": "https://github-cloud.s3.amazonaws.com/alambic/media/278163869/a2/42/{sha}?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXX%2F20210106%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210106T104409Z&X-Amz-Expires=3600&X-Amz-Signature=XXX&X-Amz-SignedHeaders=host&actor_id=XXX&key_id=0&repo_id=XXX&token=1",
          "expires_at": "2021-01-06T11:44:09Z",
          "expires_in": 3600
        }
      }
    }
  ]
}

Download the file from the URL stated in the href attribute.

@nikelborm
Copy link

nikelborm commented Jan 10, 2025

@athletic-geek
I don't know how relevant this is to you now, but it may also be helpful to others, so I'm posting it here.

In short: there won't be such an API.

Git LFS is a poorly implemented abstraction on top of the git we used to. There are no differences between the real LFS object annotation inside the git blob (file) and the fake one. There are no flags that are attached to git blobs and that can help determine if the content of the blob is a Git LFS annotation or not. Git LFS client turns such blobs from annotations to real big files on the fly based on .gitattributes files. The first problem arises when dealing with the fact that .gitattributes can be placed anywhere: deeply inside the directory structure, in the repo's root, or even outside the repo e.g. inside the user's home folder. There's no pure, independent, and definitive way to safely determine if it's a Git LFS object or not.

  1. You can delete .gitattributes files, commit the changes, clone from a specific branch into a temporary folder using the --branch flag added to the git clone command, and as a result see that all the files that previously were big LFS files are just small text files with Git LFS annotations. That's because they are and always have been.
  2. You can also add some rules to handle LFS annotation files into the .gitattributes file in your home folder, and they'll be applied globally.
  3. You can manually create a file with a valid Git LFS annotation text, commit it, and you shouldn't be surprised to see that GitHub's /contents API will say that the file's size is the size you had put into the manually created file. They couldn't have known if it was real and they won't.

So the only thing left is to check if the file is a valid Git LFS annotation or not and if it is, just assume it IS a file stored inside Git LFS. GitHub's API does it the same way. If for some reason it's extremely important to know for sure, you can also ask the LFS server and determine if it has an object with a specific sha256 hash.

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Duck test

My good enough marker of the file being a valid git LFS annotation is if it matches the following RegExp available here:

const gitLFSInfoRegexp = /^version (?<version>https:\/\/git-lfs\.github\.com\/spec\/v1)\noid sha256:(?<oidSha256>[0-9a-f]{64})\nsize (?<size>[1-9][0-9]{0,11})\n$/m

@nikelborm
Copy link

nikelborm commented Jan 10, 2025

Also, you don't have to make 2 requests to get a base64 encoded string with a Git LFS annotation. You can activate the object media type by adding Accept: application/vnd.github.v3.object header and you'll get it right inside of the /contents API response.

You'll get a response like this one:

{
    name: '100mb_file.txt',
    path: '100mb_file.txt',
    sha: '7557bc11dbc04337d33e6cd7e6b9bfa2d2d00e2b',
    size: 104857600,
    url: 'https://api.github.com/repos/fetch-gh-folder-tests/public-repo/contents/100mb_file.txt?ref=0362e8aec37c9146e1f946b27d98043a823357b7',
    html_url: 'https://github.com/fetch-gh-folder-tests/public-repo/blob/0362e8aec37c9146e1f946b27d98043a823357b7/100mb_file.txt',
    git_url: 'https://api.github.com/repos/fetch-gh-folder-tests/public-repo/git/blobs/7557bc11dbc04337d33e6cd7e6b9bfa2d2d00e2b',
    download_url: 'https://media.githubusercontent.com/media/fetch-gh-folder-tests/public-repo/0362e8aec37c9146e1f946b27d98043a823357b7/100mb_file.txt',
    type: 'file',
    content: 'dmVyc2lvbiBodHRwczovL2dpdC1sZnMuZ2l0aHViLmNvbS9zcGVjL3YxCm9p\n' +
      'ZCBzaGEyNTY6Y2VlNDFlOThkMGE2YWQ2NWNjMGVjNzdhMmJhNTBiZjI2ZDY0\n' +
      'ZGM5MDA3ZjdmMWM3ZDdkZjY4YjhiNzEyOTFhNgpzaXplIDEwNDg1NzYwMAo=\n',
    encoding: 'base64',
    _links: {
      self: 'https://api.github.com/repos/fetch-gh-folder-tests/public-repo/contents/100mb_file.txt?ref=0362e8aec37c9146e1f946b27d98043a823357b7',
      git: 'https://api.github.com/repos/fetch-gh-folder-tests/public-repo/git/blobs/7557bc11dbc04337d33e6cd7e6b9bfa2d2d00e2b',
      html: 'https://github.com/fetch-gh-folder-tests/public-repo/blob/0362e8aec37c9146e1f946b27d98043a823357b7/100mb_file.txt'
    }
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment