Skip to content

Instantly share code, notes, and snippets.

@natanlao
Last active April 11, 2025 00:33
Show Gist options
  • Save natanlao/afb676b17aa724754ee77099e4291f3f to your computer and use it in GitHub Desktop.
Save natanlao/afb676b17aa724754ee77099e4291f3f to your computer and use it in GitHub Desktop.
Translating GitHub resource IDs to global node IDs

GitHub associates a unique resource ID (or "database ID" or just "ID") with each API-accessible resource. For example, each issue, repository, and user has a global ID. In my limited experience with it, GitHub's REST API generally does not expose endpoints by which resources can be queried by ID (though it does have some undocumented endpoints). These resource IDs have been superseded by distinct global node IDs (node_id). GitHub's GraphQL API allows retrieval of a node by its ID, called a "direct node lookup".

As you can tell, you likely don't have much reason to interact with the older identifiers directly. I encountered this case when using the ZenHub API, which interfaces with repositories only by resource ID. In my case, I wanted to retrieve a list of recently-closed issues from a set of repositories identified only by their resource ID (and without their owner or name).

This would be possible using the undocumented repository information REST endpoint I mentioned above, but I wanted to use the GraphQL API due to the amount of repositories I was interested in. GitHub's GraphQL API does not expose a means of querying repositories (or any object, to my understanding) by these resource IDs, so we need to manually convert them.

It turns out that the global node IDs are base64 encodings of a human-readable format. Take, for example, DataBiosphere/azul:

query {
  azul: repository(owner:"DataBiosphere", name:"azul") {
    id
    databaseId
  }
}
{
  "data": {
    "azul": {
      "id": "MDEwOlJlcG9zaXRvcnkxMzkwOTU1Mzc=",
      "databaseId": 139095537
    }
  }
}

If we decode the global node ID, we can see the pattern:

$ echo "MDEwOlJlcG9zaXRvcnkxMzkwOTU1Mzc=" | base64 --decode
010:Repository139095537

which we can infer is roughly equivalent to {type_id}:{type_name}{resource_id}. I'm pretty new to GraphQL, so I may have butchered some of the terminology.

In any case, we can reverse this operation to calculate a global node ID from some resource identified only by resource ID:

$ echo -n "010:Repository139095537" | base64
MDEwOlJlcG9zaXRvcnkxMzkwOTU1Mzc=
query {
  node(id:"MDEwOlJlcG9zaXRvcnkxMzkwOTU1MzcK") {
    ... on Repository {
      nameWithOwner
    }
  }
}
{
  "data": {
    "node": {
      "nameWithOwner": "DataBiosphere/azul"
    }
  }
}

Interestingly, this will work even with the trailing newline (i.e., calling echo in the above example without the -n flag).

@JMalland
Copy link

@grugnog Is there any way this can be used to encode from a legacy REST ID (i.e. repository ID 966614) into the new node ID format?

Figured it out. sigh -- It was literally in front of my face. encode_legacy_id('K', 0, 966614)

Glad to have a solution though. Thanks everyone for the insight on this!

def encode_legacy_id(prefix, type, legacy_id):
    # Format is [type, id]
    packed_data = msgpack.packb([type, legacy_id])
    base64_encoded = base64.b64encode(packed_data).decode('utf-8')
    # Remove any padding '=' characters
    base64_encoded = base64_encoded.rstrip('=')
    return f"{prefix}_{base64_encoded}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment