Skip to content

Instantly share code, notes, and snippets.

@aont
Created December 17, 2025 11:33
Show Gist options
  • Select an option

  • Save aont/1eab1ff34bcdcdf93bef5e4d0220c0d1 to your computer and use it in GitHub Desktop.

Select an option

Save aont/1eab1ff34bcdcdf93bef5e4d0220c0d1 to your computer and use it in GitHub Desktop.

Notes on Tabby: Llama.cpp, Model Caching, and Access Tokens

Tabby is a developer-focused tool that can run and manage local AI models, and it includes a few practical configuration and account details that are useful to keep in mind.

Tabby uses llama.cpp internally

One notable point is that Tabby uses llama.cpp under the hood. In practice, this means Tabby can leverage the lightweight, local-inference approach that llama.cpp is known for, which is often used to run LLMs efficiently on local machines.

Model cache location: TABBY_MODEL_CACHE_ROOT

Tabby supports configuring where it stores cached model files. The environment variable:

  • TABBY_MODEL_CACHE_ROOT

is used to control the root directory for Tabby’s model cache. If you want to manage disk usage, place models on a faster drive, or standardize storage paths across multiple environments, setting this variable is a straightforward way to do it.

Registry reference: registry-tabby

Tabby’s model registry can be found here:

This repository is a useful reference point for understanding what models are available or how Tabby’s registry entries are organized.

How to check your token

To view the token you need for authenticated access, you typically:

  1. Access the service via a web browser.
  2. Create an account and log in.
  3. After logging in, you can locate and confirm your token from the account or user settings area.

In other words, the token is something you can verify after a one-time browser-based account setup and login.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment