Skip to content

Instantly share code, notes, and snippets.

@razhangwei
Last active March 31, 2025 02:49
Show Gist options
  • Save razhangwei/4bf00fd5eb3c901ac205f5c78725e61c to your computer and use it in GitHub Desktop.
Save razhangwei/4bf00fd5eb3c901ac205f5c78725e61c to your computer and use it in GitHub Desktop.
Transformer models published on Huggingface

How can I find the model implementation for a particular model?

  • Usually it's avaiable from transformers/models/xxxx, e.g., for qwen2_5_vl

Typical Configuration Files for OSS LLMs on Hugging Face

Looking at the Qwen2.5-VL-32B-Instruct example, we can see these standard configuration files:

Core Configuration Files

  1. config.json - The primary model configuration file that defines the model architecture, including parameters like hidden size, number of layers, attention heads, and other architectural details. For Qwen2.5 models, it also contains settings for rotary position embeddings and context length.

  2. tokenizer_config.json - Contains settings for the tokenizer, including vocabulary size, special tokens, and tokenization parameters.

  3. generation_config.json - Defines default parameters for text generation such as temperature, top_p, top_k, max length, and repetition penalties.

Tokenizer-related Files

  1. added_tokens.json - Lists additional tokens beyond the base vocabulary.

  2. vocab.json - Contains the mapping of tokens to token IDs.

  3. merges.txt - Used for byte-pair encoding (BPE) tokenization, containing merge rules.

  4. special_tokens_map.json - Maps special token types (like pad, eos, bos) to their actual token values.

Model Template Files

  1. chat_template.json - Defines the format for chat conversations, including how to structure multi-turn dialogues.

Model Weight Files

  1. model-00001-of-00018.safetensors - The actual model weights split into chunks (safetensors format is preferred over older PyTorch .bin files for security and performance).

  2. model.safetensors.index.json - Index file that maps parameter names to their locations in the chunked model files.

Additional Configuration

  1. preprocessor_config.json - For multimodal models like Qwen2.5-VL, this contains settings for processing visual inputs.

  2. configuration.json - Sometimes contains additional configuration metadata.

These files work together to define both the model architecture and how to interact with it. The specific configurations for multimodal models like Qwen2.5-VL include additional settings for handling images and videos, such as resolution parameters and vision encoder specifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment