Use Llama models in ollama and Librechat

To download and use Llama models in LibreChat on your Mac, potentially via Ollama, follow these steps:

Step 1: Download the Llama Models

1. Request Access

Visit the Meta Llama Downloads page.
Complete the access request form and accept Meta's Community License Agreement.
Once approved, you’ll receive a signed URL for downloading the model weights.

2. Install the Llama CLI

Open your terminal and install the Llama CLI tool:
```
pip install llama-stack
```
Verify installation by running:
```
llama --help
```

3. List Available Models

Use the CLI to list available models:
```
llama model list
```
Identify the model ID for the version you want (e.g., Llama 3.1-8B or Llama 4).

4. Download the Model

Run the following command to download your chosen model:
```
llama download --source meta --model-id CHOSEN_MODEL_ID
```
When prompted, provide the signed URL you received via email.

Step 2: Prepare for LibreChat

LibreChat supports running local models like Llama if they are compatible with its backend. You can integrate them using Ollama or directly through a backend like Hugging Face Transformers.

Use Ollama

Install Ollama:
- Download and install Ollama from its official website.
- Verify installation by running:
```
ollama --version
```
Import the Model into Ollama:
- Move your downloaded model files into a directory accessible by Ollama.
- Use Ollama’s import feature to load the model:
```
ollama import /path/to/your/model
```
Run LibreChat with Ollama Backend:
- Configure LibreChat to use Ollama as its backend for local models. Follow LibreChat’s documentation for backend setup.

Step 3: Optimize for Mac Usage

If you have limited hardware resources, consider quantization to reduce memory usage:

Use FP8 or Int4 quantization modes during inference:

torchrun --nproc_per_node=1 \
  -m models.llama4.scripts.chat_completion \
  --quantization-mode int4_mixed \
  --checkpoint-dir /path/to/checkpoints

This reduces GPU memory requirements while maintaining reasonable performance.

davehague/meta-llama-in-ollama-librechat.md