Skip to content

Instantly share code, notes, and snippets.

@CraftsMan-Labs
Created July 16, 2024 19:51
Show Gist options
  • Select an option

  • Save CraftsMan-Labs/a5c86c94d4e4ab0f14013a7a54e46b37 to your computer and use it in GitHub Desktop.

Select an option

Save CraftsMan-Labs/a5c86c94d4e4ab0f14013a7a54e46b37 to your computer and use it in GitHub Desktop.

Codestral Mamba by Mistral AI

codestralmamba

Screenshot 2024-07-17 at 1 13 00 AM

Here's a comprehensive guide to install and use the Mamba-Codestral model:

Installation

First, install the required packages:

!pip install mistral_inference>=1 mamba-ssm causal-conv1d
!pip install mamba_ssm causal-conv1d

If you encounter issues with causal-conv1d, try:

!CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install causal-conv1d

Authentication

Load your Hugging Face access token:

from huggingface_hub import login

hf_token = "your_access_token_here"
login(hf_token)

Download Model

Download the model snapshots:

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'mamba-codestral-7B-v0.1')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(
    repo_id="mistralai/mamba-codestral-7B-v0.1", 
    allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], 
    local_dir=mistral_models_path
)

Inference

To run inference, use the following command in your terminal:

mistral-chat $HOME/mistral_models/mamba-codestral-7B-v0.1 --instruct --max_tokens 512

About Mamba-Codestral

Mamba-Codestral is a 7B parameter model that combines the strengths of Mamba (a state space model) with Mistral's Mixture of Experts (MoE) architecture. Key features include:

  • Efficient training and inference
  • Strong performance on code-related tasks
  • Competitive with larger models on general language tasks

The model shows impressive results on various benchmarks:

  • Outperforms Llama 2 70B on HumanEval
  • Matches or exceeds CodeLlama 34B on many metrics
  • Competitive with larger models on general language tasks

Mamba-Codestral demonstrates that combining different model architectures can lead to powerful and efficient language models, potentially paving the way for more diverse and specialized AI systems1.

Usage Tips

  1. The model excels at code-related tasks, so it's particularly useful for programming assistance.
  2. While it's strong in coding, it also performs well on general language tasks, making it versatile.
  3. Experiment with different prompts and instructions to get the best results for your specific use case.
  4. Remember to respect the model's limitations and verify any generated code or information.

By leveraging the strengths of both Mamba and MoE architectures, Mamba-Codestral offers an efficient and powerful option for various natural language processing tasks, especially those related to code generation and understanding.

To learn more chck out: https://open.substack.com/pub/craftsmanlabs/p/codestral-mamba-the-new-game-changer?r=34r3tw&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment