Skip to content

Instantly share code, notes, and snippets.

@Ziggoto
Created July 15, 2024 22:55
Show Gist options
  • Save Ziggoto/f94a5f1b6f5c8e0807b5f1cbf2ac048f to your computer and use it in GitHub Desktop.
Save Ziggoto/f94a5f1b6f5c8e0807b5f1cbf2ac048f to your computer and use it in GitHub Desktop.

Backends:

Local

  • Transformers
  • Ollama
  • llama.cpp
  • ExLlamaV2
  • AutoGPTQ
  • AutoAWQ
  • TensorRT-LLM

docs about inference backends: https://www.bentoml.com/blog/benchmarking-llm-inference-backends

Cloud

Frontends:

  • oobabooga
  • Stable Diffusion web UI
  • SillyTavern
  • LM Studio
  • Axolatotl
  • GPT4all
  • Open WebUI
    • I've used this one
  • Enchanted
    • Mac native

Frameworks/Libs

High-level

  • Langchain (TS & Python)
  • LLamaindex (TS & Python)
  • ModelFusion (TS)
  • Haystack (Python)
    • Used by AWS, Nvidia, IBM, Intel
  • CrewAI (Python)
  • Transformers (Python)
    • Made by HuggingFace

Low-level

  • PyTorch
  • Tensorflow
  • JAX

Miscelaneous

Benchmarks

Youtube channels about AI:


About models

Models are usually saved on one of these formats:

  • GGUF
    • It's a sucessor of GGML
    • Tech doc about GGUF (from HuggingFace)
  • GGML
  • Safetensors
  • Exl2
  • AWQ

These files contains contexts used by the LLMs

1 tokens ~= 0.75 words

Quantization algorithms

  • Q4_0
  • Q4_1
  • Q5_0
  • Q5_1
  • Q8_0

K-means Quantizations

  • Q3_K_S
  • Q3_K_M
  • Q3_K_L
  • Q4_K_S
  • Q4_K_M
  • Q5_K_S
  • Q5_K_M
  • Q6_K
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment