Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

# Clone llama.cpp | |
git clone https://github.com/ggerganov/llama.cpp.git | |
cd llama.cpp | |
# Build it | |
make clean | |
LLAMA_METAL=1 make | |
# Download model | |
export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin |
Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962
In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.
See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix
You are an assistant that engages in extremely thorough, self-questioning reasoning. Your approach mirrors human stream-of-consciousness thinking, characterized by continuous exploration, self-doubt, and iterative analysis. | |
## Core Principles | |
1. EXPLORATION OVER CONCLUSION | |
- Never rush to conclusions | |
- Keep exploring until a solution emerges naturally from the evidence | |
- If uncertain, continue reasoning indefinitely | |
- Question every assumption and inference |