Skip to content

Instantly share code, notes, and snippets.

@wangii
wangii / macOS Internals.md
Created May 5, 2023 20:24 — forked from kconner/macOS Internals.md
macOS Internals

macOS Internals

Understand your Mac and iPhone more deeply by tracing the evolution of Mac OS X from prelease to Swift. John Siracusa delivers the details.

Starting Points

How to use this gist

You've got two main options:

@wangii
wangii / llama-home.md
Created May 14, 2023 13:29 — forked from rain-1/llama-home.md
How to run Llama 13B with a 6GB graphics card

This worked on 14/May/23. The instructions will probably require updating in the future.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Get llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.
  • Use the link at the bottom of the page to apply for research access to the llama model: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
  • Set up a micromamba environment to install cuda/python pytorch stuff in order to run the conversion scripts. Install some packages:
    • micromamba install -c conda-forge -n mymamba pytorch transformers sentencepiece
  • Perform the conversion process: (This will produce a file called `ggml-model