This is a full account of the steps I ran to get llama.cpp
running on the Nvidia Jetson Nano 2GB. It accumulates multiple different fixes and tutorials, whose contributions are referenced at the bottom of this README.
Remark 2025-01-21: This gist is from April 2024. The current version of llama.cpp should be able to compile on the Jetson Nano out of the box. Or you can directly run ollama on the Jetson nano, it just works. But the inference is only done on the CPU, the GPU is not utilized - and probably never will. See ollama issue 4140 regarding JetPack 4, CUDA 10.2 and gcc-11.
Note 2025-04-07: This gist does not work. The three changes to the Makefile let it compile in just 7 minutes, and the created main
and llama-bench
do work, just not with GPU acceleration. As soon as the parameter --n-gpu-layers 1
and the system crashes with GGML_ASSERT: ggml-cuda.cu:255: !"CUDA error"
. There