Matthias Kreier kreier

llama.cpp with CUDA support on a Jetson Nano

It is possible to compile and run a recent llama.cpp with gcc 8.5 and nvcc 10.2 (latest supported CUDA compiler from Nvidia for the 2019 Jetson Nano) that also supports the use of the GPU.

Prerequisites
Procedure - 5 minutes, plus 85 minutes for the compilation in the last step
Benchmark
Compile llama.cpp for CPU mode - 24 minutes
Install prerequisites
Choosing the right compiler

Setup Guide for `llama.cpp` on Nvidia Jetson Nano 4GB

This is a full account of the steps I ran to get llama.cpp running on the Nvidia Jetson Nano 2GB. It accumulates multiple different fixes and tutorials, whose contributions are referenced at the bottom of this README.

Remark 2025-01-21: This gist is from April 2024. The current version of llama.cpp should be able to compile on the Jetson Nano out of the box. Or you can directly run ollama on the Jetson nano, it just works. But the inference is only done on the CPU, the GPU is not utilized - and probably never will. See ollama issue 4140 regarding JetPack 4, CUDA 10.2 and gcc-11.

Note 2025-04-07: This gist does not work. The three changes to the Makefile let it compile in just 7 minutes, and the created main and llama-bench do work, just not with GPU acceleration. As soon as the parameter --n-gpu-layers 1 and the system crashes with GGML_ASSERT: ggml-cuda.cu:255: !"CUDA error". There

	// Most simple setup with just 8 bluetooth keys
	// Standard settings: A-up B-right C-down D-left E-fast F-select G-stop H-start autonomous - no extra keys

	#include <SoftwareSerial.h>
	#include <NewPing.h>
	#include <Servo.h>

	#define TRIGGER_PIN 2
	#define ECHO_PIN 3
	#define BUZZER_PIN 4

Matthias Kreier kreier

llama.cpp with CUDA support on a Jetson Nano

Setup Guide for llama.cpp on Nvidia Jetson Nano 4GB

Setup Guide for `llama.cpp` on Nvidia Jetson Nano 4GB