A comprehensive log of attempting to self-host Llama 3.2 on the Qualcomm QCS6490 NPU.
To build a highly efficient, headless Edge AI server using the Radxa Dragon Q6A.
- Hardware: Radxa Dragon Q6A (Qualcomm QCS6490, 12 TOPS Hexagon NPU).
- Software Objective: Self-host open-weight LLMs (specifically Llama 3.2 1B/3B) using the Hexagon NPU (DSP) to offload the CPU.
- Constraint: Use a lightweight Linux OS with containerized services (Docker) for isolation.
The Issue: I initially sought the lightest OS (Armbian Minimal). The Reality: The Qualcomm QCS6490 relies heavily on proprietary drivers (QAIRT, Hexagon DSP firmware, FastRPC) that are difficult to set up on vanilla distributions. The Solution:
- Winner: Radxa OS (Ubuntu Server).
- Reason: It includes the necessary kernel drivers, firmware blobs (
/lib/firmware/qcom), and device tree configurations out-of-the-box.
The Error: When running the Qualcomm AI Runtime (genie-t2t-run), I hit:
[ERROR] "Failed to create device: 14001"
[ERROR] "Device Creation failure"
dmesg: "no reserved DMA memory for FASTRPC"
The Diagnosis:
- Missing Userspace Drivers: The kernel had the drivers (
/dev/fastrpc*existed), but the host OS was missing the userspace libraries (libcdsprpc1) and the daemon (cdsprpcd) needed to handshake with the NPU. - Kernel Mismatch: The system was running a bleeding-edge kernel (
6.18.x-qcomfromnoble-test) which lacked the correct DMA memory reservations for the NPU. The stable NPU support is currently on the 6.8.x vendor kernel.
The Partial Fix: Installing the missing libraries manually:
sudo apt install fastrpc libcdsprpc1 libadsprpc1
sudo systemctl enable --now fastrpc
The Issue:
Running the NPU workload inside Docker caused failures because the "Listener" daemon (cdsprpcd) was running on the Host, while the AI model files were inside the Container. The daemon (on host) tried to load files it couldn't see.
The Solution:
Run the container with --privileged and bind-mount the firmware and device nodes so the container has direct hardware access.
docker run -it --rm --privileged \
--device /dev/fastrpc-cdsp \
--device /dev/fastrpc-adsp \
--device /dev/fastrpc-cdsp-secure \
-v /lib/firmware:/lib/firmware:ro \
-v /usr/lib/firmware:/usr/lib/firmware:ro \
radxazifeng278/qairt-npu-9075:v1.1
The Catastrophe: While trying to downgrade from the broken 6.18 kernel (mainline/test) to the supported 6.8 kernel (vendor/stable) to fix the DMA memory error:
- The 6.8 kernel installation script failed partially (
error exit status 1in99-update-overlay). - I manually disabled the 6.18 bootloader entry to "force" the switch.
- Result: The board became unbootable ("Nuked OS") because the 6.8 kernel was not fully configured.
The Lesson:
Do not mix noble-test (experimental) kernels with production NPU requirements. Stick to the T4/T5 official images which use the 6.8 vendor kernel.
If I were to start over today, this is the exact path to success:
Use the Radxa OS T4 (or T5+) image. Do not use the generic Ubuntu/Armbian builds if you need NPU.
- Image:
radxa-dragon-q6a_noble_kde_t4.output_512.img.xz
On the first boot, the NPU drivers might be dormant. Install the specific meta-packages:
sudo apt update
sudo apt install -y task-qualcomm embloader sdboot-is-embloader
sudo reboot
Use the official QAIRT container with full device passthrough.
# 1. Download Model (on host)
mkdir -p ~/llama-data
# (Use modelscope or huggingface to pull Llama-3.2-1B-QCS6490 into this folder)
# 2. Run Container
docker run -it --rm --privileged \
--device /dev/fastrpc-cdsp \
--device /dev/fastrpc-adsp \
-v /lib/firmware:/lib/firmware:ro \
-v ~/llama-data:/data \
radxazifeng278/qairt-npu-9075:v1.1
Despite the progress, these issues remain for the community to solve:
- Boot Stalls on T4: After installing
task-qualcomm, the boot process hangs significantly onqcom-apm(Audio Policy Manager) timeouts and WiFi driver verbosity (AICWFDBG). The system eventually boots but feels fragile. - Kernel Fragmentation: The documentation references
noble-testrepos which install broken 6.18 kernels that break NPU DMA reservations. We need clarity on which kernel tag officially supports thefastrpcmemory map. - Permissions: The
cdsprpcddaemon often conflicts when running on Host vs Container. A standardizedudevrule or Docker Compose file for the Dragon Q6A would be a massive community contribution.