Status: 🔴 UNRESOLVED - Root cause identified, awaiting fix
Last Updated: January 2026
Hardware: Radxa Dragon Q6A (8GB RAM)
OS: Ubuntu 24.04 (RadxaOS)
Kernel: 6.18.2-1-qcom
- Objective
- Hardware Overview
- The Journey
- Solved Issues
- Root Cause Analysis
- Current Status
- Diagnostic Commands
- Workarounds Attempted
- Community Resources
- Contributing
Goal: Run Llama 3.2 1B (and other AI models) on the Radxa Dragon Q6A using the onboard Qualcomm Hexagon NPU for hardware-accelerated inference.
Expected Outcome: Achieve ~12 tokens/second inference using the 12 TOPS NPU, as demonstrated in Radxa's official documentation.
Actual Outcome: Error 14001 - "Failed to create device" when attempting to initialize the NPU via QNN/QAIRT SDK.
| Component | Specification |
|---|---|
| SoC | Qualcomm QCS6490 |
| CPU | Octa-core Kryo 670 (1×2.7GHz + 3×2.4GHz + 4×1.9GHz) |
| GPU | Adreno 643 |
| NPU | Hexagon 770 DSP with Tensor Accelerator |
| AI Performance | Up to 12 TOPS (INT8) |
| RAM | 8GB LPDDR5 |
| DSP Architecture | Hexagon V68 |
┌─────────────────────────────────────────────────────────────┐
│ User Application │
├─────────────────────────────────────────────────────────────┤
│ QAIRT SDK (QNN / SNPE / Genie) │
├─────────────────────────────────────────────────────────────┤
│ libQnnHtp.so │ libGenie.so │ libQnnHtpV68Skel.so │
├─────────────────────────────────────────────────────────────┤
│ FastRPC (libcdsprpc.so / libxdsprpc.so) │
├─────────────────────────────────────────────────────────────┤
│ cdsprpcd daemon │
├─────────────────────────────────────────────────────────────┤
│ Kernel: qcom_fastrpc driver │
├─────────────────────────────────────────────────────────────┤
│ /dev/fastrpc-cdsp │ /dev/fastrpc-adsp │
├─────────────────────────────────────────────────────────────┤
│ Hexagon DSP Firmware (cdsp.mbn / adsp.mbn) │
└─────────────────────────────────────────────────────────────┘
- Installed RadxaOS (Ubuntu 24.04 Noble) on the Dragon Q6A
- Set up Docker for containerized AI development
- Downloaded QAIRT SDK container from Radxa's Docker Hub:
docker pull radxazifeng278/qairt-npu-9075:v1.1
Downloaded the pre-compiled Llama 3.2 1B model from ModelScope:
pip3 install modelscope --break-system-packages
modelscope download --model radxa/Llama3.2-1B-1024-qairt-v68 --local ./Llama3.2-1B-1024-qairt-v68Attempted inference:
cd Llama3.2-1B-1024-qairt-v68
export LD_LIBRARY_PATH=$(pwd)
./genie-t2t-run -c ./htp-model-config-llama32-1b-gqa.json -p '<prompt>'Result:
[ERROR] "Failed to create device: 14001"
[ERROR] "Device Creation failure"
Failure to initialize model.
The error 14001 indicates QNN HTP backend failed to create a device context for the Hexagon DSP. This began a systematic investigation through the entire stack.
Symptom: QNN couldn't find FastRPC transport libraries.
Solution: Install the official Qualcomm FastRPC packages:
sudo apt update
sudo apt install qcom-fastrpc1 libcdsprpc1This installs:
/usr/lib/aarch64-linux-gnu/libcdsprpc.so.1/usr/lib/aarch64-linux-gnu/libxdsprpc.so/usr/bin/cdsprpcd- Systemd services:
cdsprpcd.service,adsprpcd.service
Symptom: FastRPC transport layer couldn't communicate with DSP.
Solution: Enable and start the CDSP RPC daemon:
sudo systemctl enable cdsprpcd.service
sudo systemctl start cdsprpcd.service
sudo systemctl status cdsprpcd.serviceSymptom: DSP couldn't load the HTP skeleton library.
Solution: Copy skeleton libraries to DSP search paths:
# Create DSP library directories
sudo mkdir -p /usr/lib/dsp/cdsp /usr/lib/dsp/adsp /dsp
# Copy skeleton library (from QAIRT SDK or downloaded model)
sudo cp libQnnHtpV68Skel.so /usr/lib/dsp/cdsp/
sudo cp libQnnHtpV68Skel.so /usr/lib/dsp/adsp/
sudo cp libQnnHtpV68Skel.so /dsp/
# Set permissions
sudo chmod 644 /usr/lib/dsp/cdsp/libQnnHtpV68Skel.soSet the environment variable:
export ADSP_LIBRARY_PATH="/usr/lib/dsp/cdsp:/usr/lib/dsp/adsp:/dsp"Symptom: Permission denied accessing /dev/fastrpc-* devices.
Solution: Add user to the render group:
sudo usermod -aG render $USER
# Log out and back in, or:
newgrp renderVerify permissions:
ls -la /dev/fastrpc*
# Should show: crw-rw-r--+ ... fastrpc:renderSymptom: NPU not accessible from Docker containers.
Solution: Run Docker with full device access:
docker run --privileged -it \
-v /dev:/dev \
-v $(pwd):/workspace \
radxazifeng278/qairt-npu:v1.0 /bin/bashNote: The -9075 variant container is for QCS9075, not QCS6490. Use the base qairt-npu:v1.0 image for Dragon Q6A.
After resolving all userspace issues, the kernel logs reveal the actual problem:
sudo dmesg | grep -i fastrpcOutput:
[ 5.544594] qcom,fastrpc a300000.remoteproc:glink-edge.fastrpcglink-apps-dsp.-1.-1: no reserved DMA memory for FASTRPC
[ 5.566430] qcom,fastrpc 3700000.remoteproc:glink-edge.fastrpcglink-apps-dsp.-1.-1: no reserved DMA memory for FASTRPC
The Linux kernel's qcom_fastrpc driver requires a reserved DMA memory region defined in the device tree. This memory region is used for:
- Shared buffers between CPU and DSP
- DMA transfers for model weights and activations
- Communication buffers for RPC calls
The current device tree does define reserved memory for the DSPs:
sudo dmesg | grep "reserved mem"[ 0.000000] OF: reserved mem: 0x0000000081800000..0x00000000835fffff (30720 KiB) nomap non-reusable cdsp-secure-heap@81800000
[ 0.000000] OF: reserved mem: 0x0000000084800000..0x0000000086ffffff (40960 KiB) nomap non-reusable adsp@84800000
[ 0.000000] OF: reserved mem: 0x0000000087000000..0x0000000088dfffff (30720 KiB) nomap non-reusable cdsp@87000000
However, the fastrpc node is missing its own memory-region property that would link it to a DMA pool.
The fastrpc node needs a memory-region property pointing to a shared-dma-pool region. Example from working configurations:
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
fastrpc_mem: fastrpc@0 {
compatible = "shared-dma-pool";
reg = <0x0 0x89000000 0x0 0x2000000>; /* 32MB */
no-map;
};
};
/* In the fastrpc node: */
fastrpc {
compatible = "qcom,fastrpc";
memory-region = <&fastrpc_mem>;
/* ... */
};
Current CMA (Contiguous Memory Allocator) allocation:
cat /proc/meminfo | grep -i cmaCmaTotal: 32768 kB
CmaFree: 24568 kB
The 32MB CMA pool exists but isn't being used by fastrpc due to the missing device tree configuration.
| Component | Status | Verification |
|---|---|---|
| DSP Firmware | ✅ Loaded | cat /sys/class/remoteproc/remoteproc*/state → "running" |
| Remoteproc | ✅ Running | Both ADSP (remoteproc0) and CDSP (remoteproc1) active |
| Device Nodes | ✅ Present | /dev/fastrpc-cdsp, /dev/fastrpc-adsp exist |
| Permissions | ✅ Correct | User in render group, ACLs configured |
| FastRPC Libraries | ✅ Installed | libcdsprpc.so.1, libxdsprpc.so present |
| cdsprpcd Service | ✅ Running | Systemd service active |
| Skeleton Libraries | ✅ Deployed | libQnnHtpV68Skel.so in DSP paths |
| Component | Status | Error |
|---|---|---|
| FastRPC DMA | ❌ No Memory | "no reserved DMA memory for FASTRPC" |
| QNN Device Creation | ❌ Error 14001 | "Failed to create device" |
| NPU Inference | ❌ Blocked | Cannot establish CPU↔DSP communication |
Error 14001 (QNN)
└─→ Error 4000 (Transport)
└─→ Error 1002 (DspTransport.openSession)
└─→ Error 0x80000600 (AEE_EUNSUPPORTED)
└─→ Kernel: "no reserved DMA memory for FASTRPC"
#!/bin/bash
echo "=== Radxa Dragon Q6A NPU Diagnostic ==="
echo -e "\n--- System Info ---"
uname -a
cat /etc/os-release | grep PRETTY_NAME
echo -e "\n--- Kernel Command Line ---"
cat /proc/cmdline
echo -e "\n--- Remoteproc Status ---"
for rproc in /sys/class/remoteproc/remoteproc*; do
name=$(cat $rproc/name 2>/dev/null || echo "unknown")
state=$(cat $rproc/state 2>/dev/null || echo "unknown")
echo " $(basename $rproc): $name = $state"
done
echo -e "\n--- FastRPC Devices ---"
ls -la /dev/fastrpc* 2>/dev/null || echo " No fastrpc devices found!"
echo -e "\n--- DSP Reserved Memory ---"
sudo dmesg | grep -i "reserved mem" | grep -E "(cdsp|adsp|fastrpc)"
echo -e "\n--- FastRPC Kernel Messages ---"
sudo dmesg | grep -i fastrpc
echo -e "\n--- CMA Memory ---"
cat /proc/meminfo | grep -i cma
echo -e "\n--- cdsprpcd Service ---"
systemctl status cdsprpcd.service --no-pager 2>/dev/null || echo " Service not found"
echo -e "\n--- FastRPC Libraries ---"
ldconfig -p | grep -E "(cdsprpc|xdsprpc|fastrpc)"
echo -e "\n--- DSP Skeleton Libraries ---"
ls -la /usr/lib/dsp/cdsp/ /usr/lib/dsp/adsp/ /dsp/ 2>/dev/null
echo -e "\n--- User Groups ---"
groups
echo -e "\n--- DSP Firmware ---"
ls -la /lib/firmware/qcom/qcs6490/radxa/dragon-q6a/*.mbn 2>/dev/null# Check remoteproc status
cat /sys/class/remoteproc/remoteproc0/state # ADSP
cat /sys/class/remoteproc/remoteproc1/state # CDSP
# Check firmware versions
cat /sys/class/remoteproc/remoteproc0/firmware
cat /sys/class/remoteproc/remoteproc1/firmware
# Check fastrpc kernel module
lsmod | grep fastrpc
cat /boot/config-$(uname -r) | grep FASTRPC
# Check device tree for fastrpc
ls /sys/firmware/devicetree/base/ | grep -i fastrpcAttempted to increase CMA via kernel parameter:
# In boot config, add:
cma=256MResult: CMA size increased but fastrpc still reports "no reserved DMA memory" because it needs a dedicated memory-region property, not just a larger CMA pool.
Tried multiple Docker images:
radxazifeng278/qairt-npu-9075:v1.1(wrong SoC)radxazifeng278/qairt-npu:v1.0(correct but same kernel issue)
Result: Same error - the issue is at the kernel/device-tree level, not containerization.
Attempted to manually configure memory regions via kernel parameters.
Result: Fastrpc driver specifically looks for memory-region phandle in device tree, cannot be overridden at runtime.
Checked available device tree overlays via rsetup:
ls /boot/dtbo/*.dtbo* | grep -i -E "(npu|cdsp|fastrpc)"Result: No NPU/FastRPC-specific overlays available for Dragon Q6A.
- Radxa Forum - Dragon Q6A
- Radxa Discord -
#dragon-q6achannel - GitHub Issues
- FastRPC DMA Pool Support - Patch series for FastRPC reserved memory
- Qualcomm FastRPC DT Bindings
-
If you have a working Dragon Q6A NPU setup, please share:
- Your kernel version (
uname -a) - Your system image version
- Output of diagnostic commands above
- Your
/proc/device-treefastrpc node contents
- Your kernel version (
-
If you're a kernel/DT developer:
- Help identify the correct device tree fix
- Submit patches to Radxa's kernel repository
-
Report to Radxa:
- File an issue at: https://github.com/radxa-build/radxa-dragon-q6a/issues
- Include the diagnostic output and link to this document
**Title**: NPU Error 14001 - "no reserved DMA memory for FASTRPC"
**Hardware**: Radxa Dragon Q6A (8GB)
**OS**: RadxaOS Ubuntu 24.04
**Kernel**: 6.18.2-1-qcom
**Problem**:
NPU inference fails with error 14001 due to missing DMA memory reservation
for the fastrpc driver.
**Kernel Error**:qcom,fastrpc a300000.remoteproc:glink-edge.fastrpcglink-apps-dsp.-1.-1: no reserved DMA memory for FASTRPC
**Expected**:
The fastrpc device tree node should have a `memory-region` property
pointing to a `shared-dma-pool` reserved memory region.
**Reference**:
https://gist.github.com/[YOUR_GIST_ID]
OS: Ubuntu 24.04.3 LTS (Noble)
Kernel: 6.18.2-1-qcom aarch64
Boot: systemd-boot (embloader)
Firmware: QCM6490.LE.1.0-00376-STD.PROD-1
CDSP Firmware: CDSP.HT.2.5.c3-00103-KODIAK-1
ADSP Firmware: ADSP.HT.5.5.c8-00217-KODIAK-1
| Purpose | Path |
|---|---|
| DSP Firmware | /lib/firmware/qcom/qcs6490/radxa/dragon-q6a/ |
| FastRPC Libraries | /usr/lib/aarch64-linux-gnu/ |
| DSP Skeleton Libs | /usr/lib/dsp/cdsp/, /dsp/ |
| Boot Config | /boot/efi/loader/entries/RadxaOS-*.conf |
| Device Tree Overlays | /boot/dtbo/ |
| Date | Change |
|---|---|
| 2026-01 | Initial document - Root cause identified |
This document is released under CC BY-SA 4.0. Please share improvements back to the community.
If this document helped you or you found a solution, please contribute back!