Skip to content

Instantly share code, notes, and snippets.

@seddonm1
Last active September 16, 2024 22:13
Show Gist options
  • Save seddonm1/5927db05cb7ad38d98a22674fa82a4c6 to your computer and use it in GitHub Desktop.
Save seddonm1/5927db05cb7ad38d98a22674fa82a4c6 to your computer and use it in GitHub Desktop.
How to build onnxruntime on an aarch64 NVIDIA device (like Jetson Orin AGX)
On an Orin NX 16G the memory was too low to compile and the SWAP file had to be increased.
/etc/systemd/nvzramconfig.sh
change:
```
# Calculate memory to use for zram (1/2 of ram)
totalmem=`LC_ALL=C free | grep -e "^Mem:" | sed -e 's/^Mem: *//' -e 's/ *.*//'`
mem=$((("${totalmem}" / 2 / "${NRDEVICES}") * 1024))
```
to:
```
# Calculate memory to use for zram (size of ram)
totalmem=`LC_ALL=C free | grep -e "^Mem:" | sed -e 's/^Mem: *//' -e 's/ *.*//'`
mem=$((("${totalmem}" / "${NRDEVICES}") * 1024))
```
docker run \
--rm \
-it \
-e ONNXRUNTIME_REPO=https://github.com/microsoft/onnxruntime \
-e ONNXRUNTIME_COMMIT=v1.17.0 \
-e BUILD_CONFIG=Release \
-e CMAKE_VERSION=3.28.3 \
-e CPU_ARCHITECTURE=$(uname -m) \
-v /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra:ro \
-v $(pwd):/output \
-w /tmp \
nvcr.io/nvidia/deepstream:6.4-triton-multiarch \
/bin/bash -c "
# set up cmake
apt remove -y cmake &&\
rm -rf /usr/local/bin/cmake &&\
apt update &&\
apt install -y wget &&\
rm -rf /tmp/cmake &&\
mkdir /tmp/cmake &&\
wget https://github.com/Kitware/CMake/releases/download/v\${CMAKE_VERSION}/cmake-\${CMAKE_VERSION}-linux-\${CPU_ARCHITECTURE}.tar.gz &&\
tar zxf cmake-\${CMAKE_VERSION}-linux-\${CPU_ARCHITECTURE}.tar.gz --strip-components=1 -C /tmp/cmake &&\
export PATH=\$PATH:/tmp/cmake/bin &&\
# clone onnxruntime repository and build
apt-get install -y patch &&\
git clone \${ONNXRUNTIME_REPO} onnxruntime &&\
cd onnxruntime &&\
git checkout \${ONNXRUNTIME_COMMIT} &&\
/bin/sh build.sh \
--parallel \
--build_shared_lib \
--allow_running_as_root \
--compile_no_warning_as_error \
--cuda_home /usr/local/cuda \
--cudnn_home /usr/lib/\${CPU_ARCHITECTURE}-linux-gnu/ \
--use_tensorrt \
--tensorrt_home /usr/lib/\${CPU_ARCHITECTURE}-linux-gnu/ \
--config \${BUILD_CONFIG} \
--skip_tests \
--cmake_extra_defines 'onnxruntime_BUILD_UNIT_TESTS=OFF' &&\
# package and copy to output
export ONNXRUNTIME_VERSION=\$(cat /tmp/onnxruntime/VERSION_NUMBER) &&\
rm -rf /tmp/onnxruntime/build/onnxruntime-linux-\${CPU_ARCHITECTURE}-gpu-\${ONNXRUNTIME_VERSION} &&\
BINARY_DIR=build \
ARTIFACT_NAME=onnxruntime-linux-\${CPU_ARCHITECTURE}-gpu-\${ONNXRUNTIME_VERSION} \
LIB_NAME=libonnxruntime.so \
BUILD_CONFIG=Linux/\${BUILD_CONFIG} \
SOURCE_DIR=/tmp/onnxruntime \
COMMIT_ID=\$(git rev-parse HEAD) \
tools/ci_build/github/linux/copy_strip_binary.sh &&\
cd /tmp/onnxruntime/build/onnxruntime-linux-\${CPU_ARCHITECTURE}-gpu-\${ONNXRUNTIME_VERSION}/lib/ &&\
ln -s libonnxruntime.so libonnxruntime.so.\${ONNXRUNTIME_VERSION} &&\
cp -r /tmp/onnxruntime/build/onnxruntime-linux-\${CPU_ARCHITECTURE}-gpu-\${ONNXRUNTIME_VERSION} /output
"
@seddonm1
Copy link
Author

@adhilcolab can you tell me what is going wrong? I ran this on a Jetson Orin AGX very recently and it worked.

@ykawa2
Copy link

ykawa2 commented Jan 11, 2024

@seddonm1 Thanks for sharing. I created another docker version here:
https://github.com/ykawa2/onnxruntime-gpu-for-jetson

Shared created binary as Releases and worked in Jetson Orin AGX

@shehrozshafiqkh
Copy link

@ykawa2, thank you for your assistance! I successfully built ONNXRuntime-gpu with TensorRT using ONNXRUNTIME_COMMIT=v1.14.1, and everything went smoothly. I obtained the wheel file and installed it on my system. However, I noticed that the first inference after loading the model takes a significant amount of time, but subsequent inferences perform well. Have you encountered similar performance issues?

@shehrozshafiqkh
Copy link

@ykawa2, thank you for your assistance! I successfully built ONNXRuntime-gpu with TensorRT using ONNXRUNTIME_COMMIT=v1.14.1, and everything went smoothly. I obtained the wheel file and installed it on my system. However, I noticed that the first inference after loading the model takes a significant amount of time, but subsequent inferences perform well. Have you encountered similar performance issues?

@seddonm1 also if you could help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment