johnnynunez/Run SGLang Thor & Spark.md

Created April 10, 2026 08:30

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/johnnynunez/661c022d792bae6eba2dd5cb79a73d21.js"></script>
Save johnnynunez/661c022d792bae6eba2dd5cb79a73d21 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

Run SGLang Thor & Spark.md

Run SGLang Thor & Spark

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Create environment

uv venv .sglang --python 3.12
source .sglang/bin/activate
sudo apt install python3-dev python3.12-dev

Export variables

export TORCH_CUDA_ARCH_LIST=11.0a # 12.1 Spark, for Thor 11.0a
export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Install SGLang

uv pip install sgl-kernel --prerelease=allow --index-url https://docs.sglang.ai/whl/cu130/
uv pip install sglang --prerelease=allow 
uv pip install --upgrade flashinfer-python
uv pip install --force-reinstall torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu130

Clean memory

sudo sysctl -w vm.drop_caches=3

Run nemotron nvfp4

python3 -m sglang.launch_server \
  --model-path nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 \
  --trust-remote-code \
  --tp 1 \
  --attention-backend flashinfer \
  --tool-call-parser qwen3_coder \
  --reasoning-parser nano_v3 \
  --mem-fraction-static 0.6 \
  --cuda-graph-max-bs 16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment