UI-TARS Setup Guide

Conda setting

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
~/miniconda3/bin/conda init
source ~/.bashrc
rm Miniconda3-latest-Linux-x86_64.sh

vLLM Setup


conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

# Create and activate conda environment
conda create -n vlm python=3.10 -y
conda activate vlm

pip install transformers
# 「image processor不整合」が出るなら固定を推奨。
#pip install "transformers==4.49.0"

# Set version variables
VLLM_VERSION=0.6.6
CUDA_VERSION=cu126  # Adjust this if your CUDA version is different or if you are using CPU-only.

# Install vLLM
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/${CUDA_VERSION}
pip install vllm==${VLLM_VERSION}

### Download and Run UI-TARS Model

# NOTE: VRAM 8GB だと max_model_len を上げすぎると KV cache が足りず起動失敗することがある
#       (ValueError: max seq len > KV cache tokens)
#       まずは max_model_len=8192 で起動確認するのが安全。

# NOTE: UI-TARS は画像埋め込みが重いので、image=5 だと context を食い潰しやすい。
#       まずは image=1 で安定動作 → 必要なら増やす。

# NOTE(WSL): ZMQ/IPC絡みで起動が不安定な場合があるので、まずは --disable-frontend-multiprocessing 推奨。

# exec command I created for WSL2 on Windows
python -m vllm.entrypoints.openai.api_server \
  --served-model-name ui-tars \
  --model "bytedance-research/UI-TARS-2B-SFT" \
  --dtype half \
  --max-model-len 8192 \
  --limit-mm-per-prompt image=1 \
  --gpu-memory-utilization 0.90 \
  --disable-frontend-multiprocessing \
  --host 127.0.0.1 --port 8000
  
# Official exec command
# python -m vllm.entrypoints.openai.api_server \
#     --served-model-name ui-tars \
#     --model "bytedance-research/UI-TARS-2B-SFT" \
#     --limit-mm-per-prompt image=5 -tp 1

トラブルシューティング for vLLM Setup on Windows (※ WSL2 Ubuntuでしか動かない)

✅ 前提確認（WSL2で動かす）

WSL上で cat /etc/os-release が Ubuntu を示すこと
vLLMはWindowsネイティブ だと resource が無くて落ちるので、WSL(Ubuntu)で実行する

0. よくあるハマりポイント（今回つまづいた箇所まとめ）

(A) DNSが死んでいて pip / apt が全部失敗する

症状

pip install ... が以下のように失敗
- Temporary failure in name resolution
- Name or service not known
ping 1.1.1.1 は通るが ping pypi.org が失敗
cat /etc/resolv.conf が No such file or directory

原因

WSL内の /etc/resolv.conf が壊れている／存在しない
さらに /etc/resolv.conf が「存在しないファイルへのシンボリックリンク」(dangling symlink) になっていた
例：/etc/resolv.conf -> ../run/resolvconf/resolv.conf だがリンク先が無い

解決（dangling symlink を外して実ファイル化）

# 1) 現状確認（l で始まるならリンク）
ls -l /etc/resolv.conf

# 2) リンクを削除
sudo rm -f /etc/resolv.conf

# 3) resolv.conf を実ファイルとして作成
sudo tee /etc/resolv.conf >/dev/null <<'EOF'
nameserver 1.1.1.1
nameserver 8.8.8.8
options timeout:1 attempts:3
EOF

sudo chmod 644 /etc/resolv.conf

# 4) 確認（- で始まれば実ファイル）
ls -l /etc/resolv.conf
cat /etc/resolv.conf

# 5) DNS確認
getent hosts pypi.org
ping -c 1 pypi.org

WSL起動時の上書き防止（任意）

# resolv.conf をWSLに自動生成させない（手動管理にする）
sudo tee /etc/wsl.conf >/dev/null <<'EOF'
[network]
generateResolvConf=false
EOF

# Windows側PowerShellでWSL再起動（反映のため）
# wsl --shutdown

(B) conda create が Terms of Service (ToS) で止まる

症状

conda create -n vlm python=3.10 -y が以下で止まる CondaToSNonInteractiveError: Terms of Service have not been accepted...

解決案1：ToS同意して進める

conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

(C) vLLM起動で `Failed to infer device type`（WSL側でGPUが見えない）

症状

vLLM起動で以下のエラー
- RuntimeError: Failed to infer device type
WSL側で以下が成立する
- nvidia-smi が無い / 動かない
- /dev/nvidia* が無い
- /dev/dxg も無い（or 確認できていない）

原因（今回の決定打）

wsl -l -v を見ると Ubuntuが VERSION=1（= WSL1） だった
WSL1は GPU(CUDA)が使えない → vLLMがdeviceを自動判定できず落ちる

解決：Ubuntuを WSL2 に変換

Windows PowerShellで実行：

# 既定をWSL2に
wsl --set-default-version 2

# Ubuntu-20.04 をWSL2へ変換
wsl --set-version Ubuntu-20.04 2

# 再起動
wsl --shutdown

# 確認（VERSION が 2 になっていること）
wsl -l -v

vLLMの実行時のトラブルシューティング

（過去に出ていた）`AssertionError` / image processor 不整合（Qwen2VLImageProcessor 系）

状況: transformers の fast/slow image processor 周りで落ちることがある
今回の結果: Using a slow image processor ... のログになっており、起動成功したため一旦OK
メモ: もし再発するなら transformers のバージョン固定や、vLLM側の対応状況確認を検討

`curl http://127.0.0.1:8000/v1/models` が無反応 / `Connection refused` / `not listening`

原因: api_server が途中で落ちていて 8000 で LISTEN していない
確認:
- curl -Ssv http://127.0.0.1:8000/v1/models（-s だけだと失敗しても無音）
- ss -ltnp | grep ':8000' || echo "not listening"
対処: 落ちた“直後の末尾ログ”を見て根本原因を特定（今回=KV cache不足）

VRAMが小さいと起こる： `ValueError: max seq len (...) is larger than the maximum number of tokens that can be stored in KV cache (...)`

原因: VRAM 8GB で gpu_memory_utilization を 0.8 にすると、KV Cache に確保できるトークン数が少ない（例: 8688 tokens）。その上で --max-model-len 16384 などを指定すると KVに入り切らず起動拒否。
対処（必須）:
- --max-model-len を KV上限以下に下げる（今回の成功例: 8192）
- 併せて --gpu-memory-utilization を上げてKVを増やす（今回: 0.90）

画像入力関連の警告：`context length (...) is too short to hold the multi-modal embeddings in the worst case (image ... tokens)`

原因: 画像埋め込みが大きく、--limit-mm-per-prompt image=5 の最悪ケースでは max_model_len を超える可能性がある
対処（推奨）:
- まず --limit-mm-per-prompt image=1 で安定動作を確認
- 画像枚数を増やしたい場合は max_model_len とVRAMの余裕が必要（8GBだと厳しめ）

supertask/ui-tars-windows-instrudctions.md

Select an option

No results found

Select an option

No results found

UI-TARS Setup Guide

Conda setting

vLLM Setup

トラブルシューティング for vLLM Setup on Windows (※ WSL2 Ubuntuでしか動かない)

✅ 前提確認（WSL2で動かす）

0. よくあるハマりポイント（今回つまづいた箇所まとめ）

(A) DNSが死んでいて pip / apt が全部失敗する

症状

原因

解決（dangling symlink を外して実ファイル化）

WSL起動時の上書き防止（任意）

(B) conda create が Terms of Service (ToS) で止まる

症状

解決案1：ToS同意して進める

(C) vLLM起動で `Failed to infer device type`（WSL側でGPUが見えない）

症状

原因（今回の決定打）

解決：Ubuntuを WSL2 に変換

vLLMの実行時のトラブルシューティング

（過去に出ていた）`AssertionError` / image processor 不整合（Qwen2VLImageProcessor 系）

`curl http://127.0.0.1:8000/v1/models` が無反応 / `Connection refused` / `not listening`

VRAMが小さいと起こる： `ValueError: max seq len (...) is larger than the maximum number of tokens that can be stored in KV cache (...)`

画像入力関連の警告：`context length (...) is too short to hold the multi-modal embeddings in the worst case (image ... tokens)`

supertask commented Dec 17, 2025

Uh oh!

supertask/ui-tars-windows-instrudctions.md

UI-TARS Setup Guide

Conda setting

vLLM Setup

トラブルシューティング for vLLM Setup on Windows (※ WSL2 Ubuntuでしか動かない)

✅ 前提確認（WSL2で動かす）

0. よくあるハマりポイント（今回つまづいた箇所まとめ）

(A) DNSが死んでいて pip / apt が全部失敗する

症状

原因

解決（dangling symlink を外して実ファイル化）

WSL起動時の上書き防止（任意）

(B) conda create が Terms of Service (ToS) で止まる

症状

解決案1：ToS同意して進める

(C) vLLM起動で Failed to infer device type（WSL側でGPUが見えない）

症状

原因（今回の決定打）

解決：Ubuntuを WSL2 に変換

vLLMの実行時のトラブルシューティング

（過去に出ていた）AssertionError / image processor 不整合（Qwen2VLImageProcessor 系）

curl http://127.0.0.1:8000/v1/models が無反応 / Connection refused / not listening

VRAMが小さいと起こる： ValueError: max seq len (...) is larger than the maximum number of tokens that can be stored in KV cache (...)

画像入力関連の警告：context length (...) is too short to hold the multi-modal embeddings in the worst case (image ... tokens)

supertask commented Dec 17, 2025

Uh oh!

(C) vLLM起動で `Failed to infer device type`（WSL側でGPUが見えない）

（過去に出ていた）`AssertionError` / image processor 不整合（Qwen2VLImageProcessor 系）

`curl http://127.0.0.1:8000/v1/models` が無反応 / `Connection refused` / `not listening`

VRAMが小さいと起こる： `ValueError: max seq len (...) is larger than the maximum number of tokens that can be stored in KV cache (...)`

画像入力関連の警告：`context length (...) is too short to hold the multi-modal embeddings in the worst case (image ... tokens)`