Skip to content

Instantly share code, notes, and snippets.

@GOROman
Last active January 15, 2025 02:47
Show Gist options
  • Save GOROman/8947f32074df2370ea8c4b5877e9632b to your computer and use it in GitHub Desktop.
Save GOROman/8947f32074df2370ea8c4b5877e9632b to your computer and use it in GitHub Desktop.
MLX + MLX_VLM + Qwen2-VL-2B-Instruct-4bit で画像をVLMで解説してもらう
# /// script
# requires-python = "==3.12"
# dependencies = ["mlx==0.21.0", "mlx_vlm"]
# ///
import mlx.core as mx
import numpy as np
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load the model
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)
# Prepare input
image = ["yellow-hage.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(image)
)
# Generate output
output = generate(model, processor, formatted_prompt, image, verbose=True, dtype=np.float32)
print(output)
@GOROman
Copy link
Author

GOROman commented Jan 13, 2025

サンプル画像 yellow-hage.jpg

yellow-hage

@GOROman
Copy link
Author

GOROman commented Jan 13, 2025

インストール手順(Mac)

  • pip じゃなくて uv が便利
    brew install uv
    - Python 3.12仮想環境を作る (3.13 だと scipy で死ぬ)
    uv venv --python=python3.12
    source .venv/bin/activate
    - 色々入れる
    uv pip install mlx-vlm

  • 実行
    python mlx_vlm_test.py
    uv run mlx_vlm_test.py

トラブルシューティング

  • エラーが出る( mlx==0.22の場合 )
  File "/Users/goroman/.cache/uv/archive-v0/yIh1GVWHVj2i0Qmm1xedN/lib/python3.12/site-packages/mlx_vlm/models/qwen2_vl/vision.py", line 91, in __call__
    seq = mx.arange(seqlen, dtype=inv_freq.dtype)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: arange(): incompatible function arguments. The following argument types are supported:
    1. arange(start : Union[int, float], stop : Union[int, float], step : Union[None, int, float], dtype: Optional[Dtype] = None, *, stream: Union[None, Stream, Device] = None) -> array
    2. arange(stop : Union[int, float], step : Union[None, int, float] = None, dtype: Optional[Dtype] = None, *, stream: Union[None, Stream, Device] = None) -> array

Invoked with types: mlx.core.array, kwargs = { dtype: mlx.core.Dtype }

https://github.com/Blaizzy/mlx-vlm/pull/179/files
↑ここ直す(testはやらなくてOK)

.venv/lib/python3.12/site-packages/mlx_vlm/models/qwen2_vl/vision.py にあるファイルだけ修正。

@GOROman
Copy link
Author

GOROman commented Jan 13, 2025

どうにもダメな場合は、VSCode に Cline 入れて、ターミナルでエラー出して勝手に直してもらうと良い。
https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev

@GOROman
Copy link
Author

GOROman commented Jan 13, 2025

それでもダメな場合は、諦めて来世に期待だ。

@GOROman
Copy link
Author

GOROman commented Jan 13, 2025

@kinneko
Copy link

kinneko commented Jan 14, 2025

繰り返して呼ぶと中国語になりがち。
プロンプトで強制したら日本語吐けたです。

prompt = "この画像を詳細に説明してください。日本語で応答してください。"

@GOROman
Copy link
Author

GOROman commented Jan 15, 2025

# /// script
# requires-python = "==3.12"
# dependencies = ["mlx_vlm"]
# ///

を冒頭に追加したので、uv run mlx_vlm_test.py でいける。

@GOROman
Copy link
Author

GOROman commented Jan 15, 2025

640x640 リサイズ版(速くなる)

yellow-hage.jpg:
yellow-hage.jpg

@GOROman
Copy link
Author

GOROman commented Jan 15, 2025

dependencies = ["mlx==0.21.0", "mlx_vlm"]

だとOK。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment