vanbasten23’s gists

vanbasten23 / gist:75c27f47d7c3ddcc263e371cb7c4a2c8

Created August 5, 2025 23:21

	1. After we load the original model `vllm_model = vllm_get_model(vllm_config=vllm_config_for_load)`, vllm_model looks like

	for idx, m in enumerate(vllm_model.named_modules()):
	print(idx, '->', m)

	https://gist.github.com/vanbasten23/56a5cf844c0a527453a37af36efd3193

	2. After replace the layer with LoRA layers (via `load_lora_model`), the model looks like

	for idx, m in enumerate(vllm_model.named_modules()):

vanbasten23 / gist:d5b1f15645f3d358b4224927f1bdcbce

Created August 5, 2025 23:01

	0 -> ('', _VllmRunner(
	(vllm_model): Qwen2ForCausalLM(
	(model): Qwen2Model(
	(embed_tokens): VocabParallelEmbedding(num_embeddings=151936, embedding_dim=2048, org_vocab_size=151936, num_embeddings_padded=151936, tp_size=1)
	(layers): ModuleList(
	(0-35): 36 x Qwen2DecoderLayer(
	(self_attn): Qwen2Attention(
	(qkv_proj): MergedQKVParallelLinearWithLoRA(
	(base_layer): JaxQKVParallelLinear()
	)

vanbasten23 / gist:dab6a90283c905882647e8aa5d0b9ca1

Created August 5, 2025 22:57

	0 -> ('', _VllmRunner(
	(vllm_model): Qwen2ForCausalLM(
	(model): Qwen2Model(
	(embed_tokens): VocabParallelEmbedding(num_embeddings=151936, embedding_dim=2048, org_vocab_size=151936, num_embeddings_padded=151936, tp_size=1)
	(layers): ModuleList(
	(0-35): 36 x Qwen2DecoderLayer(
	(self_attn): Qwen2Attention(
	(qkv_proj): MergedQKVParallelLinearWithLoRA(
	(base_layer): QKVParallelLinear(in_features=2048, output_features=2560, bias=True, tp_size=1, gather_output=False)
	)

vanbasten23 / gist:fc5ab730ea88d60605057b903a0570ea

Created August 5, 2025 22:54

	0 -> ('', Qwen2ForCausalLM(
	(model): Qwen2Model(
	(embed_tokens): VocabParallelEmbedding(num_embeddings=151936, embedding_dim=2048, org_vocab_size=151936, num_embeddings_padded=151936, tp_size=1)
	(layers): ModuleList(
	(0-35): 36 x Qwen2DecoderLayer(
	(self_attn): Qwen2Attention(
	(qkv_proj): MergedQKVParallelLinearWithLoRA(
	(base_layer): QKVParallelLinear(in_features=2048, output_features=2560, bias=True, tp_size=1, gather_output=False)
	)
	(o_proj): RowParallelLinearWithLoRA(

vanbasten23 / gist:56a5cf844c0a527453a37af36efd3193

Created August 5, 2025 22:51

	0 -> ('', Qwen2ForCausalLM(
	(model): Qwen2Model(
	(embed_tokens): VocabParallelEmbedding(num_embeddings=151936, embedding_dim=2048, org_vocab_size=151936, num_embeddings_padded=151936, tp_size=1)
	(layers): ModuleList(
	(0-35): 36 x Qwen2DecoderLayer(
	(self_attn): Qwen2Attention(
	(qkv_proj): QKVParallelLinear(in_features=2048, output_features=2560, bias=True, tp_size=1, gather_output=False)
	(o_proj): RowParallelLinear(input_features=2048, output_features=2048, bias=False, tp_size=1, reduce_results=True)
	(rotary_emb): RotaryEmbedding(head_size=128, rotary_dim=128, max_position_embeddings=32768, base=1000000.0, is_neox_style=True)
	(attn): Attention(head_size=128, num_heads=16, num_kv_heads=2, scale=0.08838834764831845, backend=PallasAttentionBackendImpl)

vanbasten23 / gist:1fde8518a35279cce0879d59f608fbcf

Created August 5, 2025 17:18

	WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.
	INFO 08-05 17:06:45 [__init__.py:241] Automatically detected platform tpu.
	INFO 08-05 17:06:45 [tpu.py:202] tpu_commons not found, using vLLM's TpuPlatform
	INFO 08-05 17:06:46 [utils.py:326] non-default args: {'model': 'Qwen/Qwen2-1.5B-Instruct', 'max_model_len': 128, 'max_num_batched_tokens': 64, 'max_num_seqs': 4, 'disable_log_stats': True}
	INFO 08-05 17:06:52 [config.py:726] Resolved architecture: Qwen2ForCausalLM
	INFO 08-05 17:06:52 [config.py:1759] Using max model len 128
	INFO 08-05 17:06:52 [config.py:2588] Chunked prefill is enabled with max_num_batched_tokens=64.
	INFO 08-05 17:06:52 [tpu.py:112] [TPU] Forcing DYNAMO_ONCE compilation level
	[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [core.py:619] Waiting for init message from front-end.
	[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [core.py:71] Initializing a V1 LLM engine (v0.8.5.dev2456+g309c1bb82) with config: model='Qwen/Qwen2-1.5B-Instruct', speculati

vanbasten23 / gist:840a9088a0726d96b1bf0e1e9760b327

Created July 17, 2025 20:57

	/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/jax/_src/cloud_tpu_init.py:84: UserWarning: Transparent hugepages are not enabled. TPU runtime startup and shutdown time should be significantly improved on TPU v5e and newer. If not already set, you may need to enable transparent hugepages in your VM image (sudo sh -c "echo always > /sys/kernel/mm/transparent_hugepage/enabled")
	warnings.warn(
	INFO 07-17 20:38:09 [__init__.py:244] Automatically detected platform tpu.
	/mnt/disks/persist/vllm/vllm/platforms/tpu.py:202: UserWarning: 🚨 CAUTION: You are using 'tpu_commons' , which is experimental and NOT intended for production use yet. Please see the README for more details.
	from tpu_commons.platforms import TpuPlatform as TpuCommonsPlatform
	Running uLLM without Pathways. Module pathwaysutils is not imported.
	INFO 07-17 20:38:23 [config.py:1467] Using max model len 1024
	INFO 07-17 20:38:23 [config.py:2267] Chunked prefill is enabled with max_num_batched_tokens=8192.
	INFO 07-17 20:38:23 [tpu_jax.p

vanbasten23 / gist:57d13ba877bd5bf0976a5e77b69a4530

Created July 17, 2025 18:27

	{
	// Use IntelliSense to learn about possible attributes.
	// Hover to view descriptions of existing attributes.
	// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
	"version": "0.2.0",
	"configurations": [
	{
	"name": "vllm",
	"type": "debugpy",
	"request": "launch",

vanbasten23 / gist:f41b43837be68742a2ca265bc62c49c5

Created July 16, 2025 22:33

This file has been truncated, but you can view the full file.

	WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.
	INFO 07-16 22:24:48 [__init__.py:253] Automatically detected platform tpu.
	INFO 07-16 22:24:48 [tpu.py:196] tpu_commons not found, using vLLM's TpuPlatform
	/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
	The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

	warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
	============================= test session starts ==============================
	platform linux -- Python 3.12.11, pytest-8.3.3, pluggy-1.5.0 -- /h

vanbasten23 / gist:faace1695e7b60f6cbe633407f9bad59

Created July 15, 2025 21:24

	{
	// Use IntelliSense to learn about possible attributes.
	// Hover to view descriptions of existing attributes.
	// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
	"version": "0.2.0",
	"configurations": [
	{
	"name": "vllm",
	"type": "debugpy",
	"request": "launch",

XiongfeiWei vanbasten23