This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. After we load the original model `vllm_model = vllm_get_model(vllm_config=vllm_config_for_load)`, vllm_model looks like | |
for idx, m in enumerate(vllm_model.named_modules()): | |
print(idx, '->', m) | |
https://gist.github.com/vanbasten23/56a5cf844c0a527453a37af36efd3193 | |
2. After replace the layer with LoRA layers (via `load_lora_model`), the model looks like | |
for idx, m in enumerate(vllm_model.named_modules()): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0 -> ('', _VllmRunner( | |
(vllm_model): Qwen2ForCausalLM( | |
(model): Qwen2Model( | |
(embed_tokens): VocabParallelEmbedding(num_embeddings=151936, embedding_dim=2048, org_vocab_size=151936, num_embeddings_padded=151936, tp_size=1) | |
(layers): ModuleList( | |
(0-35): 36 x Qwen2DecoderLayer( | |
(self_attn): Qwen2Attention( | |
(qkv_proj): MergedQKVParallelLinearWithLoRA( | |
(base_layer): JaxQKVParallelLinear() | |
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0 -> ('', _VllmRunner( | |
(vllm_model): Qwen2ForCausalLM( | |
(model): Qwen2Model( | |
(embed_tokens): VocabParallelEmbedding(num_embeddings=151936, embedding_dim=2048, org_vocab_size=151936, num_embeddings_padded=151936, tp_size=1) | |
(layers): ModuleList( | |
(0-35): 36 x Qwen2DecoderLayer( | |
(self_attn): Qwen2Attention( | |
(qkv_proj): MergedQKVParallelLinearWithLoRA( | |
(base_layer): QKVParallelLinear(in_features=2048, output_features=2560, bias=True, tp_size=1, gather_output=False) | |
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0 -> ('', Qwen2ForCausalLM( | |
(model): Qwen2Model( | |
(embed_tokens): VocabParallelEmbedding(num_embeddings=151936, embedding_dim=2048, org_vocab_size=151936, num_embeddings_padded=151936, tp_size=1) | |
(layers): ModuleList( | |
(0-35): 36 x Qwen2DecoderLayer( | |
(self_attn): Qwen2Attention( | |
(qkv_proj): MergedQKVParallelLinearWithLoRA( | |
(base_layer): QKVParallelLinear(in_features=2048, output_features=2560, bias=True, tp_size=1, gather_output=False) | |
) | |
(o_proj): RowParallelLinearWithLoRA( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0 -> ('', Qwen2ForCausalLM( | |
(model): Qwen2Model( | |
(embed_tokens): VocabParallelEmbedding(num_embeddings=151936, embedding_dim=2048, org_vocab_size=151936, num_embeddings_padded=151936, tp_size=1) | |
(layers): ModuleList( | |
(0-35): 36 x Qwen2DecoderLayer( | |
(self_attn): Qwen2Attention( | |
(qkv_proj): QKVParallelLinear(in_features=2048, output_features=2560, bias=True, tp_size=1, gather_output=False) | |
(o_proj): RowParallelLinear(input_features=2048, output_features=2048, bias=False, tp_size=1, reduce_results=True) | |
(rotary_emb): RotaryEmbedding(head_size=128, rotary_dim=128, max_position_embeddings=32768, base=1000000.0, is_neox_style=True) | |
(attn): Attention(head_size=128, num_heads=16, num_kv_heads=2, scale=0.08838834764831845, backend=PallasAttentionBackendImpl) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU. | |
INFO 08-05 17:06:45 [__init__.py:241] Automatically detected platform tpu. | |
INFO 08-05 17:06:45 [tpu.py:202] tpu_commons not found, using vLLM's TpuPlatform | |
INFO 08-05 17:06:46 [utils.py:326] non-default args: {'model': 'Qwen/Qwen2-1.5B-Instruct', 'max_model_len': 128, 'max_num_batched_tokens': 64, 'max_num_seqs': 4, 'disable_log_stats': True} | |
INFO 08-05 17:06:52 [config.py:726] Resolved architecture: Qwen2ForCausalLM | |
INFO 08-05 17:06:52 [config.py:1759] Using max model len 128 | |
INFO 08-05 17:06:52 [config.py:2588] Chunked prefill is enabled with max_num_batched_tokens=64. | |
INFO 08-05 17:06:52 [tpu.py:112] [TPU] Forcing DYNAMO_ONCE compilation level | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [core.py:619] Waiting for init message from front-end. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [core.py:71] Initializing a V1 LLM engine (v0.8.5.dev2456+g309c1bb82) with config: model='Qwen/Qwen2-1.5B-Instruct', speculati |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/jax/_src/cloud_tpu_init.py:84: UserWarning: Transparent hugepages are not enabled. TPU runtime startup and shutdown time should be significantly improved on TPU v5e and newer. If not already set, you may need to enable transparent hugepages in your VM image (sudo sh -c "echo always > /sys/kernel/mm/transparent_hugepage/enabled") | |
warnings.warn( | |
INFO 07-17 20:38:09 [__init__.py:244] Automatically detected platform tpu. | |
/mnt/disks/persist/vllm/vllm/platforms/tpu.py:202: UserWarning: 🚨 CAUTION: You are using 'tpu_commons' , which is experimental and NOT intended for production use yet. Please see the README for more details. | |
from tpu_commons.platforms import TpuPlatform as TpuCommonsPlatform | |
Running uLLM without Pathways. Module pathwaysutils is not imported. | |
INFO 07-17 20:38:23 [config.py:1467] Using max model len 1024 | |
INFO 07-17 20:38:23 [config.py:2267] Chunked prefill is enabled with max_num_batched_tokens=8192. | |
INFO 07-17 20:38:23 [tpu_jax.p |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
// Use IntelliSense to learn about possible attributes. | |
// Hover to view descriptions of existing attributes. | |
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 | |
"version": "0.2.0", | |
"configurations": [ | |
{ | |
"name": "vllm", | |
"type": "debugpy", | |
"request": "launch", |
This file has been truncated, but you can view the full file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU. | |
INFO 07-16 22:24:48 [__init__.py:253] Automatically detected platform tpu. | |
INFO 07-16 22:24:48 [tpu.py:196] tpu_commons not found, using vLLM's TpuPlatform | |
/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. | |
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" | |
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) | |
============================= test session starts ============================== | |
platform linux -- Python 3.12.11, pytest-8.3.3, pluggy-1.5.0 -- /h |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
// Use IntelliSense to learn about possible attributes. | |
// Hover to view descriptions of existing attributes. | |
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 | |
"version": "0.2.0", | |
"configurations": [ | |
{ | |
"name": "vllm", | |
"type": "debugpy", | |
"request": "launch", |
NewerOlder