vanbasten23 · August 5, 2025 23:21
diff --git a/gistfile1.txt b/gistfile1.txt
 1. After we load the original model `vllm_model = vllm_get_model(vllm_config=vllm_config_for_load)`, vllm_model looks like

 for idx, m in enumerate(vllm_model.named_modules()):
  print(idx, '->', m)

 https://gist.github.com/vanbasten23/56a5cf844c0a527453a37af36efd3193

 2. After replace the layer with LoRA layers (via `load_lora_model`), the model looks like

 for idx, m in enumerate(vllm_model.named_modules()):
  print(idx, '->', m)

 https://gist.github.com/vanbasten23/fc5ab730ea88d60605057b903a0570ea

 3. Before we run shard_model_to_tpu, the self.model looks like

 for idx, m in enumerate(self.model.named_modules()):
  print(idx, '->', m)

 https://gist.github.com/vanbasten23/dab6a90283c905882647e8aa5d0b9ca1

 4. After we run shard_model_to_tpu, the self.model looks like

 for idx, m in enumerate(self.model.named_modules()):
  print(idx, '->', m)

 https://gist.github.com/vanbasten23/d5b1f15645f3d358b4224927f1bdcbce


 Conclusion:
 1. Compare (1) with (2), we replace
 - QKVParallelLinear with MergedQKVParallelLinearWithLoRA
 - MergedColumnParallelLinear with MergedColumnParallelLinearWithLoRA
 2. Compare (3) with (4), we change the "base_layer part" and replace
 - QKVParallelLinear with JaxQKVParallelLinear
 - RowParallelLinear with JaxRowParallelLinear
 - Attention with JaxAttention
 - MergedColumnParallelLinear with JaxMergedColumnParallelLinear
	1. After we load the original model `vllm_model = vllm_get_model(vllm_config=vllm_config_for_load)`, vllm_model looks like

	for idx, m in enumerate(vllm_model.named_modules()):
	print(idx, '->', m)

	https://gist.github.com/vanbasten23/56a5cf844c0a527453a37af36efd3193

	2. After replace the layer with LoRA layers (via `load_lora_model`), the model looks like

	for idx, m in enumerate(vllm_model.named_modules()):
	print(idx, '->', m)

	https://gist.github.com/vanbasten23/fc5ab730ea88d60605057b903a0570ea

	3. Before we run shard_model_to_tpu, the self.model looks like

	for idx, m in enumerate(self.model.named_modules()):
	print(idx, '->', m)

	https://gist.github.com/vanbasten23/dab6a90283c905882647e8aa5d0b9ca1

	4. After we run shard_model_to_tpu, the self.model looks like

	for idx, m in enumerate(self.model.named_modules()):
	print(idx, '->', m)

	https://gist.github.com/vanbasten23/d5b1f15645f3d358b4224927f1bdcbce


	Conclusion:
	1. Compare (1) with (2), we replace
	- QKVParallelLinear with MergedQKVParallelLinearWithLoRA
	- MergedColumnParallelLinear with MergedColumnParallelLinearWithLoRA
	2. Compare (3) with (4), we change the "base_layer part" and replace
	- QKVParallelLinear with JaxQKVParallelLinear
	- RowParallelLinear with JaxRowParallelLinear
	- Attention with JaxAttention
	- MergedColumnParallelLinear with JaxMergedColumnParallelLinear