Created
August 5, 2025 17:18
-
-
Save vanbasten23/1fde8518a35279cce0879d59f608fbcf to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU. | |
INFO 08-05 17:06:45 [__init__.py:241] Automatically detected platform tpu. | |
INFO 08-05 17:06:45 [tpu.py:202] tpu_commons not found, using vLLM's TpuPlatform | |
INFO 08-05 17:06:46 [utils.py:326] non-default args: {'model': 'Qwen/Qwen2-1.5B-Instruct', 'max_model_len': 128, 'max_num_batched_tokens': 64, 'max_num_seqs': 4, 'disable_log_stats': True} | |
INFO 08-05 17:06:52 [config.py:726] Resolved architecture: Qwen2ForCausalLM | |
INFO 08-05 17:06:52 [config.py:1759] Using max model len 128 | |
INFO 08-05 17:06:52 [config.py:2588] Chunked prefill is enabled with max_num_batched_tokens=64. | |
INFO 08-05 17:06:52 [tpu.py:112] [TPU] Forcing DYNAMO_ONCE compilation level | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [core.py:619] Waiting for init message from front-end. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [core.py:71] Initializing a V1 LLM engine (v0.8.5.dev2456+g309c1bb82) with config: model='Qwen/Qwen2-1.5B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=128, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=None, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":2,"debug_dump_path":"","cache_dir":"","backend":"openxla","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":8,"local_cache_dir":null} | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [importing.py:43] Triton is installed but 0 active driver(s) found (expected 1). Disabling Triton to prevent runtime errors. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [tpu_worker.py:332] tpu_commons not found, using vLLM's TPUWorker. | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:06:53 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
[1;36m(EngineCore_0 pid=721878)[0;0m WARNING 08-05 17:07:22 [tpu.py:165] Pin memory is not supported on TPU. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu_model_runner.py:1880] Using exponential token paddings: | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu_model_runner.py:1882] 16 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu_model_runner.py:1882] 32 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu_model_runner.py:1882] 64 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu_model_runner.py:1846] Preparing request paddings: | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu_model_runner.py:1853] 8 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu_model_runner.py:1228] Loading model from scratch... | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu.py:53] Cannot use None backend on TPU. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [tpu.py:57] Using Pallas V1 backend. | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 LinearBase.__init__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.create_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.load_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [weight_utils.py:296] Using model weights format ['*.safetensors'] | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:22 [weight_utils.py:349] No model.safetensors.index.json found in remote. | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] | |
2025-08-05 17:07:22.654488: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:23.420394: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:23.593327: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:23.772493: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:23.792158: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:23.955217: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:23.975176: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:24.138134: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:24.158149: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.load_weights | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader[1;36m(EngineCore_0 pid=721878)[0;0m | |
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.23s/it] | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.23s/it] | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 MergedColumnParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 QKVParallelLinear.weight_loader | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:24 [default_loader.py:262] Loading weights took 2.33 seconds | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.process_weights_after_loading | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.get_input_embeddings | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:25 [kv_cache_utils.py:829] GPU KV cache size: 942,832 tokens | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:25 [kv_cache_utils.py:833] Maximum concurrency for 128 tokens per request: 7365.88x | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:26 [tpu_model_runner.py:1400] Compiling the model with different input shapes. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:26 [tpu_model_runner.py:1403] -- num_tokens: 16 | |
[1;36m(EngineCore_0 pid=721878)[0;0m /home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/jax/_src/cloud_tpu_init.py:84: UserWarning: Transparent hugepages are not enabled. TPU runtime startup and shutdown time should be significantly improved on TPU v5e and newer. If not already set, you may need to enable transparent hugepages in your VM image (sudo sh -c "echo always > /sys/kernel/mm/transparent_hugepage/enabled") | |
[1;36m(EngineCore_0 pid=721878)[0;0m warnings.warn( | |
[1;36m(EngineCore_0 pid=721878)[0;0m WARNING:root:simplified_key(('bfloat16', 'bfloat16', 12, 2, 128, 16, 16, 128)) is not in ragged attention kernel's tuning table!, the key before simpilification is (dtype(bfloat16), dtype(bfloat16), 12, 2, 128, 16, 16, 8) | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.get_input_embeddings | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1403] -- num_tokens: 32 | |
[1;36m(EngineCore_0 pid=721878)[0;0m WARNING:root:simplified_key(('bfloat16', 'bfloat16', 12, 2, 128, 16, 32, 128)) is not in ragged attention kernel's tuning table!, the key before simpilification is (dtype(bfloat16), dtype(bfloat16), 12, 2, 128, 16, 32, 8) | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.get_input_embeddings | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1403] -- num_tokens: 64 | |
[1;36m(EngineCore_0 pid=721878)[0;0m WARNING:root:simplified_key(('bfloat16', 'bfloat16', 12, 2, 128, 16, 64, 128)) is not in ragged attention kernel's tuning table!, the key before simpilification is (dtype(bfloat16), dtype(bfloat16), 12, 2, 128, 16, 64, 8) | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.get_input_embeddings | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1411] Compilation finished in 1.51 [secs]. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1417] Compiling select_hidden_states with different input shapes. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1432] -- num_tokens: 16, num_seqs: 8 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1432] -- num_tokens: 32, num_seqs: 8 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1432] -- num_tokens: 64, num_seqs: 8 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1440] Compilation finished in 0.00 [secs]. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1444] Compiling compute_logits with different input shapes. | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.compute_logits | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1453] -- num_seqs: 8 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1456] Compilation finished in 0.00 [secs]. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1460] Compiling structured_decoding with different input shapes. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1477] -- num_seqs: 8 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1480] Compilation finished in 0.01 [secs]. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:27 [tpu_model_runner.py:1484] Compiling sample_from_logits with different input shapes. | |
2025-08-05 17:07:36.785618: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
2025-08-05 17:07:43.623479: W torch_xla/csrc/runtime/pjrt_computation_client.cpp:691] Failed to deserialize executable: UNIMPLEMENTED: Deserializing serialized executable not supported. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:43 [tpu_model_runner.py:1508] -- num_seqs: 8 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:43 [tpu_model_runner.py:1511] Compilation finished in 16.00 [secs]. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:43 [tpu_model_runner.py:1515] Compiling gather_logprobs with different input shapes. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:43 [tpu_model_runner.py:1526] -- num_seqs: 8 | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:43 [tpu_model_runner.py:1529] Compilation finished in 0.00 [secs]. | |
[1;36m(EngineCore_0 pid=721878)[0;0m INFO 08-05 17:07:43 [core.py:199] init engine (profile, create kv cache, warmup model) took 18.84 seconds | |
INFO 08-05 17:07:44 [llm.py:290] Supported_tasks: ['generate'] | |
Adding requests: 0%| | 0/3 [00:00<?, ?it/s] | |
Adding requests: 100%|██████████| 3/3 [00:00<00:00, 384.07it/s] | |
Processed prompts: 0%| | 0/3 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Model.get_input_embeddings | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2DecoderLayer.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2Attention.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2MLP.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 ColumnParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 RowParallelLinear.forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 UnquantizedLinearMethod.apply | |
[1;36m(EngineCore_0 pid=721878)[0;0m xw32 Qwen2ForCausalLM.compute_logits | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.8.5.dev2456+g309c1bb82) with config: model='Qwen/Qwen2-1.5B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=128, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=xla:0, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":2,"debug_dump_path":"","cache_dir":"","backend":"openxla","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":8,"local_cache_dir":null}, | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=0,prompt_token_ids_len=9,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1],),num_computed_tokens=0,lora_request=None)], scheduled_cached_reqs=CachedRequestData(req_ids=[], resumed_from_preemption=[], new_token_ids=[], new_block_ids=[], num_computed_tokens=[]), num_scheduled_tokens={0: 9}, total_num_scheduled_tokens=9, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[1], finished_req_ids=[], free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] EngineCore encountered a fatal error. | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Traceback (most recent call last): | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 676, in run_engine_core | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] engine_core.run_busy_loop() | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 703, in run_busy_loop | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] self._process_engine_step() | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 728, in _process_engine_step | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] outputs, model_executed = self.step_fn() | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 273, in step | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] model_output = self.execute_model_with_error_logging( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 259, in execute_model_with_error_logging | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] raise err | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 250, in execute_model_with_error_logging | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return model_fn(scheduler_output) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/executor/abstract.py", line 87, in execute_model | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] output = self.collective_rpc("execute_model", | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] answer = run_method(self.driver_worker, method, args, kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/utils/__init__.py", line 2948, in run_method | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return func(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/worker/tpu_worker.py", line 246, in execute_model | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] output = self.model_runner.execute_model(scheduler_output) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return func(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/worker/tpu_model_runner.py", line 1058, in execute_model | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] selected_token_ids = self.sample_from_logits_func( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 804, in compile_wrapper | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return fn(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/vllm/vllm/v1/worker/tpu_model_runner.py", line 1719, in sample_from_logits | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] def sample_from_logits( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1005, in _fn | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return fn(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1124, in forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return compiled_fn(full_args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] all_outs = call_func_at_runtime_with_args( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] out = normalize_as_list(f(args)) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 723, in inner_fn | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] outs = compiled_fn(args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 525, in wrapper | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return compiled_fn(runtime_args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 103, in g | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return f(*args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_dynamo/backends/torchxla.py", line 39, in fwd | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return compiled_graph(*args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return self._wrapped_call(self, *args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/fx/graph_module.py", line 413, in __call__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] raise e | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in __call__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return self._call_impl(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] return forward_call(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "<eval_with_key>.5", line 5, in forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] optimized_mod = torch_xla__dynamo_dynamo_bridge_optimized_mod(arg0_1); arg0_1 = None | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch_xla/_dynamo/dynamo_bridge.py", line 583, in optimized_mod | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] torch_xla._XLAC._xla_sync_multi( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ValueError: XLA:TPU compile permanent error. Ran out of memory in memory space hbm. Used 53.25G of 31.25G hbm. Exceeded hbm capacity by 22.00G. | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Total hbm usage >= 53.50G: | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] reserved 260.00M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] program 25.19G | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] arguments 28.06G | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Output size 2.32M; shares 0B with arguments. | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Program hbm requirement 25.19G: | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] HLO temp 25.19G (100.0% utilization: Unpadded (25.18G) Padded (25.18G), 0.1% fragmentation (16.28M)) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Largest program allocations in hbm: | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 1. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.368 = copy(p8.86) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 2. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.395 = copy(p258.6709) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 3. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.394 = copy(p249.6464) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 4. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.393 = copy(p240.6219) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 5. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.392 = copy(p231.5974) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 6. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.391 = copy(p222.5729) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 7. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.390 = copy(p213.5484) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 8. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.389 = copy(p204.5239) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 9. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.388 = copy(p195.4994) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 10. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.387 = copy(p186.4749) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 11. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.386 = copy(p177.4504) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 12. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.385 = copy(p168.4259) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 13. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.384 = copy(p159.4014) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 14. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.383 = copy(p150.3769) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 15. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.382 = copy(p141.3524) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 16. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.381 = copy(p132.3279) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 17. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.380 = copy(p123.3034) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 18. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.379 = copy(p114.2789) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 19. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.378 = copy(p105.2544) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] 20. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] XLA label: copy.377 = copy(p96.2299) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ERROR 08-05 17:07:46 [core.py:685] | |
Traceback (most recent call last): | |
File "/home/xiowei/vllm/examples/offline_inference/tpu.py", line 58, in <module> | |
[1;36m(EngineCore_0 pid=721878)[0;0m Process EngineCore_0: | |
main() | |
File "/home/xiowei/vllm/examples/offline_inference/tpu.py", line 47, in main | |
outputs = llm.generate(prompts, sampling_params) | |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/xiowei/vllm/vllm/utils/__init__.py", line 1520, in inner | |
return fn(*args, **kwargs) | |
^^^^^^^^^^^^^^^^^^^ | |
File "/home/xiowei/vllm/vllm/entrypoints/llm.py", line 489, in generate | |
[1;36m(EngineCore_0 pid=721878)[0;0m Traceback (most recent call last): | |
outputs = self._run_engine(use_tqdm=use_tqdm) | |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/xiowei/vllm/vllm/entrypoints/llm.py", line 1693, in _run_engine | |
step_outputs = self.llm_engine.step() | |
^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/xiowei/vllm/vllm/v1/engine/llm_engine.py", line 241, in step | |
outputs = self.engine_core.get_output() | |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/xiowei/vllm/vllm/v1/engine/core_client.py", line 634, in get_output | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap | |
[1;36m(EngineCore_0 pid=721878)[0;0m self.run() | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/multiprocessing/process.py", line 108, in run | |
[1;36m(EngineCore_0 pid=721878)[0;0m self._target(*self._args, **self._kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 687, in run_engine_core | |
[1;36m(EngineCore_0 pid=721878)[0;0m raise e | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 676, in run_engine_core | |
[1;36m(EngineCore_0 pid=721878)[0;0m engine_core.run_busy_loop() | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 703, in run_busy_loop | |
[1;36m(EngineCore_0 pid=721878)[0;0m self._process_engine_step() | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 728, in _process_engine_step | |
[1;36m(EngineCore_0 pid=721878)[0;0m outputs, model_executed = self.step_fn() | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 273, in step | |
[1;36m(EngineCore_0 pid=721878)[0;0m model_output = self.execute_model_with_error_logging( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 259, in execute_model_with_error_logging | |
[1;36m(EngineCore_0 pid=721878)[0;0m raise err | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/engine/core.py", line 250, in execute_model_with_error_logging | |
[1;36m(EngineCore_0 pid=721878)[0;0m return model_fn(scheduler_output) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/executor/abstract.py", line 87, in execute_model | |
[1;36m(EngineCore_0 pid=721878)[0;0m output = self.collective_rpc("execute_model", | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc | |
raise self._format_exception(outputs) from None | |
[1;36m(EngineCore_0 pid=721878)[0;0m answer = run_method(self.driver_worker, method, args, kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/utils/__init__.py", line 2948, in run_method | |
[1;36m(EngineCore_0 pid=721878)[0;0m return func(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/worker/tpu_worker.py", line 246, in execute_model | |
[1;36m(EngineCore_0 pid=721878)[0;0m output = self.model_runner.execute_model(scheduler_output) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context | |
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. | |
[1;36m(EngineCore_0 pid=721878)[0;0m return func(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/worker/tpu_model_runner.py", line 1058, in execute_model | |
[1;36m(EngineCore_0 pid=721878)[0;0m selected_token_ids = self.sample_from_logits_func( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 804, in compile_wrapper | |
[1;36m(EngineCore_0 pid=721878)[0;0m return fn(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/vllm/vllm/v1/worker/tpu_model_runner.py", line 1719, in sample_from_logits | |
[1;36m(EngineCore_0 pid=721878)[0;0m def sample_from_logits( | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1005, in _fn | |
[1;36m(EngineCore_0 pid=721878)[0;0m return fn(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1124, in forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m return compiled_fn(full_args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper | |
[1;36m(EngineCore_0 pid=721878)[0;0m all_outs = call_func_at_runtime_with_args( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args | |
[1;36m(EngineCore_0 pid=721878)[0;0m out = normalize_as_list(f(args)) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 723, in inner_fn | |
[1;36m(EngineCore_0 pid=721878)[0;0m outs = compiled_fn(args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 525, in wrapper | |
[1;36m(EngineCore_0 pid=721878)[0;0m return compiled_fn(runtime_args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 103, in g | |
[1;36m(EngineCore_0 pid=721878)[0;0m return f(*args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/_dynamo/backends/torchxla.py", line 39, in fwd | |
[1;36m(EngineCore_0 pid=721878)[0;0m return compiled_graph(*args) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped | |
[1;36m(EngineCore_0 pid=721878)[0;0m return self._wrapped_call(self, *args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/fx/graph_module.py", line 413, in __call__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m raise e | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in __call__ | |
[1;36m(EngineCore_0 pid=721878)[0;0m return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl | |
[1;36m(EngineCore_0 pid=721878)[0;0m return self._call_impl(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl | |
[1;36m(EngineCore_0 pid=721878)[0;0m return forward_call(*args, **kwargs) | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "<eval_with_key>.5", line 5, in forward | |
[1;36m(EngineCore_0 pid=721878)[0;0m optimized_mod = torch_xla__dynamo_dynamo_bridge_optimized_mod(arg0_1); arg0_1 = None | |
[1;36m(EngineCore_0 pid=721878)[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
[1;36m(EngineCore_0 pid=721878)[0;0m File "/home/xiowei/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch_xla/_dynamo/dynamo_bridge.py", line 583, in optimized_mod | |
[1;36m(EngineCore_0 pid=721878)[0;0m torch_xla._XLAC._xla_sync_multi( | |
[1;36m(EngineCore_0 pid=721878)[0;0m ValueError: XLA:TPU compile permanent error. Ran out of memory in memory space hbm. Used 53.25G of 31.25G hbm. Exceeded hbm capacity by 22.00G. | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m Total hbm usage >= 53.50G: | |
[1;36m(EngineCore_0 pid=721878)[0;0m reserved 260.00M | |
[1;36m(EngineCore_0 pid=721878)[0;0m program 25.19G | |
[1;36m(EngineCore_0 pid=721878)[0;0m arguments 28.06G | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m Output size 2.32M; shares 0B with arguments. | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m Program hbm requirement 25.19G: | |
[1;36m(EngineCore_0 pid=721878)[0;0m HLO temp 25.19G (100.0% utilization: Unpadded (25.18G) Padded (25.18G), 0.1% fragmentation (16.28M)) | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m Largest program allocations in hbm: | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 1. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.368 = copy(p8.86) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 2. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.395 = copy(p258.6709) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 3. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.394 = copy(p249.6464) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 4. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.393 = copy(p240.6219) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 5. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.392 = copy(p231.5974) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 6. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.391 = copy(p222.5729) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 7. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.390 = copy(p213.5484) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 8. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.389 = copy(p204.5239) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 9. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.388 = copy(p195.4994) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 10. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.387 = copy(p186.4749) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 11. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.386 = copy(p177.4504) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 12. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.385 = copy(p168.4259) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 13. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.384 = copy(p159.4014) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 14. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.383 = copy(p150.3769) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 15. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.382 = copy(p141.3524) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 16. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.381 = copy(p132.3279) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 17. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.380 = copy(p123.3034) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 18. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.379 = copy(p114.2789) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 19. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.378 = copy(p105.2544) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m 20. Size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m Shape: bf16[58927,16,4,128]{3,2,1,0:T(4,128)(2,1)} | |
[1;36m(EngineCore_0 pid=721878)[0;0m Unpadded size: 920.73M | |
[1;36m(EngineCore_0 pid=721878)[0;0m XLA label: copy.377 = copy(p96.2299) | |
[1;36m(EngineCore_0 pid=721878)[0;0m Allocation type: HLO temp | |
[1;36m(EngineCore_0 pid=721878)[0;0m ========================== | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
[1;36m(EngineCore_0 pid=721878)[0;0m | |
Processed prompts: 0%| | 0/3 [00:19<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment