Created
June 18, 2025 04:20
-
-
Save hongbo-miao/5dd1f57586a834803cbe05f442faaf11 to your computer and use it in GitHub Desktop.
olmOCR log
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
uv run python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmocr-sample.pdf --model allenai/olmOCR-7B-0225-preview-FP8 | |
INFO:olmocr.check:pdftoppm is installed and working. | |
2025-06-17 21:14:55,004 - __main__ - INFO - Got --pdfs argument, going to add to the work queue | |
2025-06-17 21:14:55,004 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document | |
2025-06-17 21:14:55,004 - __main__ - INFO - Found 1 total pdf paths to add | |
Sampling PDFs to calculate optimal length: 100%|██████████████████████████████████████████████| 1/1 [00:00<00:00, 552.46it/s] | |
2025-06-17 21:14:55,007 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00 | |
INFO:olmocr.work_queue:Found 1 total paths | |
INFO:olmocr.work_queue:0 new paths to add to the workspace | |
2025-06-17 21:14:55,163 - __main__ - INFO - Starting pipeline with PID 3059549 | |
2025-06-17 21:14:55,163 - __main__ - INFO - Downloading model with hugging face 'allenai/olmOCR-7B-0225-preview-FP8' | |
README.md: 100%|████████████████████████████████████████████████████████████████████████| 1.07k/1.07k [00:00<00:00, 20.1MB/s] | |
config.json: 100%|███████████████████████████████████████████████████████████████████████| 6.66k/6.66k [00:00<00:00, 109MB/s] | |
generation_config.json: 100%|███████████████████████████████████████████████████████████████| 215/215 [00:00<00:00, 3.16MB/s] | |
added_tokens.json: 100%|████████████████████████████████████████████████████████████████████| 392/392 [00:00<00:00, 3.83MB/s] | |
preprocessor_config.json: 100%|█████████████████████████████████████████████████████████████| 347/347 [00:00<00:00, 7.13MB/s] | |
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████| 73.5k/73.5k [00:00<00:00, 73.5MB/s] | |
recipe.yaml: 100%|██████████████████████████████████████████████████████████████████████████| 158/158 [00:00<00:00, 3.88MB/s] | |
.gitattributes: 100%|███████████████████████████████████████████████████████████████████| 1.57k/1.57k [00:00<00:00, 16.0MB/s] | |
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████| 613/613 [00:00<00:00, 6.26MB/s] | |
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████| 4.29k/4.29k [00:00<00:00, 66.6MB/s] | |
merges.txt: 100%|███████████████████████████████████████████████████████████████████████| 1.67M/1.67M [00:00<00:00, 3.43MB/s] | |
vocab.json: 100%|███████████████████████████████████████████████████████████████████████| 2.78M/2.78M [00:00<00:00, 11.1MB/s] | |
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████| 11.4M/11.4M [00:04<00:00, 2.47MB/s] | |
model-00003-of-00003.safetensors: 100%|█████████████████████████████████████████████████| 1.09G/1.09G [01:18<00:00, 13.9MB/s] | |
model-00002-of-00003.safetensors: 100%|█████████████████████████████████████████████████| 4.03G/4.03G [01:20<00:00, 50.3MB/s] | |
model-00001-of-00003.safetensors: 100%|█████████████████████████████████████████████████| 4.94G/4.94G [01:20<00:00, 61.3MB/s] | |
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████████████| 16/16 [01:20<00:00, 5.05s/it] | |
INFO:olmocr.work_queue:Initialized local queue with 1 work items█████████████████████████| 4.94G/4.94G [01:20<00:00, 170MB/s] | |
2025-06-17 21:16:16,093 - __main__ - WARNING - Attempt 1: Please wait for vllm server to become ready... | |
2025-06-17 21:16:17,106 - __main__ - WARNING - Attempt 2: Please wait for vllm server to become ready... | |
2025-06-17 21:16:18,042 - __main__ - INFO - INFO 06-17 21:16:18 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:16:18,121 - __main__ - WARNING - Attempt 3: Please wait for vllm server to become ready... | |
2025-06-17 21:16:19,136 - __main__ - WARNING - Attempt 4: Please wait for vllm server to become ready... | |
2025-06-17 21:16:20,151 - __main__ - WARNING - Attempt 5: Please wait for vllm server to become ready... | |
2025-06-17 21:16:21,165 - __main__ - WARNING - Attempt 6: Please wait for vllm server to become ready... | |
2025-06-17 21:16:22,178 - __main__ - WARNING - Attempt 7: Please wait for vllm server to become ready... | |
2025-06-17 21:16:22,555 - __main__ - INFO - INFO 06-17 21:16:22 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 21:16:23,132 - __main__ - INFO - INFO 06-17 21:16:23 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview-FP8', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 21:16:23,191 - __main__ - WARNING - Attempt 8: Please wait for vllm server to become ready... | |
2025-06-17 21:16:24,206 - __main__ - WARNING - Attempt 9: Please wait for vllm server to become ready... | |
2025-06-17 21:16:25,219 - __main__ - WARNING - Attempt 10: Please wait for vllm server to become ready... | |
2025-06-17 21:16:26,233 - __main__ - WARNING - Attempt 11: Please wait for vllm server to become ready... | |
2025-06-17 21:16:27,247 - __main__ - WARNING - Attempt 12: Please wait for vllm server to become ready... | |
2025-06-17 21:16:28,263 - __main__ - WARNING - Attempt 13: Please wait for vllm server to become ready... | |
2025-06-17 21:16:28,623 - __main__ - INFO - INFO 06-17 21:16:28 [config.py:823] This model supports multiple tasks: {'reward', 'embed', 'generate', 'score', 'classify'}. Defaulting to 'generate'. | |
2025-06-17 21:16:28,787 - __main__ - INFO - INFO 06-17 21:16:28 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 21:16:29,276 - __main__ - WARNING - Attempt 14: Please wait for vllm server to become ready... | |
2025-06-17 21:16:30,085 - __main__ - INFO - WARNING 06-17 21:16:30 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 21:16:30,290 - __main__ - WARNING - Attempt 15: Please wait for vllm server to become ready... | |
2025-06-17 21:16:31,222 - __main__ - INFO - INFO 06-17 21:16:31 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:16:31,303 - __main__ - WARNING - Attempt 16: Please wait for vllm server to become ready... | |
2025-06-17 21:16:32,320 - __main__ - WARNING - Attempt 17: Please wait for vllm server to become ready... | |
2025-06-17 21:16:33,334 - __main__ - WARNING - Attempt 18: Please wait for vllm server to become ready... | |
2025-06-17 21:16:33,482 - __main__ - INFO - INFO 06-17 21:16:33 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 21:16:33,488 - __main__ - INFO - INFO 06-17 21:16:33 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview-FP8', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview-FP8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 21:16:33,581 - __main__ - INFO - WARNING 06-17 21:16:33 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f9b740d89d0> | |
2025-06-17 21:16:33,994 - __main__ - INFO - INFO 06-17 21:16:33 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
2025-06-17 21:16:34,347 - __main__ - WARNING - Attempt 19: Please wait for vllm server to become ready... | |
2025-06-17 21:16:34,488 - __main__ - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. | |
2025-06-17 21:16:35,361 - __main__ - WARNING - Attempt 20: Please wait for vllm server to become ready... | |
2025-06-17 21:16:35,418 - __main__ - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. | |
2025-06-17 21:16:36,374 - __main__ - WARNING - Attempt 21: Please wait for vllm server to become ready... | |
2025-06-17 21:16:37,390 - __main__ - WARNING - Attempt 22: Please wait for vllm server to become ready... | |
2025-06-17 21:16:38,229 - __main__ - INFO - Unused or unrecognized kwargs: return_tensors. | |
2025-06-17 21:16:38,404 - __main__ - WARNING - Attempt 23: Please wait for vllm server to become ready... | |
2025-06-17 21:16:38,574 - __main__ - INFO - WARNING 06-17 21:16:38 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. | |
2025-06-17 21:16:38,583 - __main__ - INFO - INFO 06-17 21:16:38 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview-FP8... | |
2025-06-17 21:16:38,767 - __main__ - INFO - INFO 06-17 21:16:38 [gpu_model_runner.py:1600] Loading model from scratch... | |
2025-06-17 21:16:38,801 - __main__ - INFO - WARNING 06-17 21:16:38 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. | |
2025-06-17 21:16:38,832 - __main__ - INFO - INFO 06-17 21:16:38 [cuda.py:252] Using Flash Attention backend on V1 engine. | |
2025-06-17 21:16:39,061 - __main__ - INFO - INFO 06-17 21:16:39 [weight_utils.py:292] Using model weights format ['*.safetensors'] | |
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] | |
2025-06-17 21:16:39,418 - __main__ - WARNING - Attempt 24: Please wait for vllm server to become ready... | |
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:00, 2.69it/s] | |
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:00<00:00, 3.79it/s] | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.74it/s] | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.87it/s] | |
2025-06-17 21:16:40,262 - __main__ - INFO - | |
2025-06-17 21:16:40,375 - __main__ - INFO - INFO 06-17 21:16:40 [default_loader.py:272] Loading weights took 1.16 seconds | |
2025-06-17 21:16:40,432 - __main__ - WARNING - Attempt 25: Please wait for vllm server to become ready... | |
2025-06-17 21:16:40,712 - __main__ - INFO - INFO 06-17 21:16:40 [gpu_model_runner.py:1624] Model loading took 9.4248 GiB and 1.619003 seconds | |
2025-06-17 21:16:41,446 - __main__ - WARNING - Attempt 26: Please wait for vllm server to become ready... | |
2025-06-17 21:16:42,043 - __main__ - INFO - INFO 06-17 21:16:42 [gpu_model_runner.py:1978] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size. | |
2025-06-17 21:16:42,459 - __main__ - WARNING - Attempt 27: Please wait for vllm server to become ready... | |
2025-06-17 21:16:42,713 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] EngineCore failed to start. | |
2025-06-17 21:16:42,713 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] Traceback (most recent call last): | |
2025-06-17 21:16:42,713 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:16:42,713 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:16:42,713 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,713 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:16:42,713 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:16:42,713 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] self.model_runner.profile_run() | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:16:42,714 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] x = blk( | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] x = x + self.attn( | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return ApplyRotaryEmb.apply( | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] out = apply_rotary( | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] rotary_kernel[grid]( | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:16:42,715 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:16:42,716 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] self._init_handles() | |
2025-06-17 21:16:42,716 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:16:42,716 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:16:42,716 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - ERROR 06-17 21:16:42 [core.py:515] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:16:42,716 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 21:16:42,716 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 21:16:42,716 - __main__ - INFO - self.run() | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 21:16:42,716 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 21:16:42,716 - __main__ - INFO - raise e | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:16:42,716 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:16:42,716 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:16:42,716 - __main__ - INFO - self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:16:42,716 - __main__ - INFO - available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:16:42,716 - __main__ - INFO - output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:16:42,716 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:16:42,716 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:16:42,716 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:16:42,716 - __main__ - INFO - self.model_runner.profile_run() | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:16:42,716 - __main__ - INFO - dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:16:42,716 - __main__ - INFO - vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,716 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:16:42,716 - __main__ - INFO - image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:16:42,716 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:16:42,717 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:16:42,717 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:16:42,717 - __main__ - INFO - x = blk( | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:16:42,717 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:16:42,717 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:16:42,717 - __main__ - INFO - x = x + self.attn( | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:16:42,717 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:16:42,717 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:16:42,717 - __main__ - INFO - q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:16:42,717 - __main__ - INFO - output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:16:42,717 - __main__ - INFO - return ApplyRotaryEmb.apply( | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:16:42,717 - __main__ - INFO - return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:16:42,717 - __main__ - INFO - out = apply_rotary( | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:16:42,717 - __main__ - INFO - rotary_kernel[grid]( | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:16:42,717 - __main__ - INFO - return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:16:42,717 - __main__ - INFO - kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:16:42,717 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:16:42,717 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:16:42,717 - __main__ - INFO - self._init_handles() | |
2025-06-17 21:16:42,718 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:16:42,718 - __main__ - INFO - self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:16:42,718 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:42,718 - __main__ - INFO - SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:16:43,101 - __main__ - INFO - [rank0]:[W617 21:16:43.222408568 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
2025-06-17 21:16:43,473 - __main__ - WARNING - Attempt 28: Please wait for vllm server to become ready... | |
2025-06-17 21:16:43,783 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:16:43,783 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 21:16:43,783 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 21:16:43,783 - __main__ - INFO - ^^^^^^ | |
2025-06-17 21:16:43,783 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 21:16:43,783 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 21:16:43,783 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 21:16:43,783 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 21:16:43,783 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 21:16:43,783 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 21:16:43,783 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:43,783 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 21:16:43,783 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 21:16:43,783 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:43,783 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 21:16:43,783 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 21:16:43,783 - __main__ - INFO - return await main | |
2025-06-17 21:16:43,783 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:16:43,783 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 21:16:43,784 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 21:16:43,784 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:16:43,784 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:16:43,784 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 21:16:43,784 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:16:43,784 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:16:43,784 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 21:16:43,784 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 21:16:43,784 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 21:16:43,784 - __main__ - INFO - return cls( | |
2025-06-17 21:16:43,784 - __main__ - INFO - ^^^^ | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 21:16:43,784 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 21:16:43,784 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 21:16:43,784 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 21:16:43,784 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 21:16:43,784 - __main__ - INFO - super().__init__( | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 21:16:43,784 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 21:16:43,784 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 21:16:43,784 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 21:16:43,784 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 21:16:43,784 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 21:16:43,784 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 21:16:44,487 - __main__ - WARNING - Attempt 29: Please wait for vllm server to become ready... | |
2025-06-17 21:16:44,625 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 21:16:45,502 - __main__ - WARNING - Attempt 30: Please wait for vllm server to become ready... | |
2025-06-17 21:16:46,515 - __main__ - WARNING - Attempt 31: Please wait for vllm server to become ready... | |
2025-06-17 21:16:46,567 - __main__ - INFO - INFO 06-17 21:16:46 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:16:47,530 - __main__ - WARNING - Attempt 32: Please wait for vllm server to become ready... | |
2025-06-17 21:16:48,544 - __main__ - WARNING - Attempt 33: Please wait for vllm server to become ready... | |
2025-06-17 21:16:49,559 - __main__ - WARNING - Attempt 34: Please wait for vllm server to become ready... | |
2025-06-17 21:16:50,574 - __main__ - WARNING - Attempt 35: Please wait for vllm server to become ready... | |
2025-06-17 21:16:51,055 - __main__ - INFO - INFO 06-17 21:16:51 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 21:16:51,587 - __main__ - WARNING - Attempt 36: Please wait for vllm server to become ready... | |
2025-06-17 21:16:51,609 - __main__ - INFO - INFO 06-17 21:16:51 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview-FP8', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 21:16:52,603 - __main__ - WARNING - Attempt 37: Please wait for vllm server to become ready... | |
2025-06-17 21:16:53,617 - __main__ - WARNING - Attempt 38: Please wait for vllm server to become ready... | |
2025-06-17 21:16:54,630 - __main__ - WARNING - Attempt 39: Please wait for vllm server to become ready... | |
2025-06-17 21:16:55,643 - __main__ - WARNING - Attempt 40: Please wait for vllm server to become ready... | |
2025-06-17 21:16:56,656 - __main__ - WARNING - Attempt 41: Please wait for vllm server to become ready... | |
2025-06-17 21:16:56,976 - __main__ - INFO - INFO 06-17 21:16:56 [config.py:823] This model supports multiple tasks: {'score', 'generate', 'embed', 'reward', 'classify'}. Defaulting to 'generate'. | |
2025-06-17 21:16:57,102 - __main__ - INFO - INFO 06-17 21:16:57 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 21:16:57,670 - __main__ - WARNING - Attempt 42: Please wait for vllm server to become ready... | |
2025-06-17 21:16:58,333 - __main__ - INFO - WARNING 06-17 21:16:58 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 21:16:58,683 - __main__ - WARNING - Attempt 43: Please wait for vllm server to become ready... | |
2025-06-17 21:16:59,482 - __main__ - INFO - INFO 06-17 21:16:59 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:16:59,697 - __main__ - WARNING - Attempt 44: Please wait for vllm server to become ready... | |
2025-06-17 21:17:00,709 - __main__ - WARNING - Attempt 45: Please wait for vllm server to become ready... | |
2025-06-17 21:17:01,721 - __main__ - INFO - INFO 06-17 21:17:01 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 21:17:01,721 - __main__ - WARNING - Attempt 46: Please wait for vllm server to become ready... | |
2025-06-17 21:17:01,721 - __main__ - INFO - INFO 06-17 21:17:01 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview-FP8', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview-FP8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 21:17:01,818 - __main__ - INFO - WARNING 06-17 21:17:01 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x736ac4f60890> | |
2025-06-17 21:17:02,253 - __main__ - INFO - INFO 06-17 21:17:02 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
2025-06-17 21:17:02,734 - __main__ - WARNING - Attempt 47: Please wait for vllm server to become ready... | |
2025-06-17 21:17:02,870 - __main__ - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. | |
2025-06-17 21:17:03,725 - __main__ - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. | |
2025-06-17 21:17:03,747 - __main__ - WARNING - Attempt 48: Please wait for vllm server to become ready... | |
2025-06-17 21:17:04,760 - __main__ - WARNING - Attempt 49: Please wait for vllm server to become ready... | |
2025-06-17 21:17:05,774 - __main__ - WARNING - Attempt 50: Please wait for vllm server to become ready... | |
2025-06-17 21:17:06,555 - __main__ - INFO - Unused or unrecognized kwargs: return_tensors. | |
2025-06-17 21:17:06,788 - __main__ - WARNING - Attempt 51: Please wait for vllm server to become ready... | |
2025-06-17 21:17:06,898 - __main__ - INFO - WARNING 06-17 21:17:06 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. | |
2025-06-17 21:17:06,906 - __main__ - INFO - INFO 06-17 21:17:06 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview-FP8... | |
2025-06-17 21:17:07,088 - __main__ - INFO - INFO 06-17 21:17:07 [gpu_model_runner.py:1600] Loading model from scratch... | |
2025-06-17 21:17:07,117 - __main__ - INFO - WARNING 06-17 21:17:07 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. | |
2025-06-17 21:17:07,145 - __main__ - INFO - INFO 06-17 21:17:07 [cuda.py:252] Using Flash Attention backend on V1 engine. | |
2025-06-17 21:17:07,361 - __main__ - INFO - INFO 06-17 21:17:07 [weight_utils.py:292] Using model weights format ['*.safetensors'] | |
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] | |
2025-06-17 21:17:07,804 - __main__ - WARNING - Attempt 52: Please wait for vllm server to become ready... | |
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:00, 2.70it/s] | |
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:00<00:00, 3.71it/s] | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.71it/s] | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.84it/s] | |
2025-06-17 21:17:08,583 - __main__ - INFO - | |
2025-06-17 21:17:08,696 - __main__ - INFO - INFO 06-17 21:17:08 [default_loader.py:272] Loading weights took 1.17 seconds | |
2025-06-17 21:17:08,819 - __main__ - WARNING - Attempt 53: Please wait for vllm server to become ready... | |
2025-06-17 21:17:09,059 - __main__ - INFO - INFO 06-17 21:17:09 [gpu_model_runner.py:1624] Model loading took 9.4248 GiB and 1.618142 seconds | |
2025-06-17 21:17:09,831 - __main__ - WARNING - Attempt 54: Please wait for vllm server to become ready... | |
2025-06-17 21:17:10,397 - __main__ - INFO - INFO 06-17 21:17:10 [gpu_model_runner.py:1978] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size. | |
2025-06-17 21:17:10,845 - __main__ - WARNING - Attempt 55: Please wait for vllm server to become ready... | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] EngineCore failed to start. | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] Traceback (most recent call last): | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] self.model_runner.profile_run() | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:17:11,075 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] x = blk( | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] x = x + self.attn( | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return ApplyRotaryEmb.apply( | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:17:11,076 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] out = apply_rotary( | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] rotary_kernel[grid]( | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] self._init_handles() | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - ERROR 06-17 21:17:11 [core.py:515] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:17:11,077 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 21:17:11,077 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 21:17:11,077 - __main__ - INFO - self.run() | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 21:17:11,077 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 21:17:11,077 - __main__ - INFO - raise e | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:17:11,077 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:17:11,077 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:17:11,077 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:17:11,077 - __main__ - INFO - self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:17:11,077 - __main__ - INFO - available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:17:11,077 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:17:11,077 - __main__ - INFO - output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:17:11,077 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:17:11,077 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:17:11,077 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:17:11,077 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:17:11,077 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,077 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:17:11,077 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:17:11,078 - __main__ - INFO - self.model_runner.profile_run() | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:17:11,078 - __main__ - INFO - dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:17:11,078 - __main__ - INFO - vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:17:11,078 - __main__ - INFO - image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:11,078 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:11,078 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:17:11,078 - __main__ - INFO - x = blk( | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:11,078 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:11,078 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:17:11,078 - __main__ - INFO - x = x + self.attn( | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:11,078 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:11,078 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:17:11,078 - __main__ - INFO - q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:17:11,078 - __main__ - INFO - output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,078 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:17:11,078 - __main__ - INFO - return ApplyRotaryEmb.apply( | |
2025-06-17 21:17:11,078 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,079 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:17:11,079 - __main__ - INFO - return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:17:11,079 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,079 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:17:11,079 - __main__ - INFO - out = apply_rotary( | |
2025-06-17 21:17:11,079 - __main__ - INFO - ^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,079 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:17:11,079 - __main__ - INFO - rotary_kernel[grid]( | |
2025-06-17 21:17:11,079 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:17:11,079 - __main__ - INFO - return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:17:11,079 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,079 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:17:11,079 - __main__ - INFO - kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:17:11,079 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:17:11,079 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:17:11,079 - __main__ - INFO - self._init_handles() | |
2025-06-17 21:17:11,079 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:17:11,079 - __main__ - INFO - self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:17:11,079 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:11,079 - __main__ - INFO - SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:17:11,470 - __main__ - INFO - [rank0]:[W617 21:17:11.592120367 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
2025-06-17 21:17:11,859 - __main__ - WARNING - Attempt 56: Please wait for vllm server to become ready... | |
2025-06-17 21:17:12,155 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:17:12,155 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 21:17:12,155 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 21:17:12,155 - __main__ - INFO - ^^^^^^ | |
2025-06-17 21:17:12,155 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 21:17:12,155 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 21:17:12,155 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 21:17:12,155 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 21:17:12,155 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 21:17:12,155 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 21:17:12,156 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 21:17:12,156 - __main__ - INFO - return await main | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 21:17:12,156 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 21:17:12,156 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:17:12,156 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 21:17:12,156 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:17:12,156 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 21:17:12,156 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 21:17:12,156 - __main__ - INFO - return cls( | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 21:17:12,156 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 21:17:12,156 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 21:17:12,156 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 21:17:12,156 - __main__ - INFO - super().__init__( | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 21:17:12,156 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 21:17:12,156 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 21:17:12,156 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 21:17:12,156 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 21:17:12,156 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 21:17:12,156 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 21:17:12,872 - __main__ - WARNING - Attempt 57: Please wait for vllm server to become ready... | |
2025-06-17 21:17:12,978 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 21:17:13,886 - __main__ - WARNING - Attempt 58: Please wait for vllm server to become ready... | |
2025-06-17 21:17:14,868 - __main__ - INFO - INFO 06-17 21:17:14 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:17:14,899 - __main__ - WARNING - Attempt 59: Please wait for vllm server to become ready... | |
2025-06-17 21:17:15,915 - __main__ - WARNING - Attempt 60: Please wait for vllm server to become ready... | |
2025-06-17 21:17:16,929 - __main__ - WARNING - Attempt 61: Please wait for vllm server to become ready... | |
2025-06-17 21:17:17,943 - __main__ - WARNING - Attempt 62: Please wait for vllm server to become ready... | |
2025-06-17 21:17:18,958 - __main__ - WARNING - Attempt 63: Please wait for vllm server to become ready... | |
2025-06-17 21:17:19,317 - __main__ - INFO - INFO 06-17 21:17:19 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 21:17:19,884 - __main__ - INFO - INFO 06-17 21:17:19 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview-FP8', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 21:17:19,971 - __main__ - WARNING - Attempt 64: Please wait for vllm server to become ready... | |
2025-06-17 21:17:20,984 - __main__ - WARNING - Attempt 65: Please wait for vllm server to become ready... | |
2025-06-17 21:17:21,997 - __main__ - WARNING - Attempt 66: Please wait for vllm server to become ready... | |
2025-06-17 21:17:23,013 - __main__ - WARNING - Attempt 67: Please wait for vllm server to become ready... | |
2025-06-17 21:17:24,028 - __main__ - WARNING - Attempt 68: Please wait for vllm server to become ready... | |
2025-06-17 21:17:25,041 - __main__ - WARNING - Attempt 69: Please wait for vllm server to become ready... | |
2025-06-17 21:17:25,410 - __main__ - INFO - INFO 06-17 21:17:25 [config.py:823] This model supports multiple tasks: {'generate', 'embed', 'classify', 'reward', 'score'}. Defaulting to 'generate'. | |
2025-06-17 21:17:25,836 - __main__ - INFO - INFO 06-17 21:17:25 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 21:17:26,054 - __main__ - WARNING - Attempt 70: Please wait for vllm server to become ready... | |
2025-06-17 21:17:27,054 - __main__ - INFO - WARNING 06-17 21:17:27 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 21:17:27,069 - __main__ - WARNING - Attempt 71: Please wait for vllm server to become ready... | |
2025-06-17 21:17:28,081 - __main__ - WARNING - Attempt 72: Please wait for vllm server to become ready... | |
2025-06-17 21:17:28,192 - __main__ - INFO - INFO 06-17 21:17:28 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:17:29,094 - __main__ - WARNING - Attempt 73: Please wait for vllm server to become ready... | |
2025-06-17 21:17:30,106 - __main__ - WARNING - Attempt 74: Please wait for vllm server to become ready... | |
2025-06-17 21:17:30,457 - __main__ - INFO - INFO 06-17 21:17:30 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 21:17:30,462 - __main__ - INFO - INFO 06-17 21:17:30 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview-FP8', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview-FP8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 21:17:30,557 - __main__ - INFO - WARNING 06-17 21:17:30 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x76c9413d9810> | |
2025-06-17 21:17:30,956 - __main__ - INFO - INFO 06-17 21:17:30 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
2025-06-17 21:17:31,120 - __main__ - WARNING - Attempt 75: Please wait for vllm server to become ready... | |
2025-06-17 21:17:31,580 - __main__ - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. | |
2025-06-17 21:17:32,135 - __main__ - WARNING - Attempt 76: Please wait for vllm server to become ready... | |
2025-06-17 21:17:32,471 - __main__ - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. | |
2025-06-17 21:17:33,148 - __main__ - WARNING - Attempt 77: Please wait for vllm server to become ready... | |
2025-06-17 21:17:34,163 - __main__ - WARNING - Attempt 78: Please wait for vllm server to become ready... | |
2025-06-17 21:17:35,177 - __main__ - WARNING - Attempt 79: Please wait for vllm server to become ready... | |
2025-06-17 21:17:35,296 - __main__ - INFO - Unused or unrecognized kwargs: return_tensors. | |
2025-06-17 21:17:35,630 - __main__ - INFO - WARNING 06-17 21:17:35 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. | |
2025-06-17 21:17:35,638 - __main__ - INFO - INFO 06-17 21:17:35 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview-FP8... | |
2025-06-17 21:17:35,811 - __main__ - INFO - INFO 06-17 21:17:35 [gpu_model_runner.py:1600] Loading model from scratch... | |
2025-06-17 21:17:35,845 - __main__ - INFO - WARNING 06-17 21:17:35 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. | |
2025-06-17 21:17:35,878 - __main__ - INFO - INFO 06-17 21:17:35 [cuda.py:252] Using Flash Attention backend on V1 engine. | |
2025-06-17 21:17:36,098 - __main__ - INFO - INFO 06-17 21:17:36 [weight_utils.py:292] Using model weights format ['*.safetensors'] | |
2025-06-17 21:17:36,191 - __main__ - WARNING - Attempt 80: Please wait for vllm server to become ready... | |
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] | |
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:00, 2.72it/s] | |
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:00<00:00, 3.83it/s] | |
2025-06-17 21:17:37,205 - __main__ - WARNING - Attempt 81: Please wait for vllm server to become ready... | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.74it/s] | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.88it/s] | |
2025-06-17 21:17:37,301 - __main__ - INFO - | |
2025-06-17 21:17:37,416 - __main__ - INFO - INFO 06-17 21:17:37 [default_loader.py:272] Loading weights took 1.16 seconds | |
2025-06-17 21:17:37,757 - __main__ - INFO - INFO 06-17 21:17:37 [gpu_model_runner.py:1624] Model loading took 9.4248 GiB and 1.618518 seconds | |
2025-06-17 21:17:38,218 - __main__ - WARNING - Attempt 82: Please wait for vllm server to become ready... | |
2025-06-17 21:17:39,116 - __main__ - INFO - INFO 06-17 21:17:39 [gpu_model_runner.py:1978] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size. | |
2025-06-17 21:17:39,232 - __main__ - WARNING - Attempt 83: Please wait for vllm server to become ready... | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] EngineCore failed to start. | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] Traceback (most recent call last): | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,792 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] self.model_runner.profile_run() | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] x = blk( | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] x = x + self.attn( | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:17:39,793 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return ApplyRotaryEmb.apply( | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] out = apply_rotary( | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] rotary_kernel[grid]( | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] self._init_handles() | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - ERROR 06-17 21:17:39 [core.py:515] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:17:39,794 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 21:17:39,794 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:17:39,794 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 21:17:39,794 - __main__ - INFO - self.run() | |
2025-06-17 21:17:39,794 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 21:17:39,794 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 21:17:39,794 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 21:17:39,794 - __main__ - INFO - raise e | |
2025-06-17 21:17:39,794 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:17:39,794 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:17:39,794 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,794 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:17:39,794 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:17:39,794 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:17:39,794 - __main__ - INFO - self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:17:39,794 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:17:39,794 - __main__ - INFO - available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:17:39,794 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:17:39,795 - __main__ - INFO - output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:17:39,795 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:17:39,795 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:17:39,795 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:17:39,795 - __main__ - INFO - self.model_runner.profile_run() | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:17:39,795 - __main__ - INFO - dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:17:39,795 - __main__ - INFO - vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:17:39,795 - __main__ - INFO - image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:39,795 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:39,795 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:17:39,795 - __main__ - INFO - x = blk( | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:39,795 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:39,795 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:17:39,795 - __main__ - INFO - x = x + self.attn( | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:17:39,795 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,795 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:17:39,795 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:17:39,795 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:17:39,796 - __main__ - INFO - q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:17:39,796 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:17:39,796 - __main__ - INFO - output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:17:39,796 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:17:39,796 - __main__ - INFO - return ApplyRotaryEmb.apply( | |
2025-06-17 21:17:39,796 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:17:39,796 - __main__ - INFO - return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:17:39,796 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:17:39,796 - __main__ - INFO - out = apply_rotary( | |
2025-06-17 21:17:39,796 - __main__ - INFO - ^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:17:39,796 - __main__ - INFO - rotary_kernel[grid]( | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:17:39,796 - __main__ - INFO - return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:17:39,796 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:17:39,796 - __main__ - INFO - kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:17:39,796 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:17:39,796 - __main__ - INFO - self._init_handles() | |
2025-06-17 21:17:39,796 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:17:39,796 - __main__ - INFO - self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:17:39,796 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:39,796 - __main__ - INFO - SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:17:40,193 - __main__ - INFO - [rank0]:[W617 21:17:40.314911881 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
2025-06-17 21:17:40,246 - __main__ - WARNING - Attempt 84: Please wait for vllm server to become ready... | |
2025-06-17 21:17:40,876 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:17:40,876 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 21:17:40,876 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 21:17:40,876 - __main__ - INFO - ^^^^^^ | |
2025-06-17 21:17:40,876 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 21:17:40,876 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 21:17:40,876 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 21:17:40,876 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 21:17:40,876 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 21:17:40,876 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 21:17:40,876 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:40,876 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 21:17:40,877 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 21:17:40,877 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 21:17:40,877 - __main__ - INFO - return await main | |
2025-06-17 21:17:40,877 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 21:17:40,877 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 21:17:40,877 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:17:40,877 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:17:40,877 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 21:17:40,877 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:17:40,877 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:17:40,877 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 21:17:40,877 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 21:17:40,877 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 21:17:40,877 - __main__ - INFO - return cls( | |
2025-06-17 21:17:40,877 - __main__ - INFO - ^^^^ | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 21:17:40,877 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 21:17:40,877 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 21:17:40,877 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 21:17:40,877 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 21:17:40,877 - __main__ - INFO - super().__init__( | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 21:17:40,877 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 21:17:40,877 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 21:17:40,877 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 21:17:40,877 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 21:17:40,877 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 21:17:40,877 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 21:17:41,259 - __main__ - WARNING - Attempt 85: Please wait for vllm server to become ready... | |
2025-06-17 21:17:41,746 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 21:17:42,275 - __main__ - WARNING - Attempt 86: Please wait for vllm server to become ready... | |
2025-06-17 21:17:43,288 - __main__ - WARNING - Attempt 87: Please wait for vllm server to become ready... | |
2025-06-17 21:17:43,636 - __main__ - INFO - INFO 06-17 21:17:43 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:17:44,302 - __main__ - WARNING - Attempt 88: Please wait for vllm server to become ready... | |
2025-06-17 21:17:45,316 - __main__ - WARNING - Attempt 89: Please wait for vllm server to become ready... | |
2025-06-17 21:17:46,330 - __main__ - WARNING - Attempt 90: Please wait for vllm server to become ready... | |
2025-06-17 21:17:47,343 - __main__ - WARNING - Attempt 91: Please wait for vllm server to become ready... | |
2025-06-17 21:17:48,159 - __main__ - INFO - INFO 06-17 21:17:48 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 21:17:48,356 - __main__ - WARNING - Attempt 92: Please wait for vllm server to become ready... | |
2025-06-17 21:17:48,722 - __main__ - INFO - INFO 06-17 21:17:48 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview-FP8', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 21:17:49,386 - __main__ - WARNING - Attempt 93: Please wait for vllm server to become ready... | |
2025-06-17 21:17:50,399 - __main__ - WARNING - Attempt 94: Please wait for vllm server to become ready... | |
2025-06-17 21:17:51,413 - __main__ - WARNING - Attempt 95: Please wait for vllm server to become ready... | |
2025-06-17 21:17:52,426 - __main__ - WARNING - Attempt 96: Please wait for vllm server to become ready... | |
2025-06-17 21:17:53,441 - __main__ - WARNING - Attempt 97: Please wait for vllm server to become ready... | |
2025-06-17 21:17:53,957 - __main__ - INFO - INFO 06-17 21:17:53 [config.py:823] This model supports multiple tasks: {'score', 'reward', 'generate', 'embed', 'classify'}. Defaulting to 'generate'. | |
2025-06-17 21:17:54,075 - __main__ - INFO - INFO 06-17 21:17:54 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 21:17:54,454 - __main__ - WARNING - Attempt 98: Please wait for vllm server to become ready... | |
2025-06-17 21:17:55,314 - __main__ - INFO - WARNING 06-17 21:17:55 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 21:17:55,467 - __main__ - WARNING - Attempt 99: Please wait for vllm server to become ready... | |
2025-06-17 21:17:56,450 - __main__ - INFO - INFO 06-17 21:17:56 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:17:56,482 - __main__ - WARNING - Attempt 100: Please wait for vllm server to become ready... | |
2025-06-17 21:17:57,495 - __main__ - WARNING - Attempt 101: Please wait for vllm server to become ready... | |
2025-06-17 21:17:58,508 - __main__ - WARNING - Attempt 102: Please wait for vllm server to become ready... | |
2025-06-17 21:17:58,685 - __main__ - INFO - INFO 06-17 21:17:58 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 21:17:58,690 - __main__ - INFO - INFO 06-17 21:17:58 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview-FP8', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview-FP8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 21:17:58,781 - __main__ - INFO - WARNING 06-17 21:17:58 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7917a4913c50> | |
2025-06-17 21:17:59,198 - __main__ - INFO - INFO 06-17 21:17:59 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
2025-06-17 21:17:59,521 - __main__ - WARNING - Attempt 103: Please wait for vllm server to become ready... | |
2025-06-17 21:17:59,676 - __main__ - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. | |
2025-06-17 21:18:00,536 - __main__ - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. | |
2025-06-17 21:18:00,536 - __main__ - WARNING - Attempt 104: Please wait for vllm server to become ready... | |
2025-06-17 21:18:01,551 - __main__ - WARNING - Attempt 105: Please wait for vllm server to become ready... | |
2025-06-17 21:18:02,564 - __main__ - WARNING - Attempt 106: Please wait for vllm server to become ready... | |
2025-06-17 21:18:03,406 - __main__ - INFO - Unused or unrecognized kwargs: return_tensors. | |
2025-06-17 21:18:03,577 - __main__ - WARNING - Attempt 107: Please wait for vllm server to become ready... | |
2025-06-17 21:18:03,735 - __main__ - INFO - WARNING 06-17 21:18:03 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. | |
2025-06-17 21:18:03,744 - __main__ - INFO - INFO 06-17 21:18:03 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview-FP8... | |
2025-06-17 21:18:03,927 - __main__ - INFO - INFO 06-17 21:18:03 [gpu_model_runner.py:1600] Loading model from scratch... | |
2025-06-17 21:18:03,955 - __main__ - INFO - WARNING 06-17 21:18:03 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. | |
2025-06-17 21:18:03,982 - __main__ - INFO - INFO 06-17 21:18:03 [cuda.py:252] Using Flash Attention backend on V1 engine. | |
2025-06-17 21:18:04,194 - __main__ - INFO - INFO 06-17 21:18:04 [weight_utils.py:292] Using model weights format ['*.safetensors'] | |
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] | |
2025-06-17 21:18:04,592 - __main__ - WARNING - Attempt 108: Please wait for vllm server to become ready... | |
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:00, 2.70it/s] | |
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:00<00:00, 3.77it/s] | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.74it/s] | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.87it/s] | |
2025-06-17 21:18:05,392 - __main__ - INFO - | |
2025-06-17 21:18:05,511 - __main__ - INFO - INFO 06-17 21:18:05 [default_loader.py:272] Loading weights took 1.16 seconds | |
2025-06-17 21:18:05,605 - __main__ - WARNING - Attempt 109: Please wait for vllm server to become ready... | |
2025-06-17 21:18:05,864 - __main__ - INFO - INFO 06-17 21:18:05 [gpu_model_runner.py:1624] Model loading took 9.4248 GiB and 1.594002 seconds | |
2025-06-17 21:18:06,618 - __main__ - WARNING - Attempt 110: Please wait for vllm server to become ready... | |
2025-06-17 21:18:07,221 - __main__ - INFO - INFO 06-17 21:18:07 [gpu_model_runner.py:1978] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size. | |
2025-06-17 21:18:07,631 - __main__ - WARNING - Attempt 111: Please wait for vllm server to become ready... | |
2025-06-17 21:18:07,877 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] EngineCore failed to start. | |
2025-06-17 21:18:07,877 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] Traceback (most recent call last): | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] self.model_runner.profile_run() | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:18:07,878 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] x = blk( | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] x = x + self.attn( | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return ApplyRotaryEmb.apply( | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,879 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] out = apply_rotary( | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] rotary_kernel[grid]( | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] self._init_handles() | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,880 - __main__ - INFO - ERROR 06-17 21:18:07 [core.py:515] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:18:07,880 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 21:18:07,880 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 21:18:07,880 - __main__ - INFO - self.run() | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 21:18:07,880 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 21:18:07,880 - __main__ - INFO - raise e | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:18:07,880 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:18:07,880 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:18:07,880 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:18:07,880 - __main__ - INFO - self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:18:07,880 - __main__ - INFO - available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:18:07,880 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:18:07,880 - __main__ - INFO - output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:18:07,880 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,880 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:18:07,880 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:18:07,880 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:18:07,881 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:18:07,881 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:18:07,881 - __main__ - INFO - self.model_runner.profile_run() | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:18:07,881 - __main__ - INFO - dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:18:07,881 - __main__ - INFO - vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:18:07,881 - __main__ - INFO - image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:07,881 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:07,881 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:18:07,881 - __main__ - INFO - x = blk( | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:07,881 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:07,881 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:18:07,881 - __main__ - INFO - x = x + self.attn( | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:07,881 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:07,881 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:18:07,881 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,881 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:18:07,881 - __main__ - INFO - q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:18:07,882 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:18:07,882 - __main__ - INFO - output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:18:07,882 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:18:07,882 - __main__ - INFO - return ApplyRotaryEmb.apply( | |
2025-06-17 21:18:07,882 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:18:07,882 - __main__ - INFO - return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:18:07,882 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:18:07,882 - __main__ - INFO - out = apply_rotary( | |
2025-06-17 21:18:07,882 - __main__ - INFO - ^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:18:07,882 - __main__ - INFO - rotary_kernel[grid]( | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:18:07,882 - __main__ - INFO - return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:18:07,882 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:18:07,882 - __main__ - INFO - kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:18:07,882 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:18:07,882 - __main__ - INFO - self._init_handles() | |
2025-06-17 21:18:07,882 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:18:07,882 - __main__ - INFO - self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:18:07,882 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:07,882 - __main__ - INFO - SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:18:08,269 - __main__ - INFO - [rank0]:[W617 21:18:08.390955660 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
2025-06-17 21:18:08,644 - __main__ - WARNING - Attempt 112: Please wait for vllm server to become ready... | |
2025-06-17 21:18:08,941 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:18:08,941 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 21:18:08,941 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 21:18:08,941 - __main__ - INFO - ^^^^^^ | |
2025-06-17 21:18:08,941 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 21:18:08,942 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 21:18:08,942 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 21:18:08,942 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 21:18:08,942 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 21:18:08,942 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 21:18:08,942 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 21:18:08,942 - __main__ - INFO - return await main | |
2025-06-17 21:18:08,942 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 21:18:08,942 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 21:18:08,942 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:18:08,942 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:18:08,942 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 21:18:08,942 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:18:08,942 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:18:08,942 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:08,942 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 21:18:08,942 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 21:18:08,942 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:08,943 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 21:18:08,943 - __main__ - INFO - return cls( | |
2025-06-17 21:18:08,943 - __main__ - INFO - ^^^^ | |
2025-06-17 21:18:08,943 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 21:18:08,943 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 21:18:08,943 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:08,943 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 21:18:08,943 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 21:18:08,943 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:08,943 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 21:18:08,943 - __main__ - INFO - super().__init__( | |
2025-06-17 21:18:08,943 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 21:18:08,943 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 21:18:08,943 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 21:18:08,943 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 21:18:08,943 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 21:18:08,943 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 21:18:08,943 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 21:18:08,943 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 21:18:08,943 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 21:18:09,659 - __main__ - WARNING - Attempt 113: Please wait for vllm server to become ready... | |
2025-06-17 21:18:09,799 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 21:18:10,675 - __main__ - WARNING - Attempt 114: Please wait for vllm server to become ready... | |
2025-06-17 21:18:11,688 - __main__ - WARNING - Attempt 115: Please wait for vllm server to become ready... | |
2025-06-17 21:18:11,688 - __main__ - INFO - INFO 06-17 21:18:11 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:18:12,701 - __main__ - WARNING - Attempt 116: Please wait for vllm server to become ready... | |
2025-06-17 21:18:13,717 - __main__ - WARNING - Attempt 117: Please wait for vllm server to become ready... | |
2025-06-17 21:18:14,729 - __main__ - WARNING - Attempt 118: Please wait for vllm server to become ready... | |
2025-06-17 21:18:15,742 - __main__ - WARNING - Attempt 119: Please wait for vllm server to become ready... | |
2025-06-17 21:18:16,116 - __main__ - INFO - INFO 06-17 21:18:16 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 21:18:16,670 - __main__ - INFO - INFO 06-17 21:18:16 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview-FP8', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 21:18:16,755 - __main__ - WARNING - Attempt 120: Please wait for vllm server to become ready... | |
2025-06-17 21:18:17,774 - __main__ - WARNING - Attempt 121: Please wait for vllm server to become ready... | |
2025-06-17 21:18:18,788 - __main__ - WARNING - Attempt 122: Please wait for vllm server to become ready... | |
2025-06-17 21:18:19,804 - __main__ - WARNING - Attempt 123: Please wait for vllm server to become ready... | |
2025-06-17 21:18:20,817 - __main__ - WARNING - Attempt 124: Please wait for vllm server to become ready... | |
2025-06-17 21:18:21,832 - __main__ - WARNING - Attempt 125: Please wait for vllm server to become ready... | |
2025-06-17 21:18:21,953 - __main__ - INFO - INFO 06-17 21:18:21 [config.py:823] This model supports multiple tasks: {'generate', 'reward', 'score', 'embed', 'classify'}. Defaulting to 'generate'. | |
2025-06-17 21:18:22,095 - __main__ - INFO - INFO 06-17 21:18:22 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 21:18:22,846 - __main__ - WARNING - Attempt 126: Please wait for vllm server to become ready... | |
2025-06-17 21:18:23,381 - __main__ - INFO - WARNING 06-17 21:18:23 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 21:18:23,860 - __main__ - WARNING - Attempt 127: Please wait for vllm server to become ready... | |
2025-06-17 21:18:24,516 - __main__ - INFO - INFO 06-17 21:18:24 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 21:18:24,873 - __main__ - WARNING - Attempt 128: Please wait for vllm server to become ready... | |
2025-06-17 21:18:25,887 - __main__ - WARNING - Attempt 129: Please wait for vllm server to become ready... | |
2025-06-17 21:18:26,725 - __main__ - INFO - INFO 06-17 21:18:26 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 21:18:26,731 - __main__ - INFO - INFO 06-17 21:18:26 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview-FP8', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview-FP8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 21:18:26,822 - __main__ - INFO - WARNING 06-17 21:18:26 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x72f5f1114c90> | |
2025-06-17 21:18:26,900 - __main__ - WARNING - Attempt 130: Please wait for vllm server to become ready... | |
2025-06-17 21:18:27,214 - __main__ - INFO - INFO 06-17 21:18:27 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
2025-06-17 21:18:27,722 - __main__ - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. | |
2025-06-17 21:18:27,912 - __main__ - WARNING - Attempt 131: Please wait for vllm server to become ready... | |
2025-06-17 21:18:28,574 - __main__ - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. | |
2025-06-17 21:18:28,927 - __main__ - WARNING - Attempt 132: Please wait for vllm server to become ready... | |
2025-06-17 21:18:29,941 - __main__ - WARNING - Attempt 133: Please wait for vllm server to become ready... | |
2025-06-17 21:18:30,955 - __main__ - WARNING - Attempt 134: Please wait for vllm server to become ready... | |
2025-06-17 21:18:31,371 - __main__ - INFO - Unused or unrecognized kwargs: return_tensors. | |
2025-06-17 21:18:31,707 - __main__ - INFO - WARNING 06-17 21:18:31 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. | |
2025-06-17 21:18:31,715 - __main__ - INFO - INFO 06-17 21:18:31 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview-FP8... | |
2025-06-17 21:18:31,894 - __main__ - INFO - INFO 06-17 21:18:31 [gpu_model_runner.py:1600] Loading model from scratch... | |
2025-06-17 21:18:31,928 - __main__ - INFO - WARNING 06-17 21:18:31 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. | |
2025-06-17 21:18:31,969 - __main__ - INFO - INFO 06-17 21:18:31 [cuda.py:252] Using Flash Attention backend on V1 engine. | |
2025-06-17 21:18:31,969 - __main__ - WARNING - Attempt 135: Please wait for vllm server to become ready... | |
2025-06-17 21:18:32,178 - __main__ - INFO - INFO 06-17 21:18:32 [weight_utils.py:292] Using model weights format ['*.safetensors'] | |
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] | |
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:00, 2.73it/s] | |
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:00<00:00, 3.81it/s] | |
2025-06-17 21:18:32,984 - __main__ - WARNING - Attempt 136: Please wait for vllm server to become ready... | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.73it/s] | |
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:01<00:00, 2.87it/s] | |
2025-06-17 21:18:33,381 - __main__ - INFO - | |
2025-06-17 21:18:33,509 - __main__ - INFO - INFO 06-17 21:18:33 [default_loader.py:272] Loading weights took 1.17 seconds | |
2025-06-17 21:18:33,844 - __main__ - INFO - INFO 06-17 21:18:33 [gpu_model_runner.py:1624] Model loading took 9.4248 GiB and 1.627045 seconds | |
2025-06-17 21:18:33,997 - __main__ - WARNING - Attempt 137: Please wait for vllm server to become ready... | |
2025-06-17 21:18:35,011 - __main__ - WARNING - Attempt 138: Please wait for vllm server to become ready... | |
2025-06-17 21:18:35,195 - __main__ - INFO - INFO 06-17 21:18:35 [gpu_model_runner.py:1978] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size. | |
2025-06-17 21:18:35,860 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] EngineCore failed to start. | |
2025-06-17 21:18:35,860 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] Traceback (most recent call last): | |
2025-06-17 21:18:35,860 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:18:35,860 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:18:35,860 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] self.model_runner.profile_run() | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] x = blk( | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:35,861 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] x = x + self.attn( | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return forward_call(*args, **kwargs) | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return ApplyRotaryEmb.apply( | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] out = apply_rotary( | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] rotary_kernel[grid]( | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] self._init_handles() | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,862 - __main__ - INFO - ERROR 06-17 21:18:35 [core.py:515] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:18:35,862 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 21:18:35,862 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:18:35,862 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 21:18:35,862 - __main__ - INFO - self.run() | |
2025-06-17 21:18:35,862 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 21:18:35,862 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 21:18:35,863 - __main__ - INFO - raise e | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 21:18:35,863 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 21:18:35,863 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__ | |
2025-06-17 21:18:35,863 - __main__ - INFO - self._initialize_kv_caches(vllm_config) | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches | |
2025-06-17 21:18:35,863 - __main__ - INFO - available_gpu_memory = self.model_executor.determine_available_memory() | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory | |
2025-06-17 21:18:35,863 - __main__ - INFO - output = self.collective_rpc("determine_available_memory") | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 21:18:35,863 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 21:18:35,863 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context | |
2025-06-17 21:18:35,863 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory | |
2025-06-17 21:18:35,863 - __main__ - INFO - self.model_runner.profile_run() | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2001, in profile_run | |
2025-06-17 21:18:35,863 - __main__ - INFO - dummy_encoder_outputs = self.model.get_multimodal_embeddings( | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1277, in get_multimodal_embeddings | |
2025-06-17 21:18:35,863 - __main__ - INFO - vision_embeddings = self._process_image_input(image_input) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1214, in _process_image_input | |
2025-06-17 21:18:35,863 - __main__ - INFO - image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:35,863 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:35,863 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 652, in forward | |
2025-06-17 21:18:35,863 - __main__ - INFO - x = blk( | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^ | |
2025-06-17 21:18:35,863 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:35,863 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:35,863 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:35,864 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 419, in forward | |
2025-06-17 21:18:35,864 - __main__ - INFO - x = x + self.attn( | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl | |
2025-06-17 21:18:35,864 - __main__ - INFO - return self._call_impl(*args, **kwargs) | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl | |
2025-06-17 21:18:35,864 - __main__ - INFO - return forward_call(*args, **kwargs) | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 323, in forward | |
2025-06-17 21:18:35,864 - __main__ - INFO - q = apply_rotary_pos_emb_vision(q, rotary_pos_emb) | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_vl.py", line 240, in apply_rotary_pos_emb_vision | |
2025-06-17 21:18:35,864 - __main__ - INFO - output = apply_rotary_emb(t_, cos, sin).type_as(t) | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 124, in apply_rotary_emb | |
2025-06-17 21:18:35,864 - __main__ - INFO - return ApplyRotaryEmb.apply( | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply | |
2025-06-17 21:18:35,864 - __main__ - INFO - return super().apply(*args, **kwargs) # type: ignore[misc] | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 50, in forward | |
2025-06-17 21:18:35,864 - __main__ - INFO - out = apply_rotary( | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 203, in apply_rotary | |
2025-06-17 21:18:35,864 - __main__ - INFO - rotary_kernel[grid]( | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in <lambda> | |
2025-06-17 21:18:35,864 - __main__ - INFO - return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 591, in run | |
2025-06-17 21:18:35,864 - __main__ - INFO - kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 413, in __getattribute__ | |
2025-06-17 21:18:35,864 - __main__ - INFO - self._init_handles() | |
2025-06-17 21:18:35,864 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 408, in _init_handles | |
2025-06-17 21:18:35,864 - __main__ - INFO - self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary( | |
2025-06-17 21:18:35,864 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:35,864 - __main__ - INFO - SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats | |
2025-06-17 21:18:36,025 - __main__ - WARNING - Attempt 139: Please wait for vllm server to become ready... | |
2025-06-17 21:18:36,256 - __main__ - INFO - [rank0]:[W617 21:18:36.377544288 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) | |
2025-06-17 21:18:36,938 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 21:18:36,938 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 21:18:36,938 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 21:18:36,938 - __main__ - INFO - ^^^^^^ | |
2025-06-17 21:18:36,938 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 21:18:36,938 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 21:18:36,938 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 21:18:36,938 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 21:18:36,938 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 21:18:36,938 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 21:18:36,938 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:36,938 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 21:18:36,938 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 21:18:36,938 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 21:18:36,939 - __main__ - INFO - return await main | |
2025-06-17 21:18:36,939 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 21:18:36,939 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 21:18:36,939 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:18:36,939 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:18:36,939 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 21:18:36,939 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 21:18:36,939 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 21:18:36,939 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 21:18:36,939 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 21:18:36,939 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 21:18:36,939 - __main__ - INFO - return cls( | |
2025-06-17 21:18:36,939 - __main__ - INFO - ^^^^ | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 21:18:36,939 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 21:18:36,939 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 21:18:36,939 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 21:18:36,939 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 21:18:36,939 - __main__ - INFO - super().__init__( | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 21:18:36,939 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 21:18:36,939 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 21:18:36,939 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 21:18:36,939 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 21:18:36,939 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 21:18:36,939 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 21:18:37,038 - __main__ - WARNING - Attempt 140: Please wait for vllm server to become ready... | |
2025-06-17 21:18:37,786 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 21:18:37,787 - __main__ - ERROR - Ended up starting the vllm server more than 5 times, cancelling pipeline | |
2025-06-17 21:18:37,787 - __main__ - ERROR - | |
2025-06-17 21:18:37,787 - __main__ - ERROR - Please make sure vllm is installed according to the latest instructions here: https://docs.vllm.ai/en/stable/getting_started/installation/gpu.html | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x7b905815a480> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x7b8ea0f1f2e0> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x7b8ea3e7b9c0> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x7b8ea3e7bb00> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x7b8ea0ee2200> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
ERROR:asyncio:Task exception was never retrieved | |
future: <Task finished name='Task-2' coro=<vllm_server_host() done, defined at /home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py:670> exception=SystemExit(1)> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 190, in run | |
return runner.run(main) | |
^^^^^^^^^^^^^^^^ | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
return self._loop.run_until_complete(task) | |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete | |
self.run_forever() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 608, in run_forever | |
self._run_once() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once | |
handle._run() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/events.py", line 84, in _run | |
self._context.run(self._callback, *self._args) | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 685, in vllm_server_host | |
sys.exit(1) | |
SystemExit: 1 | |
error: Recipe `convert-pdf-to-markdown` failed on line 21 with exit code 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment