Created
June 17, 2025 22:09
-
-
Save hongbo-miao/010da8631215047f17c01a5acd8fd53a to your computer and use it in GitHub Desktop.
olmOCR log
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
INFO:olmocr.check:pdftoppm is installed and working. | |
2025-06-17 14:58:03,862 - __main__ - INFO - Got --pdfs argument, going to add to the work queue | |
2025-06-17 14:58:03,862 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document | |
2025-06-17 14:58:03,862 - __main__ - INFO - Found 1 total pdf paths to add | |
Sampling PDFs to calculate optimal length: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 337.81it/s] | |
2025-06-17 14:58:03,866 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00 | |
INFO:olmocr.work_queue:Found 1 total paths | |
INFO:olmocr.work_queue:0 new paths to add to the workspace | |
2025-06-17 14:58:03,963 - __main__ - INFO - Starting pipeline with PID 2452147 | |
2025-06-17 14:58:03,963 - __main__ - INFO - Downloading model with hugging face 'allenai/olmOCR-7B-0225-preview' | |
Fetching 15 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 211122.68it/s] | |
INFO:olmocr.work_queue:Initialized local queue with 1 work items | |
2025-06-17 14:58:04,195 - __main__ - WARNING - Attempt 1: Please wait for vllm server to become ready... | |
2025-06-17 14:58:05,211 - __main__ - WARNING - Attempt 2: Please wait for vllm server to become ready... | |
2025-06-17 14:58:06,225 - __main__ - WARNING - Attempt 3: Please wait for vllm server to become ready... | |
2025-06-17 14:58:07,239 - __main__ - WARNING - Attempt 4: Please wait for vllm server to become ready... | |
2025-06-17 14:58:07,729 - __main__ - INFO - INFO 06-17 14:58:07 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:58:08,253 - __main__ - WARNING - Attempt 5: Please wait for vllm server to become ready... | |
2025-06-17 14:58:09,267 - __main__ - WARNING - Attempt 6: Please wait for vllm server to become ready... | |
2025-06-17 14:58:10,282 - __main__ - WARNING - Attempt 7: Please wait for vllm server to become ready... | |
2025-06-17 14:58:11,297 - __main__ - WARNING - Attempt 8: Please wait for vllm server to become ready... | |
2025-06-17 14:58:12,312 - __main__ - WARNING - Attempt 9: Please wait for vllm server to become ready... | |
2025-06-17 14:58:13,328 - __main__ - WARNING - Attempt 10: Please wait for vllm server to become ready... | |
2025-06-17 14:58:13,349 - __main__ - INFO - INFO 06-17 14:58:13 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 14:58:13,959 - __main__ - INFO - INFO 06-17 14:58:13 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 14:58:14,342 - __main__ - WARNING - Attempt 11: Please wait for vllm server to become ready... | |
2025-06-17 14:58:15,356 - __main__ - WARNING - Attempt 12: Please wait for vllm server to become ready... | |
2025-06-17 14:58:16,371 - __main__ - WARNING - Attempt 13: Please wait for vllm server to become ready... | |
2025-06-17 14:58:17,384 - __main__ - WARNING - Attempt 14: Please wait for vllm server to become ready... | |
2025-06-17 14:58:18,398 - __main__ - WARNING - Attempt 15: Please wait for vllm server to become ready... | |
2025-06-17 14:58:19,413 - __main__ - WARNING - Attempt 16: Please wait for vllm server to become ready... | |
2025-06-17 14:58:19,434 - __main__ - INFO - INFO 06-17 14:58:19 [config.py:823] This model supports multiple tasks: {'embed', 'generate', 'reward', 'classify', 'score'}. Defaulting to 'generate'. | |
2025-06-17 14:58:19,508 - __main__ - INFO - INFO 06-17 14:58:19 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 14:58:20,426 - __main__ - WARNING - Attempt 17: Please wait for vllm server to become ready... | |
2025-06-17 14:58:21,040 - __main__ - INFO - WARNING 06-17 14:58:21 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 14:58:21,442 - __main__ - WARNING - Attempt 18: Please wait for vllm server to become ready... | |
2025-06-17 14:58:22,223 - __main__ - INFO - INFO 06-17 14:58:22 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:58:22,456 - __main__ - WARNING - Attempt 19: Please wait for vllm server to become ready... | |
2025-06-17 14:58:23,469 - __main__ - WARNING - Attempt 20: Please wait for vllm server to become ready... | |
2025-06-17 14:58:24,463 - __main__ - INFO - INFO 06-17 14:58:24 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 14:58:24,469 - __main__ - INFO - INFO 06-17 14:58:24 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 14:58:24,482 - __main__ - WARNING - Attempt 21: Please wait for vllm server to become ready... | |
2025-06-17 14:58:24,866 - __main__ - INFO - WARNING 06-17 14:58:24 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x77c2c85c4790> | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] EngineCore failed to start. | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] Traceback (most recent call last): | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] self._init_executor() | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] self.collective_rpc("init_device") | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:58:25,224 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 14:58:25,225 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,225 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:58:25,225 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] self.worker.init_device() # type: ignore | |
2025-06-17 14:58:25,225 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,225 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:58:25,225 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] raise ValueError( | |
2025-06-17 14:58:25,225 - __main__ - INFO - ERROR 06-17 14:58:25 [core.py:515] ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:58:25,225 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 14:58:25,225 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 14:58:25,225 - __main__ - INFO - self.run() | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 14:58:25,225 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 14:58:25,225 - __main__ - INFO - raise e | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:58:25,225 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:58:25,225 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:58:25,225 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:58:25,225 - __main__ - INFO - self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:58:25,225 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:58:25,225 - __main__ - INFO - self._init_executor() | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:58:25,225 - __main__ - INFO - self.collective_rpc("init_device") | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:58:25,225 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:58:25,225 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:58:25,225 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 14:58:25,225 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:58:25,225 - __main__ - INFO - self.worker.init_device() # type: ignore | |
2025-06-17 14:58:25,225 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:25,225 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:58:25,225 - __main__ - INFO - raise ValueError( | |
2025-06-17 14:58:25,225 - __main__ - INFO - ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:58:25,497 - __main__ - WARNING - Attempt 22: Please wait for vllm server to become ready... | |
2025-06-17 14:58:26,293 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:58:26,293 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 14:58:26,293 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 14:58:26,293 - __main__ - INFO - ^^^^^^ | |
2025-06-17 14:58:26,293 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 14:58:26,293 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 14:58:26,293 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 14:58:26,293 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 14:58:26,293 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 14:58:26,293 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 14:58:26,293 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:26,293 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 14:58:26,293 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 14:58:26,293 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:26,293 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 14:58:26,293 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 14:58:26,293 - __main__ - INFO - return await main | |
2025-06-17 14:58:26,293 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 14:58:26,294 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 14:58:26,294 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 14:58:26,294 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 14:58:26,294 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 14:58:26,294 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:58:26,302 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:58:26,302 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:26,302 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 14:58:26,302 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 14:58:26,302 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:58:26,302 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:58:26,302 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:26,302 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 14:58:26,302 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 14:58:26,302 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:26,303 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 14:58:26,303 - __main__ - INFO - return cls( | |
2025-06-17 14:58:26,303 - __main__ - INFO - ^^^^ | |
2025-06-17 14:58:26,303 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 14:58:26,303 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 14:58:26,303 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:26,303 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 14:58:26,303 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 14:58:26,303 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:26,303 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 14:58:26,303 - __main__ - INFO - super().__init__( | |
2025-06-17 14:58:26,303 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 14:58:26,303 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 14:58:26,303 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 14:58:26,303 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 14:58:26,303 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 14:58:26,303 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 14:58:26,303 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 14:58:26,303 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 14:58:26,303 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 14:58:26,510 - __main__ - WARNING - Attempt 23: Please wait for vllm server to become ready... | |
2025-06-17 14:58:27,179 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 14:58:27,534 - __main__ - WARNING - Attempt 24: Please wait for vllm server to become ready... | |
2025-06-17 14:58:28,547 - __main__ - WARNING - Attempt 25: Please wait for vllm server to become ready... | |
2025-06-17 14:58:29,080 - __main__ - INFO - INFO 06-17 14:58:29 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:58:29,560 - __main__ - WARNING - Attempt 26: Please wait for vllm server to become ready... | |
2025-06-17 14:58:30,574 - __main__ - WARNING - Attempt 27: Please wait for vllm server to become ready... | |
2025-06-17 14:58:31,588 - __main__ - WARNING - Attempt 28: Please wait for vllm server to become ready... | |
2025-06-17 14:58:32,602 - __main__ - WARNING - Attempt 29: Please wait for vllm server to become ready... | |
2025-06-17 14:58:33,618 - __main__ - WARNING - Attempt 30: Please wait for vllm server to become ready... | |
2025-06-17 14:58:33,653 - __main__ - INFO - INFO 06-17 14:58:33 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 14:58:34,235 - __main__ - INFO - INFO 06-17 14:58:34 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 14:58:34,632 - __main__ - WARNING - Attempt 31: Please wait for vllm server to become ready... | |
2025-06-17 14:58:35,646 - __main__ - WARNING - Attempt 32: Please wait for vllm server to become ready... | |
2025-06-17 14:58:36,660 - __main__ - WARNING - Attempt 33: Please wait for vllm server to become ready... | |
2025-06-17 14:58:37,673 - __main__ - WARNING - Attempt 34: Please wait for vllm server to become ready... | |
2025-06-17 14:58:38,686 - __main__ - WARNING - Attempt 35: Please wait for vllm server to become ready... | |
2025-06-17 14:58:39,518 - __main__ - INFO - INFO 06-17 14:58:39 [config.py:823] This model supports multiple tasks: {'reward', 'classify', 'score', 'generate', 'embed'}. Defaulting to 'generate'. | |
2025-06-17 14:58:39,591 - __main__ - INFO - INFO 06-17 14:58:39 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 14:58:39,699 - __main__ - WARNING - Attempt 36: Please wait for vllm server to become ready... | |
2025-06-17 14:58:40,712 - __main__ - WARNING - Attempt 37: Please wait for vllm server to become ready... | |
2025-06-17 14:58:40,752 - __main__ - INFO - WARNING 06-17 14:58:40 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 14:58:41,726 - __main__ - WARNING - Attempt 38: Please wait for vllm server to become ready... | |
2025-06-17 14:58:41,913 - __main__ - INFO - INFO 06-17 14:58:41 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:58:42,739 - __main__ - WARNING - Attempt 39: Please wait for vllm server to become ready... | |
2025-06-17 14:58:43,753 - __main__ - WARNING - Attempt 40: Please wait for vllm server to become ready... | |
2025-06-17 14:58:44,119 - __main__ - INFO - INFO 06-17 14:58:44 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 14:58:44,124 - __main__ - INFO - INFO 06-17 14:58:44 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 14:58:44,218 - __main__ - INFO - WARNING 06-17 14:58:44 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7bca317680d0> | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] EngineCore failed to start. | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] Traceback (most recent call last): | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] self._init_executor() | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] self.collective_rpc("init_device") | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:58:44,587 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] self.worker.init_device() # type: ignore | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] raise ValueError( | |
2025-06-17 14:58:44,588 - __main__ - INFO - ERROR 06-17 14:58:44 [core.py:515] ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:58:44,588 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 14:58:44,588 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 14:58:44,588 - __main__ - INFO - self.run() | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 14:58:44,588 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 14:58:44,588 - __main__ - INFO - raise e | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:58:44,588 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:58:44,588 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:58:44,588 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:58:44,588 - __main__ - INFO - self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:58:44,588 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:58:44,588 - __main__ - INFO - self._init_executor() | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:58:44,588 - __main__ - INFO - self.collective_rpc("init_device") | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:58:44,588 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:58:44,588 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:58:44,588 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 14:58:44,588 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:58:44,588 - __main__ - INFO - self.worker.init_device() # type: ignore | |
2025-06-17 14:58:44,588 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:44,588 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:58:44,588 - __main__ - INFO - raise ValueError( | |
2025-06-17 14:58:44,588 - __main__ - INFO - ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:58:44,766 - __main__ - WARNING - Attempt 41: Please wait for vllm server to become ready... | |
2025-06-17 14:58:45,576 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:58:45,576 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 14:58:45,576 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 14:58:45,576 - __main__ - INFO - ^^^^^^ | |
2025-06-17 14:58:45,576 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 14:58:45,576 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 14:58:45,576 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 14:58:45,576 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 14:58:45,576 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 14:58:45,576 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 14:58:45,576 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:45,576 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 14:58:45,576 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 14:58:45,576 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:45,576 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 14:58:45,576 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 14:58:45,576 - __main__ - INFO - return await main | |
2025-06-17 14:58:45,576 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 14:58:45,577 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 14:58:45,577 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:58:45,577 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:58:45,577 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 14:58:45,577 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:58:45,577 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:58:45,577 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 14:58:45,577 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 14:58:45,577 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 14:58:45,577 - __main__ - INFO - return cls( | |
2025-06-17 14:58:45,577 - __main__ - INFO - ^^^^ | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 14:58:45,577 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 14:58:45,577 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 14:58:45,577 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 14:58:45,577 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 14:58:45,577 - __main__ - INFO - super().__init__( | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 14:58:45,577 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 14:58:45,577 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 14:58:45,577 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 14:58:45,577 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 14:58:45,577 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 14:58:45,577 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 14:58:45,780 - __main__ - WARNING - Attempt 42: Please wait for vllm server to become ready... | |
2025-06-17 14:58:46,407 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 14:58:46,797 - __main__ - WARNING - Attempt 43: Please wait for vllm server to become ready... | |
2025-06-17 14:58:47,811 - __main__ - WARNING - Attempt 44: Please wait for vllm server to become ready... | |
2025-06-17 14:58:48,302 - __main__ - INFO - INFO 06-17 14:58:48 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:58:48,825 - __main__ - WARNING - Attempt 45: Please wait for vllm server to become ready... | |
2025-06-17 14:58:49,840 - __main__ - WARNING - Attempt 46: Please wait for vllm server to become ready... | |
2025-06-17 14:58:50,854 - __main__ - WARNING - Attempt 47: Please wait for vllm server to become ready... | |
2025-06-17 14:58:51,869 - __main__ - WARNING - Attempt 48: Please wait for vllm server to become ready... | |
2025-06-17 14:58:52,749 - __main__ - INFO - INFO 06-17 14:58:52 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 14:58:52,883 - __main__ - WARNING - Attempt 49: Please wait for vllm server to become ready... | |
2025-06-17 14:58:53,305 - __main__ - INFO - INFO 06-17 14:58:53 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 14:58:53,899 - __main__ - WARNING - Attempt 50: Please wait for vllm server to become ready... | |
2025-06-17 14:58:54,911 - __main__ - WARNING - Attempt 51: Please wait for vllm server to become ready... | |
2025-06-17 14:58:55,926 - __main__ - WARNING - Attempt 52: Please wait for vllm server to become ready... | |
2025-06-17 14:58:56,939 - __main__ - WARNING - Attempt 53: Please wait for vllm server to become ready... | |
2025-06-17 14:58:57,955 - __main__ - WARNING - Attempt 54: Please wait for vllm server to become ready... | |
2025-06-17 14:58:58,619 - __main__ - INFO - INFO 06-17 14:58:58 [config.py:823] This model supports multiple tasks: {'generate', 'embed', 'classify', 'reward', 'score'}. Defaulting to 'generate'. | |
2025-06-17 14:58:58,705 - __main__ - INFO - INFO 06-17 14:58:58 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 14:58:58,970 - __main__ - WARNING - Attempt 55: Please wait for vllm server to become ready... | |
2025-06-17 14:58:59,879 - __main__ - INFO - WARNING 06-17 14:58:59 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 14:58:59,983 - __main__ - WARNING - Attempt 56: Please wait for vllm server to become ready... | |
2025-06-17 14:59:00,997 - __main__ - WARNING - Attempt 57: Please wait for vllm server to become ready... | |
2025-06-17 14:59:01,014 - __main__ - INFO - INFO 06-17 14:59:01 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:59:02,011 - __main__ - WARNING - Attempt 58: Please wait for vllm server to become ready... | |
2025-06-17 14:59:03,025 - __main__ - WARNING - Attempt 59: Please wait for vllm server to become ready... | |
2025-06-17 14:59:03,255 - __main__ - INFO - INFO 06-17 14:59:03 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 14:59:03,260 - __main__ - INFO - INFO 06-17 14:59:03 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 14:59:03,352 - __main__ - INFO - WARNING 06-17 14:59:03 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7943fee27ed0> | |
2025-06-17 14:59:03,721 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] EngineCore failed to start. | |
2025-06-17 14:59:03,721 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] Traceback (most recent call last): | |
2025-06-17 14:59:03,721 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:59:03,721 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:59:03,721 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,721 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] self._init_executor() | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] self.collective_rpc("init_device") | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] self.worker.init_device() # type: ignore | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] raise ValueError( | |
2025-06-17 14:59:03,722 - __main__ - INFO - ERROR 06-17 14:59:03 [core.py:515] ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:59:03,722 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 14:59:03,722 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 14:59:03,722 - __main__ - INFO - self.run() | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 14:59:03,722 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 14:59:03,722 - __main__ - INFO - raise e | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:59:03,722 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:59:03,722 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:59:03,722 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:59:03,722 - __main__ - INFO - self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:59:03,722 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:59:03,722 - __main__ - INFO - self._init_executor() | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:59:03,722 - __main__ - INFO - self.collective_rpc("init_device") | |
2025-06-17 14:59:03,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:59:03,722 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:59:03,722 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:59:03,723 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 14:59:03,723 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:59:03,723 - __main__ - INFO - self.worker.init_device() # type: ignore | |
2025-06-17 14:59:03,723 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:03,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:59:03,723 - __main__ - INFO - raise ValueError( | |
2025-06-17 14:59:03,723 - __main__ - INFO - ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:59:04,038 - __main__ - WARNING - Attempt 60: Please wait for vllm server to become ready... | |
2025-06-17 14:59:04,722 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 14:59:04,722 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 14:59:04,722 - __main__ - INFO - ^^^^^^ | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 14:59:04,722 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 14:59:04,722 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 14:59:04,722 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 14:59:04,722 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 14:59:04,722 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 14:59:04,722 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 14:59:04,722 - __main__ - INFO - return await main | |
2025-06-17 14:59:04,722 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 14:59:04,722 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 14:59:04,722 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:59:04,722 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:59:04,722 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:04,722 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 14:59:04,723 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:59:04,723 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:59:04,723 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 14:59:04,723 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 14:59:04,723 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 14:59:04,723 - __main__ - INFO - return cls( | |
2025-06-17 14:59:04,723 - __main__ - INFO - ^^^^ | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 14:59:04,723 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 14:59:04,723 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 14:59:04,723 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:04,723 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 14:59:04,723 - __main__ - INFO - super().__init__( | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 14:59:04,723 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 14:59:04,723 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 14:59:04,723 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 14:59:04,723 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 14:59:04,723 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 14:59:04,723 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 14:59:05,051 - __main__ - WARNING - Attempt 61: Please wait for vllm server to become ready... | |
2025-06-17 14:59:05,539 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 14:59:06,067 - __main__ - WARNING - Attempt 62: Please wait for vllm server to become ready... | |
2025-06-17 14:59:07,080 - __main__ - WARNING - Attempt 63: Please wait for vllm server to become ready... | |
2025-06-17 14:59:07,437 - __main__ - INFO - INFO 06-17 14:59:07 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:59:08,096 - __main__ - WARNING - Attempt 64: Please wait for vllm server to become ready... | |
2025-06-17 14:59:09,108 - __main__ - WARNING - Attempt 65: Please wait for vllm server to become ready... | |
2025-06-17 14:59:10,123 - __main__ - WARNING - Attempt 66: Please wait for vllm server to become ready... | |
2025-06-17 14:59:11,136 - __main__ - WARNING - Attempt 67: Please wait for vllm server to become ready... | |
2025-06-17 14:59:11,878 - __main__ - INFO - INFO 06-17 14:59:11 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 14:59:12,150 - __main__ - WARNING - Attempt 68: Please wait for vllm server to become ready... | |
2025-06-17 14:59:12,455 - __main__ - INFO - INFO 06-17 14:59:12 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 14:59:13,163 - __main__ - WARNING - Attempt 69: Please wait for vllm server to become ready... | |
2025-06-17 14:59:14,177 - __main__ - WARNING - Attempt 70: Please wait for vllm server to become ready... | |
2025-06-17 14:59:15,190 - __main__ - WARNING - Attempt 71: Please wait for vllm server to become ready... | |
2025-06-17 14:59:16,204 - __main__ - WARNING - Attempt 72: Please wait for vllm server to become ready... | |
2025-06-17 14:59:17,217 - __main__ - WARNING - Attempt 73: Please wait for vllm server to become ready... | |
2025-06-17 14:59:17,927 - __main__ - INFO - INFO 06-17 14:59:17 [config.py:823] This model supports multiple tasks: {'reward', 'classify', 'generate', 'embed', 'score'}. Defaulting to 'generate'. | |
2025-06-17 14:59:17,998 - __main__ - INFO - INFO 06-17 14:59:17 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 14:59:18,230 - __main__ - WARNING - Attempt 74: Please wait for vllm server to become ready... | |
2025-06-17 14:59:19,157 - __main__ - INFO - WARNING 06-17 14:59:19 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 14:59:19,244 - __main__ - WARNING - Attempt 75: Please wait for vllm server to become ready... | |
2025-06-17 14:59:20,257 - __main__ - WARNING - Attempt 76: Please wait for vllm server to become ready... | |
2025-06-17 14:59:20,316 - __main__ - INFO - INFO 06-17 14:59:20 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:59:21,272 - __main__ - WARNING - Attempt 77: Please wait for vllm server to become ready... | |
2025-06-17 14:59:22,285 - __main__ - WARNING - Attempt 78: Please wait for vllm server to become ready... | |
2025-06-17 14:59:22,594 - __main__ - INFO - INFO 06-17 14:59:22 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 14:59:22,600 - __main__ - INFO - INFO 06-17 14:59:22 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 14:59:22,698 - __main__ - INFO - WARNING 06-17 14:59:22 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x78b845a8cdd0> | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] EngineCore failed to start. | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] Traceback (most recent call last): | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,022 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] self._init_executor() | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] self.collective_rpc("init_device") | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] self.worker.init_device() # type: ignore | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] raise ValueError( | |
2025-06-17 14:59:23,023 - __main__ - INFO - ERROR 06-17 14:59:23 [core.py:515] ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:59:23,023 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 14:59:23,023 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 14:59:23,023 - __main__ - INFO - self.run() | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 14:59:23,023 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 14:59:23,023 - __main__ - INFO - raise e | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:59:23,023 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:59:23,023 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:59:23,023 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:59:23,023 - __main__ - INFO - self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:59:23,023 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:59:23,023 - __main__ - INFO - self._init_executor() | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:59:23,023 - __main__ - INFO - self.collective_rpc("init_device") | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:59:23,023 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:59:23,023 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:59:23,023 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 14:59:23,023 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,023 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:59:23,023 - __main__ - INFO - self.worker.init_device() # type: ignore | |
2025-06-17 14:59:23,023 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,024 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:59:23,024 - __main__ - INFO - raise ValueError( | |
2025-06-17 14:59:23,024 - __main__ - INFO - ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:59:23,298 - __main__ - WARNING - Attempt 79: Please wait for vllm server to become ready... | |
2025-06-17 14:59:23,998 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 14:59:23,998 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 14:59:23,998 - __main__ - INFO - ^^^^^^ | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 14:59:23,998 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 14:59:23,998 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 14:59:23,998 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 14:59:23,998 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 14:59:23,998 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 14:59:23,998 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 14:59:23,998 - __main__ - INFO - return await main | |
2025-06-17 14:59:23,998 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 14:59:23,998 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 14:59:23,998 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:59:23,998 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:59:23,998 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 14:59:23,998 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:59:23,998 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:59:23,998 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 14:59:23,998 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 14:59:23,998 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,998 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 14:59:23,998 - __main__ - INFO - return cls( | |
2025-06-17 14:59:23,999 - __main__ - INFO - ^^^^ | |
2025-06-17 14:59:23,999 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 14:59:23,999 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 14:59:23,999 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,999 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 14:59:23,999 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:23,999 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:23,999 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 14:59:23,999 - __main__ - INFO - super().__init__( | |
2025-06-17 14:59:23,999 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 14:59:23,999 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 14:59:23,999 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 14:59:23,999 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 14:59:23,999 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 14:59:23,999 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 14:59:23,999 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 14:59:23,999 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 14:59:23,999 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 14:59:24,312 - __main__ - WARNING - Attempt 80: Please wait for vllm server to become ready... | |
2025-06-17 14:59:24,804 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 14:59:25,327 - __main__ - WARNING - Attempt 81: Please wait for vllm server to become ready... | |
2025-06-17 14:59:26,340 - __main__ - WARNING - Attempt 82: Please wait for vllm server to become ready... | |
2025-06-17 14:59:26,694 - __main__ - INFO - INFO 06-17 14:59:26 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:59:27,354 - __main__ - WARNING - Attempt 83: Please wait for vllm server to become ready... | |
2025-06-17 14:59:28,368 - __main__ - WARNING - Attempt 84: Please wait for vllm server to become ready... | |
2025-06-17 14:59:29,381 - __main__ - WARNING - Attempt 85: Please wait for vllm server to become ready... | |
2025-06-17 14:59:30,396 - __main__ - WARNING - Attempt 86: Please wait for vllm server to become ready... | |
2025-06-17 14:59:31,116 - __main__ - INFO - INFO 06-17 14:59:31 [api_server.py:1287] vLLM API server version 0.9.1 | |
2025-06-17 14:59:31,409 - __main__ - WARNING - Attempt 87: Please wait for vllm server to become ready... | |
2025-06-17 14:59:31,680 - __main__ - INFO - INFO 06-17 14:59:31 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} | |
2025-06-17 14:59:32,422 - __main__ - WARNING - Attempt 88: Please wait for vllm server to become ready... | |
2025-06-17 14:59:33,436 - __main__ - WARNING - Attempt 89: Please wait for vllm server to become ready... | |
2025-06-17 14:59:34,449 - __main__ - WARNING - Attempt 90: Please wait for vllm server to become ready... | |
2025-06-17 14:59:35,463 - __main__ - WARNING - Attempt 91: Please wait for vllm server to become ready... | |
2025-06-17 14:59:36,477 - __main__ - WARNING - Attempt 92: Please wait for vllm server to become ready... | |
2025-06-17 14:59:36,995 - __main__ - INFO - INFO 06-17 14:59:36 [config.py:823] This model supports multiple tasks: {'classify', 'score', 'embed', 'reward', 'generate'}. Defaulting to 'generate'. | |
2025-06-17 14:59:37,067 - __main__ - INFO - INFO 06-17 14:59:37 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. | |
2025-06-17 14:59:37,492 - __main__ - WARNING - Attempt 93: Please wait for vllm server to become ready... | |
2025-06-17 14:59:38,226 - __main__ - INFO - WARNING 06-17 14:59:38 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 | |
2025-06-17 14:59:38,504 - __main__ - WARNING - Attempt 94: Please wait for vllm server to become ready... | |
2025-06-17 14:59:39,371 - __main__ - INFO - INFO 06-17 14:59:39 [__init__.py:244] Automatically detected platform cuda. | |
2025-06-17 14:59:39,519 - __main__ - WARNING - Attempt 95: Please wait for vllm server to become ready... | |
2025-06-17 14:59:40,533 - __main__ - WARNING - Attempt 96: Please wait for vllm server to become ready... | |
2025-06-17 14:59:41,548 - __main__ - WARNING - Attempt 97: Please wait for vllm server to become ready... | |
2025-06-17 14:59:41,616 - __main__ - INFO - INFO 06-17 14:59:41 [core.py:455] Waiting for init message from front-end. | |
2025-06-17 14:59:41,622 - __main__ - INFO - INFO 06-17 14:59:41 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} | |
2025-06-17 14:59:41,722 - __main__ - INFO - WARNING 06-17 14:59:41 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7fab0708d8d0> | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] EngineCore failed to start. | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] Traceback (most recent call last): | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] self._init_executor() | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] self.collective_rpc("init_device") | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] return func(*args, **kwargs) | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] self.worker.init_device() # type: ignore | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] raise ValueError( | |
2025-06-17 14:59:42,109 - __main__ - INFO - ERROR 06-17 14:59:42 [core.py:515] ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:59:42,109 - __main__ - INFO - Process EngineCore_0: | |
2025-06-17 14:59:42,109 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:59:42,109 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap | |
2025-06-17 14:59:42,109 - __main__ - INFO - self.run() | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/multiprocessing/process.py", line 108, in run | |
2025-06-17 14:59:42,110 - __main__ - INFO - self._target(*self._args, **self._kwargs) | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core | |
2025-06-17 14:59:42,110 - __main__ - INFO - raise e | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core | |
2025-06-17 14:59:42,110 - __main__ - INFO - engine_core = EngineCoreProc(*args, **kwargs) | |
2025-06-17 14:59:42,110 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__ | |
2025-06-17 14:59:42,110 - __main__ - INFO - super().__init__(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__ | |
2025-06-17 14:59:42,110 - __main__ - INFO - self.model_executor = executor_class(vllm_config) | |
2025-06-17 14:59:42,110 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ | |
2025-06-17 14:59:42,110 - __main__ - INFO - self._init_executor() | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor | |
2025-06-17 14:59:42,110 - __main__ - INFO - self.collective_rpc("init_device") | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc | |
2025-06-17 14:59:42,110 - __main__ - INFO - answer = run_method(self.driver_worker, method, args, kwargs) | |
2025-06-17 14:59:42,110 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/utils.py", line 2671, in run_method | |
2025-06-17 14:59:42,110 - __main__ - INFO - return func(*args, **kwargs) | |
2025-06-17 14:59:42,110 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device | |
2025-06-17 14:59:42,110 - __main__ - INFO - self.worker.init_device() # type: ignore | |
2025-06-17 14:59:42,110 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:42,110 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 140, in init_device | |
2025-06-17 14:59:42,110 - __main__ - INFO - raise ValueError( | |
2025-06-17 14:59:42,110 - __main__ - INFO - ValueError: Free memory on device (7.33/31.36 GiB) on startup is less than desired GPU memory utilization (0.8, 25.09 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes. | |
2025-06-17 14:59:42,561 - __main__ - WARNING - Attempt 98: Please wait for vllm server to become ready... | |
2025-06-17 14:59:43,135 - __main__ - INFO - Traceback (most recent call last): | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/bin/vllm", line 10, in <module> | |
2025-06-17 14:59:43,136 - __main__ - INFO - sys.exit(main()) | |
2025-06-17 14:59:43,136 - __main__ - INFO - ^^^^^^ | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main | |
2025-06-17 14:59:43,136 - __main__ - INFO - args.dispatch_function(args) | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd | |
2025-06-17 14:59:43,136 - __main__ - INFO - uvloop.run(run_server(args)) | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run | |
2025-06-17 14:59:43,136 - __main__ - INFO - return runner.run(wrapper()) | |
2025-06-17 14:59:43,136 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
2025-06-17 14:59:43,136 - __main__ - INFO - return self._loop.run_until_complete(task) | |
2025-06-17 14:59:43,136 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper | |
2025-06-17 14:59:43,136 - __main__ - INFO - return await main | |
2025-06-17 14:59:43,136 - __main__ - INFO - ^^^^^^^^^^ | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server | |
2025-06-17 14:59:43,136 - __main__ - INFO - await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker | |
2025-06-17 14:59:43,136 - __main__ - INFO - async with build_async_engine_client(args, client_config) as engine_client: | |
2025-06-17 14:59:43,136 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:59:43,136 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:59:43,136 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client | |
2025-06-17 14:59:43,137 - __main__ - INFO - async with build_async_engine_client_from_engine_args( | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 210, in __aenter__ | |
2025-06-17 14:59:43,137 - __main__ - INFO - return await anext(self.gen) | |
2025-06-17 14:59:43,137 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args | |
2025-06-17 14:59:43,137 - __main__ - INFO - async_llm = AsyncLLM.from_vllm_config( | |
2025-06-17 14:59:43,137 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config | |
2025-06-17 14:59:43,137 - __main__ - INFO - return cls( | |
2025-06-17 14:59:43,137 - __main__ - INFO - ^^^^ | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__ | |
2025-06-17 14:59:43,137 - __main__ - INFO - self.engine_core = EngineCoreClient.make_async_mp_client( | |
2025-06-17 14:59:43,137 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client | |
2025-06-17 14:59:43,137 - __main__ - INFO - return AsyncMPClient(vllm_config, executor_class, log_stats, | |
2025-06-17 14:59:43,137 - __main__ - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__ | |
2025-06-17 14:59:43,137 - __main__ - INFO - super().__init__( | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__ | |
2025-06-17 14:59:43,137 - __main__ - INFO - self._init_engines_direct(vllm_config, local_only, | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct | |
2025-06-17 14:59:43,137 - __main__ - INFO - self._wait_for_engine_startup(handshake_socket, input_address, | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup | |
2025-06-17 14:59:43,137 - __main__ - INFO - wait_for_engine_startup( | |
2025-06-17 14:59:43,137 - __main__ - INFO - File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup | |
2025-06-17 14:59:43,137 - __main__ - INFO - raise RuntimeError("Engine core initialization failed. " | |
2025-06-17 14:59:43,137 - __main__ - INFO - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} | |
2025-06-17 14:59:43,574 - __main__ - WARNING - Attempt 99: Please wait for vllm server to become ready... | |
2025-06-17 14:59:43,964 - __main__ - WARNING - VLLM server task ended | |
2025-06-17 14:59:43,964 - __main__ - ERROR - Ended up starting the vllm server more than 5 times, cancelling pipeline | |
2025-06-17 14:59:43,964 - __main__ - ERROR - | |
2025-06-17 14:59:43,964 - __main__ - ERROR - Please make sure vllm is installed according to the latest instructions here: https://docs.vllm.ai/en/stable/getting_started/installation/gpu.html | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x785df8056980> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x785df9c9fc40> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x785e217bbd80> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x785df9c9cfe0> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
Exception ignored in atexit callback: <function vllm_server_task.<locals>._kill_proc at 0x785e217dc040> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 596, in _kill_proc | |
proc.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/subprocess.py", line 143, in terminate | |
self._transport.terminate() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 149, in terminate | |
self._check_proc() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc | |
raise ProcessLookupError() | |
ProcessLookupError: | |
ERROR:asyncio:Task exception was never retrieved | |
future: <Task finished name='Task-2' coro=<vllm_server_host() done, defined at /home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py:670> exception=SystemExit(1)> | |
Traceback (most recent call last): | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 190, in run | |
return runner.run(main) | |
^^^^^^^^^^^^^^^^ | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run | |
return self._loop.run_until_complete(task) | |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete | |
self.run_forever() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 608, in run_forever | |
self._run_once() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once | |
handle._run() | |
File "/home/hongbo-miao/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/asyncio/events.py", line 84, in _run | |
self._context.run(self._callback, *self._args) | |
File "/home/hongbo-miao/hongbomiao.com/machine-learning/hm-olmocr/.venv/lib/python3.11/site-packages/olmocr/pipeline.py", line 685, in vllm_server_host | |
sys.exit(1) | |
SystemExit: 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment