Skip to content

Instantly share code, notes, and snippets.

View hongbo-miao's full-sized avatar
❣️

Hongbo Miao hongbo-miao

❣️
View GitHub Profile
uv run python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmocr-sample.pdf --model allenai/olmOCR-7B-0225-preview-FP8
INFO:olmocr.check:pdftoppm is installed and working.
2025-06-17 21:14:55,004 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-06-17 21:14:55,004 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document
2025-06-17 21:14:55,004 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|██████████████████████████████████████████████| 1/1 [00:00<00:00, 552.46it/s]
2025-06-17 21:14:55,007 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-06-17 21:14:55,163 - __main__ - INFO - Starting pipeline with PID 3059549
INFO:olmocr.check:pdftoppm is installed and working.
2025-06-17 14:58:03,862 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-06-17 14:58:03,862 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document
2025-06-17 14:58:03,862 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 337.81it/s]
2025-06-17 14:58:03,866 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-06-17 14:58:03,963 - __main__ - INFO - Starting pipeline with PID 2452147
2025-06-17 14:58:03,963 - __main__ - INFO - Downloading model with hugging face 'allenai/olmOCR-7B-0225-
INFO:olmocr.check:pdftoppm is installed and working.
2025-06-17 14:46:27,683 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-06-17 14:46:27,683 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document
2025-06-17 14:46:27,683 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 491.42it/s]
2025-06-17 14:46:27,686 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-06-17 14:46:27,796 - __main__ - INFO - Starting pipeline with PID 2425548
2025-06-17 14:46:27,796 - __main__ - INFO - Downloading model with hugging face 'allenai/olmOCR-7B-0225-
@hongbo-miao
hongbo-miao / gist:fe51beaa5faa2477ddb72c42e1914d96
Last active June 14, 2025 06:35
olmocr log when run in NVIDIA GeForce RTX 5090 GPU
root@2fdffe8b8e20:~# python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmocr-sample.pdf
INFO:olmocr.check:pdftoppm is installed and working.
2025-06-14 06:27:39,378 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-06-14 06:27:39,378 - __main__ - INFO - Loading file at olmocr-sample.pdf as PDF document
2025-06-14 06:27:39,378 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 530.66it/s]
2025-06-14 06:27:39,381 - __main__ - INFO - Calculated items_per_group: 166 based on average pages per PDF: 3.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
/usr/local/lib/python3.11/dist-packages/torch/cuda/__init__.py:235: UserWarning:
@hongbo-miao
hongbo-miao / uv.lock
Created January 31, 2025 09:48
mineru uv.lock
version = 1
requires-python = ">=3.12.0, <3.13"
resolution-markers = [
"platform_system == 'Windows' and sys_platform == 'win32'",
"platform_system == 'Windows' and sys_platform != 'win32'",
"platform_machine == 'aarch64' and platform_system == 'Linux'",
"platform_machine != 'aarch64' and platform_system == 'Linux'",
"platform_machine == 'arm64' and platform_system == 'Darwin'",
"platform_machine != 'arm64' and platform_system == 'Darwin'",
"platform_system != 'Darwin' and platform_system != 'Linux' and platform_system != 'Windows' and sys_platform == 'win32'",
ts=2025-01-30T08:59:35.298859187Z level=info "boringcrypto enabled"=false
ts=2025-01-30T08:59:35.29786316Z level=info source=/go/pkg/mod/github.com/!kim!machine!gun/[email protected]/memlimit/memlimit.go:170 msg="memory is not limited, skipping" package=github.com/KimMachineGun/automemlimit/memlimit
ts=2025-01-30T08:59:35.298890774Z level=info msg="no peer discovery configured: both join and discover peers are empty" service=cluster
ts=2025-01-30T08:59:35.298894656Z level=info msg="running usage stats reporter"
ts=2025-01-30T08:59:35.298896901Z level=warn msg="this stdlib function is deprecated; please refer to the documentation for updated usage and alternatives" controller_path=/ controller_id="" function=env
ts=2025-01-30T08:59:35.2989009Z level=warn msg="this stdlib function is deprecated; please refer to the documentation for updated usage and alternatives" controller_path=/ controller_id="" function=env
ts=2025-01-30T08:59:35.298903588Z level=info msg="starting complete graph evaluation" controller_pa
This file has been truncated, but you can view the full file.
DEBUG: Using RE2 regex engine
DEBUG: Parsing configs
DEBUG: Checking for config file in /runner/renovate/job_config.json
DEBUG: Detected config in env RENOVATE_CONFIG
{
"config": {
"extends": [
"mergeConfidence:all-badges"
],
"prFooter": "This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/{{platform}}/{{repository}}).",
@hongbo-miao
hongbo-miao / gist:b10b9785997e6078b9290cb30af5ccf2
Last active October 15, 2024 21:39
LiteLLM log for Continue
21:18:26 - LiteLLM Proxy:DEBUG: litellm_pre_call_utils.py:195 - Request Headers: Headers({'host': 'litellm.example.com', 'user-agent': 'node-fetch', 'content-length': '3883', 'accept': '*/*', 'accept-encoding': 'gzip, deflate, br', 'api-key': 'anything', 'authorization': 'Bearer anything', 'content-type': 'application/json', 'x-forwarded-for': '172.31.191.224', 'x-forwarded-host': 'litellm.example.com', 'x-forwarded-port': '443', 'x-forwarded-proto': 'https', 'x-forwarded-server': 'horizon-traefik-7765cbd49c-cm5n6', 'x-real-ip': '172.31.191.224'})
21:18:26 - LiteLLM Proxy:DEBUG: litellm_pre_call_utils.py:201 - receiving data: {'model': 'claude-3-5-sonnet', 'max_tokens': 2048, 'temperature': 0.01, 'stream': True, 'stop': ['</COMPLETION>', '\n\n', '\r\n\r\n', '/src/', '#- coding: utf-8', '```', '\ndef', '\nclass', '\n"""#'], 'prompt': 'You are a HOLE FILLER. You are provided with a file containing holes, formatted as \'{{HOLE_NAME}}\'. Your TASK is to complete with a string to replace this hol
@hongbo-miao
hongbo-miao / gist:03b3bb1dd9585d185611e4b848123df6
Created October 7, 2024 23:32
LiteLLM bug: Conversation blocks and tool result blocks cannot be provided in the same turn.
This file has been truncated, but you can view the full file.
22:34:16 - LiteLLM Proxy:DEBUG: proxy_server.py:3113 - Request received by LiteLLM:
{
"model": "claude-3-opus",
"messages": [
{
"role": "system",
"content": "You are Claude Dev, a highly skilled software developer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.\n\n====\n \nCAPABILITIES\n\n- You can read and analyze code in various programming languages, and can write clean, efficient, and well-documented code.\n- You can debug complex issues and providing detailed explanations, offering architectural insights and design patterns.\n- You have access to tools that let you execute CLI commands on the user's computer, list files, view source code definitions, regex search, inspect websites, read and write files, and ask follow-up questions. These tools help you effectively accomplish a wide range of tasks, such as writing code, making edits or improvements to existing files, understanding the current state of
@hongbo-miao
hongbo-miao / gist:8577107aba2db2cff0b577cede63e12b
Created October 4, 2024 04:23
LiteLLM error log: Conversation blocks and tool result blocks cannot be provided in the same turn.
This file has been truncated, but you can view the full file.
04:20:27 - LiteLLM Proxy:DEBUG: proxy_server.py:3122 - Request received by LiteLLM:
{
"model": "claude-3-opus",
"messages": [
{
"role": "system",
"content": "You are Claude Dev, a highly skilled software developer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.\n\n====\n \nCAPABILITIES\n\n- You can read and analyze code in various programming languages, and can write clean, efficient, and well-documented code.\n- You can debug complex issues and providing detailed explanations, offering architectural insights and design patterns.\n- You have access to tools that let you execute CLI commands on the user's computer, list files, view source code definitions, regex search, inspect websites, read and write files, and ask follow-up questions. These tools help you effectively accomplish a wide range of tasks, such as writing code, making edits or improvements to existing files, understanding the current state of