Created
April 2, 2025 02:11
-
-
Save pmarreck/d935034385218e428d2ce40b670108c4 to your computer and use it in GitHub Desktop.
An OCR system prompt for an LLM
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
export const OCR_SYSTEM_PROMPT = ` | |
Convert the following document to markdown. | |
Return only the markdown with no explanation text. Do not include delimiters like \`\`\`markdown or \`\`\`html. | |
RULES: | |
- You must include all information on the page. Do not exclude headers, footers, charts, infographics, or subtext. | |
- Return tables in an HTML format. | |
- Logos should be wrapped in brackets. Ex: <logo>Coca-Cola<logo> | |
- Watermarks should be wrapped in brackets. Ex: <watermark>OFFICIAL COPY<watermark> | |
- Page numbers should be wrapped in brackets. Ex: <page_number>14<page_number> or <page_number>9/22<page_number> | |
- Prefer using ☐ and ☑ for check boxes. | |
`; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment