jSayal · July 10, 2025 02:54
diff --git a/gistfile1.txt b/gistfile1.txt
 After multiple attempts to have a verbatim conversion of PDF to Markdown file, the following prompt worked nicely.
 Feel free to add or suggest improvements.

 ```
 **Objective:** Convert the attached PDF to Markdown **verbatim** without summarizing, omitting, or altering any content. Follow this exact workflow:

 #### **Phase 1: Document Analysis**  
 1. **Read Entire PDF**  
   - Process all pages sequentially. Do **not** skip pages or sections.  
   - Preserve every paragraph, table row, and code snippet - **no exceptions**.  

 2. **Structure Identification**  
   - Map chapters/sections to heading levels (`#` → `###`).  
   - Tag special elements with metadata:  
     ```markdown
     <!-- [TABLE] 4-column financial data -->  
     <!-- [CODE] Unidentified language (lines 12-25) -->  
     ```  

 3. **Noise Filtering**  
   - Auto-ignore repeating footers/headers (e.g., "Page 3 of 23").  
   - If uncertain, keep content but flag: `<!-- CHECK: Possible footer -->`.  

 #### **Phase 2: Strict 1:1 Conversion**  
 1. **Text Formatting**  
   - Bold/italics → `**text**`/`*text*`  
   - Lists: Maintain exact indentation (even if inconsistent in PDF).  

 2. **Tables**  
   - Convert **all rows** - never truncate.  
   - Use pipe syntax with alignment hints:  
     ```markdown
     | Header 1 | Header 2 |  
     |----------|----------|  
     | Row 1    | Data     |  <!-- Preserve empty cells! -->  
     ```  

 3. **Code Blocks**  
   - Minimum 3-line backtick fences with line breaks:  
     ````markdown  
     ```python  
     def example():  # Never join split code lines!  
         pass  
     ```  
     ````  
   - For unidentified languages:  
     ```  
     [UNKNOWN_LANGUAGE]  
     fn obscure_code() { ... }  
     ```  

 4. **Images/Figures**  
   - Placeholder + filename: `![Fig.3: Architecture Diagram](pdf_image_7.png)`  

 #### **Phase 3: Validation**  
 1. **Anti-Summarization Checks**  
   - Compare word count of original PDF paragraphs vs. Markdown output.  
   - If >5% discrepancy, revert and flag: `<!-- LENGTH MISMATCH: p.14 paragraph 2 -->`.  

 2. **Ambiguity Protocol**  
   - For unclear formatting:  
     ```markdown  
     <!-- RAW_PDF_EXTRACT_START -->  
     [Strange   spacing]  
     <!-- RAW_PDF_EXTRACT_END -->  
     ```  

 #### **Phase 4: Delivery**  
 - Output **raw Markdown only** (no JSON/XML wrappers).  
 - Include this completion token: `<!-- CONVERSION_COMPLETE_VERBATIM -->`.  

 **Failure Modes That Void Approval:**  
 - Any summarized/joined paragraphs  
 - Truncated tables or code  
 - Unmarked language guesses
 ```
	After multiple attempts to have a verbatim conversion of PDF to Markdown file, the following prompt worked nicely.
	Feel free to add or suggest improvements.

	```
	Objective: Convert the attached PDF to Markdown verbatim without summarizing, omitting, or altering any content. Follow this exact workflow:

	#### Phase 1: Document Analysis
	1. Read Entire PDF
	- Process all pages sequentially. Do not skip pages or sections.
	- Preserve every paragraph, table row, and code snippet - no exceptions.

	2. Structure Identification
	- Map chapters/sections to heading levels (`#` → `###`).
	- Tag special elements with metadata:
	```markdown
	<!-- [TABLE] 4-column financial data -->
	<!-- [CODE] Unidentified language (lines 12-25) -->
	```

	3. Noise Filtering
	- Auto-ignore repeating footers/headers (e.g., "Page 3 of 23").
	- If uncertain, keep content but flag: `<!-- CHECK: Possible footer -->`.

	#### Phase 2: Strict 1:1 Conversion
	1. Text Formatting
	- Bold/italics → `text`/`text`
	- Lists: Maintain exact indentation (even if inconsistent in PDF).

	2. Tables
	- Convert all rows - never truncate.
	- Use pipe syntax with alignment hints:
	```markdown
	\| Header 1 \| Header 2 \|
	\|----------\|----------\|
	\| Row 1 \| Data \| <!-- Preserve empty cells! -->
	```

	3. Code Blocks
	- Minimum 3-line backtick fences with line breaks:
	````markdown
	```python
	def example(): # Never join split code lines!
	pass
	```
	````
	- For unidentified languages:
	```
	[UNKNOWN_LANGUAGE]
	fn obscure_code() { ... }
	```

	4. Images/Figures
	- Placeholder + filename: `![Fig.3: Architecture Diagram](pdf_image_7.png)`

	#### Phase 3: Validation
	1. Anti-Summarization Checks
	- Compare word count of original PDF paragraphs vs. Markdown output.
	- If >5% discrepancy, revert and flag: `<!-- LENGTH MISMATCH: p.14 paragraph 2 -->`.

	2. Ambiguity Protocol
	- For unclear formatting:
	```markdown
	<!-- RAW_PDF_EXTRACT_START -->
	[Strange spacing]
	<!-- RAW_PDF_EXTRACT_END -->
	```

	#### Phase 4: Delivery
	- Output raw Markdown only (no JSON/XML wrappers).
	- Include this completion token: `<!-- CONVERSION_COMPLETE_VERBATIM -->`.

	Failure Modes That Void Approval:
	- Any summarized/joined paragraphs
	- Truncated tables or code
	- Unmarked language guesses
	```
No results found