Skip to content

Instantly share code, notes, and snippets.

@birkin
Last active April 11, 2023 21:28
Show Gist options
  • Select an option

  • Save birkin/a89b4e6720d1d84f8569bc70f16eeea6 to your computer and use it in GitHub Desktop.

Select an option

Save birkin/a89b4e6720d1d84f8569bc70f16eeea6 to your computer and use it in GitHub Desktop.
asks chatGPT for some django-alto advice.

Prompt

Goal...

  • I want a user to be able to perform a search on a word -- and see a list of PDFs that contain the search-word.
  • If the user clicks on one of the PDFs in the result-list, I want the PDF to display with the first search-word highlighted.

Context...

  • I'm using the Django web-framework.
  • I use views.py functions, not classes.
  • I have, stored in the solr-index, the PDF-filepath, the alto-xml (containing the PDF words and coordinates), and the full-text of the PDF.

What would my django code look like to handle that second-step -- to display a selected PDF with the first word highlighted?


Response

To achieve your goal, you can follow these steps:

  1. Install required packages
  2. Create a view for rendering the PDF with the highlighted word
  3. Create a URL pattern for the new view
  4. Add a template to display the PDF with the highlighted word

Here's an outline of the code you would need to accomplish this:

Step 1: Install required packages

Install PyPDF2 and reportlab to handle PDFs in Django:

pip install PyPDF2 reportlab

Step 2: Create a view for rendering the PDF with the highlighted word

In your views.py:

import os
from django.http import FileResponse
from PyPDF2 import PdfReader, PdfWriter
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

def display_pdf(request, pdf_filepath, search_word):
    # Load the PDF
    reader = PdfReader(pdf_filepath)
    writer = PdfWriter()

    # Prepare the canvas
    output = os.path.join("media", "temp", "highlighted.pdf")
    c = canvas.Canvas(output, pagesize=letter)

    for page_number in range(len(reader.pages)):
        page = reader.pages[page_number]
        writer.add_page(page)

        # Highlight the search word on the first page it appears
        if page_number == 0:
            # You should extract the coordinates (x, y) from the alto-xml for the search word
            x = 100  # Replace with the x-coordinate from alto-xml
            y = 200  # Replace with the y-coordinate from alto-xml
            c.setFillColorRGB(1, 1, 0)  # Set fill color to yellow
            c.rect(x, y, len(search_word) * 6, 12, fill=1)  # Draw a rectangle around the search word
            c.setFillColorRGB(0, 0, 0)  # Set fill color to black
            c.setFont("Helvetica", 12)  # Set the font
            c.drawString(x, y, search_word)  # Write the search word

        c.showPage()

    c.save()

    # Serve the highlighted PDF
    response = FileResponse(open(output, "rb"), content_type="application/pdf")
    response["Content-Disposition"] = f'inline; filename="{os.path.basename(pdf_filepath)}"'

    return response

Step 3: Create a URL pattern for the new view

In your urls.py:

from django.urls import path
from . import views

urlpatterns = [
    # ... other URL patterns ...
    path("pdf/<str:pdf_filepath>/<str:search_word>/", views.display_pdf, name="display_pdf"),
]

Step 4: Add a template to display the PDF with the highlighted word

Create a new HTML template file called pdf_viewer.html in your templates folder:

{% load static %}
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>PDF Viewer</title>
</head>
<body>
    <embed src="{% url 'display_pdf' pdf_filepath=search_word %}" type="application/pdf" width="100%" height="100%">
</body>
</html>

Now, when a user clicks on a PDF in the search results, the PDF should be displayed with the first occurrence of the search word highlighted. Make sure to adjust the coordinates extraction from the alto-xml to match your specific use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment