Goal...
- I want a user to be able to perform a search on a word -- and see a list of PDFs that contain the search-word.
- If the user clicks on one of the PDFs in the result-list, I want the PDF to display with the first search-word highlighted.
Context...
- I'm using the Django web-framework.
- I use views.py functions, not classes.
- I have, stored in the solr-index, the PDF-filepath, the alto-xml (containing the PDF words and coordinates), and the full-text of the PDF.
What would my django code look like to handle that second-step -- to display a selected PDF with the first word highlighted?
To achieve your goal, you can follow these steps:
- Install required packages
- Create a view for rendering the PDF with the highlighted word
- Create a URL pattern for the new view
- Add a template to display the PDF with the highlighted word
Here's an outline of the code you would need to accomplish this:
Install PyPDF2 and reportlab to handle PDFs in Django:
pip install PyPDF2 reportlab
In your views.py:
import os
from django.http import FileResponse
from PyPDF2 import PdfReader, PdfWriter
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
def display_pdf(request, pdf_filepath, search_word):
# Load the PDF
reader = PdfReader(pdf_filepath)
writer = PdfWriter()
# Prepare the canvas
output = os.path.join("media", "temp", "highlighted.pdf")
c = canvas.Canvas(output, pagesize=letter)
for page_number in range(len(reader.pages)):
page = reader.pages[page_number]
writer.add_page(page)
# Highlight the search word on the first page it appears
if page_number == 0:
# You should extract the coordinates (x, y) from the alto-xml for the search word
x = 100 # Replace with the x-coordinate from alto-xml
y = 200 # Replace with the y-coordinate from alto-xml
c.setFillColorRGB(1, 1, 0) # Set fill color to yellow
c.rect(x, y, len(search_word) * 6, 12, fill=1) # Draw a rectangle around the search word
c.setFillColorRGB(0, 0, 0) # Set fill color to black
c.setFont("Helvetica", 12) # Set the font
c.drawString(x, y, search_word) # Write the search word
c.showPage()
c.save()
# Serve the highlighted PDF
response = FileResponse(open(output, "rb"), content_type="application/pdf")
response["Content-Disposition"] = f'inline; filename="{os.path.basename(pdf_filepath)}"'
return response
In your urls.py:
from django.urls import path
from . import views
urlpatterns = [
# ... other URL patterns ...
path("pdf/<str:pdf_filepath>/<str:search_word>/", views.display_pdf, name="display_pdf"),
]
Create a new HTML template file called pdf_viewer.html in your templates folder:
{% load static %}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>PDF Viewer</title>
</head>
<body>
<embed src="{% url 'display_pdf' pdf_filepath=search_word %}" type="application/pdf" width="100%" height="100%">
</body>
</html>
Now, when a user clicks on a PDF in the search results, the PDF should be displayed with the first occurrence of the search word highlighted. Make sure to adjust the coordinates extraction from the alto-xml to match your specific use case.