Recently, I wanted to find the textual differences between two PDFS, in the same way that you would compare plain text files with Git/GitHub. I wanted the nice side-by-side view too, not just the Git diff terminal output.
Edit: After writing this gist, I realized that https://www.diffchecker.com/pdf-compare/ pretty much does what I want. This was still a fun experiment though.
I tried a few different ways of extracting the text (Tesseract OCR, Copy and Pasting all the text), but eventually I found the best solution for me was a tool called textract, which uses pdftotext under the hood. This did the best job, as it didn't have weird misread symbols like OCR and it didn't have extra nonsense copied by C/P.