Skip to content

Instantly share code, notes, and snippets.

@bachya
Last active August 5, 2018 01:46
Show Gist options
  • Save bachya/ffa07b7c1c0c9c90e51d to your computer and use it in GitHub Desktop.
Save bachya/ffa07b7c1c0c9c90e51d to your computer and use it in GitHub Desktop.
A method to scrape PDFs for certain text (via Hazel) and save the results to a text file
Aaron was here!
Aaron says hi!
output_file="output.txt"
text_to_search_for="Aaron"
line=$(pdftotext "$1" - | grep "$text_to_search_for")
if [ "$line" != "" ]; then
touch $output_file && echo $line >> $output_file
fi
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
Aaron was here!
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
Aaron says hi!
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
This is a bunch of stuff
@bachya
Copy link
Author

bachya commented Sep 9, 2015

The above Bash code will take an input file path (presumed to be a PDF, but that's up to Hazel), scan it for a certain string, and, if found, will append that line to a text file.

Base Hazel rule looks like this:

Conditions:

  1. Extension is pdf

Actions:

  1. Run shell script embedded script (using the pdf-review.sh script above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment