Last active
August 5, 2018 01:46
-
-
Save bachya/ffa07b7c1c0c9c90e51d to your computer and use it in GitHub Desktop.
A method to scrape PDFs for certain text (via Hazel) and save the results to a text file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Aaron was here! | |
Aaron says hi! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
output_file="output.txt" | |
text_to_search_for="Aaron" | |
line=$(pdftotext "$1" - | grep "$text_to_search_for") | |
if [ "$line" != "" ]; then | |
touch $output_file && echo $line >> $output_file | |
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
Aaron was here! | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
Aaron says hi! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff | |
This is a bunch of stuff |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The above Bash code will take an input file path (presumed to be a PDF, but that's up to Hazel), scan it for a certain string, and, if found, will append that line to a text file.
Base Hazel rule looks like this:
Conditions:
Extension
ispdf
Actions:
Run shell script
embedded script
(using thepdf-review.sh
script above)