Skip to content

Instantly share code, notes, and snippets.

@dannguyen
dannguyen / faa-333-pdf-gathering.md
Last active June 19, 2021 13:18
Using wget + grep to explore inconveniently organized federal data (FAA Section 333 Exemptions)

if !database: wget + grep

The Federal Aviation Administration is posting PDFs of the Section 333 exemptions that it grants, i.e. the exemptions for operators who want to fly drones commercially before the FAA finishes its rulemaking. A journalist wanted to look for exemptions granted to operators in a given U.S. state. But the FAA doesn't appear to have an easy-to-read data file to use and doesn't otherwise list exemptions by location of operator.

However, since their exemptions page is just one giant HTML table for listing the PDFs, we can just use wget to fetch all the PDFs, run pdftotext on each file, and then [grep](https://medium.com/@rualthanzauva/grep-was-a-private-command-of-m