Note: this was originally an Federal Communications Commission challenge at the PDF Liberation Hackathon. The original information can be found here: https://github.com/pdfliberation/pdf-hackathon/blob/master/challenges/fcc-daily-releases.md
"As part of regular business process, the Federal Communications Commission writes and releases many documents. These documents are public notices, rule-makings, proposed rules and many other prose based discussions of technical issues relating to spectrum, broadcasting, broadband, media and other communications issues. In general the legal industry has a need for these documents to not only contain the proper history, content and technical discussions, but also contain standard formatting that the legal industry has developed. This combination of content and formatting fundamentally requires the FCC to release PDF documents. These documents result in less than desirable search, retrieval and display." Read more
example. http://transition.fcc.gov/Daily_Releases/Daily_Business/2013/db1220/DA-13-2423A1.pdf
Joshua Snyder, https://github.com/jsnyderjsnyderper
The current prototype built during the PDF Liberation Hackathon can handle some electronically-filed forms using the ABBYY Cloud OCR API, a paid cloud OCR service.