Skip to content

Instantly share code, notes, and snippets.

@JoaoCarabetta
Created February 2, 2020 21:45
From request to pdf to string
import requests
# Download
res = requests.get('https://www.camara.leg.br/proposicoesWeb/prop_mostrarintegra?codteor=938381&filename=PL+2699/2011')
# To PDF
with open('metadata.pdf', 'wb') as f:
f.write(res.content)
# To string
from tika import parser
rawText = parser.from_file('metadata.pdf')
rawList = rawText['content'].splitlines()
print('\n'.join([r for r in rawList if r != '']))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment