The following script, given someone's last name, prints a CSV of financial disclosure PDFs (the first 20, for simplicity's sake) as found on the House Financial Disclosure Reports. It's meant to be a proof-of-concept of how to scrape ASPX (and other "stateful" websites) with using plain old requests -- without too much inconvenience -- rather than resorting to something heavy like the selenium websdriver
The search page can be found here: http://clerk.house.gov/public_disc/financial-search.aspx
Here's a screenshot of what it does when you search via web browser:
I've attached an example of what the form data paylod looks like to this gist: sample-form-data-post-response-txt
You can download the script attached to this gist and run it like this -- the frst argument is presumed to be the "last name":
$ python house-public-disc-simple-search.py 'king'
The output looks like this:
Searching for last name of: `king` ...
name,office,filing_year,filing,url
"KING, HON.PETER T.",NY03,2007,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2008/8135888.pdf
"KING, HON.PETER T.",NY03,2007,FD Amendment,http://clerk.house.gov/public_disc/financial-pdfs/2008/8138301.pdf
"KING, HON.PETER T.",NY03,2008,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2009/8140250.pdf
"KING, HON.PETER T.",NY03,2009,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2010/8147290.pdf
"KING, HON.PETER T.",NY03,2010,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2011/8202610.pdf
"KING, HON.PETER T.",NY03,2011,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2012/8205535.pdf
"KING, HON.PETER T.",NY02,2012,FD Amendment,http://clerk.house.gov/public_disc/financial-pdfs/2013/8212617.pdf
"KING, HON.PETER T.",NY02,2012,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2013/8212125.pdf
"King, Mr.Peter T.",NY02,2013,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2013/10001239.pdf
"King, Hon.Peter T.",NY02,2014,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2014/10007122.pdf
"King, Hon.Peter T.",NY02,2015,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2015/10012142.pdf
"King, Hon.Peter T.",NY02,2016,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2016/10016910.pdf
"King, Hon.Peter T.",NY02,2017,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2017/10022302.pdf
"KING, HON.STEVE",IA05,2007,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2008/8135889.pdf
"KING, HON.STEVE",IA05,2008,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2009/8140601.pdf
"KING, HON.STEVE",IA05,2009,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2010/8147289.pdf
"KING, HON.STEVE",IA05,2010,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2011/8202611.pdf
"KING, HON.STEVE",IA05,2011,FD Original,http://clerk.house.gov/public_disc/financial-pdfs/2012/8206497.pdf
"KING, HON.STEVE",IA05,2011,Extension,http://clerk.house.gov/public_disc/financial-pdfs/2012/8206900.pdf
"KING, HON.STEVE",IA05,2011,Extension,http://clerk.house.gov/public_disc/financial-pdfs/2012/8206933.pdf
(I've put most of the technical details as a clump of comments in the actual script below)
The most annoying part of this is having to do the web search the old-fashioned-way -- i.e. via clicking-and-pointing in your browser -- and inspecting the traffic via the Network Panel when you submit the form.
(Tip: it's recommended to disable Javascript and the cache for more predictable results)
In the Network Panel, click on the first request (to financial-search.aspx to see the details, such as the headers:
Scroll down to the Form Data panel to see all the actual form parameters sent in the POST request. With some trial and error, you'll see what the website requires at a minimum to trigger a successful response: