Skip to content

Instantly share code, notes, and snippets.

@quoidautre
Created February 13, 2017 17:28
Show Gist options
  • Select an option

  • Save quoidautre/e7cf3ba6c3fea1e9dcd6e953dfb6f071 to your computer and use it in GitHub Desktop.

Select an option

Save quoidautre/e7cf3ba6c3fea1e9dcd6e953dfb6f071 to your computer and use it in GitHub Desktop.
Read url with BeautifulSoup, urllib and write in csv file
from BeautifulSoup import BeautifulSoup as BS
import urllib, urllib2, re
import csv
url = 'http://www.entreprises-aix.com/entreprises.php'
data = {"CaNum": "3246", "btn_rechercher": "RECHERCHER"}
data = urllib.urlencode(data)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
html_code = response.read()
data_soup = BS(html_code);
#datas = data_soup.find_all('a') #attrs={"data-url"})
links = data_soup.findAll("a")
i = 0;
print links
try:
with open('societes.csv', 'w') as fp:
companies = csv.writer(fp,delimiter=';')
companies.writerow(['email'])
for anchor in links:
companies.writerow([anchor['href']])
fp.close()
finally:
response.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment