Skip to content

Instantly share code, notes, and snippets.

@cbscribe
Last active October 25, 2018 01:08
Show Gist options
  • Save cbscribe/e7fa44d5b895b4412552343d3606a66f to your computer and use it in GitHub Desktop.
Save cbscribe/e7fa44d5b895b4412552343d3606a66f to your computer and use it in GitHub Desktop.
Python beautifulsoup example
import requests, bs4
import time
# for n in range(0, 5937):
# url = 'https://www.fiercebiotech.com/biotech?page=0%2C' + str(n)
# data = requests.get(url)
# use this to prevent ddos
# time.sleep(10)
# example loading google.com
url = "http://google.com/"
# "data" contains the raw html from the website
data = requests.get(url)
data.raise_for_status()
#print(data.text)
# "soupdata" contains the processed html
soupdata = bs4.BeautifulSoup(data.text,features="html.parser")
# this pulls all <a> tags into a list
links = soupdata.select('a')
# loops through all links
for link in links:
# loop through each link and print its url and text
print(link.get('href'), "\t", link.string)
print("-"*20)
@magikalcookie
Copy link

In the "for n in range (0,5937):" function, the "url" variable does not contain a list of the string of output generated. It only contains 1 line, when it should have listed thousands. Is there a way to assign a variable to list all the output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment