Skip to content

Instantly share code, notes, and snippets.

@sharvaridhote
Last active January 10, 2021 23:21
Show Gist options
  • Select an option

  • Save sharvaridhote/0c7240b77fe56840979e03eb3eb975ab to your computer and use it in GitHub Desktop.

Select an option

Save sharvaridhote/0c7240b77fe56840979e03eb3eb975ab to your computer and use it in GitHub Desktop.
Extract Weblinks
# crawling website
def getLinks(url):
html_page = urlopen(url)
soup = BeautifulSoup(html_page)
total_pages = []
try:
for link in soup.find_all('a', href=True):
if link.get('href') not in total_pages:
total_pages.append(link.get('href'))
except:
print("An exception occurred")
return total_pages
total_links = getLinks("https://en.wikipedia.org/wiki/Wikipedia:Featured_articles")
print(len(total_links))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment