Skip to content

Instantly share code, notes, and snippets.

@jonathanoheix
Created December 11, 2018 14:55
Show Gist options
  • Save jonathanoheix/c5774e3e2e1e9f99f275949b984aef7a to your computer and use it in GitHub Desktop.
Save jonathanoheix/c5774e3e2e1e9f99f275949b984aef7a to your computer and use it in GitHub Desktop.
import re
categories_urls = [main_url + x.get('href') for x in soup.find_all("a", href=re.compile("catalogue/category/books"))]
categories_urls = categories_urls[1:] # we remove the first one because it corresponds to all the books
print(str(len(categories_urls)) + " fetched categories URLs")
print("Some examples:")
categories_urls[:5]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment