Skip to content

Instantly share code, notes, and snippets.

@victormurcia
Created September 5, 2022 04:30
Show Gist options
  • Select an option

  • Save victormurcia/b47d506791efc576dc16c0ff344b92ae to your computer and use it in GitHub Desktop.

Select an option

Save victormurcia/b47d506791efc576dc16c0ff344b92ae to your computer and use it in GitHub Desktop.
return all book unique identifiers from a list of raw links
# return all book unique identifiers from a list of raw links
def get_book_identifiers(links):
# define a url pattern we are looking for
pattern = re.compile('/ebooks/[0-9]+')
# process the list of links for those that match the pattern
books = set()
for link in links:
# check of the link matches the pattern
if not pattern.match(link):
continue
# extract the book id from /ebooks/nnn
book_id = link[8:]
# store in the set, only keep unique ids
books.add(book_id)
return books
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment