Skip to content

Instantly share code, notes, and snippets.

@thricedotted
Last active February 25, 2016 21:11
Show Gist options
  • Save thricedotted/89bd1eae8fac02f0e666 to your computer and use it in GitHub Desktop.
Save thricedotted/89bd1eae8fac02f0e666 to your computer and use it in GitHub Desktop.
i got an email from nvidia trying to sell me on their deep learning/ai sessions. suffice it to say i am not sold
from lxml import html
import requests
GTC_URI = 'https://mygtc.gputechconf.com/form/session-listing&doSearch=true&queryInput=&topic_selector=Deep+Learning+%26+Artificial+Intelligence'
page = requests.get(GTC_URI)
tree = html.fromstring(page.content)
bios = [b.text for b in tree.cssselect('.session-speaker-bio')]
def contains(text, word_list):
return any(w in text.lower().split() for w in word_list)
woman_bios = [b for b in bios if contains(b, ('she', 'her'))]
man_bios = [b for b in bios if contains(b, ('he', 'his'))]
unknown_bios = [b for b in bios if b not in man_bios + woman_bios]
print("Total bios: {}".format(len(bios)))
print("Bios containing 'she', 'her': {} ({:.1f}%)".format(
len(woman_bios), len(woman_bios) * 100 / len(bios)))
print("Bios containing 'he', 'him': {} ({:.1f}%)".format(
len(man_bios), len(man_bios) * 100 / len(bios)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment