Skip to content

Instantly share code, notes, and snippets.

@kishdubey
Created November 19, 2019 19:40
Show Gist options
  • Save kishdubey/0ad231138c7fa8c76bf4ad20202974ba to your computer and use it in GitHub Desktop.
Save kishdubey/0ad231138c7fa8c76bf4ad20202974ba to your computer and use it in GitHub Desktop.
getting most frequent organizations appearing in Google Summer of Code (GSOC)
import requests
from bs4 import BeautifulSoup
from collections import Counter
def get_organizations(url):
'''
(String) ->()
Adding names of all organizations listed in the given GSOC url
'''
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
html_text = soup.find_all('h4', {"class": "organization-card__name font-black-54"})
for organization in html_text:
organizations.append(organization.text)
def most_common():
'''
() -> ()
Printing the organizations with the most amount of appearences within the different years
'''
common = Counter(organizations).most_common()
for organization in common:
if organization[1] == len(urls):
print(organization[0])
def main():
for url in urls:
get_organizations(url)
most_common()
if __name__ == '__main__':
urls = ["https://summerofcode.withgoogle.com/archive/2018/organizations/", "https://summerofcode.withgoogle.com/archive/2017/organizations/", "https://summerofcode.withgoogle.com/archive/2016/organizations/"]
organizations = []
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment