Last active
January 14, 2019 07:03
-
-
Save py-ranoid/4acb07b12d59b945d5d042fa55715989 to your computer and use it in GitHub Desktop.
Downloading Google CodeIn tasks without an API Key.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Note : This approach is an alternative to using the API for fetching instance info by | |
saving the webpages in ~/Downloads/ instead and using BeautifulSoup for parsing the data. | |
Go to https://codein.withgoogle.com/dashboard/task-instances/?sp-order=name&sp-my_tasks=false&sp-page_size=100 | |
Iterate over all pages (1, 2, 3...) and save them. | |
""" | |
from glob import glob | |
from bs4 import BeautifulSoup as soup | |
import pandas as pd | |
all_rows = [] | |
for fname in glob("/Users/vishalgupta/Downloads/Task instances _ Google Code-in *.htm"): | |
with open(fname) as f: | |
cont = f.read() | |
s = soup(cont) | |
rows = s.select('md-table-container tbody tr') | |
for row in rows: | |
vals = [i.text.strip() for i in row.select('td') if i.text.strip()] | |
all_rows.append(vals) | |
col_names = [i.text.strip() for i in s.select('md-table-container th') if i.text.strip()] | |
df = pd.DataFrame(all_rows,columns=col_names) | |
df.to_csv("GCI_instance_dump.csv") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment