Last active
October 12, 2020 13:39
-
-
Save dylanroy/d95becafdab746ef308a73789f66df8b to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
name: scrape-wikipedia | |
on: | |
push: | |
branches: | |
- master | |
schedule: | |
- cron: "0 */1 * * *" | |
jobs: | |
build-and-deploy: | |
runs-on: ubuntu-latest | |
steps: | |
- name: 🍽️ Get working copy | |
uses: actions/checkout@master | |
with: | |
fetch-depth: 1 | |
- name: 🐍 Set up Python 3.8 | |
uses: actions/setup-python@v2 | |
with: | |
python-version: '3.8' | |
- name: 💿 Install Requirements | |
run: pip install -r requirements.txt | |
- name: 🍳 Update dataset | |
run: python main.py | |
- name: 🚀 Commit and push if it changed | |
run: | | |
git config user.name "${GITHUB_ACTOR}" | |
git config user.email "${GITHUB_ACTOR}@users.noreply.github.com" | |
git add -A | |
timestamp=$(date -u) | |
git commit -m "Latest data: ${timestamp}" || exit 0 | |
git push |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
name: scrape-wikipedia | |
on: | |
push: | |
branches: | |
- master | |
schedule: | |
- cron: "0 */1 * * *" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jobs: | |
build-and-deploy: | |
runs-on: ubuntu-latest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
steps: | |
- name: 🍽️ Get working copy | |
uses: actions/checkout@master | |
with: | |
fetch-depth: 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- name: 🐍 Set up Python 3.8 | |
uses: actions/setup-python@v2 | |
with: | |
python-version: '3.8' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- name: 💿 Install Requirements | |
run: pip install -r requirements.txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- name: 🍳 Update dataset | |
run: python main.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- name: 🚀 Commit and push if it changed | |
run: | | |
git config user.name "${GITHUB_ACTOR}" | |
git config user.email "${GITHUB_ACTOR}@users.noreply.github.com" | |
git add -A | |
timestamp=$(date -u) | |
git commit -m "Latest data: ${timestamp}" || exit 0 | |
git push |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
import pandas as pd | |
def scrape_wiki_table(url, table_index=0): | |
df = pd.read_html(url, header=0)[table_index] | |
return re.sub(r"\[?\s*(\d+)(?=(?:, \d+)|\])(?=[^\[]*\]).", "", df.to_csv(index=False)) | |
if __name__ == '__main__': | |
with open('data.csv', 'w+') as f: | |
f.write(scrape_wiki_table('https://en.wikipedia.org/wiki/List_of_chief_executive_officers')) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Medium Post |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df = pd.read_html(url, header=0)[table_index] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Company | Executive | Title | Since | Notes | Updated | |
---|---|---|---|---|---|---|
Accenture | Julie Sweet[1] | CEO | 2019 | Succeeded Pierre Nanterme, died | 2019-01-31 | |
Aditya Birla Group | Kumar Birla | Chairman | 1995 | Part of the Birla family business house in India[2] | 2018-10-01 | |
Adobe Systems | Shantanu Narayen | Chairman, president and CEO | 2007 | Formerly with Apple Inc.[3] | 2018-10-01 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
return re.sub(r"\[?\s*(\d+)(?=(?:, \d+)|\])(?=[^\[]*\]).", "", df.to_csv(index=False)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment