Skip to content

Instantly share code, notes, and snippets.

View harej's full-sized avatar

James Hare harej

View GitHub Profile
@harej
harej / citoid.py
Created April 22, 2016 19:31
Creates a CSV based on Citoid output
import requests
import csv
from collections import defaultdict
def get_citation(inputstring):
r = requests.get("https://citoid.wikimedia.org/api?format=mediawiki&search=" + inputstring)
return r.json()[0]
@harej
harej / npg_gap_analysis.py
Last active June 9, 2016 21:20
Generates list of items and properties used on NPG-related Wikidata entries and assesses existence of labels in other languages
# Step 1: Get list of any Wikidata item with NPG ID and anything that is a subclass of chemical hazard
# Step 2: Iterate through each item for invoked items and properties
# (for claim in claims; for subclaim in claim: 'Q' + str(subclaim['mainsnak']['data-value']['value']['numeric-id'])
# and subclaim['mainsnak']['property'] where claim[0]['datatype'] == 'wikibase-item')
# Step 3: De-duplicate to generate exhaustive list of each item/property of interest to NIOSH
# Step 4: Check labels: en, es, zh, fr, de
# Step 5: Prepare HTML table that lists each item/property of interest, highlighting cells where values are missing
# Step 6: Take percentages of coverage in each language; save to a timestamped log
import requests
@harej
harej / niosh_scraper.py
Last active September 25, 2015 15:21
A script to scrape the Pocket Guide to Chemical Hazards on NIOSH's website
# public domain
from bs4 import BeautifulSoup
import requests
def main():
manifest = {}
for id in range(1, 687): # starting with PGCH #1 and going to #686, the last one
if id == 553: # this one is irregular and should be skipped