James Hare harej

Knowledge graph engineer

harej / citoid.py

Created April 22, 2016 19:31

Creates a CSV based on Citoid output

	import requests
	import csv
	from collections import defaultdict


	def get_citation(inputstring):
	r = requests.get("https://citoid.wikimedia.org/api?format=mediawiki&search=" + inputstring)
	return r.json()[0]

harej / npg_gap_analysis.py

Last active June 9, 2016 21:20

Generates list of items and properties used on NPG-related Wikidata entries and assesses existence of labels in other languages

	# Step 1: Get list of any Wikidata item with NPG ID and anything that is a subclass of chemical hazard
	# Step 2: Iterate through each item for invoked items and properties
	# (for claim in claims; for subclaim in claim: 'Q' + str(subclaim['mainsnak']['data-value']['value']['numeric-id'])
	# and subclaim['mainsnak']['property'] where claim[0]['datatype'] == 'wikibase-item')
	# Step 3: De-duplicate to generate exhaustive list of each item/property of interest to NIOSH
	# Step 4: Check labels: en, es, zh, fr, de
	# Step 5: Prepare HTML table that lists each item/property of interest, highlighting cells where values are missing
	# Step 6: Take percentages of coverage in each language; save to a timestamped log

	import requests

harej / niosh_scraper.py

Last active September 25, 2015 15:21

A script to scrape the Pocket Guide to Chemical Hazards on NIOSH's website

	# public domain

	from bs4 import BeautifulSoup
	import requests

	def main():
	manifest = {}

	for id in range(1, 687): # starting with PGCH #1 and going to #686, the last one
	if id == 553: # this one is irregular and should be skipped