Ettore Rizza ettorerizza

🏠

Working from home

Researcher & PhD student in Information Sciences & Technologies. Open Refine supporter.

49 followers · 144 following

ULB
Brussels, Belgium
https://twitter.com/Ettore_Rizza

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

ettorerizza / xml_split.py

Created April 20, 2019 16:36 — forked from benallard/xml_split.py

Small python script to split huge XML files into parts. It takes one or two parameters. The first is always the huge XML file, and the second the size of the wished chunks in Kb (default to 1Mb) (0 spilt wherever possible) The generated files are called like the original one with an index between the filename and the extension like that: bigxml.…

	#!/usr/bin/env python

	import os
	import xml.parsers.expat
	from xml.sax.saxutils import escape
	from optparse import OptionParser
	from math import log10


	# How much data we process at a time

ettorerizza / search_wikidata.py

Created March 7, 2019 13:08

Search the Wikidata API like in the website searchbar

	import requests

	api = "https://www.wikidata.org/w/api.php"
	query = "some search"
	params = {
	'action': 'query',
	'list': 'search',
	'format': 'json',
	'srsearch': query,
	'srprop': 'titlesnippet\|snippet',

ettorerizza / viaf_links.sh

Created January 14, 2019 11:03

analyze viaf dump

	#Dézippe le dernier dump de viaf links (http://viaf.org/viaf/data/viaf-20190107-links.txt.gz) et compte le nombre d'id #viaf uniques qui possèdent au moins un lien vers une édition de Wikipedia

	gunzip -kc viaf-20190107-links.txt.gz \| awk -F '\t' '/wikipedia/ {print $1}' \| sort \| uniq \| wc -l

ettorerizza / yago_sparql.py

Last active August 12, 2024 14:38

Yago sparql example in Python

	from SPARQLWrapper import SPARQLWrapper, JSON

	sparql = SPARQLWrapper("https://linkeddata1.calcul.u-psud.fr/sparql")
	sparql.setQuery("""
	select *
	where {
	<http://yago-knowledge.org/resource/Elvis_Presley> ?property ?valueOrObject .
	}
	LIMIT 100""")
	sparql.setReturnFormat(JSON)

ettorerizza / replace.html

Last active December 19, 2018 17:29

Replace text bookmarklet

<a "href=javascript:function replaceText(ot,nt,n){n=n||document.body;var cs=n.childNodes,i=0;while(n=cs[i]){if(n.nodeType==Node.TEXT_NODE){n.textContent=n.textContent.replace(ot,nt);}else{replaceText(ot,nt,n);};i++;}};replaceText('surréalisme','romantisme');">Surréaliste!</a>

ettorerizza / parse_lodcloud.py

Last active December 17, 2018 09:49

lodcloud to csv

	# -- coding: utf-8 --

	import requests
	import json
	import unicodecsv as csv

	url = "https://lod-cloud.net/lod-data.json"


	r = requests.get(url)

ettorerizza / wikidata_api_refine.py

Last active January 16, 2019 08:27

Example of Python/Jython script to extract Wikipedia sitelink from values reconcilied with Wikidata in OpenRefine

	import json
	import urllib2

	langs = ["fr", "en", "de", "nl"] # ordered list of languages you want to try until there is a match

	value = cell.recon.match.id

	for lang in langs:

	wiki = lang + "wiki"

ettorerizza / tropy2csv.py

Last active November 10, 2018 13:47

Tropy JsonLD to CSV

	import json
	import csv
	import sys

	# parseur Python pour les fichiers jsonLD de Tropy
	# Renvoie le fichier avec son nom d'origine, mais en CSV.
	# à utiliser en lignes de commande.
	# exemple d'usage avec un fichier Json de Tropy nommé resultats.json :
	#> python tropy_parser.py resultats.json

ettorerizza / open_refine_cli.py

Created October 21, 2018 10:54

Workaround to use openrefine-client in Python3

	import subprocess

	command_line = "openrefine-client_0-3-4_windows.exe --list"

	result = subprocess.run(command_line, shell=True, stdout=subprocess.PIPE).stdout.decode('utf8')

	print(result)

ettorerizza / split_camel_case.py

Created September 22, 2018 16:59

split a camel case string

	def camel_case_split(identifier):
	matches = finditer('.+?(?:(?<=[a-z])(?=[A-Z])\|(?<=[A-Z])(?=[A-Z][a-z])\|$)', identifier)
	return [m.group(0) for m in matches]

Newer Older