Skip to content

Instantly share code, notes, and snippets.

View ettorerizza's full-sized avatar
🏠
Working from home

Ettore Rizza ettorerizza

🏠
Working from home
View GitHub Profile
@ettorerizza
ettorerizza / xml_split.py
Created April 20, 2019 16:36 — forked from benallard/xml_split.py
Small python script to split huge XML files into parts. It takes one or two parameters. The first is always the huge XML file, and the second the size of the wished chunks in Kb (default to 1Mb) (0 spilt wherever possible) The generated files are called like the original one with an index between the filename and the extension like that: bigxml.…
#!/usr/bin/env python
import os
import xml.parsers.expat
from xml.sax.saxutils import escape
from optparse import OptionParser
from math import log10
# How much data we process at a time
@ettorerizza
ettorerizza / search_wikidata.py
Created March 7, 2019 13:08
Search the Wikidata API like in the website searchbar
import requests
api = "https://www.wikidata.org/w/api.php"
query = "some search"
params = {
'action': 'query',
'list': 'search',
'format': 'json',
'srsearch': query,
'srprop': 'titlesnippet|snippet',
@ettorerizza
ettorerizza / viaf_links.sh
Created January 14, 2019 11:03
analyze viaf dump
@ettorerizza
ettorerizza / yago_sparql.py
Last active August 12, 2024 14:38
Yago sparql example in Python
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("https://linkeddata1.calcul.u-psud.fr/sparql")
sparql.setQuery("""
select *
where {
<http://yago-knowledge.org/resource/Elvis_Presley> ?property ?valueOrObject .
}
LIMIT 100""")
sparql.setReturnFormat(JSON)
@ettorerizza
ettorerizza / replace.html
Last active December 19, 2018 17:29
Replace text bookmarklet
<a "href=javascript:function replaceText(ot,nt,n){n=n||document.body;var cs=n.childNodes,i=0;while(n=cs[i]){if(n.nodeType==Node.TEXT_NODE){n.textContent=n.textContent.replace(ot,nt);}else{replaceText(ot,nt,n);};i++;}};replaceText('surréalisme','romantisme');">Surréaliste!</a>
@ettorerizza
ettorerizza / parse_lodcloud.py
Last active December 17, 2018 09:49
lodcloud to csv
# -*- coding: utf-8 -*-
import requests
import json
import unicodecsv as csv
url = "https://lod-cloud.net/lod-data.json"
r = requests.get(url)
@ettorerizza
ettorerizza / wikidata_api_refine.py
Last active January 16, 2019 08:27
Example of Python/Jython script to extract Wikipedia sitelink from values reconcilied with Wikidata in OpenRefine
import json
import urllib2
langs = ["fr", "en", "de", "nl"] # ordered list of languages you want to try until there is a match
value = cell.recon.match.id
for lang in langs:
wiki = lang + "wiki"
@ettorerizza
ettorerizza / tropy2csv.py
Last active November 10, 2018 13:47
Tropy JsonLD to CSV
import json
import csv
import sys
# parseur Python pour les fichiers jsonLD de Tropy
# Renvoie le fichier avec son nom d'origine, mais en CSV.
# à utiliser en lignes de commande.
# exemple d'usage avec un fichier Json de Tropy nommé resultats.json :
#> python tropy_parser.py resultats.json
@ettorerizza
ettorerizza / open_refine_cli.py
Created October 21, 2018 10:54
Workaround to use openrefine-client in Python3
import subprocess
command_line = "openrefine-client_0-3-4_windows.exe --list"
result = subprocess.run(command_line, shell=True, stdout=subprocess.PIPE).stdout.decode('utf8')
print(result)
@ettorerizza
ettorerizza / split_camel_case.py
Created September 22, 2018 16:59
split a camel case string
def camel_case_split(identifier):
matches = finditer('.+?(?:(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)', identifier)
return [m.group(0) for m in matches]