Skip to content

Instantly share code, notes, and snippets.

View ettorerizza's full-sized avatar
🏠
Working from home

Ettore Rizza ettorerizza

🏠
Working from home
View GitHub Profile
@ettorerizza
ettorerizza / dbpedia_wikidata_federated_sparql_query.py
Last active September 13, 2018 12:32
dbpedia_wikidata_federated_sparql_query
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://linkeddata.uriburner.com/sparql")
sparql.setQuery("""
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
@ettorerizza
ettorerizza / NLTK_NER.py
Created July 16, 2018 11:57
stanford ner with nltk
# -*- coding: utf-8 -*-
"""
Test du Stanford NER tagger avec les modèles CRF d'Europeana
entrainés sur des journaux :
http://lab.kbresearch.nl/static/html/eunews.html
La fonction est lente --> songer au multiprocessing
"""
import warnings
with warnings.catch_warnings():
warnings.filterwarnings("ignore",category=DeprecationWarning)
@ettorerizza
ettorerizza / scrapeDeathsWikipedia.r
Created July 14, 2018 08:16
Récupère dans Wikipedia.fr la liste des personnes décédées par année et par mois depuis 2010
library(rvest)
library(dplyr)
#création des urls de départ
main_url <- "https://fr.wikipedia.org/wiki/Décès_en_"
annee <- 2010:2018
urls <- paste0(main_url, annee)
#vecteur vide qui contiendra les urls des pages
liens <- vector()
@ettorerizza
ettorerizza / wikipedia_gsrsearch.py
Last active June 24, 2018 20:25
Do a Wikipedia search for `query` and return a list of tuples (page title, url). Use ngram to get the best match in the list
import requests
import functools
from ngram import NGram
def get_similar(data, target):
G = NGram(target)
return G.find(data)
class cache(object):
import subprocess
console = subprocess.run('openrefine_cli.exe --help', shell=True, stdout=subprocess.PIPE).stdout.decode('utf8')
print(console)
@ettorerizza
ettorerizza / get_nominatim.py
Last active June 18, 2018 07:49
Nominatim API alternative to geopy
import requests
import requests_cache
from time import sleep
# Nominatim bloque les requêtes répétées
requests_cache.install_cache('nominatim_cache')
def get_nominatim(value, countrycodes=['BE',''], limit=5, lang="fr"):
# doc : https://wiki.openstreetmap.org/wiki/Nominatim
@ettorerizza
ettorerizza / get_wikidata.py
Last active June 18, 2018 07:30
match with wikidata using openrefine api
# -*- coding: utf-8 -*-
"""
Created on Fri Jan 5 12:35:54 2018
@author: ettor
"""
import pandas as pd
import requests
import requests_cache
@ettorerizza
ettorerizza / glance.py
Created June 15, 2018 08:20
aperçu d'un gros dictionnaire Python
import itertools
def glance(d):
"""aperçu d'un échantillon non ordonné d'un dictionnaire"""
return dict(itertools.islice(d.items(), 3))
glance(json_file)
@ettorerizza
ettorerizza / gist:a54ccefbb1059becd0e4fd41f82bc2be
Created June 13, 2018 22:09 — forked from hellbunnie/gist:dfca37537a80ec698a4cf9c773e4566a
Open Refine template for exporting tabular data to DRI-ready Dublin Core XML
<qualifieddc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:dcterms="http://purl.org/dc/terms" xmlns:marcrel="http://www.loc.gov/marc.relators" xsi:schemaLocation="http://www.loc.gov/marc.relators http://imlsdcc2.grainger.illinois.edu/registry/marcrel.xsd" xsi:noNamespaceSchemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd">
{{forNonBlank(cells["id"], v, "<dc:identifier>"+v.value+"</dc:identifier>", "")}}
{{forNonBlank(cells["Title"], v, "<dc:title>"+v.value+"</dc:title>", "")}}
{{forNonBlank(cells["Creator"], v, "<dc:creator>"+v.value+"</dc:creator>", "")}}
{{forNonBlank(cells["Date"], v, "<dc:date>"+v.value+"</dc:date>", "")}}
{{forNonBlank(cells["Description"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
{{forNonBlank(cells["Description2"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
{{forNonBlank(cells["Rights"], v, "<dc:rights>"+v.value+"</dc:rights>", "")}}
{{forNonBlank(cells["Type"], v, "<dc:
@ettorerizza
ettorerizza / parse_lodcloud.py
Created June 11, 2018 21:50
Write json keys in a text file
import json
from pprint import pprint
with open(r'C:\Users\ettor\Desktop\lod-data.json') as f:
data = json.load(f)
myfile = open('test.txt', 'a')
for key in data.keys():