Ettore Rizza ettorerizza

🏠

Working from home

Researcher & PhD student in Information Sciences & Technologies. Open Refine supporter.

49 followers · 144 following

ULB
Brussels, Belgium
https://twitter.com/Ettore_Rizza

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

ettorerizza / Search_Wikipedia.py

Last active June 24, 2016 14:01

# Ce script récupère une liste de noms et vérifie d'abord s'il existent dans Wikipedia.fr, puis dans Wikipedia.nl

	# -- coding: utf-8 --

	######################################################
	#
	# Ce script récupère une liste de noms et vérifie
	# d'abord s'il existent dans Wikipedia.fr, puis
	# dans Wikipedia.nl
	#
	######################################################

ettorerizza / create_column_openrefine.py

Last active March 7, 2017 11:21

This script takes as input a Json file of Open Refine and returns the same file in which each "transform" and each "mass edit" will be documented in a column

	#!/usr/bin/python3
	import json

	with open("test.json", "r") as infile:
	data = json.load(infile)

	def transform_to_addcolumn(data):
	data_trans = dict(data)
	data_trans["op"] = "core/column-addition"
	data_trans["expression"] = (

ettorerizza / refinetranslator.py

Last active April 28, 2018 00:04

a mini Python3 script that transforms a list of operations performed in Open Refine into a text file easier to read. To use it, paste your Open Refine "undo/redo" history in a file named, for example, "operations.json", place this file in the same folder as the Python script, and run this command : python refinetranslator.py operations.json

	#!/usr/bin/python3
	import json
	import sys

	with open(sys.argv[1], "r") as infile:
	data = json.load(infile)

	outfile = open(sys.argv[1]+".txt", 'w')
	count = 1

ettorerizza / merge_and_reshape_topics_matrice.R

Last active May 21, 2017 21:27

prend les matrices de plusieurs topic modellings et les reformate

	library(dplyr)
	library(data.table)
	library(stringr)

	#dossier contenant les fichiers
	setwd("C:/Users/ettor/Desktop/Eurovoc Topicmodeling/presidencies")

	#on merge les trois
	files <- list.files(path = getwd(),
	pattern = ".txt")

ettorerizza / Open Refine fingerprint function in R

Last active May 28, 2017 11:57

Given a character vector as input, get the key collision fingerprint for each element. Forked from refinr package.

	#' Get key collision fingerprints
	#'
	#' Given a character vector as input, get the key collision fingerprint for
	#' each element.
	#'
	#' Operations in order :
	#'
	#'-remove leading and trailing whitespace
	#'-change all characters to their lowercase representation
	#'-remove all punctuation and control characters

ettorerizza / parse_jrc-acquis

Created May 30, 2017 16:11

Script R pour parser les 26 000 XML/TEI du corpus européen JRC-Acquis et leur ajouter leurs descripteurs eurovoc

	library(XML)
	library(dplyr)
	library(stringr)
	library(readr)
	library(readxl)
	library(tidyr)

	#liste des fichiers XML du corpus JRC Acquis version anglaise (http://optima.jrc.it/Acquis/JRC-Acquis.3.0/corpus/jrc-en.tgz)
	liste <-
	list.files(

ettorerizza / levenshtein.py

Last active October 31, 2017 11:16

A function for calculating the Levensthein edit distance between columns with Jython in Open Refine

	def call_counter(func):
	def helper(args, *kwargs):
	helper.calls += 1
	return func(args, *kwargs)
	helper.calls = 0
	helper.__name__= func.__name__
	return helper
	memo = {}
	@call_counter
	def levenshtein(s, t):

ettorerizza / postag_refine.py

Created July 2, 2017 17:04

OpenRefine/jython POS tagging with parsetree

	import sys
	sys.path.append(r'D:\jython2.7.0\Lib\site-packages')
	from pattern.fr import parsetree

	sentences = parsetree(value, relations=True, lemmata=True)

	liste = []
	for s in sentences:
	for chunk in s.chunks:
	for w in chunk.words:

ettorerizza / sparql_refine.py

Created July 2, 2017 17:05

OpenRefine/Jython sparql query (find possible locations and persons in tokens)

	import sys
	sys.path.append(r'D:\jython2.7.0\Lib\site-packages')
	from SPARQLWrapper import SPARQLWrapper, JSON
	from langdetect import detect

	dbpedia_version = "http://dbpedia.org/sparql"

	#TEST
	value = "comptoir"

ettorerizza / extract_names.py

Last active July 9, 2017 08:53

Jython naive method to detect potential persons names in OpenRefine based on a list of first names

	from unidecode import unidecode

	with open(r"C:\Users\Boulot\Desktop\prenoms.txt", 'r') as f:
	prenoms = [name.strip().lower() for name in f]

	CHARS = "abcdefghijklmnopqrstuvwxyzéèàçüûùABCDEFGHIJKLMNOPQRSTUVWXYZ- "

	family_joint = ["d'", "de", "du", "der", "den", "vander", "vanden", "van", "le"]

	#TEST

OlderNewer