Ettore Rizza ettorerizza

🏠

Working from home

Researcher & PhD student in Information Sciences & Technologies. Open Refine supporter.

50 followers · 148 following

ULB
Brussels, Belgium
https://twitter.com/Ettore_Rizza

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

ettorerizza / marc2csv_mcmaster.py

Created July 1, 2019 11:23 — forked from mmccollow/marc2csv_mcmaster.py

	#!/usr/bin/env python

	import csv
	from pymarc import MARCReader
	from os import listdir
	from re import search

	# change this line to match your folder structure
	SRC_DIR = '/path/to/mrc/records'

ettorerizza / import_viaf.pl

Created May 2, 2019 21:26 — forked from phochste/import_viaf.pl

Match authors against VIAF using Catmandu and Linked Data Fragments

	#!/usr/bin/env perl
	#
	# Match authors against VIAF
	#
	# License: http://dev.perl.org/licenses/artistic.html
	#
	# Author: Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be>
	#
	# Apr 2015
	$\|++;

ettorerizza / xml_split.py

Created April 20, 2019 16:36 — forked from benallard/xml_split.py

Small python script to split huge XML files into parts. It takes one or two parameters. The first is always the huge XML file, and the second the size of the wished chunks in Kb (default to 1Mb) (0 spilt wherever possible) The generated files are called like the original one with an index between the filename and the extension like that: bigxml.…

	#!/usr/bin/env python

	import os
	import xml.parsers.expat
	from xml.sax.saxutils import escape
	from optparse import OptionParser
	from math import log10


	# How much data we process at a time

ettorerizza / gist:a54ccefbb1059becd0e4fd41f82bc2be

Created June 13, 2018 22:09 — forked from hellbunnie/gist:dfca37537a80ec698a4cf9c773e4566a

Open Refine template for exporting tabular data to DRI-ready Dublin Core XML

	<qualifieddc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:dcterms="http://purl.org/dc/terms" xmlns:marcrel="http://www.loc.gov/marc.relators" xsi:schemaLocation="http://www.loc.gov/marc.relators http://imlsdcc2.grainger.illinois.edu/registry/marcrel.xsd" xsi:noNamespaceSchemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd">
	{{forNonBlank(cells["id"], v, "<dc:identifier>"+v.value+"</dc:identifier>", "")}}
	{{forNonBlank(cells["Title"], v, "<dc:title>"+v.value+"</dc:title>", "")}}
	{{forNonBlank(cells["Creator"], v, "<dc:creator>"+v.value+"</dc:creator>", "")}}
	{{forNonBlank(cells["Date"], v, "<dc:date>"+v.value+"</dc:date>", "")}}
	{{forNonBlank(cells["Description"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
	{{forNonBlank(cells["Description2"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
	{{forNonBlank(cells["Rights"], v, "<dc:rights>"+v.value+"</dc:rights>", "")}}
	{{forNonBlank(cells["Type"], v, "<dc:

ettorerizza / airbnb.r

Created July 24, 2017 07:58 — forked from t-andrew-do/airbnb.r

AirBnB Scraping Script

	library(stringr)
	library(purrr)
	library(rvest)

	#------------------------------------------------------------------------------#
	# Author: Andrew Do
	# Purpose: A bunch of utility functions for the main ScrapeCityToPage The goal
	# is to be able to scrape up to a specified page number for a given city and
	# then to store that information as a data frame. The resulting data frame will
	# be raw and will require additional cleaning, but the structure is more or less