Skip to content

Instantly share code, notes, and snippets.

View ettorerizza's full-sized avatar
🏠
Working from home

Ettore Rizza ettorerizza

🏠
Working from home
View GitHub Profile
#!/usr/bin/env python
import csv
from pymarc import MARCReader
from os import listdir
from re import search
# change this line to match your folder structure
SRC_DIR = '/path/to/mrc/records'
@ettorerizza
ettorerizza / import_viaf.pl
Created May 2, 2019 21:26 — forked from phochste/import_viaf.pl
Match authors against VIAF using Catmandu and Linked Data Fragments
#!/usr/bin/env perl
#
# Match authors against VIAF
#
# License: http://dev.perl.org/licenses/artistic.html
#
# Author: Patrick Hochstenbach <[email protected]>
#
# Apr 2015
$|++;
@ettorerizza
ettorerizza / xml_split.py
Created April 20, 2019 16:36 — forked from benallard/xml_split.py
Small python script to split huge XML files into parts. It takes one or two parameters. The first is always the huge XML file, and the second the size of the wished chunks in Kb (default to 1Mb) (0 spilt wherever possible) The generated files are called like the original one with an index between the filename and the extension like that: bigxml.…
#!/usr/bin/env python
import os
import xml.parsers.expat
from xml.sax.saxutils import escape
from optparse import OptionParser
from math import log10
# How much data we process at a time
@ettorerizza
ettorerizza / gist:a54ccefbb1059becd0e4fd41f82bc2be
Created June 13, 2018 22:09 — forked from hellbunnie/gist:dfca37537a80ec698a4cf9c773e4566a
Open Refine template for exporting tabular data to DRI-ready Dublin Core XML
<qualifieddc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:dcterms="http://purl.org/dc/terms" xmlns:marcrel="http://www.loc.gov/marc.relators" xsi:schemaLocation="http://www.loc.gov/marc.relators http://imlsdcc2.grainger.illinois.edu/registry/marcrel.xsd" xsi:noNamespaceSchemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd">
{{forNonBlank(cells["id"], v, "<dc:identifier>"+v.value+"</dc:identifier>", "")}}
{{forNonBlank(cells["Title"], v, "<dc:title>"+v.value+"</dc:title>", "")}}
{{forNonBlank(cells["Creator"], v, "<dc:creator>"+v.value+"</dc:creator>", "")}}
{{forNonBlank(cells["Date"], v, "<dc:date>"+v.value+"</dc:date>", "")}}
{{forNonBlank(cells["Description"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
{{forNonBlank(cells["Description2"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
{{forNonBlank(cells["Rights"], v, "<dc:rights>"+v.value+"</dc:rights>", "")}}
{{forNonBlank(cells["Type"], v, "<dc:
@ettorerizza
ettorerizza / airbnb.r
Created July 24, 2017 07:58 — forked from t-andrew-do/airbnb.r
AirBnB Scraping Script
library(stringr)
library(purrr)
library(rvest)
#------------------------------------------------------------------------------#
# Author: Andrew Do
# Purpose: A bunch of utility functions for the main ScrapeCityToPage The goal
# is to be able to scrape up to a specified page number for a given city and
# then to store that information as a data frame. The resulting data frame will
# be raw and will require additional cleaning, but the structure is more or less