Ettore Rizza ettorerizza

🏠

Working from home

Researcher & PhD student in Information Sciences & Technologies. Open Refine supporter.

ettorerizza / csv_to_sqlserver.ps

Last active April 25, 2020 16:49

bulk import a folder of csv into sql server, creating tables on the fly

	#Install-Module dbatools

	#In case of scripts are disabled, run first :
	#powershell -noprofile -ExecutionPolicy bypass


	import-module dbatools;

	Get-ChildItem -Path "C:\CSV_PATH" \| ForEach-Object {
	Import-DbaCsv -Csv $_.FullName -SQLInstance "DESKTOP-C5EUKT9" -Database "stagging" -AutoCreateTable

ettorerizza / most_common.py

Created March 12, 2020 08:10

Most common elements in a list (with ties)

	def most_commons(List):
	"""Return a new list with the most common elements
	in a list
	"""
	from collections import Counter
	count = Counter(List)
	freq_list = count.values()
	max_cnt = max(freq_list)
	total = freq_list.count(max_cnt)
	most_commons = count.most_common(total)

ettorerizza / sms_spam_detector.py

Last active January 16, 2020 09:40

	# Source : https://pythonprogramminglanguage.com/logistic-regression-spam-filter/
	# dataset : https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection

	import pandas as pd
	import numpy as np
	from sklearn.feature_extraction.text import TfidfVectorizer
	from sklearn.linear_model.logistic import LogisticRegression
	from sklearn.model_selection import train_test_split, cross_val_score

	df = pd.read_csv(r'C:/Users/student/Desktop/spam detect logistic regression python/SMSSpamCollection', delimiter='\t',header=None)

ettorerizza / post_request.py

Created January 2, 2020 06:47

How to use a POST APi with Jython in OpenRefine

	import urllib
	import urllib2
	import json


	url = 'https://api.monkeylearn.com/v3/classifiers/cl_pi3C7JiL/classify/'


	headers = {

ettorerizza / bookdown_to_pdf.py

Created November 16, 2019 14:16

to improve



	import os

	CHROME_PATH = r"/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome"
	url = "https://www.tidytextmining.com/tfidf.html"

	def url_to_pdf(url, filename):

	chrome_args = [CHROME_PATH,

ettorerizza / textfolder_to_csv.py

Last active November 3, 2019 13:52

Import the content of each files in a folder in a unique csv where each row contains the content of a file

	#!/usr/bin/env python3
	# -- coding: utf-8 --

	"""
	Import the content of each files in a folder in a unique csv
	where each row contains the content of a file

	Arguments:

	-i or --inputfolder : path to the folder containing the files

ettorerizza / scraping_4instance.py

Created October 6, 2019 12:24

Script destiné à scraper les noms de cabinettards sur un vieux site au HTML très pourri

	#!/usr/bin/env python
	#-- coding: utf-8 --

	"""
	Script destiné à scraper les noms de cabinettards sur le vieux site au HTML très pourri de 4instance
	"""

	# J'importe les modules externes qui seront nécessaires
	# A installer au préalable en ligne de commandes (ou dans le terminal de VSCode)
	# exemple : pip install bs4 ; pip install requests ; pip install pandas ; pip install regex

ettorerizza / marc2csv_mcmaster.py

Created July 1, 2019 11:23 — forked from mmccollow/marc2csv_mcmaster.py

	#!/usr/bin/env python

	import csv
	from pymarc import MARCReader
	from os import listdir
	from re import search

	# change this line to match your folder structure
	SRC_DIR = '/path/to/mrc/records'

ettorerizza / urls_to_pdf.py

Last active April 15, 2024 16:50

List of urls to PDF with headless chrome (Mac)

	#!/usr/bin/env python3
	# -- coding: utf-8 --

	import os
	import requests
	from bs4 import BeautifulSoup
	import glob
	from PyPDF2 import PdfFileMerger

	#Todo: debug this function

ettorerizza / import_viaf.pl

Created May 2, 2019 21:26 — forked from phochste/import_viaf.pl

Match authors against VIAF using Catmandu and Linked Data Fragments

	#!/usr/bin/env perl
	#
	# Match authors against VIAF
	#
	# License: http://dev.perl.org/licenses/artistic.html
	#
	# Author: Patrick Hochstenbach <[email protected]>
	#
	# Apr 2015
	$\|++;