Ettore Rizza ettorerizza

🏠

Working from home

Researcher & PhD student in Information Sciences & Technologies. Open Refine supporter.

ettorerizza / commas_to_other.py

Created May 6, 2018 10:41

Function to change the separator in a csv row

ettorerizza / get_all_tweets.py

Created March 2, 2018 11:54

Get all tweets (max 3200) of a Twitter account

ettorerizza / wdtaxonomy.py

Created February 28, 2018 17:28

Use a command line tool from Python, example

	import subprocess
	import json

	json_file = subprocess.run("wdtaxonomy Q634 -f json", shell=True, stdout=subprocess.PIPE).stdout.decode('utf8')

	print(json.loads(json_file))

ettorerizza / pandas_snippets.py

Created February 10, 2018 14:11

Usefull panda's snippets copied from ?

	# List unique values in a DataFrame column
	# h/t @makmanalp for the updated syntax!
	df['Column Name'].unique()

	# Convert Series datatype to numeric (will error if column has non-numeric values)
	# h/t @makmanalp
	pd.to_numeric(df['Column Name'])

	# Convert Series datatype to numeric, changing non-numeric values to NaN
	# h/t @makmanalp for the updated syntax!

ettorerizza / count_pdf_pages.sh

Created February 10, 2018 14:10

count pages of pdf in a folder

	for i in *.pdf; do echo $i && pdfinfo "$i" \| grep "^Pages:"; done

	//pour compter le total de pages

	for i in *.pdf; do pdfinfo "$i" \| grep "^Pages:"; done \| awk '{s+=$2} END {print s}'

ettorerizza / pywikidatabot.py

Last active February 10, 2018 12:40

search Wikidata in Python with pywikibot

	from pywikibot.data import api
	import pywikibot
	import pprint

	def search_entities(site, itemtitle):
	params = { 'action' :'wbsearchentities',
	'format' : 'json',
	'language' : 'en',
	'type' : 'item',
	'search': itemtitle}

ettorerizza / search_google.py

Created February 9, 2018 20:23

Search Google in Python

	from google import google


	def search_google(query, num_page):
	"""
	Search Google using the scraper https://github.com/abenassi/Google-Search-API
	Return the titles of the links and their descriptions.
	Other elements can be :
	name # The title of the link
	link # The external link

ettorerizza / rosette.py

Created February 4, 2018 12:14

rosette api test

	from rosette.api import API, DocumentParameters, RosetteException

	def rosette(text):
	""" Run the example """
	# Create an API instance
	api = API(user_key="YOUR_KEY",
	service_url="https://api.rosette.com/rest/v1/")
	params = DocumentParameters()
	params["content"] = text
	params["genre"] = "social-media"

ettorerizza / stanford_ner_europeana

Created February 4, 2018 12:08

Test du Stanford NER tagger avec les modèles CRF d'Europeana entrainés sur des journaux : http://lab.kbresearch.nl/static/html/eunews.html

	# -- coding: utf-8 --
	"""
	Test du Stanford NER tagger avec les modèles CRF d'Europeana
	entrainés sur des journaux :
	http://lab.kbresearch.nl/static/html/eunews.html
	La fonction est lente --> songer au multiprocessing
	"""

	from nltk.tag import StanfordNERTagger
	from nltk.tokenize import word_tokenize

ettorerizza / jq.R

Created December 28, 2017 20:43

How to use jq with R

	library(jqr)

	data <- readr::read_file("tweets.json")

	data %>% keys()

	data %>% jq("{id: .id, hashtag: .entities.hashtags[].text}",
	"[.id, .hashtag]") %>% jsonlite::toJSON()

	stri <- "--h"