Skip to content

Instantly share code, notes, and snippets.

View ettorerizza's full-sized avatar
🏠
Working from home

Ettore Rizza ettorerizza

🏠
Working from home
View GitHub Profile
@ettorerizza
ettorerizza / commas_to_other.py
Created May 6, 2018 10:41
Function to change the separator in a csv row
import re
def changeSeparator(value):
regex = re.compile(r'("(?:[^"]|"")*"|[^,"\n\r]*)(,|\r?\n|\r)')
return re.sub(regex, r"\1|||", value)
@ettorerizza
ettorerizza / get_all_tweets.py
Created March 2, 2018 11:54
Get all tweets (max 3200) of a Twitter account
#!/usr/bin/env python
# encoding: utf-8
import tweepy
import csv
# Twitter API credentials
consumer_key = ""
consumer_secret = ""
@ettorerizza
ettorerizza / wdtaxonomy.py
Created February 28, 2018 17:28
Use a command line tool from Python, example
import subprocess
import json
json_file = subprocess.run("wdtaxonomy Q634 -f json", shell=True, stdout=subprocess.PIPE).stdout.decode('utf8')
print(json.loads(json_file))
@ettorerizza
ettorerizza / pandas_snippets.py
Created February 10, 2018 14:11
Usefull panda's snippets copied from ?
# List unique values in a DataFrame column
# h/t @makmanalp for the updated syntax!
df['Column Name'].unique()
# Convert Series datatype to numeric (will error if column has non-numeric values)
# h/t @makmanalp
pd.to_numeric(df['Column Name'])
# Convert Series datatype to numeric, changing non-numeric values to NaN
# h/t @makmanalp for the updated syntax!
@ettorerizza
ettorerizza / count_pdf_pages.sh
Created February 10, 2018 14:10
count pages of pdf in a folder
for i in *.pdf; do echo $i && pdfinfo "$i" | grep "^Pages:"; done
//pour compter le total de pages
for i in *.pdf; do pdfinfo "$i" | grep "^Pages:"; done | awk '{s+=$2} END {print s}'
@ettorerizza
ettorerizza / pywikidatabot.py
Last active February 10, 2018 12:40
search Wikidata in Python with pywikibot
from pywikibot.data import api
import pywikibot
import pprint
def search_entities(site, itemtitle):
params = { 'action' :'wbsearchentities',
'format' : 'json',
'language' : 'en',
'type' : 'item',
'search': itemtitle}
@ettorerizza
ettorerizza / search_google.py
Created February 9, 2018 20:23
Search Google in Python
from google import google
def search_google(query, num_page):
"""
Search Google using the scraper https://github.com/abenassi/Google-Search-API
Return the titles of the links and their descriptions.
Other elements can be :
name # The title of the link
link # The external link
@ettorerizza
ettorerizza / rosette.py
Created February 4, 2018 12:14
rosette api test
from rosette.api import API, DocumentParameters, RosetteException
def rosette(text):
""" Run the example """
# Create an API instance
api = API(user_key="YOUR_KEY",
service_url="https://api.rosette.com/rest/v1/")
params = DocumentParameters()
params["content"] = text
params["genre"] = "social-media"
@ettorerizza
ettorerizza / stanford_ner_europeana
Created February 4, 2018 12:08
Test du Stanford NER tagger avec les modèles CRF d'Europeana entrainés sur des journaux : http://lab.kbresearch.nl/static/html/eunews.html
# -*- coding: utf-8 -*-
"""
Test du Stanford NER tagger avec les modèles CRF d'Europeana
entrainés sur des journaux :
http://lab.kbresearch.nl/static/html/eunews.html
La fonction est lente --> songer au multiprocessing
"""
from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize
@ettorerizza
ettorerizza / jq.R
Created December 28, 2017 20:43
How to use jq with R
library(jqr)
data <- readr::read_file("tweets.json")
data %>% keys()
data %>% jq("{id: .id, hashtag: .entities.hashtags[].text}",
"[.id, .hashtag]") %>% jsonlite::toJSON()
stri <- "--h"