This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import re | |
| def changeSeparator(value): | |
| regex = re.compile(r'("(?:[^"]|"")*"|[^,"\n\r]*)(,|\r?\n|\r)') | |
| return re.sub(regex, r"\1|||", value) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python | |
| # encoding: utf-8 | |
| import tweepy | |
| import csv | |
| # Twitter API credentials | |
| consumer_key = "" | |
| consumer_secret = "" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import subprocess | |
| import json | |
| json_file = subprocess.run("wdtaxonomy Q634 -f json", shell=True, stdout=subprocess.PIPE).stdout.decode('utf8') | |
| print(json.loads(json_file)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # List unique values in a DataFrame column | |
| # h/t @makmanalp for the updated syntax! | |
| df['Column Name'].unique() | |
| # Convert Series datatype to numeric (will error if column has non-numeric values) | |
| # h/t @makmanalp | |
| pd.to_numeric(df['Column Name']) | |
| # Convert Series datatype to numeric, changing non-numeric values to NaN | |
| # h/t @makmanalp for the updated syntax! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| for i in *.pdf; do echo $i && pdfinfo "$i" | grep "^Pages:"; done | |
| //pour compter le total de pages | |
| for i in *.pdf; do pdfinfo "$i" | grep "^Pages:"; done | awk '{s+=$2} END {print s}' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pywikibot.data import api | |
| import pywikibot | |
| import pprint | |
| def search_entities(site, itemtitle): | |
| params = { 'action' :'wbsearchentities', | |
| 'format' : 'json', | |
| 'language' : 'en', | |
| 'type' : 'item', | |
| 'search': itemtitle} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from google import google | |
| def search_google(query, num_page): | |
| """ | |
| Search Google using the scraper https://github.com/abenassi/Google-Search-API | |
| Return the titles of the links and their descriptions. | |
| Other elements can be : | |
| name # The title of the link | |
| link # The external link |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from rosette.api import API, DocumentParameters, RosetteException | |
| def rosette(text): | |
| """ Run the example """ | |
| # Create an API instance | |
| api = API(user_key="YOUR_KEY", | |
| service_url="https://api.rosette.com/rest/v1/") | |
| params = DocumentParameters() | |
| params["content"] = text | |
| params["genre"] = "social-media" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # -*- coding: utf-8 -*- | |
| """ | |
| Test du Stanford NER tagger avec les modèles CRF d'Europeana | |
| entrainés sur des journaux : | |
| http://lab.kbresearch.nl/static/html/eunews.html | |
| La fonction est lente --> songer au multiprocessing | |
| """ | |
| from nltk.tag import StanfordNERTagger | |
| from nltk.tokenize import word_tokenize |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| library(jqr) | |
| data <- readr::read_file("tweets.json") | |
| data %>% keys() | |
| data %>% jq("{id: .id, hashtag: .entities.hashtags[].text}", | |
| "[.id, .hashtag]") %>% jsonlite::toJSON() | |
| stri <- "--h" |