Skip to content

Instantly share code, notes, and snippets.

View soldni's full-sized avatar
🏳️‍🌈
vibing!

Luca Soldaini soldni

🏳️‍🌈
vibing!
View GitHub Profile
import re
import codecs
class WarcHeader(dict):
def __init__(self):
dict.__init__(self)
self.__dict__ = self
@soldni
soldni / index_wikipedia.sh
Created September 22, 2016 20:18
Index wikipedia to elasticsearch
# This script is largely based on this forum post from the elastic blog
# https://www.elastic.co/blog/loading-wikipedia
# variables
es="$ES_HOST:$ES_PORT"
site="en.wikipedia.org"
index="enwiki"
indexDate="20160919"
indexType="content"
import os
import json
from bs4 import BeautifulSoup
def get_extracted_concepts(doc, ctakes_doc_content):
ctakes_doc = BeautifulSoup(ctakes_doc_content, 'xml')
umls_concepts = []
for cas_FSArray in ctakes_doc.find_all('uima.cas.FSArray'):
@soldni
soldni / cloudSettings
Last active September 23, 2020 06:03
Settings for Visual Studio Code
{"lastUpload":"2020-09-23T06:02:58.659Z","extensionVersion":"v3.4.3"}
from collections import namedtuple
def NamedTuple(key, data):
data = [k[0] for k in data]
return namedtuple(key, data)
class __List:
def __getitem__(_, elem):
return None
function prompt
{
# How many characters of the $PWD should be kept
local pwdmaxlen=30
# Indicator that there has been directory truncation:
local trunc_symbol="..."
if [ ${#PWD} -gt $pwdmaxlen ]
then
local pwdoffset=$(( ${#PWD} - $pwdmaxlen ))
newPWD="${trunc_symbol}${PWD:$pwdoffset:$pwdmaxlen}"
@soldni
soldni / _run_shell.py
Created July 10, 2017 14:06
Call any command line process from Python.
#!/usr/bin/env python
"""
Call any command line process from Python.
Useful as a wrapper in PyCharm or other IDEs that
do not support running scripts in any other language
other than the one(s) supported by the IDE.
"""
@soldni
soldni / movie_reviews.py
Created December 22, 2017 03:36
sentiment analysis of movie reviews. (proof of concept with mxnet)
# coding: utf-8
# In[1]:
from thinc.extra import datasets
import mxnet as mx
import random
import re
import tqdm