Skip to content

Instantly share code, notes, and snippets.

@psorianom
psorianom / text.py
Created August 23, 2014 13:40
Text feature extractor with okapi bm25 and delta idf
# -*- coding: utf-8 -*-
# Authors: Olivier Grisel <[email protected]>
# Mathieu Blondel <[email protected]>
# Lars Buitinck <[email protected]>
# Robert Layton <[email protected]>
# Jochen Wersdörfer <[email protected]>
# Roman Sinayev <[email protected]>
#
# License: BSD 3 clause
"""
@psorianom
psorianom / xml2txt.py
Created February 20, 2019 13:50
Extract text from CAPP XMLs
import xml.etree.ElementTree
import glob
texts = []
all_files = list(glob.glob('./extracted/*.xml'))
n_files = len(all_files)
with open("all_capp_new.txt", "w") as filo:
for i,f in enumerate(all_files):
print("Treating file {0} => {1}/{2}\n".format(f, i+1 , n_files))
e = xml.etree.ElementTree.parse(f).getroot()
'''Genreates a syntethic dataset (csv) of persons to test the SNU_assignator
Usage:
SNU_gen.py <o> [options]
Arguments:
<o> An output path to store the ysntethic data csv
-n PER Number of persons to generate [default: 2000:int]
-f FIL Representation proportion of the filiere. Ex: "0.1,0.1,...,0.1" (default: None)
-r RES Representation proportion of the residence Ex: "0.1,0.1,...,0.1" (default: None)
@psorianom
psorianom / tacos_de_carnitas.md
Created June 10, 2019 22:56
Tacos de carnitas

Tacos de Carnitas (porc)

Ingredients

  • 1 kg de viande de porc, combinaison maciza + costilla (à decouvrir les parties correspondantes en France ¯_(ツ)_/¯) ) avec de la graisse (emincée en petits morceaux).
  • 2 tasses de jus d'orange
  • 2 cullieres (cafe) de sel
  • 1 petit bouquet des herbes aromatiques (lequelles ?? :/ )
  • 2 cullieres de saindoux (??) de porc
@psorianom
psorianom / 1_MosesTokenizerSpans.py
Last active April 20, 2020 09:53
Moses tokenizer with spans. Built upon the Python's sacremoses port.
"""
Class that inherits MosesTokenizer and adds a method which returns the spans. Kinda flaky with the escape, unescape,
detokenize situation, so watch out!
"""
from sacremoses import MosesTokenizer, MosesDetokenizer
class MosesTokenizerSpans(MosesTokenizer):
def __init__(self, lang="en", custom_nonbreaking_prefixes_file=None):
MosesTokenizer.__init__(self, lang=lang,
@psorianom
psorianom / 1_serialize_deserialize_dash_component.py
Last active April 23, 2020 17:00
Serialization/Deserialization of Dash components
from importlib import import_module
from pprint import pprint
from typing import List, Dict
from dash.development.base_component import Component
from dash_html_components import Div
from dash_html_components import P ,Mark
from dash_interface.helper import serialize_components

Keybase proof

I hereby claim:

  • I am psorianom on github.
  • I am psoriano (https://keybase.io/psoriano) on keybase.
  • I have a public key ASBEtv4RYHXAyi-Dzj24fMUzLFjCWqwBS88Cg8Oxw0AY4Qo

To claim this, I am signing this object: