Skip to content

Instantly share code, notes, and snippets.

View fjavieralba's full-sized avatar

Javier Alba fjavieralba

View GitHub Profile
@fjavieralba
fjavieralba / non_overlapping_tagging.py
Created March 23, 2012 10:51
[PYTHON] Non overlapping tagging of a sentence based on a dictionary of expressions
def non_overlapping_tagging(sentence, dict, max_key_size):
"""
Result is only one tagging of all the possible ones.
The resulting tagging is determined by these two priority rules:
- longest matches have higher priority
- search is made from left to right
"""
tag_sentence = []
N = len(sentence)
if max_key_size == -1:
@fjavieralba
fjavieralba / NonOverlappingTagging.java
Created March 23, 2012 10:52
[JAVA] Non overlapping tagging of a sentence based on a dictionary of expressions
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
/*
Result is only one tagging of all the possible ones.
The resulting tagging is determined by these two priority rules:
- longest matches have higher priority
- search is made from left to right
*/
@fjavieralba
fjavieralba / preprocessing_text.py
Last active June 21, 2017 06:15
Simple wrapper classes for Splitting and POS-Tagging text using NLTK
text = """What can I say about this place. The staff of the restaurant is nice and the eggplant is not bad. Apart from that, very uninspired food, lack of atmosphere and too expensive. I am a staunch vegetarian and was sorely dissapointed with the veggie options on the menu. Will be the last time I visit, I recommend others to avoid."""
splitter = Splitter()
postagger = POSTagger()
splitted_sentences = splitter.split(text)
print splitted_sentences
[['What', 'can', 'I', 'say', 'about', 'this', 'place', '.'], ['The', 'staff', 'of', 'the', 'restaurant', 'is', 'nice', 'and', 'eggplant', 'is', 'not', 'bad', '.'], ['apart', 'from', 'that', ',', 'very', 'uninspired', 'food', ',', 'lack', 'of', 'atmosphere', 'and', 'too', 'expensive', '.'], ['I', 'am', 'a', 'staunch', 'vegetarian', 'and', 'was', 'sorely', 'dissapointed', 'with', 'the', 'veggie', 'options', 'on', 'the', 'menu', '.'], ['Will', 'be', 'the', 'last', 'time', 'I', 'visit', ',', 'I', 'recommend', 'others', 'to', 'avoid', '.']]
@fjavieralba
fjavieralba / dictionary_tagger.py
Last active September 6, 2018 09:50
Python class for tagging text with dictionaries
class DictionaryTagger(object):
def __init__(self, dictionary_paths):
files = [open(path, 'r') for path in dictionary_paths]
dictionaries = [yaml.load(dict_file) for dict_file in files]
map(lambda x: x.close(), files)
self.dictionary = {}
self.max_key_size = 0
for curr_dict in dictionaries:
for key in curr_dict:
if key in self.dictionary:
@fjavieralba
fjavieralba / basic_sentiment_score.py
Created October 28, 2012 20:36
Basic measure of sentiment score of a tagged text
def value_of(sentiment):
if sentiment == 'positive': return 1
if sentiment == 'negative': return -1
return 0
def sentiment_score(review):
return sum ([value_of(tag) for sentence in dict_tagged_sentences for token in sentence for tag in token[2]])
@fjavieralba
fjavieralba / gist:4633857
Created January 25, 2013 11:50
A simple but maybe useful distance definition for texts
from difflib import SequenceMatcher
def distance(url1, url2):
ratio = SequenceMatcher(None, url1, url2).ratio()
return 1.0 - ratio
@fjavieralba
fjavieralba / KafkaLocal.java
Last active March 23, 2021 09:57
Embedding Kafka+Zookeeper for testing purposes. Tested with Apache Kafka 0.8
import java.io.IOException;
import java.util.Properties;
import kafka.server.KafkaConfig;
import kafka.server.KafkaServerStartable;
public class KafkaLocal {
public KafkaServerStartable kafka;
public ZooKeeperLocal zookeeper;
@fjavieralba
fjavieralba / 0_reuse_code.js
Created June 9, 2014 08:49
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@fjavieralba
fjavieralba / python_resources.md
Created June 9, 2014 08:50 — forked from jookyboi/python_resources.md
Python-related modules and guides.

Packages

  • lxml - Pythonic binding for the C libraries libxml2 and libxslt.
  • boto - Python interface to Amazon Web Services
  • Django - Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.
  • Fabric - Library and command-line tool for streamlining the use of SSH for application deployment or systems administration task.
  • PyMongo - Tools for working with MongoDB, and is the recommended way to work with MongoDB from Python.
  • Celery - Task queue to distribute work across threads or machines.
  • pytz - pytz brings the Olson tz database into Python. This library allows accurate and cross platform timezone calculations using Python 2.4 or higher.

Guides

import logging
import logging.handlers
import sys
if len(sys.argv) < 2:
print "ERROR: usage: syslog_generator.py <NAME>"
exit(1)
my_logger = logging.getLogger('MyLogger')
my_logger.setLevel(logging.DEBUG)