Skip to content

Instantly share code, notes, and snippets.

View malev's full-sized avatar

Marcos Vanetta malev

View GitHub Profile
@malev
malev / text_extractor.md
Created September 2, 2014 21:37
Text Extraction

TextExtractor

Requirements

  • Works with doc, odt and pdf
  • Works through an API
  • Can handle multiple files at the same time
  • Uses queues (maybe distributed)
  • It's doable
  • Works fast!
@malev
malev / bcycle_spots.js
Created August 30, 2014 17:14
Bcycle - Austin - Locations
var icon = '/Controls/StationLocationsMap/Images/marker-active.png';
var back = 'infowin-available';
var point = new google.maps.LatLng(30.26408, -97.74355);
kioskpoints.push(point);
var marker = new createMarker(point, "<div class='location'><strong>2nd & Congress</strong><br />151 E. 2nd St.<br />Austin<br />TX 78701</div><div class='avail'>Bikes available: <strong>8</strong><br />Docks available: <strong>5</strong></div><div></div>", icon, back, false);
markers.push(marker);
var icon = '/Controls/StationLocationsMap/Images/marker-active.png';
var back = 'infowin-available';
var point = new google.maps.LatLng(30.26634, -97.74378);
kioskpoints.push(point);
@malev
malev / gender.py
Created July 28, 2014 04:20
Gender detector API version
import json
from gender_detector import GenderDetector
from bottle import route, run, response
detector_ar = GenderDetector('ar')
detector_uk = GenderDetector('uk')
detector_us = GenderDetector('us')
@route('/<country>/<name>')
def index(country, name):
@malev
malev / scraper.rb
Created July 18, 2014 14:11
Turbot scraper example
require 'open-uri'
require 'json'
require 'mechanize'
require 'pdf-reader'
require 'turbotlib'
SOURCE_URL = "http://www.cityofchicago.org/city/en/depts/doit/supp_info/list_of_contractors.html"
@malev
malev / srap_ba.rb
Created May 13, 2014 16:45
Scrapper for list of names in BA
# encoding: UTF-8
require 'open-uri'
require 'nokogiri'
require 'csv'
def gen_url(offset=0)
if offset == 0
"http://www.buenosaires.gob.ar/areas/registrocivil/nombres/busqueda/buscador_nombres.php?&menu_id=16082"
else
@malev
malev / convert.rb
Created May 13, 2014 16:33
Clean Uruguay names & gender dataset
# encoding: UTF-8
require 'csv'
filename = 'nombre_nacim_por_anio_y_sexo.csv'
class Name
attr_reader :name, :gender, :male_count, :female_count, :year
def self.valid?(name)
@malev
malev / gender_detection.py
Created May 5, 2014 16:37
Gender detection testing app (genderPredictor)
import csv
from genderPredictor import genderPredictor
gp = genderPredictor()
gp.trainAndTest()
def gender(name):
output = 'unknown'
tmp = gp.classify(name)
@malev
malev / gender_detection.rb
Created May 5, 2014 16:36
Gender detection testing app
require 'csv'
require 'net/http'
require 'json'
require 'beauvoir'
require 'sexmachine'
names_with_gender = []
CSV.foreach('input.csv') do |row|
names_with_gender << row
@malev
malev / statify.py
Created April 22, 2014 19:19
Filter Medicare dataset by state
#!/usr/bin/env python
import os
import csv
import glob
import optparse
class CSVHandler:
def __init__(self, filename):
@malev
malev / answer.py
Last active August 29, 2015 14:00
Coreferencer
class Answer(object):
"""docstring for answer
>>> answer = Answer(1, [1,3])
>>> answer.included()
True
>>> answer.excluded()
False
>>> answer.includes()
[1, 3]
>>> answer = Answer(2, [1, 2, 3])