Skip to content

Instantly share code, notes, and snippets.

View andrefs's full-sized avatar

André Santos andrefs

View GitHub Profile
@andrefs
andrefs / pulitzr_columns_example.json
Created July 24, 2015 17:06
pulitzr_columns_example
[{"title":"Economico","sortTitle":"Data","query":"filter=Source.Name eq
\"Economico\"&sort=_InternalRefs.RegisterDate&limit=20","destination":"main
","id":1430395633631,"idx":0,"domElement":{"0":{"jQuery11100528639918193221
1":226,"sizzle-1434637889294":{"parentNode":[2289,18,false]}},"length":1,"p
revObject":{"0":{},"context":{},"length":1},"context":{}}},{"title":"Expres
so","sortTitle":"Data","query":"filter=Source.Name eq
\"Expresso\"&sort=_InternalRefs.RegisterDate&limit=20","destination":"main"
,"id":1430401119919,"idx":1,"domElement":{"0":{"jQuery111005593397612683475
":225,"sizzle-1433410869016":{"parentNode":[3230,18,false]}},"length":1,"pr
evObject":{"0":{},"context":{},"length":1},"context":{}}},{"title":"Renasce
@andrefs
andrefs / upload_cover.sh
Last active August 29, 2015 14:26
upload_cover.sh
#!/bin/bash
DIR=$(dirname $0)
pubID="$1"
if [ -z "$pubID" ]; then
pubID='4090'
fi
FILE="$2"
if [ -z "$FILE" ]; then
FILE='4.png'
@andrefs
andrefs / newsletter.json
Created September 14, 2015 13:28
Estrutura newsletter
{
_id: ObjectId("55a53ba2325fdeb559ae14aa"),
Date: "2015-09-14 15:45:26",
Highlights: [
{
Title: "William Carvalho lesionado, Sporting acusa federação",
Description: "O jogador do Sporting William Carvalho tem uma fratura de stress na Tíbia e vai parar de 10 a 12 semanas. O médio, que esteve ao serviço da seleção nacional no Europeu de Sub-21, vai assim falhar a Supertaça, o play-off da Liga dos Campeões e o arranque de campeonato. O Sporting está indignado com o corpo clínico da Federação Portuguesa de Futebol.",
URL: "http://sicnoticias.sapo.pt/desporto/2015-07-14-William-Carvalho-lesionado-Sporting-acusa-federacao",
ProducerName: "SIC Notícias",
Categories: [
0x70833BdAaBF2a0efa738C847a1E7bbda3e537B68
@andrefs
andrefs / dump2txt.js
Created December 11, 2017 00:58
Script to retrieve news articles from a MongoDB
const mongo = require('promised-mongo');
const db = mongo('metacache', ['contents']);
const fs = require('fs-extra');
const htmlToText = require('html-to-text');
const Promise = require('bluebird');
async function dumpDocs(filter, folder, fields){
console.log('['+(new Date().toISOString())+'] Dumping',filter,'into '+folder);
const _filter = {
...filter,
@andrefs
andrefs / select_datasets
Created December 11, 2017 01:04
Split a dataset into train, eval and dev smaller subsets
#!/bin/sh
CATEGORIES=(desporto economia)
mkdir -p dev train 'eval'
for c in "${CATEGORIES[@]}"; do
for i in $(find ./_/$c -type f | sort -R | head); do
j=$(basename $i)
mv $i dev/${c}_${j}
@andrefs
andrefs / pre_process.R
Created December 11, 2017 01:16
Pre-processing the documents
library(tm)
####################################
# load the corpora and pre process #
####################################
economia.train = VCorpus(DirSource('../news/train/economia'), readerControl = list(reader = readPlain, language='pt'))
desporto.train = VCorpus(DirSource('../news/train/desporto'), readerControl = list(reader = readPlain, language='pt'))
economia.test = VCorpus(DirSource('../news/eval/economia'), readerControl = list(reader = readPlain, language='pt'))
@andrefs
andrefs / classif.R
Last active December 11, 2017 03:07
Classification using caret
##################
# classification #
##################
# decision trees
set.seed(15973)
dtree <- train(train.d, train.c.vector, method = 'rpart')
conf.mx.dt <- table(test.c, predict(dtree, test.d))
//IE Fix
if (typeof(NodeList.prototype.forEach)!==typeof(alert))
{
NodeList.prototype.forEach=Array.prototype.forEach;
}
function showContent(id){
// get all .content elements
@andrefs
andrefs / CETEMPublicoAnotado2019_10k.txt
Created May 6, 2019 08:51
First 10k lines of CETEMPublico Anotado 2019
<ext n=1 sec=clt sem=92b>
<t>
Um clt 92b um DET_arti 0 S M >N 0 0 0 1->2 ex-libris
revivalismo clt 92b revivalismo N 0 S M NPHR 0 0 0 2->0 ser
refrescante clt 92b refrescante ADJ 0 S M N< 0 0 0 3->2 TOPO
</t>
<p par=ext1-clt-92b-1>
<s>
O clt 92b o DET_artd 0 S M >N 0 0 0 1->5 ex-libris
7 clt 92b 7=e=meio NUM_card 0 S M SUBJ> 0 0 0 2->3 ser