Skip to content

Instantly share code, notes, and snippets.

View igorbrigadir's full-sized avatar

Igor Brigadir igorbrigadir

View GitHub Profile
# aggregate 100 random walks, with
# different start points:
# in each case take a walk of ten steps
# and add a 100-dimensional vector
# to an aggregator (allwalks)
# that has ten different entries,
# for the ten possible steps of each
# walk
@iaincollins
iaincollins / Bills with tags
Last active August 29, 2015 14:14
Combining UK Parliament data + NLP + BBC Things in node.js to tag Bills by topic
Tags for Employment Practices Bill:
{ 'Scottish Parliament':
{ label: 'Scottish Parliament',
hint: 'The Scottish Parliament is the devolved national, unicameral legislature of Scotland, located in the Holyrood area of the capital, Edinburgh. ',
uri: 'http://www.bbc.co.uk/things/59ab9b46-cb29-4394-bea7-59b2d6c74bc2#id',
properties: [Function] },
Wales:
{ label: 'Wales',
hint: 'a nation of the United Kingdom of Great Britain and Northern Ireland',
uri: 'http://www.bbc.co.uk/things/00eb010f-568a-4b89-bbfe-799d5b812bed#id',
@dmasad
dmasad / Intro_to_ICEWS.ipynb
Last active February 4, 2017 18:12
Intro to ICEWS in Python
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
##vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv##
## Download and examine deleted congress tweets ##
## Data Source: politwoops.sunlightfoundation.com ##
## Analysis: Katherine Ognyanova at www.kateto.net ##
## Visualizations: http://kateto.net/politwoops ##
##vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv##
library(RJSONIO)
library(RCurl)
library(plyr)

version 1.0.3 #Spark Logo + Python Logo

Text Analysis and Entity Resolution

####Entity resolution is a common, yet difficult problem in data cleaning and integration. This lab will demonstrate how we can use Apache Spark to apply powerful and scalable text analysis techniques and perform entity resolution across two datasets of commercial products.

Entity Resolution, or "[Record linkage][wiki]" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. Our terms with the same meaning include, "entity disambiguation/linking", duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration", and "conflation".

Entity Resol

@achabotl
achabotl / doi2bib
Last active June 24, 2016 15:30
doi2bib
#!/bin/sh
if [[ "${1}" == "http"* ]] ; then
doi="${1}"
else
doi="http://dx.doi.org/${1}"
fi
# Stopped working around 2015-10-04.
# curl -sLH "Accept: text/bibliography; style=bibtex" "${doi}" | sed 's/^ *//'
@kylemcdonald
kylemcdonald / _tsne.pdf
Last active February 22, 2024 22:13
Exploring antonyms with word2vec.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@johnmyleswhite
johnmyleswhite / statistical_maxims.md
Created December 1, 2015 15:25
Statistical Maxims
  • Correlation is not causation (???)
  • No causation without manipulation. (Holland)
  • All models are wrong, some are useful. (Box)
  • Statistics is the science of uncertainty. (arguably Tukey)
  • Statistics is the science of learning from experience, especially experience that arrives a little bit at a time. (Efron)
rm doit.out ; touch doit.out ; yes | head -200 | awk '{print "echo "NR" `lynx -dump '"'"'https://en.wikipedia.org/wiki/"NR"_(number)'"'"' | wc -l` >> doit.out"}' > ! /tmp/doit.sh ; source /tmp/doit.sh ; sort -n -k 2 < doit.out | head -5
@eevee
eevee / gist:55426e5856f5825317b1
Last active January 28, 2021 22:51
adblock rules to hide mentions from people who don't follow you

Pop open "filter preferences" in adblock plus, and add the following rules to hide mentions from people who don't follow you (and who you don't follow).

For the interactions/notifications page:

twitter.com##.interaction-page [data-follows-you="false"][data-you-follow="false"]:not(.my-tweet)

For the mentions page:

twitter.com##.mentions-page [data-follows-you="false"][data-you-follow="false"]:not(.my-tweet)