####Entity resolution is a common, yet difficult problem in data cleaning and integration. This lab will demonstrate how we can use Apache Spark to apply powerful and scalable text analysis techniques and perform entity resolution across two datasets of commercial products.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # aggregate 100 random walks, with | |
| # different start points: | |
| # in each case take a walk of ten steps | |
| # and add a 100-dimensional vector | |
| # to an aggregator (allwalks) | |
| # that has ten different entries, | |
| # for the ten possible steps of each | |
| # walk |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ##vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv## | |
| ## Download and examine deleted congress tweets ## | |
| ## Data Source: politwoops.sunlightfoundation.com ## | |
| ## Analysis: Katherine Ognyanova at www.kateto.net ## | |
| ## Visualizations: http://kateto.net/politwoops ## | |
| ##vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv## | |
| library(RJSONIO) | |
| library(RCurl) | |
| library(plyr) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/sh | |
| if [[ "${1}" == "http"* ]] ; then | |
| doi="${1}" | |
| else | |
| doi="http://dx.doi.org/${1}" | |
| fi | |
| # Stopped working around 2015-10-04. | |
| # curl -sLH "Accept: text/bibliography; style=bibtex" "${doi}" | sed 's/^ *//' |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
- Correlation is not causation (???)
- No causation without manipulation. (Holland)
- All models are wrong, some are useful. (Box)
- Statistics is the science of uncertainty. (arguably Tukey)
- Statistics is the science of learning from experience, especially experience that arrives a little bit at a time. (Efron)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| rm doit.out ; touch doit.out ; yes | head -200 | awk '{print "echo "NR" `lynx -dump '"'"'https://en.wikipedia.org/wiki/"NR"_(number)'"'"' | wc -l` >> doit.out"}' > ! /tmp/doit.sh ; source /tmp/doit.sh ; sort -n -k 2 < doit.out | head -5 |
Pop open "filter preferences" in adblock plus, and add the following rules to hide mentions from people who don't follow you (and who you don't follow).
For the interactions/notifications page:
twitter.com##.interaction-page [data-follows-you="false"][data-you-follow="false"]:not(.my-tweet)
For the mentions page:
twitter.com##.mentions-page [data-follows-you="false"][data-you-follow="false"]:not(.my-tweet)

