- Graphs: http://boston.lti.cs.cmu.edu/spalakod/
- Source: https://github.com/shriphani/clueweb-disqus
- Machines:
- Index pages: Boston-Cluster : compute-1-11
- Posts: Boston-Cluster : compute-1-10
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sys | |
if __name__ == '__main__': | |
filename = sys.argv[1] | |
with open(filename, 'r') as handle: | |
for line in handle: | |
components = line.strip().split('.') | |
components.reverse() | |
print '.'.join(components) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
cd /bos/tmp19/spalakod/clueweb12pp/kba/2013_sample_social_corpus/ | |
for filename in $(ls *.xz.gpg) | |
do | |
gpg -d $filename > ${filename/xz.gpg/xz} | |
echo "removing $filename" | |
rm $filename | |
done |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
cd /bos/tmp19/spalakod/clueweb12pp/kba/2013_sample_social_corpus/ | |
for filename in $(ls *.xz.gpg) | |
do | |
gpg -d $filename > ${filename/xz.gpg/xz} | |
echo "removing $filename" | |
rm $filename | |
done |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
file_listing="/bos/tmp19/spalakod/clueweb12pp/kba/2013_sample_social_corpus/kba_first_10_days_files.txt" | |
netloc_and_path="http://s3.amazonaws.com/aws-publicdatasets/trec/kba/kba-streamcorpus-2014-v0_3_0/" | |
cd /bos/tmp19/spalakod/clueweb12pp/kba/2013_sample_social_corpus/ | |
for filename in $(cat $file_listing) | |
do | |
wget $netloc_and_path$filename |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://scififantasyforum.com/ | |
http://updatebaba.com/discussion/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Rank Brand Brand Value ($bil) 1-Yr Value Change (%) Brand Revenue ($bil) Company Advertising ($mil) Industry | |
1 | |
Apple | |
104.3 20 156.5 1,100 Technology | |
2 | |
Microsoft | |
56.7 4 77.8 2,600 Technology | |
3 | |
Coca-Cola | |
54.9 9 23.5 3,342 Beverages |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(ns cleaner-fuck-up.core | |
(:import (org.htmlcleaner HtmlCleaner))) | |
(defn process-page | |
[page-src] | |
(let [cleaner (new HtmlCleaner) | |
props (.getProperties cleaner)] | |
(.clean cleaner page-src))) | |
(defn blah |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; gorilla-repl.fileformat = 1 | |
;; ** | |
;;; # Gorilla REPL | |
;;; | |
;;; Welcome to gorilla :-) Shift + enter evaluates code. Poke the question mark (top right) to learn more ... | |
;; ** | |
;; @@ | |
(use 'gorilla-plot.core) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<html><head><link href="http://fonts.googleapis.com/css?family=Arvo:400,700,400italic,700italic|Lora:400,700,400italic,700italic" rel="stylesheet" type="text/css" /><link href="http://yandex.st/highlightjs/8.0/styles/default.min.css" rel="stylesheet" type="text/css" /><script src="http://yandex.st/highlightjs/8.0/highlight.min.js"></script><style> | |
body { | |
/*padding-top: 40px;*/ | |
} | |
div#contents { | |
margin-left: 10%; | |
margin-right: 10%; | |
} |