This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/ruby | |
# Script that consumes Mahout Collocations, sorts them. | |
# Example input: Key: 00 a.m: Value: 53.017824619466865 | |
# | |
# Once we have these, we can go back and look for associations between docs and these collocations | |
# e.g. find . -exec grep -il 'antibiotic sensitivity' {} \; | |
# several occurances of'antibiotic resistance' in paul_ewald_asks_can_we_domesticate_germs.html | |
# | |
# See also http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TellyClub:trunk danbri$ examples/bin/build-reuters.sh | |
Please select a number to choose the corresponding clustering algorithm | |
1. kmeans clustering | |
2. lda clustering | |
Enter your choice : 1 | |
ok. You chose 1 and we'll use kmeans Clustering | |
11/09/13 10:24:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11/09/13 10:30:23 INFO common.AbstractJob: Command line arguments: {--dictionary=mahout-work/reuters-out-seqdir-sparse-kmeans/dictionary.file-0, --dictionaryType=sequencefile, --endPhase=2147483647, --numWords=20, --seqFileDir=mahout-work/reuters-kmeans/clusters-10, --startPhase=0, --substring=100, --tempDir=temp} | |
:CL-15706{n=519 c=[0:0.014, 0.1:0.038, 0.2:0.013, 0.3:0.024, 0.4:0.013, 0.5:0.012, 0.7:0.031, 0.8:0.0 | |
Top Terms: | |
vs => 7.6343705245295475 | |
net => 4.940552704136725 | |
mln => 4.394683003884979 | |
shr => 4.391380775870616 | |
cts => 4.295353677231453 | |
loss => 4.157557884392711 | |
oper => 3.606452024051909 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# | |
# The Mahout command script | |
# | |
# Environment Variables | |
# | |
# MAHOUT_JAVA_HOME The java implementation to use. Overrides JAVA_HOME. | |
# | |
# MAHOUT_HEAPSIZE The maximum amount of heap to use, in MB. | |
# Default is 1000. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/ruby | |
# Read the BBC iPlayer site, and take note of the URLs for potentially playable items | |
# Currently we ignore the detail of embedded JSON, and just extract pids. | |
sitemap = `curl -s http://www.bbc.co.uk/iplayer/sitemap.xml.gz | gunzip - | grep '<loc>'` | |
done = [] | |
topics = [] | |
sitemap.each do |sm| |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select distinct ?d ?i ?c ?t WHERE { | |
?d <http://purl.org/dc/terms/subject> ?s . | |
?d <http://purl.org/dc/terms/title> ?t . | |
?d <http://purl.org/ontology/bibo/isbn> ?i . | |
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> ?c . | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Top Terms: | |
social_sciences => 6.3393223616576275 | |
power_ => 3.2031174961782436 | |
elite_ => 2.4367908532084943 | |
_united_states => 0.4032046124604249 | |
consensus_ => 0.272245196685248 | |
functionalism_ => 0.23277774145594696 | |
sociology => 0.18494324667173773 | |
philosophy => 0.18243420760402476 | |
_china => 0.17777614661383034 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"> | |
<key attr.name="label" attr.type="string" for="node" id="label"/> | |
<key attr.name="Edge Label" attr.type="string" for="edge" id="edgelabel"/> | |
<key attr.name="weight" attr.type="double" for="edge" id="weight"/> | |
<key attr.name="Edge Id" attr.type="string" for="edge" id="edgeid"/> | |
<key attr.name="r" attr.type="int" for="node" id="r"/> | |
<key attr.name="g" attr.type="int" for="node" id="g"/> | |
<key attr.name="b" attr.type="int" for="node" id="b"/> | |
<key attr.name="x" attr.type="float" for="node" id="x"/> | |
<key attr.name="y" attr.type="float" for="node" id="y"/> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath. | |
MAHOUT_LOCAL is set, running locally | |
117,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_1.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_100.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_101.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_102.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_103.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_104.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_109.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_110.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_111.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_118.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_119.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_120.txt,/The_Psychopath_Test__A_Journey_Through_the_Madness_Industry_121.txt,/The_Psychopath_Test_ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Key: zoo story: Value: 99.5501602332726 | |
Key: zina zina: Value: 24.524853792255954 | |
Key: zina tells: Value: 18.198242925620775 | |
Key: zina sammy: Value: 16.90048778841856 | |
Key: zina joe: Value: 15.925026676875632 | |
Key: zina her: Value: 10.134937961312062 | |
Key: yourself your: Value: 3.7305085407751903 | |
Key: yourself you: Value: 2.3517997572998866 | |
Key: yourself what: Value: 2.5757062135444357 | |
Key: yourself wandering: Value: 20.21486502949483 |