This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sh build-reuters.sh | |
Please select a number to choose the corresponding clustering algorithm | |
1. kmeans clustering | |
2. lda clustering | |
Enter your choice : 1 | |
ok. You chose 1 and we'll use kmeans Clustering | |
Downloading Reuters-21578 | |
% Total % Received % Xferd Average Speed Time Time Time Current | |
Dload Upload Total Spent Left Speed | |
100 7959k 100 7959k 0 0 105k 0 0:01:15 0:01:15 --:--:-- 155k |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<message from="[email protected]/TellyClub" type="chat" to="[email protected]/danko2" > | |
<body>{"id":"b008v131","pid":"b008v131", "video":"http://g.bbcredux.com/programme/bbcthree/2011-01-25/23-00-00","title":"HELLO LIBBY", "image":"http://upload.wikimedia.org/wikipedia/commons/6/6d/Rick_Astley_-_Pepsifest_2009.jpg","description":"Some description goes here", "nick":"danko2"}</body> | |
</message> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Script started on Fri Sep 2 13:15:22 2011 | |
bash-3.2$ MAHOUT_LOCAL=true sh colloc-reuters.sh | |
./bin/mahout seqdirectory -i ./examples/bin/work/reuters-out/ -o ./examples/bin/work/reuters-out-seqdir -c UTF-8 -chunk 5 | |
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath. | |
MAHOUT_LOCAL is set, running locally | |
CLASSPATH: :/Users/danbri/working/android/sdk:/Users/danbri/working/mahout/trunk/src/conf:/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/tools.jar:/Users/danbri/working/mahout/trunk/mahout-*.jar:/Users/danbri/working/mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar:/Users/danbri/working/mahout/trunk/mahout-examples-*-job.jar:/Users/danbri/working/mahout/trunk/lib/*.jar:/Users/danbri/working/mahout/trunk/examples/target/dependency/antlr-2.7.7.jar:/Users/danbri/working/mahout/trunk/examples/target/dependency/antlr-3.2.jar:/Users/danbri/working/mahout/trunk/examples/target/dependency/antlr-runtime-3.2.jar:/Users/danbri/working/mahout/trunk/examples/target/depen |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
# Running in top level directory, per http://permalink.gmane.org/gmane.comp.apache.mahout.user/5689 | |
# via https://cwiki.apache.org/MAHOUT/collocations.html | |
# I've tried this from top level dir, both with and without MAHOUT_LOCAL=true set. | |
# In both cases, I get seemingly nothing. | |
# | |
# e.g. running with cluster I got two files, and analysing | |
# ./bin/mahout seqdumper -s part-r-00001 ...gives |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ cat miglib.pig | |
-- Mahout Pig integration | |
-- only proper piglatin can go in an imported macro; file-management, jar registration etc. has | |
-- to be run via .pig files. | |
-- We need piggybank.jar for reading Mahout's Hadoop Sequence files, plus other utilities: | |
-- | |
REGISTER /Users/bandri/working/pig/pig-0.9.0/contrib/piggybank/java/piggybank.jar; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
grunt> mydir = seqdirectory('ted/txt/', 'ted/foo', IGNORE); | |
grunt> dump mydir; | |
2011-09-04 17:33:58,860 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: NATIVE | |
2011-09-04 17:33:59,373 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false | |
2011-09-04 17:33:59,558 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 | |
2011-09-04 17:33:59,558 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3 | |
2011-09-04 17:33:59,631 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job | |
2011-09-04 17:33:59,654 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
_:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . | |
_:genid1 <http://xmlns.com/foaf/0.1/nick> "giul"@en . | |
_:genid1 <http://xmlns.com/foaf/0.1/name> "giul"@en . | |
_:genid1 <http://www.livejournal.org/rss/lj/1.0/journaltitle> "Eariel - t.A.T.u. Live Journal"@en . | |
_:genid1 <http://xmlns.com/foaf/0.1/openid> <http://giul.livejournal.com/> . | |
<http://www.livejournal.com/directory.bml?opt_sort=ut&s_loc=1&loc_cn=IT> <http://purl.org/dc/elements/1.1/title> "IT" . | |
_:genid1 <http://blogs.yandex.ru/schema/foaf/country> <http://www.livejournal.com/directory.bml?opt_sort=ut&s_loc=1&loc_cn=IT> . | |
<http://www.livejournal.com/directory.bml?opt_sort=ut&s_loc=1&loc_cn=IT&loc_st=&loc_ci=Rome> <http://purl.org/dc/elements/1.1/title> "Rome" . | |
_:genid1 <http://blogs.yandex.ru/schema/foaf/city> <http://www.livejournal.com/directory.bml?opt_sort=ut&s_loc=1&loc_cn=IT&loc_st=&loc_ci=Rome> . | |
_:genid1 <http://xmlns.com/foaf/0.1/img> <http://l-userpic.livejournal.com/94039030/23437353> . |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TellyClub:trunk danbri$ sh spectral.sh | |
Running on hadoop, using HADOOP_HOME=/Users/danbri/working/hadoop/hadoop-0.20.2 | |
HADOOP_CONF_DIR=/Users/danbri/working/hadoop/hadoop-0.20.2/conf | |
MAHOUT-JOB: /Users/danbri/working/mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar | |
11/09/07 14:22:46 WARN driver.MahoutDriver: No spectralkmeans.props found on classpath, will use command-line arguments only | |
11/09/07 14:22:46 INFO common.AbstractJob: Command line arguments: {--clusters=2, --convergenceDelta=0.5, --dimensions=37, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=speccy, --maxIter=10, --output=specout, --startPhase=0, --tempDir=temp} | |
11/09/07 14:22:46 INFO common.HadoopUtil: Deleting specout/calculations/seqfile-248 | |
11/09/07 14:22:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. | |
11/09/07 14:22:51 INFO input.FileInputFormat: Total input paths to process : 2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Index: core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java | |
=================================================================== | |
--- core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java (revision 1163723) | |
+++ core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java (working copy) | |
@@ -78,6 +78,9 @@ | |
FileInputFormat.addInputPath(job, markovPath); | |
FileOutputFormat.setOutputPath(job, outputPath); | |
+ | |
+ job.setJarByClass(VectorMatrixMultiplicationJob.class); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TellyClub:trunk danbri$ sh spectral.sh | |
Running on hadoop, using HADOOP_HOME=/Users/danbri/working/hadoop/hadoop-0.20.2 | |
HADOOP_CONF_DIR=/Users/danbri/working/hadoop/hadoop-0.20.2/conf | |
MAHOUT-JOB: /Users/danbri/working/mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar | |
11/09/07 14:37:49 WARN driver.MahoutDriver: No spectralkmeans.props found on classpath, will use command-line arguments only | |
11/09/07 14:37:49 INFO common.AbstractJob: Command line arguments: {--clusters=2, --convergenceDelta=0.5, --dimensions=37, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=speccy, --maxIter=10, --output=specout, --startPhase=0, --tempDir=temp} | |
11/09/07 14:37:50 INFO common.HadoopUtil: Deleting specout/calculations/seqfile-112 | |
11/09/07 14:37:50 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. | |
11/09/07 14:37:51 INFO input.FileInputFormat: Total input paths to process : |