Skip to content

Instantly share code, notes, and snippets.

@danbri
Created August 5, 2011 08:31
Show Gist options
  • Save danbri/1127131 to your computer and use it in GitHub Desktop.
Save danbri/1127131 to your computer and use it in GitHub Desktop.
mahout spectralkmeans --input a/ --output b/ --dimensions 1000 --clusters 10 --maxIter 10
Running on hadoop, using HADOOP_HOME=/Users/danbri/working/hadoop/hadoop-0.20.2
HADOOP_CONF_DIR=/Users/danbri/working/hadoop/hadoop-0.20.2/conf
MAHOUT-JOB: /Users/danbri/Documents/workspace/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
2011-08-05 10:26:28,833 WARN [org.apache.mahout.driver.MahoutDriver] - No spectralkmeans.props found on classpath, will use command-line arguments only
2011-08-05 10:26:29,188 INFO [org.apache.mahout.common.AbstractJob] - Command line arguments: {--clusters=10, --convergenceDelta=0.5, --dimensions=1000, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=a/, --maxIter=10, --output=b/, --startPhase=0, --tempDir=temp}
2011-08-05 10:26:29,901 INFO [org.apache.mahout.common.HadoopUtil] - Deleting b/calculations/seqfile-136
2011-08-05 10:26:30,079 WARN [org.apache.hadoop.mapred.JobClient] - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2011-08-05 10:26:31,459 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input paths to process : 1
2011-08-05 10:26:33,200 INFO [org.apache.hadoop.mapred.JobClient] - Running job: job_201108042119_0023
2011-08-05 10:26:34,212 INFO [org.apache.hadoop.mapred.JobClient] - map 0% reduce 0%
2011-08-05 10:27:02,319 INFO [org.apache.hadoop.mapred.JobClient] - map 33% reduce 0%
2011-08-05 10:27:05,698 INFO [org.apache.hadoop.mapred.JobClient] - map 55% reduce 0%
2011-08-05 10:27:07,747 INFO [org.apache.hadoop.mapred.JobClient] - map 85% reduce 0%
2011-08-05 10:27:11,313 INFO [org.apache.hadoop.mapred.JobClient] - map 100% reduce 0%
2011-08-05 10:27:26,431 INFO [org.apache.hadoop.mapred.JobClient] - map 100% reduce 100%
2011-08-05 10:27:28,439 INFO [org.apache.hadoop.mapred.JobClient] - Job complete: job_201108042119_0023
2011-08-05 10:27:28,637 INFO [org.apache.hadoop.mapred.JobClient] - Counters: 17
2011-08-05 10:27:28,637 INFO [org.apache.hadoop.mapred.JobClient] - Job Counters
2011-08-05 10:27:28,637 INFO [org.apache.hadoop.mapred.JobClient] - Launched reduce tasks=1
2011-08-05 10:27:28,637 INFO [org.apache.hadoop.mapred.JobClient] - Launched map tasks=1
2011-08-05 10:27:28,637 INFO [org.apache.hadoop.mapred.JobClient] - Data-local map tasks=1
2011-08-05 10:27:28,637 INFO [org.apache.hadoop.mapred.JobClient] - FileSystemCounters
2011-08-05 10:27:28,638 INFO [org.apache.hadoop.mapred.JobClient] - FILE_BYTES_READ=44000030
2011-08-05 10:27:28,638 INFO [org.apache.hadoop.mapred.JobClient] - HDFS_BYTES_READ=23880910
2011-08-05 10:27:28,638 INFO [org.apache.hadoop.mapred.JobClient] - FILE_BYTES_WRITTEN=66000068
2011-08-05 10:27:28,638 INFO [org.apache.hadoop.mapred.JobClient] - HDFS_BYTES_WRITTEN=9028077
2011-08-05 10:27:28,638 INFO [org.apache.hadoop.mapred.JobClient] - Map-Reduce Framework
2011-08-05 10:27:28,638 INFO [org.apache.hadoop.mapred.JobClient] - Reduce input groups=1000
2011-08-05 10:27:28,638 INFO [org.apache.hadoop.mapred.JobClient] - Combine output records=0
2011-08-05 10:27:28,639 INFO [org.apache.hadoop.mapred.JobClient] - Map input records=1000000
2011-08-05 10:27:28,639 INFO [org.apache.hadoop.mapred.JobClient] - Reduce shuffle bytes=0
2011-08-05 10:27:28,639 INFO [org.apache.hadoop.mapred.JobClient] - Reduce output records=1000
2011-08-05 10:27:28,639 INFO [org.apache.hadoop.mapred.JobClient] - Spilled Records=3000000
2011-08-05 10:27:28,639 INFO [org.apache.hadoop.mapred.JobClient] - Map output bytes=20000000
2011-08-05 10:27:28,639 INFO [org.apache.hadoop.mapred.JobClient] - Combine input records=0
2011-08-05 10:27:28,639 INFO [org.apache.hadoop.mapred.JobClient] - Map output records=1000000
2011-08-05 10:27:28,640 INFO [org.apache.hadoop.mapred.JobClient] - Reduce input records=1000000
2011-08-05 10:27:28,853 INFO [org.apache.mahout.common.HadoopUtil] - Deleting b/calculations/diagonal
2011-08-05 10:27:28,986 WARN [org.apache.hadoop.mapred.JobClient] - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2011-08-05 10:27:31,446 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input paths to process : 1
2011-08-05 10:27:32,970 INFO [org.apache.hadoop.mapred.JobClient] - Running job: job_201108042119_0024
2011-08-05 10:27:33,996 INFO [org.apache.hadoop.mapred.JobClient] - map 0% reduce 0%
2011-08-05 10:27:51,163 INFO [org.apache.hadoop.mapred.JobClient] - map 100% reduce 0%
2011-08-05 10:28:03,189 INFO [org.apache.hadoop.mapred.JobClient] - map 100% reduce 100%
2011-08-05 10:28:05,193 INFO [org.apache.hadoop.mapred.JobClient] - Job complete: job_201108042119_0024
2011-08-05 10:28:05,196 INFO [org.apache.hadoop.mapred.JobClient] - Counters: 17
2011-08-05 10:28:05,196 INFO [org.apache.hadoop.mapred.JobClient] - Job Counters
2011-08-05 10:28:05,196 INFO [org.apache.hadoop.mapred.JobClient] - Launched reduce tasks=1
2011-08-05 10:28:05,196 INFO [org.apache.hadoop.mapred.JobClient] - Launched map tasks=1
2011-08-05 10:28:05,197 INFO [org.apache.hadoop.mapred.JobClient] - Data-local map tasks=1
2011-08-05 10:28:05,197 INFO [org.apache.hadoop.mapred.JobClient] - FileSystemCounters
2011-08-05 10:28:05,197 INFO [org.apache.hadoop.mapred.JobClient] - FILE_BYTES_READ=14006
2011-08-05 10:28:05,197 INFO [org.apache.hadoop.mapred.JobClient] - HDFS_BYTES_READ=9028077
2011-08-05 10:28:05,289 INFO [org.apache.hadoop.mapred.JobClient] - FILE_BYTES_WRITTEN=28044
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - HDFS_BYTES_WRITTEN=8109
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Map-Reduce Framework
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Reduce input groups=1
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Combine output records=0
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Map input records=1000
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Reduce shuffle bytes=0
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Reduce output records=1
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Spilled Records=2000
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Map output bytes=12000
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Combine input records=0
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Map output records=1000
2011-08-05 10:28:05,290 INFO [org.apache.hadoop.mapred.JobClient] - Reduce input records=1000
2011-08-05 10:28:06,788 WARN [org.apache.hadoop.mapred.JobClient] - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2011-08-05 10:28:08,305 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input paths to process : 1
2011-08-05 10:28:09,623 INFO [org.apache.hadoop.mapred.JobClient] - Running job: job_201108042119_0025
2011-08-05 10:28:10,651 INFO [org.apache.hadoop.mapred.JobClient] - map 0% reduce 0%
2011-08-05 10:28:26,679 INFO [org.apache.hadoop.mapred.JobClient] - map 100% reduce 0%
2011-08-05 10:28:28,784 INFO [org.apache.hadoop.mapred.JobClient] - Job complete: job_201108042119_0025
2011-08-05 10:28:28,787 INFO [org.apache.hadoop.mapred.JobClient] - Counters: 7
2011-08-05 10:28:28,787 INFO [org.apache.hadoop.mapred.JobClient] - Job Counters
2011-08-05 10:28:28,788 INFO [org.apache.hadoop.mapred.JobClient] - Launched map tasks=1
2011-08-05 10:28:28,788 INFO [org.apache.hadoop.mapred.JobClient] - Data-local map tasks=1
2011-08-05 10:28:28,788 INFO [org.apache.hadoop.mapred.JobClient] - FileSystemCounters
2011-08-05 10:28:28,788 INFO [org.apache.hadoop.mapred.JobClient] - HDFS_BYTES_READ=9036189
2011-08-05 10:28:28,788 INFO [org.apache.hadoop.mapred.JobClient] - HDFS_BYTES_WRITTEN=9028077
2011-08-05 10:28:28,788 INFO [org.apache.hadoop.mapred.JobClient] - Map-Reduce Framework
2011-08-05 10:28:28,788 INFO [org.apache.hadoop.mapred.JobClient] - Map input records=1000
2011-08-05 10:28:28,789 INFO [org.apache.hadoop.mapred.JobClient] - Spilled Records=0
2011-08-05 10:28:28,789 INFO [org.apache.hadoop.mapred.JobClient] - Map output records=1000
2011-08-05 10:28:28,897 INFO [org.apache.mahout.math.decomposer.lanczos.LanczosSolver] - Finding 20 singular vectors of matrix with 1000 rows, via Lanczos
2011-08-05 10:28:30,896 INFO [org.apache.hadoop.mapred.FileInputFormat] - Total input paths to process : 2
Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/user/danbri/b/calculations/laplacian-240/tmp/data
at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:222)
at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:72)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:155)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:93)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/user/danbri/b/calculations/laplacian-240/tmp/data
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:457)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:214)
... 19 more
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment