Skip to content

Instantly share code, notes, and snippets.

@remeniuk
Created August 10, 2012 11:55
Show Gist options
  • Save remeniuk/3313788 to your computer and use it in GitHub Desktop.
Save remeniuk/3313788 to your computer and use it in GitHub Desktop.
// clusterization results
val outputClustersPath = new Path("job/output-clusters")
// textual dump of clusterization results
val dumpPath = "job/dump"
println("Running K-means...")
// runs K-means algorithm with up to 20 iterations, to find clusters of colluding players (assumption of collusion is
// made on the basis of number hand player together with other player[s])
KMeansDriver.run(conf, vectorsPath, inputClustersPath, outputClustersPath,
new CosineDistanceMeasure(), 0.01, 20, true, 0, false)
println("Printing results...")
// dumps clusters to a text file
val clusterizationResult = finalClusterPath(conf, outputClustersPath, 20)
val clusteredPoints = new Path(outputClustersPath, "clusteredPoints")
val clusterDumper = new ClusterDumper(clusterizationResult, clusteredPoints)
clusterDumper.setNumTopFeatures(10)
clusterDumper.setOutputFile(dumpPath)
clusterDumper.setTermDictionary(new Path(indexedDictionaryPath, "part-00000").toString,
"sequencefile")
clusterDumper.printClusters(null)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment