Created
August 10, 2012 11:55
-
-
Save remeniuk/3313788 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// clusterization results | |
val outputClustersPath = new Path("job/output-clusters") | |
// textual dump of clusterization results | |
val dumpPath = "job/dump" | |
println("Running K-means...") | |
// runs K-means algorithm with up to 20 iterations, to find clusters of colluding players (assumption of collusion is | |
// made on the basis of number hand player together with other player[s]) | |
KMeansDriver.run(conf, vectorsPath, inputClustersPath, outputClustersPath, | |
new CosineDistanceMeasure(), 0.01, 20, true, 0, false) | |
println("Printing results...") | |
// dumps clusters to a text file | |
val clusterizationResult = finalClusterPath(conf, outputClustersPath, 20) | |
val clusteredPoints = new Path(outputClustersPath, "clusteredPoints") | |
val clusterDumper = new ClusterDumper(clusterizationResult, clusteredPoints) | |
clusterDumper.setNumTopFeatures(10) | |
clusterDumper.setOutputFile(dumpPath) | |
clusterDumper.setTermDictionary(new Path(indexedDictionaryPath, "part-00000").toString, | |
"sequencefile") | |
clusterDumper.printClusters(null) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment