Related Setup: https://gist.github.com/hofmannsven/6814278
Related Pro Tips: https://ochronus.com/git-tips-from-the-trenches/
| package org.aja.tej.examples.ml | |
| import org.aja.tej.utils.TejUtils | |
| import org.apache.spark.ml.classification.MultilayerPerceptronClassifier | |
| import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator | |
| import org.apache.spark.mllib.util.MLUtils | |
| import org.apache.spark.sql.SQLContext | |
| /** | |
| * Created by mageswaran on 25/9/15. |
| package org.aja.tej.examples | |
| import java.io.File | |
| import org.aja.tej.utils.TejUtils | |
| import org.apache.spark.{SparkConf, SparkContext} | |
| /** |
| package org.aja.tej.tej.test.spark | |
| /** | |
| * Created by mageswaran on 9/8/15. | |
| */ | |
| import java.util.Random | |
| import org.apache.spark.{SparkConf, SparkContext} |
| - Format: 7zipped | |
| - Files: | |
| - **badges**.xml | |
| - UserId, e.g.: "420" | |
| - Name, e.g.: "Teacher" | |
| - Date, e.g.: "2008-09-15T08:55:03.923" | |
| - **comments**.xml | |
| - Id | |
| - PostId | |
| - Score |
| /** | |
| * Get the stackexchange data from https://archive.org/details/stackexchange | |
| * Data set used here : math.stackexchange.com | |
| **/ | |
| //Open the file. The text file is an RDD (Resilient Distributed Dataset) | |
| //of Strings, which are the lines of the file. | |
| val postXML = sc.textFile("Posts.xml") | |
| //Count the lines. Note: Run twice and see the difference ;) |
Related Setup: https://gist.github.com/hofmannsven/6814278
Related Pro Tips: https://ochronus.com/git-tips-from-the-trenches/