Skip to content

Instantly share code, notes, and snippets.

@earino
Created February 4, 2014 03:34
Show Gist options
  • Select an option

  • Save earino/8797765 to your computer and use it in GitHub Desktop.

Select an option

Save earino/8797765 to your computer and use it in GitHub Desktop.
Better Logistic Example in Apache Spark
from pyspark.mllib.classification import LogisticRegressionWithSGD
from numpy import array
# Load and parse the data
data = sc.textFile("mllib/data/sample_svm_data.txt")
parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')]))
model = LogisticRegressionWithSGD.train(parsedData)
# Build the model
labelsAndPreds = parsedData.map(lambda point: (int(point.item(0)),
model.predict(point.take(range(1, point.size)))))
# Evaluating the model on training data
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count())
print("Training Error = " + str(trainErr))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment