Skip to content

Instantly share code, notes, and snippets.

@bfraiche
bfraiche / build_grid.py
Last active April 2, 2019 22:08
This gist contains code snippets for my blogpost: 'Random Forest with Python and Spark ML'
from pyspark.ml.tuning import ParamGridBuilder
import numpy as np
paramGrid = ParamGridBuilder() \
.addGrid(rf.numTrees, [int(x) for x in np.linspace(start = 10, stop = 50, num = 3)]) \
.addGrid(rf.maxDepth, [int(x) for x in np.linspace(start = 5, stop = 25, num = 3)]) \
.build()
@bfraiche
bfraiche / build_cv.py
Created April 2, 2019 17:41
This gist contains code snippets for my blogpost: 'Random Forest with Python and Spark ML'
from pyspark.ml.tuning import CrossValidator
from pyspark.ml.evaluation import RegressionEvaluator
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=RegressionEvaluator(),
numFolds=3)
@bfraiche
bfraiche / best_hp.py
Last active April 2, 2019 22:18
This gist contains code snippets for my blogpost: 'Random Forest with Python and Spark ML'
print('numTrees - ', bestModel.getNumTrees)
print('maxDepth - ', bestModel.getOrDefault('maxDepth'))
@bfraiche
bfraiche / add_rf.py
Last active April 2, 2019 17:43
This gist contains code snippets for my blogpost: 'Random Forest with Python and Spark ML'
from pyspark.ml.regression import RandomForestRegressor
rf = RandomForestRegressor(labelCol="label", featuresCol="features")