Skip to content

Instantly share code, notes, and snippets.

@yashwanth2804
Last active March 21, 2019 00:55
Show Gist options
  • Save yashwanth2804/9a16fc8d3fde74111d2dd33e4317f79d to your computer and use it in GitHub Desktop.
Save yashwanth2804/9a16fc8d3fde74111d2dd33e4317f79d to your computer and use it in GitHub Desktop.
UDF
import static org.apache.spark.sql.functions.*;
import org.apache.spark.sql.expressions.UserDefinedFunction;
StructField [] sf1 = new StructField[] {
DataTypes.createStructField("uid",DataTypes.IntegerType, true),
DataTypes.createStructField("mid",DataTypes.IntegerType,true),
DataTypes.createStructField("rating",DataTypes.IntegerType, true),
DataTypes.createStructField("time",DataTypes.IntegerType, true),
};
StructType st1 = DataTypes.createStructType(sf1);
Dataset<Row> mv = spark
.read()
.schema(st1)
.format("com.databricks.spark.csv")
.option("delimiter", "\t")
.csv("/home/hasura/Desktop/SparkData/u.data");
UserDefinedFunction increaserating = udf(
(Integer s) -> s+1,DataTypes.IntegerType
);
mv.
withColumn("rating",increaserating.apply(mv.col("rating")))
.show();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment