Skip to content

Instantly share code, notes, and snippets.

@gbraccialli
Last active October 25, 2018 01:43
Show Gist options
  • Select an option

  • Save gbraccialli/7b67574de928b30a9d25fc09d0fa9a38 to your computer and use it in GitHub Desktop.

Select an option

Save gbraccialli/7b67574de928b30a9d25fc09d0fa9a38 to your computer and use it in GitHub Desktop.
case class Person (name: String, age: Int)
val people = List(Person("Guilherme", 35), Person("Isabela", 6), Person("Daniel", 3))
val rdd = sc.parallelize(people)
val df = rdd.toDF
val ds = rdd.toDS
//count letters
rdd.flatMap(p => p.name.toUpperCase.groupBy(n => n).mapValues(_.size)).reduceByKey(_ + _).foreach(println)
rdd.flatMap(p => p.name.toUpperCase).map(c => (c,1)).reduceByKey(_ + _).foreach(println)
df.flatMap(row => row.getAs[String]("name").toUpperCase.groupBy(n => n).map({case (key, value) => (key.toString, value.size)})).groupBy("_1").agg(count("*")).show
df.rdd.flatMap(row => row.getAs[String]("name").toUpperCase.groupBy(n => n).mapValues(_.size)).reduceByKey(_ + _).foreach(println)
df.rdd.flatMap(row => row.getAs[String]("name").toUpperCase).map(c => (c,1)).reduceByKey(_ + _).foreach(println)
ds.flatMap(p => p.name.toUpperCase.groupBy(n => n).map({case (key, value) => (key.toString, value.size)})).groupBy("_1").agg(count("*")).show
ds.rdd.flatMap(p => p.name.toUpperCase.groupBy(n => n).mapValues(_.size)).reduceByKey(_ + _).foreach(println)
ds.rdd.flatMap(p => p.name.toUpperCase.toUpperCase).map(c => (c,1)).reduceByKey(_ + _).foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment