Skip to content

Instantly share code, notes, and snippets.

@jamesrajendran
Last active March 2, 2020 13:37
Show Gist options
  • Save jamesrajendran/a9e05852acc20881f33ef83591d24880 to your computer and use it in GitHub Desktop.
Save jamesrajendran/a9e05852acc20881f33ef83591d24880 to your computer and use it in GitHub Desktop.
Rdd --> DF --> Table --> SQL --> DS
val lrdd =sc.parallelize( List(1,2,3,4,5,3,5))
//without case class
val namedDF = sqlContext.createDataFrame(lrdd.map(Tuple1.apply)).toDF("Id")
//with case class
case class Dummy(Id: Int)
val namedDF = lrdd.map(x => Dummy(x.toInt)).toDF()
//one liner DF
val ldf = List(1,2,3,4,5,3,5).toDS().toDF()
namedDF.registerTempTable("l_table")
sqlContext.sql("select * from l_table").show
sqlContext.sql("select * from l_table where Id =3").show
sqlContext.sql("select * from l_table where Id in (3,1)").show
sqlContext.sql("select * from l_table where Id like '3%' ").show
sqlContext.sql("select * from l_table where Id like '3' ").show
sqlContext.sql("select id,count(*) from l_table group by Id ").show
===================
val ds = namedDF.as[Dummy]
ds.distinct.show
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment