Skip to content

Instantly share code, notes, and snippets.

@jamesrajendran
Last active March 2, 2020 13:37
Show Gist options
  • Save jamesrajendran/e11760c84894cc6561ea9172e4b2566e to your computer and use it in GitHub Desktop.
Save jamesrajendran/e11760c84894cc6561ea9172e4b2566e to your computer and use it in GitHub Desktop.
The difference between map, flatMap is a little confusing for beginers - this example might help:
This can be tested on a spark shell or scala CLI:
scala> val l = List(1,2,3,4,5)
scala> l.map(x => List(x-1, x, x+1))
res1: List[List[Int]] = List(List(0, 1, 2), List(1, 2, 3), List(2, 3, 4), List(3, 4, 5), List(4, 5, 6))
scala> l.flatMap(x => List(x-1, x, x+1))
res2: List[Int] = List(0, 1, 2, 1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6)
===========================================================================
Example with a file:
//create data in a file
echo "This is a simple example
This is to test map function
This also is used to test flatMap function
This will demystify the difference between map and flatmap" >> words.txt
//load the file data - replace the local file location with yours
//remove "file:" if the accessed file is in HDFS
//look at carefully the result of each command below
//map will result in Nested Array while flatMap will result in a single/flattened Array
val textFile = sc.textFile("file:/home/cloudera/spark_script/words.txt")
textFile.collect
textFile.map(line => (line,1)).collect
textFile.map(line => line.split(" ")).collect
textFile.flatMap(line => line.split(" ")).collect
textFile.map(line => line.split(" ")).map(x => (x,1)).collect
textFile.flatMap(line => line.split(" ")).map(x => (x,1)).collect
//get the wordcount
textFile.flatMap(line => line.split(" ")).map(x => (x,1)).reduceByKey(_+_).collect
//if we replace flatMap with map it will result in "Cannot use map-side combining with array keys"
textFile.map(line => line.split(" ")).map(x => (x,1)).reduceByKey(_+_).collect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment