Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Created October 26, 2020 19:53
Show Gist options
  • Save lakshay-arora/a49924c2fd0e0387a5078de509a923cb to your computer and use it in GitHub Desktop.
Save lakshay-arora/a49924c2fd0e0387a5078de509a923cb to your computer and use it in GitHub Desktop.
# parallelizing data collection
my_list = [1, 2, 3, 4, 5]
my_list_rdd = sc.parallelize(my_list)
## 2. Referencing to external data file
file_rdd = sc.textFile("path_of_file")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment