thangarajan8 · August 24, 2021 12:51
diff --git a/Apache Spark Repartition vs coalesce.txt b/Apache Spark Repartition vs coalesce.txt
 Repatition
  1. create even number of records in resultant partitions so the resources are consumed equally
  2. Go for full shuffle so it will cost effective
  3. used to increase or decerase number of partitions

 Coalesce:
  1. Create un-even number of records in resultant partitions due to this load will be un-balanced
  2. won't go for full shuffle so it will be fast
  3. used to decrease number of partitions
  
  
  in RDD creation we can specify the number of partition we want. But in dataframe we cannot.
	Repatition
	1. create even number of records in resultant partitions so the resources are consumed equally
	2. Go for full shuffle so it will cost effective
	3. used to increase or decerase number of partitions

	Coalesce:
	1. Create un-even number of records in resultant partitions due to this load will be un-balanced
	2. won't go for full shuffle so it will be fast
	3. used to decrease number of partitions


	in RDD creation we can specify the number of partition we want. But in dataframe we cannot.