- Spark manipulates input datas as RDD, which basically are distributed datasets.
- RDD transformations (map) are lazy. It's like a roadmap of transformations to operate over dataset. But lazy, still.
- RDD actions evaluates transformations and reduces in order to generate and return the result.
- RDD transformations are re-evaluated on each actions by default unless you cache them
tips