Skip to content

Instantly share code, notes, and snippets.

@xtrmstep
Created January 26, 2023 11:49
Show Gist options
  • Save xtrmstep/952ac0f63319d22899c4592e6ef28d77 to your computer and use it in GitHub Desktop.
Save xtrmstep/952ac0f63319d22899c4592e6ef28d77 to your computer and use it in GitHub Desktop.
Calculating the size of a Spark data frame
files = [
"file://path"
]
df = spark.read.json(files)
catalyst_plan = df._jdf.queryExecution().logical()
df_size_read = spark._jsparkSession.sessionState().executePlan(catalyst_plan).optimizedPlan().stats().sizeInBytes()
@xtrmstep
Copy link
Author

Hm... @VibhavariBellutagi19 maybe it's not working on your version of Spark. It is some undocumented feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment