Skip to content

Instantly share code, notes, and snippets.

@kvnkho
Last active February 9, 2022 03:56
Show Gist options
  • Save kvnkho/a2e795ad876f6c9839a1228717c25acb to your computer and use it in GitHub Desktop.
Save kvnkho/a2e795ad876f6c9839a1228717c25acb to your computer and use it in GitHub Desktop.
# Pandas
df.groupby("col1")["col2"].median()
# PySpark
from pyspark.sql import Window
import pyspark.sql.functions as F
med_func = F.expr('percentile_approx(col2, 0.5, 20)')
df.groupBy('col1').agg(med_func).show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment