Last active
March 19, 2023 06:46
-
-
Save j-thepac/938f6b0bb9b9b36bdc0ad339039e92ce to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql.window import Window | |
""" | |
Aggregate: min, max, avg, count, and sum. | |
Ranking: rank, dense_rank, percent_rank, row_num, and ntile | |
Analytical: cume_dist, lag, and lead | |
Custom boundary: rangeBetween and rowsBetween | |
""" | |
1. Only Order :lead(c) , lag(c) | |
Note: 1) perform OrderBy 2) lags | |
df.withColumn("c" , lead(df["id"],1).over(Window.orderBy(df.id))) | |
` select lead(c, 1) over ( order by c2 ) from T ` | |
2. Only partition: max(c), min(c) | |
df.withColumn("c" , max(df["id"]).over(Window.partitionBy(df.id))) | |
` select max(sal) over ( partition by age ) from T ` | |
3. Both partition and Order: first() ,last(), rank() ,dense_rank() , udf | |
df.withColumn("c" , rank().over(Window.orderBy(df.id).partitionBy(df.c))) | |
` SELECT name,country,sal, rank() over (partition by country order by sal asc) FROM EMP; ` | |
#As Variables | |
winSpec = Window.partitionBy("depName").orderBy("salary".desc) | |
empsalary.withColumn("rank", rank().over(winSpec)) | |
winSpec = Window.partitionBy("depName").orderBy("salary") | |
empsalary.withColumn("lag", lag("salary", 2).over(winSpec)) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment