Skip to content

Instantly share code, notes, and snippets.

@achinta
Last active November 7, 2024 23:16
Show Gist options
  • Save achinta/d461d1033c0b79005544af4a34db1bb9 to your computer and use it in GitHub Desktop.
Save achinta/d461d1033c0b79005544af4a34db1bb9 to your computer and use it in GitHub Desktop.
Forward Fill in Pyspark
import pyspark.sql.functions as F
from pyspark.sql import Window
df = spark.createDataFrame([
('d1',None),
('d2',10),
('d3',None),
('d4',30),
('d5',None),
('d6',None),
],('day','temperature'))
w_forward = Window.partitionBy().orderBy('day').rowsBetween(Window.unboundedPreceding,Window.currentRow)
w_backward = Window.partitionBy().orderBy('day').rowsBetween(Window.currentRow,Window.unboundedFollowing)
df.withColumn('fill_forward',F.last('temperature',ignorenulls=True).over(w_forward))\
.withColumn('fill_both',F.first('fill_forward',ignorenulls=True).over(w_backward)).show()
'''
+---+-----------+------------+---------+
|day|temperature|fill_forward|fill_both|
+---+-----------+------------+---------+
| d1| null| null| 10|
| d2| 10| 10| 10|
| d3| null| 10| 10|
| d4| 30| 30| 30|
| d5| null| 30| 30|
| d6| null| 30| 30|
+---+-----------+------------+---------+
'''
@ankit4488kumar
Copy link

Thank you, it's very useful.
But before using it we need to import some libraries:-

import pyspark.sql.functions as F
from pyspark.sql import Window

@nitesh0007-edith
Copy link

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment