Skip to content

Instantly share code, notes, and snippets.

@d4rkc0de
Last active December 14, 2022 17:26
Show Gist options
  • Save d4rkc0de/8904d6fa17a1da4ec60c242a1369ee76 to your computer and use it in GitHub Desktop.
Save d4rkc0de/8904d6fa17a1da4ec60c242a1369ee76 to your computer and use it in GitHub Desktop.
Both of these two functions take two arguments: start and end of the frame and they can be specified as follows:
- Window.unboundedPreceding, Window.unboundedFollowing — the entire window from the beginning to the end
- Window.unboundedPreceding, Window.currentRow — from the beginning of the window to the current row, this is used for the cumulative sum
- using numerical values, for example, 0 means currentRow, but the meaning of other values can differ based on the framing function rowsBetween/rangeBetween.
df.withColumn('activity_sum', sum('activity').over(w))
https://miro.medium.com/max/1400/1*WYO-zRP1SlrzGqT4S_5Jvw.webp
More details:
https://towardsdatascience.com/spark-sql-102-aggregations-and-window-functions-9f829eaa7549
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment