#Window

In PySpark, a Window function is used to define a window specification for performing calculations across a group of rows related to the current row. These window functions allow operations like ranking, aggregation, and running totals, but without having to collapse the data (i.e., without using group-by operations). Instead, the window specification allows the calculation to be done while maintaining the individual row's context.

Syntax: -

from pyspark.sql.window import Window
from pyspark.sql.functions import col, rank, row_number

# Define a Window specification
windowSpec = Window.partitionBy("column_to_group").orderBy("column_to_order")
#orderBy(desc("column_to_order")) or orderBy(-"col_to_ordr")

# Apply a window function
df_with_rank = df.withColumn("rank", rank().over(windowSpec))

Example: -

from pyspark.sql.functions import sum

# Define window specification for cumulative sum
windowSpec = Window.partitionBy("Name").orderBy("Age").rowsBetween(Window.unboundedPreceding, Window.currentRow)

# Apply cumulative sum
df_with_cumsum = df.withColumn("cumulative_sum", sum("Age").over(windowSpec))

df_with_cumsum.show()
+-----+---+-------------+
| Name|Age|cumulative_sum|
+-----+---+-------------+
|Alice| 30|           30|
|Alice| 40|           70|
|Alice| 50|          120|
|  Bob| 30|           30|
|  Bob| 35|           65|
|  Bob| 45|          110|
+-----+---+-------------+