#Function

In PySpark, the function datediff() is used to calculate the difference in days between two dates or timestamps. It returns an integer value representing the number of days between the two dates.

Syntax: -

from pyspark.sql.functions import datediff

# datediff(end_date, start_date)

If the dates are in string format, use to_date() to convert them to DateType:

from pyspark.sql.functions import to_date

# Sample DataFrame with string dates
data = [("2023-01-15", "2024-12-27")]
columns = ["Start_Date", "End_Date"]

df = spark.createDataFrame(data, columns)

# Convert string to date and calculate difference in days
df_with_days_diff = df.withColumn("Start_Date", to_date("Start_Date", "yyyy-MM-dd")) \
                      .withColumn("End_Date", to_date("End_Date", "yyyy-MM-dd")) \
                      .withColumn("Days_Difference", datediff("End_Date", "Start_Date"))

df_with_days_diff.show()