#Function

The unix_timestamp() function in PySpark is used to convert a timestamp or date string into the number of seconds since January 1, 1970 (epoch time). This function is particularly useful for working with timestamp differences, conversions, or calculations based on time.

Syntax

unix_timestamp(column, format=None)

Difference Between Two Timestamps

Calculate the time difference between two timestamp columns using unix_timestamp().

data = [
    ("2024-12-31 08:30:00", "2024-12-31 10:30:00"),
    ("2024-12-31 09:00:00", "2024-12-31 17:00:00"),
]
columns = ["start_time", "end_time"]

df = spark.createDataFrame(data, columns)

# Calculate time difference in seconds
df = df.withColumn("time_diff_seconds", unix_timestamp("end_time") - unix_timestamp("start_time"))

df.show()
+-------------------+-------------------+----------------+
|        start_time |          end_time |time_diff_seconds|
+-------------------+-------------------+----------------+
|2024-12-31 08:30:00|2024-12-31 10:30:00|            7200|
|2024-12-31 09:00:00|2024-12-31 17:00:00|           28800|
+-------------------+-------------------+----------------+

Important Notes

  1. Default Format: If the timestamp strings match the default format (yyyy-MM-dd HH:mm:ss), the format argument can be omitted.
  2. Date-only Strings: You can use unix_timestamp() with date-only strings (yyyy-MM-dd) to get the start of the day in epoch seconds.
  3. Returns None for Invalid Dates: If the input does not match the specified format, the function will return null.