#Function
explode is a function used to flatten an array or a map column into multiple rows. It essentially "explodes" the elements of an array or the key-value pairs of a map, creating a new row for each element or pair.
Syntax: -
explode(column)
from pyspark.sql import Row
from pyspark.sql.functions import explode
# Input data
data = [
Row(id=1, info={"age": 25, "city": "NYC"}),
Row(id=2, info={"age": 30, "city": "LA"})
]
# Create DataFrame
df = spark.createDataFrame(data)
# Explode the 'info' column
exploded_df = df.select("id", explode("info").alias("key", "value"))
# Show the exploded DataFrame
exploded_df.show()
+---+----+-----+
| id| key|value|
+---+----+-----+
| 1| age| 25|
| 1|city| NYC|
| 2| age| 30|
| 2|city| LA|
+---+----+-----+
-
Array Columns: Breaking down arrays into individual rows.
-
Map Columns: Breaking down key-value pairs into rows.
-
Data Normalization: Converting nested or complex structures into flat, relational structures.
-
explode: Drops rows where the column being exploded isnullor an empty array/map. -
explode_outer: Retains rows withnullor empty arrays/maps, filling the exploded column withnull. -
If you are sure your column does not contain
nullor empty arrays/maps, useexplodefor simplicity. -
explode_outeris especially helpful for data quality and completeness in ETL pipelines.