explode()

#Function
explode is a function used to flatten an array or a map column into multiple rows. It essentially "explodes" the elements of an array or the key-value pairs of a map, creating a new row for each element or pair.

Syntax: -

explode(column)

from pyspark.sql import Row
from pyspark.sql.functions import explode

# Input data
data = [
    Row(id=1, info={"age": 25, "city": "NYC"}),
    Row(id=2, info={"age": 30, "city": "LA"})
]

# Create DataFrame
df = spark.createDataFrame(data)

# Explode the 'info' column
exploded_df = df.select("id", explode("info").alias("key", "value"))

# Show the exploded DataFrame
exploded_df.show()

+---+----+-----+
| id| key|value|
+---+----+-----+
|  1| age|   25|
|  1|city|  NYC|
|  2| age|   30|
|  2|city|   LA|
+---+----+-----+

Array Columns: Breaking down arrays into individual rows.
Map Columns: Breaking down key-value pairs into rows.
Data Normalization: Converting nested or complex structures into flat, relational structures.
explode: Drops rows where the column being exploded is null or an empty array/map.
explode_outer: Retains rows with null or empty arrays/maps, filling the exploded column with null.
If you are sure your column does not contain null or empty arrays/maps, use explode for simplicity.
explode_outer is especially helpful for data quality and completeness in ETL pipelines.