Mtds,Func,Trns,Wnd/Functions(Oprt at Column Level)/explode()
data = [
(1, "Alice,Bob"),
(2, "Charlie,Diana,Chris"),
(3, "Edward,Francesca"),
(4,"Harley")
]
schema = ["id", "name"]
df = spark.createDataFrame(data,schema)
df.show()
exploded_df = df.select(col("id"),explode(split(col("name"),",")).alias("Name"))
exploded_df.show()
+---+-------------------+
| id| name|
+---+-------------------+
| 1| Alice,Bob|
| 2|Charlie,Diana,Chris|
| 3| Edward,Francesca|
| 4| Harley|
+---+-------------------+
+---+---------+
| id| Name|
+---+---------+
| 1| Alice|
| 1| Bob|
| 2| Charlie|
| 2| Diana|
| 2| Chris|
| 3| Edward|
| 3|Francesca|
| 4| Harley|
+---+---------+
Spark SQL Solution: -
df.createOrReplaceTempView("names")
sql_query ="""
SELECT
id,
EXPLODE(SPLIT(name, ',')) AS Name
FROM names
"""
result = spark.sql(sql_query)
result.show()