Problem Statement: -
Given file like: -
%fs head Files/data.csv
Id|Name|Marks
1|Sagar|20,30,40
2|Alex|34,32,12
3|David|45,67,54
4|John|10,34,60
Get different columns for marks, PCM.
df = spark \
.read \
.format("csv") \
.option("header",True) \
.option("sep","|") \
.load("Files/data.csv")
df.show()
+---+-----+--------+
| Id| Name| Marks|
+---+-----+--------+
| 1|Sagar|20,30,40|
| 2| Alex|34,32,12|
| 3|David|45,67,54|
| 4| John|10,34,60|
+---+-----+--------+
df_res = df \
.withColumn("Physics",split(col("Marks"),",")[0]) \
.withColumn("Chem",split(col("Marks"),",")[1]) \
.withColumn("Maths",split(col("Marks"),",")[2]) \
.drop("Marks")
df_res.show()
+---+-----+-------+----+-----+
| Id| Name|Physics|Chem|Maths|
+---+-----+-------+----+-----+
| 1|Sagar| 20| 30| 40|
| 2| Alex| 34| 32| 12|
| 3|David| 45| 67| 54|
| 4| John| 10| 34| 60|
+---+-----+-------+----+-----+