Give lists like: -
names = ['Alice', 'Bob', 'Charlie']
ages = [30, 25, 35]
cities = ['New York', 'Los Angeles', 'Chicago']
Combine them to form a dataframe.
arrays_zip(): -
zip() is used in python, arrays_zip in pyspark.
combined_data = zip(names,ages,cities)
schema = StructType(
[ StructField("Name",StringType(),True),
StructField("Ages",IntegerType(),True),
StructField("City",StringType(),True)
]
)
df = spark.createDataFrame(combined_data,schema)
df.show()
+-------+----+-----------+
| Name|Ages| City|
+-------+----+-----------+
| Alice| 30| New York|
| Bob| 25|Los Angeles|
|Charlie| 35| Chicago|
+-------+----+-----------+