Merge 2 dataframes with unequal number of columns.
Different columns.
Dataframe API Solution: -
data1 = [("Alice", 25, "F"), ("Bob", 30, "M")]
schema1 = ["name", "age", "gender"]
data2 = [("Charlie", "USA"), ("Diana", "UK")]
schema2 = ["name", "country"]
df1 = spark.createDataFrame(data1,schema1)
df2 = spark.createDataFrame(data2,schema2)
df1.show()
df2.show()
+-----+---+------+
| name|age|gender|
+-----+---+------+
|Alice| 25| F|
| Bob| 30| M|
+-----+---+------+
+-------+-------+
| name|country|
+-------+-------+
|Charlie| USA|
| Diana| UK|
+-------+-------+
Mtds,Func,Trns,Wnd/Methods(Oprt at Dataframe Level)/unionByName()
df_res = df1.unionByName(df2,allowMissingColumns=True)
df_res.show()
+-------+----+------+-------+
| name| age|gender|country|
+-------+----+------+-------+
| Alice| 25| F| NULL|
| Bob| 30| M| NULL|
|Charlie|NULL| NULL| USA|
| Diana|NULL| NULL| UK|
+-------+----+------+-------+
Spark SQL Solution: -
df1.createOrReplaceTempView("df1")
df2.createOrReplaceTempView("df2")
result = spark.sql("""
SELECT
name,
age,
gender,
NULL AS Country
FROM df1
UNION
SELECT
name,
NULL AS Age,
NULL AS Gender,
Country
FROM df2
""")
result.show()