Problem Statement: -
- Write a pyspark code to get all employee detail.
- Write a query to get only "FirstName" column from emp_df.
- Write a Pyspark code to get FirstName in upper case as "First Name".
- Write a pyspark code to get FirstName in lower case.
- Write a pyspark code for combine FirstName and LastName and display it as "Name" (also include white space between first name & last name)
Dataframe API Solution: -
data = [ [1, "Vikas", "Ahlawat", 600000.0, "2013-02-15 11:16:28.290", "IT", "Male"], [2, "nikita", "Jain", 530000.0, "2014-01-09 17:31:07.793", "HR", "Female"], [3, "Ashish", "Kumar", 1000000.0, "2014-01-09 10:05:07.793", "IT", "Male"], [4, "Nikhil", "Sharma", 480000.0, "2014-01-09 09:00:07.793", "HR", "Male"], [5, "anish", "kadian", 500000.0, "2014-01-09 09:31:07.793", "Payroll", "Male"], ] # Create a schema for the DataFrame
schema = StructType([ StructField("EmployeeID", IntegerType(), True), StructField("First_Name", StringType(), True), StructField("Last_Name", StringType(), True), StructField("Salary", DoubleType(), True), StructField("Joining_Date", StringType(), True), StructField("Department", StringType(), True), StructField("Gender", StringType(), True) ])
df = spark.createDataFrame(data,schema)
df.show()
+----------+----------+---------+---------+--------------------+----------+------+
|EmployeeID|First_Name|Last_Name| Salary| Joining_Date|Department|Gender|
+----------+----------+---------+---------+--------------------+----------+------+
| 1| Vikas| Ahlawat| 600000.0|2013-02-15 11:16:...| IT| Male|
| 2| nikita| Jain| 530000.0|2014-01-09 17:31:...| HR|Female|
| 3| Ashish| Kumar|1000000.0|2014-01-09 10:05:...| IT| Male|
| 4| Nikhil| Sharma| 480000.0|2014-01-09 09:00:...| HR| Male|
| 5| anish| kadian| 500000.0|2014-01-09 09:31:...| Payroll| Male|
+----------+----------+---------+---------+--------------------+----------+------+
df1 = df.select("first_name")
df1.show()
+----------+
|first_name|
+----------+
| Vikas|
| nikita|
| Ashish|
| Nikhil|
| anish|
+----------+
df1 = df.select("first_name")
df2 = df1.withColumn("First_name",upper("first_name"))
df2.show()
+----------+
|First_name|
+----------+
| VIKAS|
| NIKITA|
| ASHISH|
| NIKHIL|
| ANISH|
+----------+
df1 = df.select("first_name")
df2 = df1.withColumn("First_name",lower("first_name"))
df2.show()
+----------+
|First_name|
+----------+
| vikas|
| nikita|
| ashish|
| nikhil|
| anish|
+----------+
concat_ws()
df1 = df.select(concat_ws(" ",col("First_Name"),col("Last_Name")).alias("Name"))
df1.show()
+-------------+
| Name|
+-------------+
|Vikas Ahlawat|
| nikita Jain|
| Ashish Kumar|
|Nikhil Sharma|
| anish kadian|
+-------------+