#stratascratch
https://platform.stratascratch.com/coding/9881-make-a-report-showing-the-number-of-survivors-and-non-survivors-by-passenger-class?code_type=6

Problem Statement: -

Make a report showing the number of survivors and non-survivors by passenger class. Classes are categorized based on the pclass value as:
pclass = 1:first_class
pclass = 2: second_classs
pclass = 3: third_class
Output the number of survivors and non-survivors by each class.

Pasted image 20241230193328.png

Dataframe API Solution: -

pivot()
When_Otherwise

import pyspark
from pyspark.sql.functions import *

titanic_rpt_df = titanic.select("passengerid","survived","pclass")
# titanic_rpt_df.show()

titanic_cnt = titanic_rpt_df.groupBy("survived","pclass").agg(count("passengerid").alias("count")) \
.withColumn("pclass", when(col("pclass") == 1, "first_class").when(col("pclass") == 2, "second_class").otherwise("third_class") )
# titanic_cnt.show()

res_df = titanic_cnt.groupBy("survived").pivot("pclass").agg(sum(col("count")))

res_df.toPandas()

Pasted image 20241230193459.png