#stratascratch
https://platform.stratascratch.com/coding/10353-workers-with-the-highest-salaries?code_type=6
Problem Statement: -
Find the job titles of the employees with the highest salary. If multiple employees have the same highest salary, include the job titles for all such employees.


Dataframe API Solution: -
# Importing necessary PySpark modules
from pyspark.sql.functions import col, rank
from pyspark.sql.window import Window
# Selecting required columns from worker DataFrame
worker_df = worker.select("worker_id", "salary")
# Selecting required columns from title DataFrame
title_df = title.select("worker_ref_id", "worker_title")
# Defining window specification to rank workers by salary in descending order
wndw_spec = Window.orderBy(col("salary").desc())
# Ranking workers and filtering to get the highest-paid worker
worker_ranked = (
worker_df
.withColumn("rnk", rank().over(wndw_spec))
.filter(col("rnk") == 1)
.drop("rnk")
)
# Joining the ranked worker DataFrame with the title DataFrame
joined_df = (
worker_ranked
.join(title_df, worker_ranked.worker_id == title_df.worker_ref_id)
.select("worker_title")
)
# Converting the result to a pandas DataFrame
joined_df.toPandas()