#stratascratch
https://platform.stratascratch.com/coding/10353-workers-with-the-highest-salaries?code_type=6

Problem Statement: -

Find the job titles of the employees with the highest salary. If multiple employees have the same highest salary, include the job titles for all such employees.

Pasted image 20241230163256.png

Pasted image 20241230163313.png

Dataframe API Solution: -

Window
join()

# Importing necessary PySpark modules
from pyspark.sql.functions import col, rank
from pyspark.sql.window import Window

# Selecting required columns from worker DataFrame
worker_df = worker.select("worker_id", "salary")

# Selecting required columns from title DataFrame
title_df = title.select("worker_ref_id", "worker_title")

# Defining window specification to rank workers by salary in descending order
wndw_spec = Window.orderBy(col("salary").desc())

# Ranking workers and filtering to get the highest-paid worker
worker_ranked = (
    worker_df
    .withColumn("rnk", rank().over(wndw_spec))
    .filter(col("rnk") == 1)
    .drop("rnk")
)

# Joining the ranked worker DataFrame with the title DataFrame
joined_df = (
    worker_ranked
    .join(title_df, worker_ranked.worker_id == title_df.worker_ref_id)
    .select("worker_title")
)

# Converting the result to a pandas DataFrame
joined_df.toPandas()