#stratascratch
https://platform.stratascratch.com/coding/2099-election-results?code_type=6

Problem Statement: -

The election is conducted in a city and everyone can vote for one or more candidates, or choose not to vote at all. Each person has 1 vote so if they vote for multiple candidates, their vote gets equally split across these candidates. For example, if a person votes for 2 candidates, these candidates receive an equivalent of 0.5 vote each. Find out who got the most votes and won the election. Output the name of the candidate or multiple names in case of a tie. To avoid issues with a floating-point error you can round the number of votes received by a candidate to 3 decimal places.

Pasted image 20241230190411.png

Dataframe API Solution: -

Window

# Import your libraries
import pyspark
from pyspark.sql.functions import *
from pyspark.sql.window import *
# voting_results.show()

voter_df = voting_results \
    .groupBy("voter") \
    .agg(count(col("candidate")).alias("votes_given")) \
    .filter(col("votes_given")>0)

voter_df = voter_df \
.withColumn("votes_given", round(1/col("votes_given"),3) )

# voter_df.show()

df_joined = voter_df.join(voting_results,voter_df.voter == voting_results.voter).select("candidate","votes_given")

# df_joined.show()

df_ttl = df_joined.groupBy("candidate") \
    .agg(round(sum(col("votes_given")),3).alias("total_votes"))
    
# df_ttl.show()

wndw_spec = Window.orderBy(col("total_votes").desc())

df_result = df_ttl.withColumn("rank",rank().over(wndw_spec)).filter(col("rank") == 1).drop("total_votes","rank")

df_result.toPandas()