Problem Statement: -
Find the names of the customer that are not referred by the customer with id = 2.
Return the result table in any order.

Dataframe API Solution: -
schema = StructType([ StructField("id", IntegerType(), True), StructField("name", StringType(), True), StructField("referee_id", IntegerType(), True) ])
# Create an RDD with the data
data = [ (1, 'Will', None), (2, 'Jane', None), (3, 'Alex', 2), (4, 'Bill', None), (5, 'Zack', 1), (6, 'Mark', 2) ]
df = spark.createDataFrame(data,schema)
df.show()
+---+----+----------+
| id|name|referee_id|
+---+----+----------+
| 1|Will| NULL|
| 2|Jane| NULL|
| 3|Alex| 2|
| 4|Bill| NULL|
| 5|Zack| 1|
| 6|Mark| 2|
+---+----+----------+
result = df.filter((col("referee_id")!=2)|col("referee_id").isNull()).select("name")
result.show()
+----+
|name|
+----+
|Will|
|Jane|
|Bill|
|Zack|
+----+
Spark SQL Solution: -
df.createOrReplaceTempView("emp")
result = spark.sql("""
SELECT
name
FROM emp
WHERE referee_id IS NULL OR referee_id !=2
""")
result.show()