exceptAll()

Applicable to: DataFrames.
Functionality: Returns a DataFrame that contains rows from the first DataFrame that are not in the second DataFrame, including duplicates.
Duplicates: Keeps duplicates in the output if they exist in the first DataFrame but not in the second

Example: -

from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName("exceptAll Example").getOrCreate()

# Create two DataFrames
data1 = [(1, 'Alice'), (2, 'Bob'), (2, 'Bob'), (3, 'Charlie')]
data2 = [(2, 'Bob'), (3, 'Charlie')]

columns = ["id", "name"]

df1 = spark.createDataFrame(data1, columns)
df2 = spark.createDataFrame(data2, columns)

# Use exceptAll() to get rows in df1 that are not in df2
result = df1.exceptAll(df2)
result.show()

+---+-----+
| id| name|
+---+-----+
|  1|Alice|
|  2|  Bob|
+---+-----+

Feature	subtract()	exceptAll()
Applicable To	RDDs	DataFrames
Duplicates	Does not include duplicates	Includes duplicates
Schema Support	No schema (operates on RDD elements)	Requires identical schemas
Output Type	RDD	DataFrame
Complexity	Simpler; works at RDD level	Supports structured data