In PySpark, isinstance() is a Python built-in function used to check whether an object is an instance of a specific class or a subclass thereof. It is particularly useful for type-checking PySpark objects like DataFrame, RDD, or others when working with dynamic or unknown inputs.
Syntax: -
isinstance(object, class_or_tuple)
- Check Multiple Classes
You can check against multiple classes using a tuple:
from pyspark.sql import DataFrame
from pyspark.rdd import RDD
objects = [df, rdd, 42]
for obj in objects:
if isinstance(obj, (DataFrame, RDD)):
print(f"{obj} is either a DataFrame or an RDD")
else:
print(f"{obj} is neither a DataFrame nor an RDD")
Benefits of isinstance() in PySpark
- Type Safety:
- Ensures that the object is of the expected type before performing operations.
- Dynamic Programming:
- Useful in pipelines where the input can vary.
- Error Prevention:
- Helps avoid runtime errors by validating object types before proceeding.