the .option("sep") method is used to specify the separator or delimiter when reading a CSV file. This is helpful when the CSV file uses a custom delimiter other than the default comma (,). The .option("sep") method allows you to set the separator character that PySpark should use when parsing the CSV file.
Syntax: -
df = spark.read.option("sep", "<delimiter>").csv("<file_path>")
- "sep": The delimiter that separates values in the CSV file (e.g.,
|,;, tab, etc.). - "<delimiter>": The actual delimiter character, such as
|,;,\t(for tab), etc. <file_path>: The path to the CSV file.
Example: -
Let's say you have a CSV file where values are separated by a pipe (|) instead of a comma. Here's how to read it:
from pyspark.sql import SparkSession
# Initialize Spark session
spark = SparkSession.builder.appName("CSV Separator Example").getOrCreate()
# Read CSV with custom separator (| in this case)
df = spark.read.option("sep", "|").csv("data.csv", header=True, inferSchema=True)
# Show the DataFrame
df.show(truncate=False)