option(multiline)

#Config_Method

In PySpark, .option("multiline", "true") is used when reading JSON data files to specify that the JSON data is stored in a multiline format. This option ensures that PySpark can correctly parse JSON files that have records spread across multiple lines, rather than all data being on a single line.

By default, when reading JSON files in Spark, it expects each record (or JSON object) to be on a single line. However, when working with complex JSON data, records might span multiple lines. In such cases, you need to specify the multiline option as true to let Spark handle the multiline JSON records properly.

Syntax: -

df = spark.read.option("multiline", "true").json("path_to_json_file")

Example: -

{
  "name": "Alice",
  "age": 25
}
{
  "name": "Bob",
  "age": 30
}

This JSON file contains two records, but each record spans multiple lines. Without setting the multiline option to true, Spark would fail to read this correctly because it expects each record to be a single line.

# Read the multiline JSON file
df = spark.read.option("multiline", "true").json("path_to_your_json_file")

# Show the DataFrame
df.show()

+---+-----+
|age| name|
+---+-----+
| 25|Alice|
| 30|  Bob|
+---+-----+