#Function

from_json() function is used to parse a JSON string column into a StructType or another complex type (e.g., ArrayType). It is commonly used when working with JSON data stored as strings in a column, allowing you to extract structured information.

Syntax: -

pyspark.sql.functions.from_json(col, schema, options={})

Example: -

  1. Parsing JSON string.
# Example data
data = [("1", '{"name":"Alice", "age":30}'),
        ("2", '{"name":"Bob", "age":25}')]

# Create a DataFrame
df = spark.createDataFrame(data, ["id", "json_string"])

# Define schema
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Use from_json to parse JSON strings
parsed_df = df.withColumn("parsed_json", from_json(col("json_string"), schema))

# Extract fields from the parsed JSON
result_df = parsed_df.select("id", col("parsed_json.name").alias("name"), col("parsed_json.age").alias("age"))

# Show result
result_df.show()
+---+-----+---+
| id| name|age|
+---+-----+---+
|  1|Alice| 30|
|  2|  Bob| 25|
+---+-----+---+
  1. Parsing JSON Array: -
from pyspark.sql.types import ArrayType

# Example data
data = [("1", '[{"name":"Alice", "age":30}, {"name":"Bob", "age":25}]')]

# Create DataFrame
df = spark.createDataFrame(data, ["id", "json_array"])

# Define array schema
array_schema = ArrayType(StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
]))

# Parse JSON array
parsed_array_df = df.withColumn("parsed_json", from_json(col("json_array"), array_schema))

# Show result
parsed_array_df.show(truncate=False)
+---+-------------------------------------------------+
| id|parsed_json                                      |
+---+-------------------------------------------------+
| 1 |[{Alice, 30}, {Bob, 25}]                        |
+---+-------------------------------------------------+

Common Use Cases

  1. Parsing streaming JSON data: Often used in structured streaming where JSON data arrives in string format.
  2. Nested data extraction: Allows access to deeply nested fields in JSON.
  3. Schema enforcement: Ensures that JSON strings conform to the expected schema.