Convert a Column object to a DataFrame in PySpark

46 Views Asked by At

I have a JSON list which I am reading using from_json. How do I convert the resulting Column type to a single-column dataframe?

from pyspark.sql.functions import from_json
from pyspark.sql.types import ArrayType, StringType

jsonlist = '["a","b","c"]'
col = from_json(jsonlist , ArrayType(StringType()))

# how to I create a dataframe?
df = ...

Everything I have tried results in

TypeError: Column is not iterable

eg

spark.createDataFrame([col], ['item'])
2

There are 2 best solutions below

1
saviourofdp On BEST ANSWER

I found that I can do this without using a pyspark function to parse the JSON:

import json
json = '["a","b","c"]'
jsonlist = json.loads(json)
df= spark.createDataFrame([x for x in zip(*[iter(jsonlist )])], ['item'])
display(df)
2
Lev Gelman On

This code do the job:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType
import json

spark = SparkSession.builder \
    .appName("JSON to DataFrame") \
    .getOrCreate()
json_string = "['a','b','c']"
json_list = json.loads(json_string.replace("'", "\""))
schema = StructType([
    StructField("Column1", StringType(), True)
])
df = spark.createDataFrame([(value,) for value in json_list], schema=schema)
df.show()