Convert a Column object to a DataFrame in PySpark

46 Views Asked by saviourofdp At 28 March 2024 at 09:59

I have a JSON list which I am reading using from_json. How do I convert the resulting Column type to a single-column dataframe?

from pyspark.sql.functions import from_json
from pyspark.sql.types import ArrayType, StringType

jsonlist = '["a","b","c"]'
col = from_json(jsonlist , ArrayType(StringType()))

# how to I create a dataframe?
df = ...

Everything I have tried results in

TypeError: Column is not iterable

spark.createDataFrame([col], ['item'])

Original Q&A

There are 2 best solutions below

saviourofdp On 28 March 2024 at 11:59 BEST ANSWER

I found that I can do this without using a pyspark function to parse the JSON:

import json
json = '["a","b","c"]'
jsonlist = json.loads(json)
df= spark.createDataFrame([x for x in zip(*[iter(jsonlist )])], ['item'])
display(df)

Lev Gelman On 28 March 2024 at 10:08

This code do the job:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType
import json

spark = SparkSession.builder \
    .appName("JSON to DataFrame") \
    .getOrCreate()
json_string = "['a','b','c']"
json_list = json.loads(json_string.replace("'", "\""))
schema = StructType([
    StructField("Column1", StringType(), True)
])
df = spark.createDataFrame([(value,) for value in json_list], schema=schema)
df.show()

Convert a Column object to a DataFrame in PySpark

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in DATAFRAME

Related Questions in PYSPARK

Trending Questions

Popular # Hahtags

Popular Questions