I have a dataframe which has 2 columns" "ID" and "input_array" (values are JSON arrays).
ID input_array
1 [ {“A”:300, “B”:400}, { “A”:500,”B”: 600} ]
2 [ {“A”: 800, “B”: 900} ]
Output that I need:
ID A B
1 300 400
1 500 600
2 800 900
I tried from_json, explode functions. But data type mismatch error is coming for array columns.
Real data image
In the image, the 1st dataframe is the input dataframe which I need to read and convert to the 2nd dataframe. 3 input rows needs to be converted to 5 output rows.
I have 2 interpretations of what input (column "input_array") data types you have.
If it's a string...
...you can use
from_jsonto extract Spark structure from JSON string and theninlineto explode the resulting array of structs into columns.If it's an array of strings...
...you can first use
explodeto move every array's element into rows thus resulting in a column of string type, then usefrom_jsonto create Spark data types from the strings and finally expand*the structs into columns.