How to explode a nested Struct in Spark using Scala

146 Views Asked by At

I'm working through a Databricks example. The schema for the dataframe looks like:

 |-- authors: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- author: struct (nullable = true)
 |    |    |    |-- key: string (nullable = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- type: string (nullable = true)

i try to make dataframe schema like below


 |-- author_key: string (nullable = true)
 |-- key: string (nullable = true)
 |-- type: string (nullable = true)

I have no idea how to explode nested struct so I just want to take the key, type rows first by using explode, but I'm not sure this is the right way. Here is what I did:

  • code
df
  .select(explode($"authors"))
  .select($"col.key", $"col.type")
  .show()
  • output
+---------------+----------------------+
|           key | type                 |
+---------------+----------------------+
|/authors/<key1>| null                 |
|/authors/<key2>| null                 |
|/authors/<key3>| null                 |
|/authors/<key4>| null                 |
|         null  |{"key":"/type/auth..."|
|/authors/<key6>| null                 |
|/authors/<key7>| null                 |
+---------------+----------------------+
1

There are 1 best solutions below

0
Islam Elbanna On

You could use explode function to explode the array, then extract the needed data in separate columns, something like this:

import org.apache.spark.sql.functions.explode

val explodedDf = df.select(explode($"authors").alias("elem"))
val result = explodedDf
            .withColumn("author_key", $"elem.author.key")
            .withColumn("key", $"elem.key")
            .withColumn("type", $"elem.type")