Dynamically parsing XML in Databricks

42 Views Asked by At

a table in my database contains a column with large XML string data. The issue is that I want to dynamically pass a string to parse the XML.

This example shows more or less what kind of result I would like to achieve:

enter image description here

SELECT
  from_xml(
    '<fruits><fruit>banana</fruit><fruit>apple</fruit></fruits>',
    schema_of_xml(
      '<fruits><fruit>banana</fruit><fruit>apple</fruit></fruits>'
    )
  );

However, the function schema_of_xml only accepts a static value.

I decided to try to solve it using pyspark, below are the examples that works for individual rows:

enter image description here and enter image description here

There was also an attempt to write a UDF to use it with the "map" function. However, I've encountered some issues with utilizing RDDs: enter image description here

0

There are 0 best solutions below