To write an xml string to azure data lake storage

199 Views Asked by At

When I try to write an xml string to azure datalake storage I am getting error as file not found. I am using synapse notebook with python to write the file. Synapse notebook and the datalake storage are in the same resource group

I tried with to_xml({file_path/output.xml}). But this does not work with xml strings

2

There are 2 best solutions below

3
Hermann12 On BEST ANSWER

If you use pandas, what I assume you use it:

import pandas as pd
import io
xml = '''<data><row><tex>text example</tex></row></data>'''

df = pd.read_xml(io.StringIO(xml))
print(df)

# Output in file
out ='StringXML.xml'
df.to_xml(f'{out}', index=False)

This will write into file:

<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <tex>text example</tex>
  </row>
</data>
2
DileeprajnarayanThumula On

spark.sparkContext.parallelize([xml_string], 1) converts the xml_string into a distributed collection (RDD) and specifies that it should be stored as one partition.

.saveAsTextFile(adls_path) saves the content of the RDD to the specified ADLS Gen2 path as a text file.

I have tried the below approach in Pyspark:

xml_string = """
<root>
  <person>
    <name>John Doe</name>
    <age>30</age>
  </person>
  <person>
    <name>Jane Smith</name>
    <age>28</age>
  </person>
</root>
"""
adls_path = "abfss://[email protected]/output.xml"
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("WriteXMLToADLS").getOrCreate()
spark.sparkContext.parallelize([xml_string], 1).saveAsTextFile(adls_path)
print("XML data has been written to ADLS Gen2.")

enter image description here

enter image description here

  • The above Code converts the XML string into an RDD, and then saves it to your specified ADLS Gen2 path as a text file.
  • This is a way to write data to ADLS Gen2 using distributed data processing capabilities provided by PySpark in Azure Synapse.