Sanitizing XML text in Python (&amp)

245 Views Asked by POVR2 At 22 April 2022 at 01:12

I am writing a Python script that imports work items from IBM RTC and exports them to Microsoft ADS. One issue I found is that some strings from RTC xml data are imported with strange text characters such as &amp:

9.&amp;#09;Customize the rules for Feature work item
9. Customize the rules for Feature work item

1.&#09;Send out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them
1. Send out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them

Speed Up RTC-&gt;ADS queries
Speed Up RTC->ADS queries

I've tried using the following code to sanitize and normalize the text:

from bs4 import BeautifulSoup
from html import unescape


    soup = BeautifulSoup(unescape(rtc_title), 'lxml')
    ads_title=soup.text

But it is replacing the characters with tabs most of the time, which is incorrect:

1.\tSend out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them

is there a better way to parse and normalize these strings taken from IBM RTC xml data? Thanks

Original Q&A

Sanitizing XML text in Python (&amp)

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in HTML

Related Questions in XML

Related Questions in STRING

Related Questions in SANITIZE

Trending Questions

Popular # Hahtags

Popular Questions