I am writing a Python script that imports work items from IBM RTC and exports them to Microsoft ADS. One issue I found is that some strings from RTC xml data are imported with strange text characters such as &:
9.	Customize the rules for Feature work item
9. Customize the rules for Feature work item
1.	Send out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them
1. Send out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them
Speed Up RTC->ADS queries
Speed Up RTC->ADS queries
I've tried using the following code to sanitize and normalize the text:
from bs4 import BeautifulSoup
from html import unescape
soup = BeautifulSoup(unescape(rtc_title), 'lxml')
ads_title=soup.text
But it is replacing the characters with tabs most of the time, which is incorrect:
1.\tSend out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them
is there a better way to parse and normalize these strings taken from IBM RTC xml data? Thanks