I'm trying to replace a substring in a main string with a replacement text. The substring should exist as a word, hence preferred using regex. The python code works for English text but fails for Hindi text.
I've tried following code:
def replace_str(text, substring_to_replace, replacement_text):
modified_text = re.sub(
rf"\b{substring_to_replace}\b", replacement_text,
text, flags=re.IGNORECASE
)
return modified_text
When given the English input text:
text = "This is a dummy english text."
substring_to_replace = "is"
replacement_text = "##"
modified_text = replace_str(text, substring_to_replace, replacement_text)
print(modified_text)
it prints:
This ## a dummy english text.
But for the Hindi text:
text = "आपको किन विषयों का अध्ययन करने की आवश्यकता है।"
substring_to_replace = "विषय"
replacement_text = "##"
modified_text = replace_str(text, substring_to_replace, replacement_text)
print(modified_text)
it prints:
आपको किन ##ों का अध्ययन करने की आवश्यकता है।
The hindi substring विषय shouldn't have been found in the text as a word, but was still replaced.
I've tried using re.UNICODE regex flag as well with no luck.