I need to replace all emojis from text with the form ["emoji here"](emoji/1234567890). I wrote this code:
entities = [. . .] # ids for my emojies
emoji_pattern = re.compile(r"[\U0001F300-\U0001F64F\U0001F680-\U0001F6FF\u2702-\u27B0\u27BF-\u27FF\u2930-\u293F\u2980-\u29FF]")
emojis = [match.group() for match in re.finditer(emoji_pattern, text)]
emoji_dict = {emoji: [] for emoji in set(emojis)}
for i, emoji in enumerate(emojis):
emoji_dict[emoji].append(i)
new_text = replace_emoji(emoji_dict, entities, text)
def replace_emoji(emoji_dict, entities, text):
for emoji, indices in emoji_dict.items():
for index in indices:
text = re.sub(fr"{emoji}", f"[{emoji}](emoji/{entities[index]})", text)
return text
emoji_dict looks something like this: {'': [0], '': [1, 2, 3, 4, 5]} where the numbers are the index of the value from the entities list
If an emoji occurs in the text only once (as in the case of ), then everything is displayed correctly: [](emoji/1234567890), but if an emoji occurs several times (as in the case of ), then this is displayed like this: [[](emoji/5235873473821159415)](emoji/5235851187235861094)[[](emoji/5235873473821159415)](emoji/5235851187235861094)
Tell me how can I fix this error?
Example:
text = '''Hello, #️⃣ user #️⃣ How's your day going? I hope everything is going great for you! If you have any questions, feel free to ask. I'm here to help! '''
. . .
new_text = '''Hello, [#️⃣](emoji/12352352340) user [#️⃣](emoji/12352352340) How's your day going? [](emoji/1245531421) I hope everything is going great for you! [](emoji/523424120) If you have any questions, feel free to ask. I'm here to help! [](emoji/90752893562)'''
When you do
The first iteration replaces the emoji with
f'[{emoji}](emoji/{indices[0]})'. Then the second iteration replaces the emoji inside the[]withf'[{emoji}](emoji/{indices[1]})', and so on, so you get a series of nested replacements. You don't want to replace inside a previous replacement.In your desired output, you use the same entity for all the repetitions of an emoji. So there's no need to make a list of indices for each emoji, or loop over them when making the replacements.
emoji_dictshould just have one index for each emoji, and you can replace all of them with the corresponding entity.output:
#️⃣ and are not replaced because they aren't matched by the regexp.