Given (i) a string text and (ii) a dictionary made of offset and the substring y:
text = "\u5723\u4f9d\u534e\u5802'\"\uff08\uff09\u662f\u4e00\u5ea7\u4f4d\u4e8e\u610f\u5927\u5229\u7f57\u9a6c\u7684\u7f57\u9a6c\u5929\u4e3b\u6559\u6559\u5802\u3002\u8fd9\u5ea7\u6559\u5802\u88ab\u8ba4\u4e3a\u662f\u5df4\u6d1b\u514b\u6559\u5802\u5efa\u7b51\u7684\u6770\u4f5c\uff0c\u7531\u5efa\u7b51\u5e08\u5f17\u6717\u5207\u65af\u79d1\u00b7\u535a\u7f57\u7c73\u5c3c\u4fee\u5efa\u4e8e1642\u81f31660\u5e74\u3002\n\u5386\u53f2.\n\u8fd9\u5ea7\u6559\u5802\u59cb\u5efa\u4e8e14\u4e16\u7eaa\uff0c\u5f53\u65f6\u662f\u4f5c\u4e3a\u7f57\u9a6c\u4e0a\u667a\u5927\u5b66\u5bab\u7684\u5c0f\u6559\u5802\u3002\u8fd9\u6240\u5927\u5b66\u88ab\u79f0\u4e3a\u4e0a\u667a\uff08\"La Sapienza\"\uff09\uff0c\u800c\u8fd9\u5ea7\u6559\u5802\u4f9b\u5949\u7684\u662f\u5723\u4f9d\u534e\uff08Saint Yves\uff0c\u6cd5\u5b66\u5bb6\u7684\u4e3b\u4fdd\u5723\u4eba\uff09\uff0c\u56e0\u800c\u5f97\u540d\u3002\u6ce2\u7f57\u7c73\u5c3c\u88ab\u8feb\u4f7f\u4ed6\u7684\u8bbe\u8ba1\u9002\u5e94\u73b0\u6709\u7684\u5bab\u6bbf\u3002\u4ed6\u9009\u62e9\u4e00\u4e2a\u7c7b\u4f3c\u4e8e\u5927\u536b\u4e4b\u661f\u7684\u89c4\u5212\uff0c\u5c06\u6559\u5802\u7684\u7acb\u9762\u4e0e\u5bab\u6bbf\u7684\u5ead\u9662\u5408\u5e76\u3002\u5176\u7a79\u9876\u6709\u4e00\u4e2a\u975e\u5e38\u65b0\u9896\u7684\u87ba\u65cb\u5f62\u9876\u5854\u3002\u5185\u90e8\u4ee4\u4eba\u773c\u82b1\u7f2d\u4e71\u7684\u51e0\u4f55\u4f53\u6784\u6210\u590d\u6742\u7684\u97f5\u5f8b\u3002\u8fd9\u662f\u4e00\u5ea7\u7406\u6027\u7684\u5efa\u7b51- \u770b\u8d77\u6765\u5f88\u7eb7\u4e71\uff0c\u4f46\u662f\u5176\u5e73\u9762\u91cd\u53e0\u7740\u4e00\u5708\u5708\u4e24\u4e24\u53e0\u52a0\u7684\u7b49\u8fb9\u4e09\u89d2\u5f62\uff0c\u5728\u4e00\u5ea7\u96c6\u4e2d\u7684\u6559\u5802\u5185\uff0c\u6784\u6210\u4e00\u5217\u516d\u89d2\u5f62\u5c0f\u6559\u5802\u548c\u796d\u53f0\u7684\u57fa\u7840\u3002\u5185\u90e8\u7684\u51f9\u51f8\u8d77\u4f0f\uff0c\u5f62\u6210\u4e0d\u548c\u8c10\u7136\u800c\u4f7f\u4eba\u7729\u6655\u7684\u9b45\u529b\u3002\u88c5\u9970\u662f\u516d\u7ffc\u5929\u4f7f\u5934\u50cf\u548c\u51e0\u4f55\u4f53\uff08\u661f\u661f\uff09\u7684\u7ec4\u5408\uff0c\u6bd4\u8d77\u540c\u65f6\u4ee3\u7684\u6d4e\u5b89\u00b7\u8d1d\u5c3c\u5c3c\u8fc7\u5ea6\u4f7f\u7528\u9540\u91d1\u548c\u77f3\u818f\uff0c\u8f83\u4e3a\u67cf\u62c9\u56fe\u3002\u6cbf\u4e09\u6839\u5706\u9876\u652f\u67f1\u57fa\u7840\u5411\u4e0a\uff0c\u662f\u6559\u5b97\u4e9e\u6b77\u5c71\u5927\u4e03\u4e16\u7684\u57fa\u5409\u5bb6\u65cf\u6807\u5fd7\u201c\u4e00\u9897\u661f\u661f\u4e0b\u65b9\u7684\u516d\u5ea7\u5c71\u201d\u3002"
y = {16: '罗马', 19: '罗马天主教', 24: '教堂', 35: '巴洛克',
50: '弗朗切斯科·博罗米尼', 96: '罗马上智大学', 142: '圣依华',
199: '大卫之星', 376: '济安·贝尼尼', 413: '亞歷山大七世', 420: '基吉家族'}
To see the raw text : https://gist.githubusercontent.com/alvations/4c3460cf2dda5e19d828a8bace532216/raw/28190258c2be8ed7123033e77edefbf9201e1845/text
The goal is to iterate through all the characters in the text and then print
Oif the character is Outside/not in the substring dictionaryBif the character is the Beginning/start of a substring in the dictionaryIif the character is Inside/part of a substring in the dictionary
I've tried this which produces the expected output:
bio = []
bi_end = 0
for i, ch in enumerate(text):
if i in y:
bio.append('B')
bi_end = i + len(y[i])
elif i < bi_end:
bio.append('I')
else:
bio.append('O')
Which kind of worked and produced:
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B', 'I', 'O', 'B', 'I', 'I', 'I', 'I', 'B', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B', 'I', 'I', 'I', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B', 'I', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B', 'I', 'I', 'I', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B', 'I', 'I', 'I', 'I', 'I', 'O', 'B', 'I', 'I', 'I', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
Or in string:
>>> ''.join(bio)
'OOOOOOOOOOOOOOOOBIOBIIIIBIOOOOOOOOOBIIOOOOOOOOOOOOBIIIIIIIIIOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOBIIIIIOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOBIIOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOBIIIOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOBIIIIIOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOBIIIIIOBIIIOOOOOOOOOOOOOOO'
But is there a simpler to achieve the same desired output? Maybe with some regex and replacement?
Or some special str.translate table operation?
This problem is about ranges, rather than characters. You wrote
{16: '罗马', 19: '罗马天主教', ...but it could as easily have beenor
The algorithm you supplied looks great. First case exploits a data structure that offers O(1) constant time access, and the other two cases are even simpler. You need to produce N output characters, and you have a O(N) linear time algorithm.
reand.translate()won't offer improvements on that.