I am looking to get the pinyin of Simplified Mandarin characters, and have come across two packages:
- pinyin 0.4.0 which is 6 years old (GitHub repo here)
- pinyin_jyutping_sentence which is 2> years old. (GitHub repo here)
Both offer similar features in terms of the ability to print character pinyin with and without the diacritics, but I am curious if one is more efficient than the other.
Right off the bat, I noticed that on the first import pinyin_jyutping_sentence that the package builds out a Prefix dict:
import pinyin_jyutping_sentence as pnyn
Building prefix dict from Path\to\python\lib\site-packages\pinyin_jyutping_sentence\dict.txt.big ...
Dumping model to file cache Path\to\AppData\Local\Temp\jieba.ue5a383df573783d4e379d21ab891d92a.cache
Loading model cost 0.793 seconds.
Prefix dict has been built successfully.
Whereas running import pinyin did not result in the creation of any kind of a dictionary.
Is there a difference between the two packages in speed and accuracy?
NOTE: Due to StackOverflow's rules about the inclusion of Mandarin characters, I was unable to include both the 294 character long mandarin string and 8-index long list of mandarin names I used to test this.
Because this seems to be an obscure question for which there are no questions/answers here on StackOverflow, I did some quick efficiency/accuracy analysis for each package using
timeitanddatetime.Here is the code:
With the following output:
Based on the output of the
timeitanddatetimemodules,pinyin_jyutping_sentenceis much slower thanpinyin. However, after examining the pinyin output of bothpinyin_jyutping_sentenceandpinyinin relation to one another and the original mandarin characters,pinyin_jyutping_sentenceis far more accurate and readable.*pinyincontained several errors in it's output of the 294 character long string, and on closer examination of the pinyin output of the list of names,pinyingot the character tone wrong in several places, whereaspinyin_jyutping_sentencegot it right in (as far as I was able to identify) every case. I will update this answer if I find/test other mandarin characters to pinyin packages in python.*Interestingly,
pinyin_jyutping_sentenceconverted numbers in the string into the number's corresponding pinyin.