Change wikitext from Wiktionary to readable text

276 Views Asked by At

How to exchange Wikitext (as seen in Witkionary sourcecode) into readable text (as seen in Wiktionary website).

So this source:

{{ru-verb|ходи́ть|impf|pf=сходи́ть}}

Should be seen as:

ходи́ть • (xodítʹ) impf (perfective сходи́ть)

It is called Template in Wikitext, but how to change that template into human readable text I cannot find in the documentation.

Anyone had similar problem before?

1

There are 1 best solutions below

1
AXO On

Use the parse API to get an HTML output.

You can render the HTML by passing it to your browser...

I don't think MediaWiki can directly generate plain text output, but If that's what desired, one may use a third party library. In Python and using Beautiful Soup and its get_text method the code looks like this:

>>> BeautifulSoup(
        requests.get(
            'https://en.wiktionary.org/w/api.php?action=parse&text=%7B%7Bru-verb|%D1%85%D0%BE%D0%B4%D0%B8%CC%81%D1%82%D1%8C%7Cimpf|pf=%D1%81%D1%85%D0%BE%D0%B4%D0%B8%CC%81%D1%82%D1%8C%7D%7D&prop=text&title=page_title&formatversion=2&format=json'
        ).json()['parse']['text']
    ).get_text(strip=True)
'ходи́ть•(xodítʹ)impf(perfectiveсходи́ть)'

Update:

Use strip_tags and html_entity_decode functions in PHP:

$ php -a
Interactive mode enabled

php > $json = file_get_contents('https://en.wiktionary.org/w/api.php?action=parse&text=%7B%7Bru-verb|%D1%85%D0%BE%D0%B4%D0%B8%CC%81%D1%82%D1%8C%7Cimpf|pf=%D1%81%D1%85%D0%BE%D0%B4%D0%B8%CC%81%D1%82%D1%8C%7D%7D&prop=text&title=page_title&formatversion=2&format=json');
php > $json = json_decode($json, TRUE);
php > $html = $json['parse']['text'];
php > $pain_text = strip_tags(html_entity_decode ($html));
php > echo $pain_text;
ходи́ть • (xodítʹ) impf (perfective сходи́ть)