Is there a way in apertium translator to get the original phrase for a translation?
I.E. get something like:
phrase: {
original: { Hola, buenos días},
translated: {Hello, good morning}
}
I need that in order to make a mechanism to improve the translations.
If you're sending a corpus through the command-line interface, e.g.
then you can try simply
afterwards to get the input next to the output. That's assuming you want to split things on newlines. The
sedis there to ensure words aren't moved across newlines (rules tend not to move across sentence boundaries).This will be fast, but it's a bit hacky and there are many edge cases.
If you want more control, one way would be to install the JSON API locally and send one request at a time.
If you've got a recent Debian/Ubuntu (or are using one of the apertium repos), you can get the API with
And then you can translate like this:
(or from Javascript with standard ajax requests, some docs at http://wiki.apertium.org/wiki/Apertium-apy/Debian and http://wiki.apertium.org/wiki/Apertium-apy#Usage )
Note that apertium-apy by default serves the pairs that are in /usr/share/apertium/modes; if you start it manually (instead of through systemctl) you can point it at a different path.
If you want to produce the JSON format you had in your example, the easiest way would be to use
jq(sudo apt install jq), e.g.or on a corpus:
(A simple test of 500 lines showed this taking 23.7s wall clock time while the
pasteversion took 5.5s.)