Seq2Seq Neural Machine Translation step for aligning right to left languages with English (Or any LTR language)

70 Views Asked by Wanjan At 05 May 2023 at 14:53

I've so far worked with left to right languages and NLTK worked fine for tokenization. But while working on a research paper focused on several languages including RTL languages, the normal procedure has been giving me completely inaccurate translations. Could anyone please let me know what is the norm in neural machine translation when working with languages like Persian or Hebrew?

I've tried following the steps mentioned in nmt with attention, where I changed the two regex to fit the Farsi and Urdu scripts along with other languages and seperating the punctuations,

def lowerSplitPunct(text):
  # Split accented characters.
  text = tf_text.normalize_utf8(text, 'NFKC')
  text = tf.strings.lower(text)
  # Keep space, a to z, and select punctuation.
  text = tf.strings.regex_replace(text, '[^\u0600-\u06FF\uFB8A\u067E\u0686\u06AF\u200C\u200F\u0980-\u09FFa-z۔؟،«»।ا.?!,]', '')
  # Add spaces around punctuation.
  text = tf.strings.regex_replace(text, '[۔؟،«»ا।.?!,]', r' \0 ')
  # Strip whitespace.
  text = tf.strings.strip(text)

  text = tf.strings.join(['[START]', text, '[END]'], separator=' ')
  return text

and it still doesn't solve my problem.

Original Q&A

Seq2Seq Neural Machine Translation step for aligning right to left languages with English (Or any LTR language)

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in NLP

Related Questions in MACHINE-TRANSLATION

Related Questions in FARSI

Trending Questions

Popular # Hahtags

Popular Questions