How to use googletrans to translate text attributes in xml parsed file in python?

75 Views Asked by At

I have a xml file with loads of lines containing this kind of elements:

<?xml version="1.0" encoding="UTF-8"?>
<PAGEOBJECT XPOS="553.544307086578" YPOS="21613.7017874016" OwnPage="49" ItemID="146799560" PTYPE="4" WIDTH="39.6850393700787" HEIGHT="25.5118" FRTYPE="0" CLIPEDIT="1" PWIDTH="0.963601" PCOLOR="Black" PCOLOR2="Black" PLINEART="1" TEXTFLOWMODE="2" LOCALSCX="1" LOCALSCY="1" LOCALX="0" LOCALY="0" LOCALROT="0" PICART="1" SCALETYPE="1" RATIO="1" COLUMNS="1" COLGAP="14.4" AUTOTEXT="0" EXTRA="0" TEXTRA="2.83464566929134" BEXTRA="0" REXTRA="0" VAlign="0" FLOP="0" PLTSHOW="0" BASEOF="0" textPathType="0" textPathFlipped="0" path="M0 0 L39.685 0 L39.685 25.5118 L0 25.5118 L0 0 Z" copath="M0 0 L-46.8721 0 L-46.8721 -21.6 L0 -21.6 L0 0 Z" gXpos="518.740196456758" gYpos="420.491199031527" gWidth="519.703797456759" gHeight="763.483364779499" ALIGN="1" LAYER="0" NEXTITEM="-1" BACKITEM="-1">
            <StoryText>
                <DefaultStyle ALIGN="1" LINESPMode="0" LINESP="10" FONT="Times New Roman Bold" FONTSIZE="10" FEATURES="inherit" FCOLOR="White" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0"/>
                <ITEXT FONT="Times New Roman Bold" FONTSIZE="10" FEATURES="inherit" FCOLOR="White" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="English word 1"/>
                <para ALIGN="1" LINESP="10"/>
                <ITEXT FONT="Times New Roman Bold" FONTSIZE="10" FEATURES="inherit" FCOLOR="White" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="English word 2"/>
                <trail ALIGN="1" LINESP="10"/>
            </StoryText>
        </PAGEOBJECT>

My goal is to iterate on the file lines to find CH attributes for each ITEXT element, then for each text found into CH attribute call googletrans to translate it from one langage to another, then replace initial text with translated one in the file

My code:

from googletrans import Translator
import xml.etree.ElementTree as ET       

        # file opening 
        file = open("c:/fir/file.sla", mode="r+", encoding='UTF-8')

        # Parse XML
        tree = ET.parse(file)
        root = tree.getroot()

        # find/replace of CH attribute in ITEXT elements
        for elem in root.iter("ITEXT"):
            txtcourant = elem.get("CH")
            # Translation
            translator = Translator()
            txtraduit = translator.translate(dest="fr", src="en", text=txtcourant)
           
            # text replace
            elem.attrib["CH"] = txtraduit
            
        # file closing
        file.close()

I have a problem here: i cannot see the changes in the final file. i think i miss some write order somewhere

Do you have an idea to help me?

i tried to replace translated text by plain string text, i got no more error but i cannot see the changes (the original text is still here with no changes).

0

There are 0 best solutions below