Gujrati text is not rendered correctly in my existing pdf

92 Views Asked by At

I have used Fitz library(pymupdf sub-module) to insert text in existing pdf.
also, I have used the google trans module to translate the text
whenever I run the script the text is not rendered correctly
o/p : રવ્ નિદર નાકરાણી
Expected O/P : રવિન્દ્ર નાકરાણી

import fitz

def add_text_to_pdf(pdf_path, output_path,  text_to_add):
    # Open the PDF
    pdf_document = fitz.open(pdf_path)
    # set coordinnates for text
    coordinates = (175,215)

    # Get the number of pages in the PDF
    num_pages = pdf_document.page_count

    for page_num in range(num_pages):
        # Get the page
        if page_num == 2:
            page = pdf_document[page_num]
            new_ste = text_to_add
                
            page.insert_text((coordinates[0], coordinates[1]), str(new_ste),fontfile = font_path ,fontsize = 13,color = (0,0,0),fontname = 'Shruti')
    
    # Save the modified PDF
    pdf_document.save(output_path)
    pdf_document.close()
    
# Example usage
if __name__ == "__main__": 
    input_pdf_path = file_path
    output_pdf_path = "Outpugt.pdf"
    translator = Translator()
    result = translator.translate('Ravindra Nakrani', src='en',dest='gu')
    print(result.text)
    text_to_add = result.text
    add_text_to_pdf(input_pdf_path, output_pdf_path, text_to_add)
2

There are 2 best solutions below

0
Jorj McKie On

Here is a multi-language / multi-font example using Page.insert_htmlbox:

import fitz

greetings = (
    "Hello, World!",  # english
    "Hallo, Welt!",  # german
    "سلام دنیا!",  # persian
    "வணக்கம், உலகம்!",  # tamil
    "สวัสดีชาวโลก!",  # thai
    "Привіт Світ!",  # ucranian
    "שלום עולם!",  # hebrew
    "ওহে বিশ্ব!",  # bengali
    "你好世界!",  # chinese
    "こんにちは世界!",  # japanese
    "안녕하세요, 월드!",  # korean
    "नमस्कार, विश्व !",  # sanskrit
    "हैलो वर्ल्ड!",  # hindi
)
doc = fitz.open()  # make a new empty PDF
page = doc.new_page()  # give it an empty page
rect = (50, 50, 200, 500)  # define a small rectangle on it

# concatenate the greetings into one string.
text = " ... ".join([t for t in greetings])
page.insert_htmlbox(rect, text)  # place into the rectangle

# make subset fonts
doc.subset_fonts()

# save with maximum compression / garbage collection
doc.save(__file__.replace(".py", ".pdf"), garbage=3, deflate=True)

Note: I am a maintainer and the original creator of PyMuPDF.

0
Jorj McKie On
import fitz
doc = fitz.open()
page = doc.new_page()
r = (100, 100, 200, 150)
text = "રવિન્દ્ર નાકરાણી"
page.insert_htmlbox(r, text)
doc.ez_save("gujrati.pdf")

Result looks like your text for me ...

enter image description here

You can also supply your own font - instead of letting PyMuPDF choosing one from the Google Notos repository (which is shown here). This must be done using CSS-styling intructions @font-face and font-family.