Windows Tesseract OCR getting scattered HOCR out put instead of clean standard format

520 Views Asked by Joe At 09 February 2022 at 08:40

A quick help is highly appreciated. I am extracting the text from the tiff image through tesseract-OCR. The output I am looking for is.HOCR (HTML). I am getting the perfect output in terms of content, but the format looks very unorganized. But the same when I open with Notepad ++ it gives a clean format.

The windows command line is given below

Tesseract "Path\image.tiff" "Path\output" HOCR

need your help in getting the organised hocr format in notepad as enclosed

How do I get organized hocr data when I open with notepad?

Original Q&A

There are 1 best solutions below

user898678 On 09 February 2022 at 12:24

Problem is not in tesseract, but in notepad. Use some normal text editor like notepad++ or context.

Windows Tesseract OCR getting scattered HOCR out put instead of clean standard format

There are 1 best solutions below

Related Questions in WINDOWS

Related Questions in COMMAND-LINE

Related Questions in OCR

Related Questions in TESSERACT

Related Questions in HOCR

Trending Questions

Popular # Hahtags

Popular Questions