Why HOCR output does not work as expected for apache-tika

21 Views Asked by At

I'm trying to extract hocr text from a simple image input

  1. Pure command line tesseract command works as expected tesseract image_input.png out_hocr hocr
  2. When I m trying using tika-server (or tika-core) the output is just a text with extra new lines curl -T image_input.png http://localhost:9998/tika --header "Content-type: image/png" --header "X-Tika-OCRenableImageProcessing: 1" --header "X-Tika-OCRoutputType: hocr" returns output_tika
0

There are 0 best solutions below