I have two problems.
- I would like to get character wise confidence values. At the moment I am just getting the meanConfidence for each word. Let`s say "Hello" - meanConfidence: 90. I want it like that:
- "H" - confidence: 90
- "e" - confidence: 94
- ...
- At the moment I´m getting the ocrText and the segment rectangles seperated. I need these informations together. Let`s say:
- 100 100 100 100 "H"
- 110 100 110 100 "e"
- ...
...
private TesseractEngine tesseract = new TesseractEngine(path, "eng", EngineMode.LstmOnly);
....
using (var page = tesseract.Process(image, rec, PageSegMode.Auto))
{
text= page.GetText(); // returns the ocr text of the whole rectangle
confidence = page.GetMeanConfidence(); // returns the confidence for the whole word.
List<System.Drawing.Rectangle> rectangles = page.GetSegmentedRegions(PageIteratorLevel.Symbol); //returns each character reactangle of the word.
}
Thanks for your help! :)
You'd need to obtain
ResultIteratorobject (viapage.GetIterator()method) and then operate on it atPageIteratorLevel.Symbollevel. Check PageSerializer class for example.