I want to run an OCR(Tesseract) on AWS Lambda using Java.I wanted to have a basic-"Hello World" set up so I created the Handler function for AWS Lambda like so-
public class Hello implements RequestHandler<Object, String> {
@Override
public String handleRequest(Object input, Context context) {
context.getLogger().log("Inside handleRequest of Hello class PPPPp ");
context.getLogger().log("Input: " + input);
String output = "Hello, " + input + "!";
return output;
}
}
And this is working as I have tested it on the AWS Lambda console. And also Teseract is running on my local machine and tested it by passing an image and is printing the words present. But now I am not getting how to combine these 2 (AWS Lambda and Teseract) into a single java program. I found this link which runs Teseract on Python - https://typless.com/tesseract-on-aws-lambda-ocr-as-a-service/ and tried to do the equivalent in Java, but was not able to do so as they were manipulating .py files and I am dealing with .java files. I also came across this github link- https://github.com/tesseract-ocr/tesseract present in the AWS Lambda documentation,and this one- https://github.com/tesseract-ocr/tesseract but didnt understand how to use the code for my requirment. I am stuck here and new to AWS Lambda. Any help very much appreciated. Thanks in advance.
Please see the example https://github.com/jlcorradi/tesseract-example/blob/e85b93f3481109c56bb0e7255e0116691463693f/src/main/java/com/playground/tesseract/TesseractPlaygroundLambdaHandle.java
Github: Not verified
You will need to have a project/lib management tool such as
Mavenfor example, to allow you to specify dependencies (For Terrasact), and to create a standalone jar, that you can then upload towards your Lambda.This is in its simplest form.
https://docs.aws.amazon.com/lambda/latest/dg/java-package.html for how to deploy a standalone zip/jar towards Lambda.