Too much CPU usage of MediaProjection leaving less for OCR?

72 Views Asked by At

I am taking screenshots on Android with two different methods:

  1. By running /system/bin/screencap -p $path.
  2. With the MediaProjection API.

Even though it's the exact same screen, when performing OCR (with the use of Tesseract) I get different results.

With /system/bin/screencap I get the expected results. With the MediaProjection API is unable to recognise any or all text correctly, hence I need to preprocess the image with a binarization algorithm.

Why is that? I have checked screencap source code and it seems that is uses PNG compression, config ARGB_8888 and quality 100%. As you can see here: https://android.googlesource.com/platform/frameworks/base/+/master/cmds/screencap/screencap.cpp

This is how I am creating the bitmap by using the MediaProjection API:

public class ImageTransmogrifier implements ImageReader.OnImageAvailableListener {
  private final int width;
  private final int height;
  private final ImageReader imageReader;
  private final ScreenshotService svc;
  private Bitmap latestBitmap=null;

  ImageTransmogrifier(ScreenshotService svc) {
    this.svc=svc;

    Display display=svc.getWindowManager().getDefaultDisplay();
    Point size=new Point();

    display.getRealSize(size);

    int width=size.x;
    int height=size.y;

    while (width*height > (2<<19)) {
      width=width>>1;
      height=height>>1;
    }

    this.width=width;
    this.height=height;

    imageReader=ImageReader.newInstance(width, height,
        PixelFormat.RGBA_8888, 2);
    imageReader.setOnImageAvailableListener(this, svc.getHandler());
  }

  @Override
  public void onImageAvailable(ImageReader reader) {
    final Image image=imageReader.acquireLatestImage();

    if (image!=null) {
      Image.Plane[] planes=image.getPlanes();
      ByteBuffer buffer=planes[0].getBuffer();
      int pixelStride=planes[0].getPixelStride();
      int rowStride=planes[0].getRowStride();
      int rowPadding=rowStride - pixelStride * width;
      int bitmapWidth=width + rowPadding / pixelStride;

      if (latestBitmap == null ||
          latestBitmap.getWidth() != bitmapWidth ||
          latestBitmap.getHeight() != height) {
        if (latestBitmap != null) {
          latestBitmap.recycle();
        }

        latestBitmap=Bitmap.createBitmap(bitmapWidth,
            height, Bitmap.Config.ARGB_8888);
      }

      latestBitmap.copyPixelsFromBuffer(buffer);
      image.close();

      ByteArrayOutputStream baos=new ByteArrayOutputStream();
      Bitmap cropped=Bitmap.createBitmap(latestBitmap, 0, 0,
        width, height);

      cropped.compress(Bitmap.CompressFormat.PNG, 100, baos);

      byte[] newPng=baos.toByteArray();

      svc.processImage(newPng);
    }
  }

  Surface getSurface() {
    return(imageReader.getSurface());
  }

  int getWidth() {
    return(width);
  }

  int getHeight() {
    return(height);
  }

  void close() {
    imageReader.close();
  }
}

I've been told that basically I am using more of the processor to do the recording leaving less to the OCR. The OCR has less cycles, so the accuracy decreases within a given amount of time. And that's also the reason I don't need to pre-process the image with screencap. Because chances are a one-time upscale isn't nearly as much as constant streaming through.

Is there any foundation in this? If so, should I use something else instead of the MediaProjection or simply pre-process the images?

0

There are 0 best solutions below