CameraX custom OpenGL Video Pipeline (`UseCase`/`VideoOutput`)

88 Views Asked by At

I have an existing Camera2 app which I want to migrate over to CameraX to fix a few quirks on some Android devices (Samsungs crash a few times).

Currently, I have a custom VideoPipeline where I do the following:

  1. Using an ImageReader, I get CPU access to the Camera Frames and run face detection on it
  2. After faces have been detected, forward the Frame to my OpenGL pipeline (either via an ImageWriter, or by passing the HardwareBuffer as an OpenGL texture)
  3. In the OpenGL pipeline I draw a box around the user's face using OpenGL.
  4. After I drew the box around the face, I render everything into two output surfaces:
    1. The Preview View Surface
    2. A MediaRecorder/Encoder Surface (when recording to .mp4)

In Camera2 I just attached the surface I get from the ImageReader (step 1) to the Camera, and it started streaming into my ImageReader at whatever format I configured the ImageReader to run at (ImageFormat.YUV_420_888 for CPU algorithms or ImageFormat.PRIVATE for GPU algorithms).

See code here: VideoPipeline.kt


I've been trying to wrap my head around CameraX' quite high-level abstractions of the whole Camera pipeline, but I can't seem to understand how to create a custom Video Pipeline where I can do what I did in Camera2.

So far, this is what I came up with:

class VideoPipeline(
  private val format = ImageFormat.YUV_420_888, // or ImageFormat.PRIVATE
  private val callback: CameraSession.Callback
) : VideoOutput, Closeable {
  companion object {
    private const val MAX_IMAGES = 3
    private const val TAG = "VideoPipeline"
  }

  // Output
  // TODO: Use `Recording`/`Recorder` output from CameraX?

  // TODO: When I didn't override getMediaSpec, CameraX crashed.
  @SuppressLint("RestrictedApi")
  override fun getMediaSpec(): Observable<MediaSpec> {
    val mediaSpec = MediaSpec.builder().setOutputFormat(MediaSpec.OUTPUT_FORMAT_MPEG_4).configureVideo { video ->
      // TODO: Instead of hardcoding that, can I dynamically get those values from the Camera?
      video.setFrameRate(Range(0, 60))
      video.setQualitySelector(QualitySelector.from(Quality.HD))
    }
    return ConstantObservable.withValue(mediaSpec.build())
  }

  @SuppressLint("RestrictedApi")
  override fun onSurfaceRequested(request: SurfaceRequest) {
    val size = request.resolution
    val surfaceSpec = request.deferrableSurface
    Log.i(TAG, "Creating $size Surface... (${request.expectedFrameRate.upper} FPS, expected format: ${surfaceSpec.prescribedStreamFormat}, ${request.dynamicRange})")

    // Create ImageReader
    val imageReader = ImageReader.newInstance(size.width, size.height, format, MAX_IMAGES)

    imageReader.setOnImageAvailableListener({ reader ->
      val image = reader.acquireNextImage() ?: return@setOnImageAvailableListener

      // 1. Detect Faces
      val faces = callback.detectFaces(image)

      // 2. Forward to OpenGL
      onFrame(image)
    }, CameraQueues.videoQueue.handler)

    // Pass the ImageReader surface to CameraX when bound to a lifecycle
    request.provideSurface(imageReader.surface, CameraQueues.videoQueue.executor) { result ->
      imageReader.close()
    }
  }
}

and then to set up the Camera:

val fpsRange = Range(30, 30)

val preview = Preview.Builder().build()
preview.setSurfaceProvider(previewView.surfaceProvider)

val videoPipeline = VideoPipeline(ImageFormat.YUV_420_888, callback)
val video = VideoCapture.Builder(videoPipeline).also { video ->
  video.setMirrorMode(MirrorMode.MIRROR_MODE_ON_FRONT_ONLY)
  video.setTargetFrameRate(fpsRange)
}.build()

val camera = provider.bindToLifecycle(this, cameraSelector, preview, video)

..but this crashes because for some reason when calling acquireNextImage(), because apparently the surface producer (the Camera) is configured with format 0x1 (RGB) instead of 0x23 (YUV)?:

java.lang.UnsupportedOperationException: The producer output buffer format 0x1 doesn't match the ImageReader's configured buffer format 0x23.
  at android.media.ImageReader.nativeImageSetup(Native Method)
  at android.media.ImageReader.acquireNextSurfaceImage(ImageReader.java:439)
  at android.media.ImageReader.acquireNextImage(ImageReader.java:493)
  at com.mrousavy.camera.core.VideoPipeline.onSurfaceRequested$lambda$1(VideoPipeline.kt:99)
  at com.mrousavy.camera.core.VideoPipeline.$r8$lambda$5heGRS9RL_Z-x-BiU-ziCBMS68U(Unknown Source:0)
  at com.mrousavy.camera.core.VideoPipeline$$ExternalSyntheticLambda1.onImageAvailable(Unknown Source:2)
  at android.media.ImageReader$ListenerHandler.handleMessage(ImageReader.java:800)
  at android.os.Handler.dispatchMessage(Handler.java:112)
  at android.os.Looper.loop(Looper.java:216)
  at android.os.HandlerThread.run(HandlerThread.java:65)

Now especially also because of the getMediaSpec() method I wasn't sure if what I'm doing is the correct approach for this, does anyone have any thoughts or ideas? Multiple questions about this approach:

  1. Should I extend VideoOutput and then use that in a VideoCapture.Builder, or should I directly extend UseCase and do everything myself?
  2. Can I somehow re-use CameraX' Recorder/Recording instances in my VideoPipeline to stream into that Surface from OpenGL? I don't want to re-write the entire MediaMuxer/MediaEncoder parts...
1

There are 1 best solutions below

3
Xi 张熹 On

Please check out the OverlayEffect API. It allows the app to buffer up the GPU stream and wait for the result from CPU stream before rendering a overlay on top of the GPU stream. As for the CPU stream, you can get it using the ImageAnalysis API. Not sure how do you detect faces today but you can get face detection result using the ML Kit with the MLKitAnalyzer API.

For code samples, you can check out this prototype change. The SyncedOverlayEffect shows how to sync the CPU stream with the GPU stream. The OverlayFragment shows how to use MLKitAnalyzer to get the detection result.