I'm encountering a very nasty problem in performing a simple frame averaging procedure with Core Image. In summary, I grab frames from the video buffer in the capture output method:
func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
guard let cvBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
let newImage = CIImage(cvImageBuffer: cvBuffer)
...
// Frame averaging part using a CIImageAccumulator
if (slowIncrement == 0.0) {
accumulator?.setImage(newImage)
} else {
makeAverageFiler.currentStack = accumulator?.image()
makeAverageFiler.newImage = newImage
makeAverageFiler.count = slowIncrement
guard let processedImage = makeAverageFiler.outputImage else {return}
accumulator?.setImage(processedImage)
}
slowIncrement += 1.0
...
}
I made a custom filter with the following kernel:
float4 makeAverage(sample_t currentStack, sample_t newImage, float stackCount) {
float4 cstack = unpremultiply(currentStack);
float4 nim = unpremultiply(newImage);
float4 avg = ((cstack * stackCount) + nim) / (stackCount + 1.0);
return premultiply(avg);
}
The algorithm should be correct. When I test the same with a small python snippet on video frames, it works perfectly. In the app it also works to a point. However, when the app acquires more and more frames, I can see that the colors get messed up and weird color patches start to appear. I suspect that core image is not performing the calculations on the color channels properly, somehow the color channels get clipped.
This is how I initialized the CIImageAccumulator:
let accumulator = CIImageAccumulator(extent: CGRect(x: 0, y: 0, width: 3024, height: 4032), format: .RGBAf)
I need to use the accumulator, otherwise, the memory usage grows indefinitely and the app stops working.
I can see that changing the format affects the results. However, I could not find a suitable format that would make the problem disappear.
What am I doing wrong? The cvImageBuffer has a 32bit-per-pixel ARGB pixel format. Is core image performing the conversion to 128bit-per-pixel automatically?
Additional things I tried:
- unmultiply and premultiply does not seem to fix the results
- changing the workingformat of the CIContext also does not seem to work. In particular, it seems I can only set the sRGB format as working CIContextOption. Other formats result in raising an exception.
I would really like to avoid using custom metal shaders and stick to core image. Thanks in advance for your help!
Update
Here is an example of the weird patches that start to appear after acquiring for a while. In this case I'm just moving the phone around while capturing. In real world use cases, this problem appears severely when acquiring slow moving clouds.
Update 2
I declare the CIContext as a property of the view controller. Then, I initialize it in viewDidLoad as follow:
ciContext = CIContext(mtlDevice: metalView.metalDevice, options: [.workingFormat : CIFormat.RGBAf,
.workingColorSpace: NSNull(),
.cacheIntermediates : false,
.highQualityDownsample: true])
I use the ciContext in several places, to both render CIImages on the drawable, create intermediate CGImages and also to save JPEGs. For example, here is the render use:
self.ciContext.render(centeredImage,
to: currentDrawable.texture,
commandBuffer: commandBuffer,
bounds: CGRect(origin: .zero, size: view.drawableSize),
colorSpace: CGColorSpaceCreateDeviceRGB())

The default CIContext pixel format is RGBAh, which is 64 bits per pixel. You need RGBAf since you are working with 128 bits per pixel images. The format needs to be explicitly specified when you create the CIContent like this:
This would set the precision for the entire pipeline which would get executed when the image gets rendered.
UPDATE
Turns out the "color clipping" issue has nothing to do with the color space conversion or rounding errors. Core Image is doing its calculations just fine.
What is really happening is the "convolution" effect. Due to camera jitter, the neighboring pixels get averaged out with ever diminishing factor, and over time the pixels from earlier frames contribute more and more into the final result. Effectively it's similar to applying a convolution kernel (which size is equal to the average distance of the jitter) with greater values towards one side of the matrix. If there is a color gradient in that direction it gets more pronounced with every iteration.
To confirm that try the following experiment:
Here is the code snippet to experiment with:
And here are the results using this image:
after 1000, 2000 and 3000 iterations respectively:



Now, if you replace the
factorwith a constant number, say1000the result will be like this after 3000 iterations:You can see some of the color bleeding but not much, since all the pixels are contributing more or less equally creating more of a blur than color gradient effect. If there were any issues with calculations they would have been seen in this case as well.