I am attempting to analyze camera preview frames with a tflite model, using the CameraX api.
This documentation describes using the ImageAnalyzer to process incoming frames. Currently the frames are incoming as YUV, and I'm not sure how to pass YUV image data to a tflite model thats expecting an input of the shape (BATCHxWIDTHxHEIGHTx3). In the old APIs you could specify preview output formats and change it to rgb, however this page specifically says "CameraX produces images in YUV_420_888 format."
First I'm hoping someone has found a way to pass RGB to the Analyzer rather than YUV, and secondly if not, could someone suggest a way of passing a YUV image to a TFLite interpreter? The incoming image object is of the type ImageProxy and it has 3 planes, Y, U, and V.
AFAIK, the ImageAnalysis use case only provides images in the YUV_420_888 format (You can see it defined here).
The official CameraX documentation provides a way to convert YUV images to RGB bitmaps, it's at the bottom of this section.
For sample code that shows how to convert a Media.Image object from
YUV_420_888 format to an RGB Bitmap object, see YuvToRgbConverter.kt.
For anyone having this problem now.
ImageAnalysis use cases now provides images in YUV_420_888 as well as RGBA_8888 which is supported in TFLite interpreter.
Usage:
val imageAnalysis = ImageAnalysis.Builder()
.setTargetAspectRatio(AspectRatio.RATIO_16_9)
.setTargetRotation(viewFinder.display.rotation)
.setBackpressureStrategy(ImageAnalysis.STRATEGY_BLOCK_PRODUCER)
.setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
.build()
imageAnalysis.setAnalyzer(executor, ImageAnalysis.Analyzer { image ->
if (!::bitmapBuffer.isInitialized) {
// The image rotation and RGB image buffer are initialized only once
// the analyzer has started running
imageRotationDegrees = image.imageInfo.rotationDegrees
bitmapBuffer = Bitmap.createBitmap(
image.width, image.height, Bitmap.Config.ARGB_8888)
}
// Copy out RGB bits to our shared buffer
image.use { bitmapBuffer.copyPixelsFromBuffer(image.planes[0].buffer) }
image.close()
val imageProcessor =
ImageProcessor.Builder()
.add(Rot90Op(-frame.imageRotationDegrees / 90))
.build()
// Preprocess the image and convert it into a TensorImage for detection.
val tensorImage = imageProcessor.process(TensorImage.fromBitmap(frame.bitmapBuffer))
val results = objectDetector?.detect(tensorImage)
}
Check official sample app for more details: https://github.com/tensorflow/examples/blob/master/lite/examples/object_detection/android_play_services/app/src/main/java/org/tensorflow/lite/examples/objectdetection/fragments/CameraFragment.kt
Related
I am trying to mock Camera API in order to come up with end-to-end test. The Camera API produces android.media.Image(s) and posts it to the Surface to be consumed by ImageReader.acquireLatestImage().
My idea is to create a mechanism based on ImageWriter so I could queue predefined test JPEG images or video files in order to mimic Camera API functionality.
As far as I understand there are two options:
to build the YUV byte buffers manually using some byte manipulations (software/RenderScript/GL) and inject them into Image object retrieved from ImageWriter.dequeueInputImage
to decode the source media file by MediaCodec in ByteBuffer mode and extract the result frames via MediaCodec.getOutputImage and copy it to the ImageWriter.
Unfortunately I could not get any success at the moment.
Does someone know any working method to mock Camera dependency but keep the data source?
The library libyuv-android (https://github.com/crow-misia/libyuv-android) has helped with the problem. Something like this:
val yuvBuffer = I420Buffer.allocate(width, height)
val bitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888)
val argbBuffer = AbgrBuffer.allocate(width, height)
bitmap.copyPixelsToBuffer(argbBuffer.asBuffer())
argbBuffer.convertTo(yuvBuffer)
val imageWriter = ImageWriter.newInstance(targetSurface, 1, ImageFormat.YUV_420_888)
val image = imageWriter.dequeueInputImage()
image.planes[0].buffer.put(yuvBuffer.planeY.buffer)
image.planes[1].buffer.put(yuvBuffer.planeU.buffer)
image.planes[2].buffer.put(yuvBuffer.planeV.buffer)
imageWriter.queueInputImage(image)
I'm building a camera app for Android, in Kotlin, using CameraX. My ImageAnalyzer will do some processing on the images passed from the camera, and I need to be able to process these images as 2-D arrays.
The CameraX demos use this code
val buffer = image.planes[0].buffer
// Extract image data from callback object
val data = buffer.toByteArray()
val pixels = data.map { it.toInt() and 0xFF }
which results in, effectively, the 1-D array pixels. I would prefer to use Bitmap.getPixel(x, y) which is perfect. Unfortunately I don't have a Bitmap and haven't discovered a way to get one.
How can I index the image like a 2-D array?
I am creating an app for taking pictures and sending them via http POST to my server. Since I only need grayscale data on the server side, it would by much better to just take the grayscale picture and not having to convert it.
I am using Camera2 API and I have an issue with setting properties for CaptureRequest.Builder instance. With this:
final CaptureRequest.Builder captureBuilder = cameraDevice.createCaptureRequest(CameraDevice.TEMPLATE_STILL_CAPTURE);
captureBuilder.set(CaptureRequest.CONTROL_EFFECT_MODE, CaptureRequest.CONTROL_EFFECT_MODE_NEGATIVE);
It takes a negative photo.
But this:
final CaptureRequest.Builder captureBuilder = cameraDevice.createCaptureRequest(CameraDevice.TEMPLATE_STILL_CAPTURE);
captureBuilder.set(CaptureRequest.CONTROL_EFFECT_MODE, CaptureRequest.CONTROL_EFFECT_MODE_MONO);
Does absolutely nothing. No grayscale. just a normal picture.
You need to look at the list of supported effects on your device, to see if MONO is actually supported by it.
If you only care about luminance, you could just capture YUV_420_888 buffers instead of JPEG, and only send the Y buffer to the server. That won't get you automatic JPEG encoding, though.
Also note that generally under the hood, JPEG images are encoded in YUV; so if you dig into your JPEG decoder library, you may be able to get the image data before conversion to RGB, and simply ignore the chroma channels.
captureRequestBuilder.set(CaptureRequest.CONTROL_EFFECT_MODE,CameraMetadata.CONTROL_EFFECT_MODE_MONO);
you can use this`
I'm trying to create an app that processes camera images in real time and displays them on screen. I'm using the camera2 API. I have created a native library to process the images using OpenCV.
So far I have managed to set up an ImageReader that receives images in YUV_420_888 format like this.
mImageReader = ImageReader.newInstance(
mPreviewSize.getWidth(),
mPreviewSize.getHeight(),
ImageFormat.YUV_420_888,
4);
mImageReader.setOnImageAvailableListener(mOnImageAvailableListener, mImageReaderHandler);
From there I'm able to get the image planes (Y, U and V), get their ByteBuffer objects and pass them to my native function. This happens in the mOnImageAvailableListener:
Image image = reader.acquireLatestImage();
Image.Plane[] planes = image.getPlanes();
Image.Plane YPlane = planes[0];
Image.Plane UPlane = planes[1];
Image.Plane VPlane = planes[2];
ByteBuffer YPlaneBuffer = YPlane.getBuffer();
ByteBuffer UPlaneBuffer = UPlane.getBuffer();
ByteBuffer VPlaneBuffer = VPlane.getBuffer();
myNativeMethod(YPlaneBuffer, UPlaneBuffer, VPlaneBuffer, w, h);
image.close();
On the native side I'm able to get the data pointers from the buffers, create a cv::Mat from the data and perform the image processing.
Now the next step would be to show my processed output, but I'm unsure how to show my processed image. Any help would be greatly appreciated.
Generally speaking, you need to send the processed image data to an Android view.
The most performant option is to get an android.view.Surface object to draw into - you can get one from a SurfaceView (via SurfaceHolder) or a TextureView (via SurfaceTexture). Then you can pass that Surface through JNI to your native code, and there use the NDK methods:
ANativeWindow_fromSurface to get an ANativeWindow
The various ANativeWindow methods to set the output buffer size and format, and then draw your processed data into it.
Use setBuffersGeometry() to configure the output size, then lock() to get an ANativeWindow_Buffer. Write your image data to ANativeWindow_Buffer.bits, and then send the buffer off with unlockAndPost().
Generally, you should probably stick to RGBA_8888 as the most compatible format; technically only it and two other RGB variants are officially supported. So if your processed image is in YUV, you'd need to convert it to RGBA first.
You'll also need to ensure that the aspect ratio of your output view matches that of the dimensions you set; by default, Android's Views will just scale those internal buffers to the size of the output View, possibly stretching it in the process.
You can also set the format to one of Android's internal YUV formats, but this is not guaranteed to work!
I've tried the ANativeWindow approach, but it's a pain to set up and I haven't managed to do it correctly. In the end I just gave up and imported OpenCV4Android library which simplifies things by converting camera data to a RGBA Mat behind the scenes.
I would like to perform face detection / tracking on a video file (e.g. an MP4 from the users gallery) using the Android Vision FaceDetector API. I can see many examples on using the CameraSource class to perform face tracking on the stream coming directly from the camera (e.g. on the android-vision github), but nothing on video files.
I tried looking at the source code for CameraSource through Android Studio, but it is obfuscated, and I couldn't see the original online. I image there are many commonalities between using the camera and using a file. Presumably I just play the video file on a Surface, and then pass that to a pipeline.
Alternatively I can see that Frame.Builder has functions setImageData and setTimestampMillis. If I was able to read in the video as ByteBuffer, how would I pass that to the FaceDetector API? I guess this question is similar, but no answers. Similarly, decode the video into Bitmap frames and pass that to setBitmap.
Ideally I don't want to render the video to the screen, and the processing should happen as fast as the FaceDetector API is capable of.
Alternatively I can see that Frame.Builder has functions setImageData and setTimestampMillis. If I was able to read in the video as ByteBuffer, how would I pass that to the FaceDetector API?
Simply call SparseArray<Face> faces = detector.detect(frame); where detector has to be created like this:
FaceDetector detector = new FaceDetector.Builder(context)
.setProminentFaceOnly(true)
.build();
If processing time is not an issue, using MediaMetadataRetriever.getFrameAtTime solves the question. As Anton suggested, you can also use FaceDetector.detect:
Bitmap bitmap;
Frame frame;
SparseArray<Face> faces;
MediaMetadataRetriever mMMR = new MediaMetadataRetriever();
mMMR.setDataSource(videoPath);
String timeMs = mMMR.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION); // video time in ms
int totalVideoTime= 1000*Integer.valueOf(timeMs); // total video time, in uS
for (int time_us=1;time_us<totalVideoTime;time_us+=deltaT){
bitmap = mMMR.getFrameAtTime(time_us, MediaMetadataRetriever.OPTION_CLOSEST_SYNC); // extract a bitmap element from the closest key frame from the specified time_us
if (bitmap==null) break;
frame = new Frame.Builder().setBitmap(bitmap).build(); // generates a "Frame" object, which can be fed to a face detector
faces = detector.detect(frame); // detect the faces (detector is a FaceDetector)
// TODO ... do something with "faces"
}
where deltaT=1000000/fps, and fps is the desired number of frames per second. For example, if you want to extract 4 frames every second, deltaT=250000
(Note that faces will be overwritten on every iteration, so you should do something (store/report results) inside the loop