Camera2 - most efficient way to obtain a still capture Bitmap - android

To start with the question: what is the most efficient way to initialize and use ImageReader with the camera2 api, knowing that I am always going to convert the capture into a Bitmap?
I'm playing around with the Android camera2 samples, and everything is working quite nicely. However, for my purposes I always need to perform some post processing on captured still images, for which I require a Bitmap object. Presently I am using BitmapFactory.decodeByteArray(...) using the bytes coming from the ImageReader.acquireNextImage().getPlanes()[0].getBuffer() (I'm paraphrasing). While this works acceptably, I still feel like there should be a way to improve performance. The captures are encoded in ImageFormat.Jpeg and need to be decoded again to get the Bitmap, which seems redundant. Ideally I'd obtain them in PixelFormat.RGB_888 and just copy that to a Bitmap using Bitmap.copyPixelsFromBuffer(...), but it doesn't seem like initializing an ImageReader with that format has reliable device support. YUV_420_888 could be another option, but looking around SO it seems that it requires jumping through some hoops to decode into a Bitmap. Is there a recommended way to do this?

The question is what you are optimizing for.
Jpeg is without doubt the easiest format supported by all devices. Decoding it to bitmap is not redundant as it seems because encoding the picture into jpeg he is usually performed by kind of hardware. This means that uses minimal bandwidth to transmit the image from the sensor to your application. on some devices this is the only way to get maximum resolution. BitmapFactory.decodeByteArray(...) is often performed by special hardware decoder too. The major problem with this call is that may cause out of memory exception, because the output bitmap is too big. So you will find many examples the do subsampled decoding, tuned for the use case where the bitmap must be displayed on the phone screen.
If your device supports required resolution with RGB_8888, go for it: this needs minimal post-processing. But scaling such image down may be more CPU intensive then dealing with Jpeg, and memory consumption may be huge. Anyways, only few devices support this format for camera capture.
As for YUV_420_888 and other YUV formats,
the advantages over Jpeg are even smaller than for RGB.
If you need the best quality image and don't have memory limitations, you should go for RAW images which are supported on most high-end devices these days. You will need your own conversion algorithm, and probably make different adaptations for different devices, but at least you will have full command of the picture acquisition.

After a while I now sort of have an answer to my own question, albeit not a very satisfying one. After much consideration I attempted the following:
Setup a ScriptIntrinsicYuvToRGB RenderScript of the desired output size
Take the Surface of the used input allocation, and set this as the target surface for the still capture
Run this RenderScript when a new allocation is available and convert the resulting bytes to a Bitmap
This actually worked like a charm, and was super fast. Then I started noticing weird behavior from the camera, which happened on other devices as well. As it would turn out, the camera HAL doesn't really recognize this as a still capture. This means that (a) the flash / exposure routines don't fire in this case when they need to and (b) if you have initiated a precapture sequence before your capture auto-exposure will remain locked unless you manage to unlock it using AE_PRECAPTURE_TRIGGER_CANCEL (API >= 23) or some other lock / unlock magic which I couldn't get to work on either device. Unless you're fine with this only working in optimal lighting conditions where no exposure adjustment is necessary, this approach is entirely useless.
I have one more idea, which is to setup an ImageReader with a YUV_420_888 output and incorporating the conversion routine from this answer to get RGB pixels from it. However, I'm actually working with Xamarin.Android, and RenderScript user scripts are not supported there. I may be able to hack around that, but it's far from trivial.
For my particular use case I have managed to speed up JPEG decoding to acceptable levels by carefully arranging background tasks with subsampled decodes of the versions I need at multiple stages of my processing, so implementing this likely won't be worth my time any time soon. If anyone is looking for ideas on how to approach something similar though; that's what you could do.

Change the Imagereader instance using a different ImageFormat like this:
ImageReader.newInstance(width, height, ImageFormat.JPEG, 1)

Related

NV21 ByteArray to array2d<rgb_pixel>

Hi I'm trying to implement dlib facial landmark detection in android. I know, but I need as many fps I can get. There are 2 issues that I face,
Conversion Chain
Resizing
Currently, I am getting the data from a preview callback set to a camera. It outputs a byte[] of a NV21 Image. Since dlib dont know image and only know array2d<dlib::rgb_pixel>, I need to conform the data to it. The implementation that I get uses bitmap, and when I try to use there code, I have a chain of conversion byte[]->bmp->array2d, I want to implement a byte[]->array2d conversion.
Now, I need to leverage the performance of dlib by manipulating the size of the image fed in to it. My use-case though doesn't involve small faces so I can down-scale the input image to boost performance, but lets say I am successful on making the byte[]->array2d conversion, how can I resize the image? Resizing in bitmap though have many fast implementations but I need to cut the bitmap involvement to extract more fps. I have an option on resizing the byte[] or the converted one array2d, but again... how? Im guessing its good to do the resizing after the conversion because it will now be operating on native and not on java.
Edit
The down-scaling should take the byte[](not the dlib::arrray2d) form as input since I need to do something on the down-scaled byte[].
So my final problem is to implement this on jni
byte[] resize(ByteArray img, Size targetSize);
and
dlib::array2d<rgb_pixel> convert(ByteArray img);
This question helped me a lot, and made me understand the nv21 structure. Using the code in the question I was able to develop a converter from nv21 byte[] to array2d<rgb>.
What's left unsloved now is the resize.
Performing any resizing in Java is probably bad because of poor compiler optimization. Your most performant option CPU-wise would probably be to write a specialized NV12 resize in C++, then convert to RGB. Swapping the order may only be slightly slower though, and much easier to write.
Your other option is to do all this on the GPU using shaders. GPU are way faster at this sort of thing, but they are finnicky. You might need a CPU fallback anyway (if the GPU isn't available for whatever reason, not familiar with Android).

Android camera2: what kind of upsampling/downsampling is done to fit my streams?

I want to get image flows with the least distortions possible(no noise reduction, etc) without having to deal with RAW outputs.
I'm working with two streams(one when using the deprecated camera), one for the preview and one for the processing. I understand the camera2 api, but am wondering what kind of upsampling/downsampling is used when fitting the sensor output to the surfaces?
More specifically, I'm working on zoomed images, and according to the camera2 documentation concerning cropping and the references:
For non-raw streams, any additional per-stream cropping will be done to maximize the final pixel area of the stream.
The whole concept is easy enough to understand, but it's also mentioned that:
Output streams use this rectangle to produce their output, cropping to a smaller region if necessary to maintain the stream's aspect ratio, then scaling the sensor input to match the output's configured resolution.
But I haven't been able to find any info about this scaling. Which method is used(filter based, bicubic, edge-directed, etc)? is there a way to get this info? and is there a way I can actually choose which one is used?
Concerning the deprecated camera, I'm guessing the zoom is just simpler, in the sense that it's probably equivalent to having SCALER_CROPPING_TYPE_CENTER_ONLY with only a limited set of crop regions corresponding to the exposed zoom ratios. But is the image scaling the same as in camera2? If someone could shed some light I'd be happy.
Real life example
Camera sensor: 5312x2988(16:9)
I want a 4x zoom so the crop region should be (1992, 1120, 1328, 747)
(btw what happens to odd sizes? for instance with SCALER_CROPPING_TYPE_CENTER_ONLY devices?)
Now I have a surface of size(1920, 1080), the crop area and the stream ratio fit, but the 1328x747 values must be transformed to fill the 1920x1080 surface. The nature of this transformation is what I want to know.
The scaling algorithm used depends on the device; generally for power efficiency and speed, scaling is done in hardware blocks usually at the end of camera image signal processor (ISP) pipeline.
Therefore, you can't generally rely on it being any particular kind of scaling or filtering. Unfortunately, if you want to understand the entire processing pipeline, you have to start with RAW and implement it yourself.
If you're on the same device, the old camera API and the new camera2 API talk to the same hardware abstraction layer, and the same hardware scalers, so the scaling output will generally match exactly for the same resolution. (with the exception of LEGACY-level devices, where camera2 may need additional GPU-based scaling, which will be bilinear downsampling - but you don't really know when this would apply).

Real time image processing Android camera2 api

I'm very new to android. I'm trying to use the new Android Camera2 api to build a real time image processing application. My application requires to maintain a good FPS rate as well. Following some examples i managed to do the image processing inside the onImageAvailable(ImageReader reader) method available with ImageReader class. However by doing so, i can only manage to get a frame rate around 5-7 FPS.
I've seen that it is advised to use RenderScript for YUV processing with Android camera2 api. Will using RenderScript gain me higher FPS rates?
If so please can someone guide me on how to implement that, as i'm new to android i'm having a hard time grasping concepts of Allocation and RenderScript. Thanks in advance.
I don't know what type of image processing you want to perform. But in case that you are interested only in the intensity of the image (i.e. grayvalue information) you don't need any conversion of the YUV data array (e.g. into jpeg). For an image consisting of n pixels the intensity information is given by the first n bytes of the YUV data array. So, just cut those bytes out of the YUV data array:
byte[] intensity = new byte[width*height];
intensity = Arrays.copyOfRange(data, 0, width*height);
In theory, you can get the available fps ranges with this call:
characteristics.get(CameraCharacteristics.CONTROL_AE_AVAILABLE_TARGET_FPS_RANGES);
and set the desired fps range here:
mPreviewRequestBuilder.set(CaptureRequest.CONTROL_AE_TARGET_FPS_RANGE, bestFPSRange);
So in principle, you should choose a range with the same lower and upper bound, and that should keep your frame rate constant.
HOWEVER, on devices with a LEGACY profile, none of the devices I have tested have been able to achieve 30fps at 1080p (S5, Z3 Compact, Huawei Mate S, and HTC One M9). The only way I was able to achieve that was by using a device (LG G4) that turned out to have a FULL profile.
Renderscript will not buy you anything here if you are going to use it inside the onImageAvailable callback. It appears that getting the image at that point is the bottleneck on LEGACY devices since the new camera2 API simply wraps the old one, and is presumably creating so much overhead that the callback does not occur at 30fps anymore. So if Renderscript is to work, you would need to need to create a Surface and find another way of grabbing the frames off of it.
Here is the kicker though... if you move back the deprecated API, I would almost guarantee 30fps at whatever resolution you want. At least that is what I found on all of the devices I tested....

Proper camera2 YUV_420_888 to JPEG conversion

I know the topic has already appeared multiple times but none of the solutions actually seems to be working, ranging from answers suitable for the old camera API (where the YUV data comes in a neat byte[] array), via corrupted images, to saving JPEG's all in "green scale" using RenderScript (which by the way is the best I am currently able to do).
The way the Camera2Basic example does it is that it simply sets the Type.Builder's format to JPEG. The problem with this (as discussed in multiple posts as well) is that it slows down the camera pipeline. YUV_420_888 works much much faster.
So, did anyone manage to perform a proper YUV_420_888 -> JPEG conversion?

Real-time image processing with Camera2

I tried searching in a ton of places about doing this, with no results. I did read that the only (as far as I know) way to obtain image frames was to use a ImageReader, which gives me a Image to work with. However, a lot of work must be done before I have a nice enough image (converting Image to byte array, then converting between formats - YUV_420_888 to ARGB_8888 - using RenderScript, then turning it into a Bitmap and rotating it manually - or running the application on landscape mode). By this point a lot of processing is made, and I haven't even started the actual processing yet (I plan on running some native code on it). Additionally, I tried to lower the resolution, with no success, and there is a significant delay when drawing on the surface.
Is there a better approach to this? Any help would be greatly appreciated.
Im not sure what exactly you are doing with the images, but a lot of times only a grayscale image is actually needed (again depending on your exact goal) If your camera outputs YUV, the grayscale information is in the Y channel. The nice thing,is you don't need to convert to numerous colorspaces and working with only one layer (as opposed to three) decreases the size of your data set greatly.
If you need color images then this wouldn't help

Categories

Resources