Real Time Image Processing in Android using the NDK

Real Time Image Processing in Android using the NDK - android

Using an Android (2.3.3) phone, I can use the camera to retrieve a preview with the onPreviewFrame(byte[] data, Camera camera) method to get the YUV image.
For some image processing, I need to convert this data to an RGB image and show it on the device. Using the basic java / android method, this runs at a horrible rate of less then 5 fps...
Now, using the NDK, I want to speed things up. The problem is: How do I convert the YUV array to an RGB array in C? And is there a way to display it (using OpenGL perhaps?) in the native code? Real-time should be possible (the Qualcomm AR demos showed us that).
I cannot use the setTargetDisplay and put an overlay on it!
I know Java, recently started with the Android SDK and have zero experience in C

Have you considered using OpenCV's Android port? It can do a lot more than just color conversion, and it's quite fast.

A Google search returned this page for a C implementation of YUV->RGB565. The author even included the JNI wrapper for it.

You can also succeed by staying with Java. I did this for the imagedetectíon of the androangelo-app.
I used the sample code which you find here by searching "decodeYUV".
For processing the frames, the essential part to consider is the image-size.
Depending on the device you may get quite large images. i.e. for the Galaxy S2
the smallest supported previewsize is 640*480. This is a big amount of pixels.
What I did, is to use only every second row and every second column, after yuvtorgb decoding. So processing a 320*240 image works quite well and allowed me to get frame-rates of 20fps. (including some noise-reduction, a color-conversion from rgb to hsv and a circledetection)
In addition You should carefully check the size of the image-buffer provided to the setPreview function. If it is too small, the garbage-collection will spoil everything.
For the result you can check the calibration-screen of the androangelo-app. There I had an overlay of the detected image over the camera-preview.

Related

Real time image processing Android camera2 api

I'm very new to android. I'm trying to use the new Android Camera2 api to build a real time image processing application. My application requires to maintain a good FPS rate as well. Following some examples i managed to do the image processing inside the onImageAvailable(ImageReader reader) method available with ImageReader class. However by doing so, i can only manage to get a frame rate around 5-7 FPS.
I've seen that it is advised to use RenderScript for YUV processing with Android camera2 api. Will using RenderScript gain me higher FPS rates?
If so please can someone guide me on how to implement that, as i'm new to android i'm having a hard time grasping concepts of Allocation and RenderScript. Thanks in advance.

I don't know what type of image processing you want to perform. But in case that you are interested only in the intensity of the image (i.e. grayvalue information) you don't need any conversion of the YUV data array (e.g. into jpeg). For an image consisting of n pixels the intensity information is given by the first n bytes of the YUV data array. So, just cut those bytes out of the YUV data array:
byte[] intensity = new byte[width*height];
intensity = Arrays.copyOfRange(data, 0, width*height);

In theory, you can get the available fps ranges with this call:
characteristics.get(CameraCharacteristics.CONTROL_AE_AVAILABLE_TARGET_FPS_RANGES);
and set the desired fps range here:
mPreviewRequestBuilder.set(CaptureRequest.CONTROL_AE_TARGET_FPS_RANGE, bestFPSRange);
So in principle, you should choose a range with the same lower and upper bound, and that should keep your frame rate constant.
HOWEVER, on devices with a LEGACY profile, none of the devices I have tested have been able to achieve 30fps at 1080p (S5, Z3 Compact, Huawei Mate S, and HTC One M9). The only way I was able to achieve that was by using a device (LG G4) that turned out to have a FULL profile.
Renderscript will not buy you anything here if you are going to use it inside the onImageAvailable callback. It appears that getting the image at that point is the bottleneck on LEGACY devices since the new camera2 API simply wraps the old one, and is presumably creating so much overhead that the callback does not occur at 30fps anymore. So if Renderscript is to work, you would need to need to create a Surface and find another way of grabbing the frames off of it.
Here is the kicker though... if you move back the deprecated API, I would almost guarantee 30fps at whatever resolution you want. At least that is what I found on all of the devices I tested....

Create pixel from camera given byte in android

I have an OCR scanning text logic that needs to work in real-time. The app needs to scan any text that appears on the camera.
Currently I'm grabbing the onPreviewFrame method of the camera and for accurateness I want to convert the byte data which gives me that method to pixel.
How do i do it? As so far in google i could not find my answer.
So far i just tried this code and nothing else because nothing else i could find
Pix pix = Pix.createFromPix(byteData,imageWidth,imageHeight,imageDepth);
but i have feeling im doing wrong and this is not pixel formating from byte...

From the description of onPreviewFrame(), the content of the bytes depends on the previously set preview format. If you set the format to RAW10 for example, its documentation says:
In an image buffer with this format, starting from the first pixel of
each row, each 4 consecutive pixels are packed into 5 bytes (40 bits).
The documentation explains quite in detail how you can get all the pixels.
However, I should think converting the bytes this way might be too slow if implemented in Java, especially if you want real time. The camera provides methods to show preview images constantly in a SurfaceView, but as far as I know you cannot grab a bitmap from there either.
I should have a look at the source code of the xzing barcode scanner project. They do realtime detection of various bar codes, so you should be able to extract the image taking part from there.

Tesseract getutf8text performance

I have been working with an app that uses the Tesseract API in order to support OCR. This is done by using a Surfaceview which shows the camera output (Camera2 API) and a ImageReader instance which is used to get the images from the camera. The camera is setup to be of the type setRepeatingRequest so new images are available very frequent. When I make a call to the getutf8text() method to get the readable text in images it makes the preview of the camera which is showed on the Surfaceview lag.
Are there any settings in the Tesseract API which can be set so it speeds up the getutf8text() method call or anything else I can do in order to get the preview Surfaceview to not lag?
Any help or guidance is appreciated!

Most of the things that you can do to speed up performance occur separately from the Tesseract API itself:
Run the OCR on a separate, non-UI thread
Grab a new image to start OCR on after OCR has finished on the last image. Try capture instead of setRepeatingRequest.
Downsample the image before OCR, so that it's smaller
Experiment with different Tesseract page segmentation modes to see what's the fastest on your data
Re-train the Tesseract trained data file to use fewer characters and a smaller dictionary, depending on what your app is used for
Modify Tesseract to perform only recognition pass #1
Don't forget to consider OpenCV or other approaches altogether
You didn't say what Tesseract API settings you're using now, and you didn't describe what your app does in a general sense, so it's hard to tell you where to start, but these points should get you started.

There are a few other things which you can try.
Init tesseract with OEM_TESSERACT_ONLY
Instead of using full-blown training data, use a faster alternative from https://github.com/tesseract-ocr/tessdata_fast.
Move the recognition to the computation thread.

Extract pointclouds WITH colour using the Project Tango; i.e. getting the current camera frame

I am trying to produce a point cloud where each point has a colour. I can get just the point cloud or I can get the camera to take a picture, but I need them to be as simultaneous as possible. If I could look up an RGB image with a timestamp or call a function to get the current frame when onXYZijAvailable() is called I would be done. I could just go over the points, find out where it would intersect with the image plane and get the colour of that pixel.
As it is now I have not found any way to get the pixel info of an image or get coloured points. I have seen AR apps where the camera is connected to the CameraView and then things are rendered on top, but the camera stream is never touched by the application.
According to this post it should be possible to get the data I want and synchronize the point cloud and the image plane by a simple transformation. This post is also saying something similar. However, I have no idea how to get the RGB data. I cant find any open source projects or tutorials.
The closest I have gotten is finding out when a frame is ready by using this:
public void onFrameAvailable(final int cameraId) {
if (cameraId == TangoCameraIntrinsics.TANGO_CAMERA_COLOR) {
//Get the new rgb frame somehow.
}
}
I am working with the Java API and I would very much like to not delve into JNI and the NDK if at all possible. How can I get the frame that most closely matches the timestamp of my current point cloud?
Thank you for your help.
Update:
I implemented a CPU version of it and even after optimising it a bit I only managed to get .5 FPS on a small point cloud. This is also due to the fact that the colours have to be converted from the android native NV21 colour space to the GPU native RGBA colour space. I could have optimized it further, but I am not going to get a real time effect with this. The CPU on the android device simply can not perform well enough. If you want to do this on more than a few thousand points, go for the extra hassle of using the GPU or do it in post.

Tango normally delivers color pixel data directly to an OpenGLES texture. In Java, you create the destination texture and register it with Tango.connectTextureId(), then in the onFrameAvailable() callback you update the texture with Tango.updateTexture(). Once you have the color image in a texture, you can access it using OpenGLES drawing calls and shaders.
If your goal is to color a Tango point cloud, the most efficient way to do this is in the GPU. That is, instead of pulling the color image out of the GPU and accessing it in Java, you instead pass the point data into the GPU and use OpenGLES shaders to transform the 3D points into 2D texture coordinates and look up the colors from the texture. This is rather tricky to get right if you're doing it for the first time but may be required for acceptable performance.
If you really want direct access to pixel data without using the C API,
you need to render the texture into a buffer and then read the color data from the buffer. It's kind of tricky if you aren't used to OpenGL and writing shaders, but there is an Android Studio app that demonstrates that here, and is further described in this answer. This project demonstrates both how to draw the camera texture to the screen, and how to draw to an offscreen buffer and read RGBA pixels.
If you really want direct access to pixel data but decide that the NDK might be less painful than OpenGLES, the C API has TangoService_connectOnFrameAvailable() which gives you pixel data directly, i.e. without going through OpenGLES. Note, however, that the format of the pixel data is NV21, not RGB or RGBA.

I am doing this now by capturing depth with onXYZijAvailable() and images with onFrameAvailable(). I am using native code, but the same should work in Java. For every onFrameAvailable() I get the image data and put it in a preallocated ring buffer. I have 10 slots and a counter/pointer. Each new image increments the counter, which loops back from 9 to 0. The counter is an index into an array of images. I save the image timestamp in a similar ring buffer. When I get a depth image, onXYZijAvailable(), I grab the data and the timestamp. Then I go back through the images, starting with the most recent and moving backwards, until I find the one with the closest timestamp to the depth data. As you mentioned, you know that the image data will not be from the same frame as the depth data because they use the same camera. But, using these two calls (in JNI) I get within +/- 33msec, i.e. the previous or next frame, on a consistent basis.
I have not checked how close it would be to just naively use the most recently updated rgb image frame, but that should be pretty close.
Just make sure to use the onXYZijAvailable() to drive the timing, because depth updates more slowly than rgb.
I have found that writing individual images to the file system using OpenCV::imwrite() does not keep up with the real time of the camera. I have not tried streaming to a file using the video codec. That should be much faster. Depending on what you plan to do with the data in the end you will need to be careful how you store your results.

Real-time image processing with Camera2

I tried searching in a ton of places about doing this, with no results. I did read that the only (as far as I know) way to obtain image frames was to use a ImageReader, which gives me a Image to work with. However, a lot of work must be done before I have a nice enough image (converting Image to byte array, then converting between formats - YUV_420_888 to ARGB_8888 - using RenderScript, then turning it into a Bitmap and rotating it manually - or running the application on landscape mode). By this point a lot of processing is made, and I haven't even started the actual processing yet (I plan on running some native code on it). Additionally, I tried to lower the resolution, with no success, and there is a significant delay when drawing on the surface.
Is there a better approach to this? Any help would be greatly appreciated.

Im not sure what exactly you are doing with the images, but a lot of times only a grayscale image is actually needed (again depending on your exact goal) If your camera outputs YUV, the grayscale information is in the Y channel. The nice thing,is you don't need to convert to numerous colorspaces and working with only one layer (as opposed to three) decreases the size of your data set greatly.
If you need color images then this wouldn't help

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.