I'm designing an application that has a OpenGL processing pipeline (collection of shaders) and simultaneously requires the end user to see the unprocessed camera preview.
For the sake of example, suppose you want to show the user the camera preview and at the same time count the number of red objects in the scenes you receive from the camera, but any shaders you utilize to count the objects such as hue filtering, etc. should not be seen by the user.
How would I go about setting this up properly?
I know I can setup a camera preview and then on the callback receive camera frame data in YUV format, then dump that into an OpenGL texture and process the frame that way, however, that has performance problems associated with it. I have to roundtrip the data from the camera hardware to the VM, then pass it back to the GPU memory. I'm using SurfaceTexture to get the data from the camera directly in OpenGL understandable format and pass that to my shaders to solve this issue.
I thought I'd be able to show that same unprocessed SurfaceTexture to the end user, but TextureView does not have a constructor or a setter where I can pass it the SurfaceTexture I want it to render. It always creates its own.
This is an overview of my current setup:
GLRenderThread: this class extends from Thread, setups the OpenGL context, display, etc. and uses a SurfaceTexture as the surface (3rd parameter of eglCreateWindowSurface).
GLFilterChain: A collection of shaders that perform detection on the input texture.
Camera: Uses a separate SurfaceTexture which is used as the input of GLFilterChain and grabs the camera's preview
Finally a TextureView that displays the GLRenderThread's SurfaceTexture
Obviously, with this setup, I'm showing the processed frames to the user which is not what I want. Further, the processing of the frames is not real-time. Basically, I run the input from Camera through the chain once and once all filters are done, I call updateTexImage to grab the next frame from the Camera. My processing is around 10 frames per second on Nexus 4.
I feel that I probably need to use 2 GL contexts, one for real-time preview and one for processing, but I'm not certain. I'm hoping someone can push me in the right direction.
can you please upload some of the code you are using?
you might be able to call glDrawArrays on a texture created for and bound to the surface view you are using to display the preview initially, and then flush it and bind a separate texture with your other texture to do the analysis with? something like
GLES20.glUseProgram(simpleProgram);
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textures[0]);
GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4);
GLES20.glUseProgram(crazyProgram);
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textures[1]);
GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4);
where your camera's preview surfacetexture is bound to textures[0] and then a separate surfacetexture created for texture[1]
maybe?
Unless your processing runs slower than real time, then the answer is a simple one: just keep the original camera texture untouched, calculate the processed image to a different texture and display both to the user, side by side in a single GLView. Keep a single thread, as all the processing happens on the GPU anyway. Multiple threads only complicate matters here.
The number of processing steps does not really matter, as there can be arbitrary number of intermediate textures (also see ping-ponging) that are never displayed to the user - no one and nothing is forcing you to.
The notion of real time is probably confusing here. Just think of a frame as an undivisible time snapshot. By doing so, you will ignore the delay that it takes for the image to go from the camera to the screen, but if you can keep it at interactive frame rates (such as at least 20 frames per second), then this can mostly be ignored.
On the other hand, if your processing is much slower, you need to make a choice between introducing a delay in the camera feed and process only every Nth frame, or alternately display every camera frame in real time and let the next processed frame lag behind. To do that, you would probably need two separate rendering contexts to enable asynchronous processing, which might be potentially hard to do on Android (or maybe just as simple as creating a second GLView, since you can live without data sharing between the contexts).
Related
I am writing some code that adds a watermark to an already existing video using OpenGL.
I took most of the code from ContinuousCaptureActivity in Grafika - https://github.com/google/grafika/blob/master/app/src/main/java/com/android/grafika/ContinuousCaptureActivity.java
Where instead of using camera to render on a SurfaceTexture, I use the MoviePlayer class, also present in grafika. Also, instead of rendering random boxes, I render the watermark.
I run MoviePlayer at its full speed, i.e., reading from source and rendering on to the surface as soon as a frame is decoded. This does the job for a 30s video in 2s or less.
Now the issue comes with onFrameAvailable callback. It is called only once after every 4 or 5 frames are rendered by the MoviePlayer class. This makes me lose frames in the output video. Now if I make the MoviePlayer thread go to sleep until the corresponding onFrameAvailable is called, everything is fine and no frames are missed. However, now processing my 30s video takes around 5s.
My question is how do I make SurfaceTexture faster? Or is there some completely different approach that I have to look into?
Note that I do not need to render anything on the screen.
I'm trying to understand graphics memory usage/flow in Android and specifically with respect to encoding frames from the camera using MediaCodec. In order to do that I'm having to understand a bunch of graphics, OpenGL, and Android terminology/concepts that are unclear to me. I've read the Android graphics architecture material, a bunch of SO questions, and a bunch of source but I'm still confused primarily because it seems that terms have different meanings in different contexts.
I've looked at CameraToMpegTest from fadden's site here. My specific question is how MediaCodec::createInputSurface() works in conjunction with Camera::setPreviewTexture(). It seems that an OpenGL texture is created and then this is used to create an Android SurfaceTexture which can then be passed to setPreviewTexture(). My specific questions:
What does calling setPreviewTexture() actually do in terms of what memory buffer the frames go to from the camera?
From my understanding an OpenGL texture is a chunk of memory that is accessible by the GPU. On Android this has to be allocated using gralloc with the correct usage flags. The Android description of SurfaceTexture mentions that it allows you to "stream images to a given OpenGL texture": https://developer.android.com/reference/android/graphics/SurfaceTexture.html#SurfaceTexture(int). What does a SurfaceTexture do on top of an OpenGL texture?
MediaCodec::createInputSurface() returns an Android Surface. As I understand it an Android Surface represents the producer side of a buffer queue so it may be multiple buffers. The API reference mentions that "the Surface must be rendered with a hardware-accelerated API, such as OpenGL ES". How do the frames captured by the camera get from the SurfaceTexture to this Surface that is input to the encoder? I see CameraToMpegTest creates an EGLSurface using this Surface somehow but not knowing much about EGL I don't get this part.
Can someone clarify the usage of "render"? I see things such as "render to a surface", "render to the screen" among other usages that seem to maybe mean different things.
Edit: Follow-up to mstorsjo's responses:
I dug into the code for SurfaceTexture and CameraClient::setPreviewTarget() in CameraService some more to try and understand the inner workings of Camera::setPreviewTexture() better and have some more questions. To my original question of understanding the memory allocation it seems like SurfaceTexture creates a BufferQueue and CameraService passes the associated IGraphicBufferProducer to the platform camera HAL implementation. The camera HAL can then set the gralloc usage flags appropriately (e.g. GRALLOC_USAGE_SW_READ_RARELY | GRALLOC_USAGE_SW_WRITE_NEVER | GRALLOC_USAGE_HW_TEXTURE) and also dequeue buffers from this BufferQueue. So the buffers that the camera captures frames into are gralloc allocated buffers with some special usage flags like GRALLOC_USAGE_HW_TEXTURE. I work on ARM platforms with unified memory architectures so the GPU and CPU can access the same memory so what kind of impact would the GRALLOC_USAGE_HW_TEXTURE flag have on how the buffer is allocated?
The OpenGL (ES) part of SurfaceTexture seems to mainly be implemented as part of GLConsumer and the magic seems to be in updateTexImage(). Are there additional buffers being allocated for the OpenGL (ES) texture or is the same gralloc buffer that was filled by the camera able to be used? Is there some memory copying that has to happen here to get the camera pixel data from the gralloc buffer into the OpenGL (ES) texture? I guess I don't understand what calling updateTexImage() does.
It means that the camera provides the output frames via an opaque handle instead of in a user-provided buffer within the application's address space (if using setPreviewCallback or setPreviewCallbackWithBuffer). This opaque handle, the texture, can be used within OpenGL drawing.
Almost. In this case, the OpenGL texture is not a physical chunk of memory, but a handle to a variable chunk of memory within an EGL context. In this case, the sample code itself doesn't actually allocate or size the texture, it only creates a "name"/handle for a texture using glGenTextures - it's basically just an integer. Within normal OpenGL (ES), you'd use OpenGL functions to allocate the actual storage for the texture and fill it with content. In this setup, SurfaceTexture provides an Android level API/abstraction to populate the texture with data (i.e. allocate storage for it with the right flags, provide it with a size and content) - allowing you to pass the SurfaceTexture to other classes that can fill it with data (either Camera that takes a SurfaceTexture directly, or wrap in the Surface class to be able to use it in other contexts). This allows filling the OpenGL texture with content efficiently, without having to pass a buffer of raw data to your application's process and having your app upload it to OpenGL.
(Answering points 3 and 4 in reverse order.) OpenGL (ES) is a generic API for drawing. In the normal/original setup, consider a game, you'd have a number of textures for different parts of the game content (backgrounds, props, actors, etc), and then with OpenGL APIs draw this to the screen. The textures could either be more or less just copied as such to the screen, or be wrapped around a 3D object built out of triangles. This is the process called "rendering", taking the input textures and set of triangles and drawing it. In the simplest cases, you would render content straight to the screen. The GPU usually can do the same rendering into any other output buffer as well. In games, it is common to render some scene into a texture, and use that prerendered texture as part of the final render which actually ends up displayed on the screen.
An EGL context is created for passing the output from the camera into the encoder input. An EGL context is basically a context for doing OpenGL rendering. The target for the rendering is the Surface from the encoder. That is, whatever graphics is drawn using OpenGL ends up in the encoder input buffer instead of on the screen. Now the scene that is drawn using OpenGL could be any sequence of OpenGL function calls, rendering a game scene into the encoder. (This is what the Android Breakout game recorder example does.) Within the context, an texture handle is created. Instead of filling the texture with content by loading a picure from disk, as in normal game graphics rendering, this is made into a SurfaceTexture, to allow Camera to fill it with the camera picture. The SurfaceTexture class provides a callback, giving a signal when the Camera has updated the content. When this callback is received, the EGL context is activated and one frame is rendered into the EGL context output target (which is the encoder input). The rendering itself doesn't do anything fancy, but more or else copies the input texture as-is straight into the output.
This might all sound quite roundabout, but it does give a few benefits:
The actual raw bits of the camera frames never need to be handled directly within the application code (and potentially never within the application's process and address space at all). For low resolutions, this isn't much of an issue, but the setPreviewCallback API is a bottleneck when it comes to higher resolutions.
You can do color adjustments and anything else you can do within OpenGL, almost for free with GPU acceleration.
I want to extend an app from Camera1 to Camera2 depending on the API. One core mechanism of the app consists in taking preview pictures at a rate of about 20 pics per second. With Camera1 I realized that by creating a SurfaceView, adding a Callback on its holder and after creation of the surface accessing the preview pics via periodic setOneShotPreviewCallbacks. That was pretty easy and reliable.
Now, when studying Camera2, I came "from the end" and managed to convert YUV420_888 to Bitmap (see YUV420_888 to Bitmap Conversion ). However I am struggling now with the "capture technique". From the Google example I see that you need to make a "setRepeating" CaptureRequest with CameraDevice.TEMPLATE_PREVIEW for displaying the preview e.g. on a surface view. That is fine. However, in order to take an actual picture I need to make another capture request with (this time) builder.addTarget(imageReader.getSurface()). I.e. data will be available within the onImageAvailable method of the imageReader.
The problem: the creation of the captureRequest is a rather heavy operation taking about 200ms on my device. Therefore, the usage of a capture request (whether with Template STILL_CAPTUR nor PREVIEW) can impossibly be a feasible approach for capturing 20 images per second, as I need it. The proposals I found here on SO are primarily based on the (educationally moderately efficient) Google example, which I don't really understand...
I feel the solution must be to feed the ImageReader with a contiuous stream of preview pics, which can be picked from there in a given frequency. Can someone please give some guidance on how to implement this? Many thanks.
If you want to send a buffer to both the preview SurfaceView and to your YUV ImageReader for every frame, simply add both Surfaces to the repeating preview request as targets.
Generally, a capture request can target any subset (or all) of the
session's configured output targets.
Also, if you do want to only capture an occasional frame to your YUV ImageReader with .capture(), you don't have to recreate the capture request builder each time; just call .build() again on the same builder, or just reuse the actual constructed CaptureRequest if you're not changing any settings.
Even with this occasional capture, you probably want to include the preview Surface as a target in the YUV capture request, so that there's no skipped frame in the displayed preview.
I am trying to produce a point cloud where each point has a colour. I can get just the point cloud or I can get the camera to take a picture, but I need them to be as simultaneous as possible. If I could look up an RGB image with a timestamp or call a function to get the current frame when onXYZijAvailable() is called I would be done. I could just go over the points, find out where it would intersect with the image plane and get the colour of that pixel.
As it is now I have not found any way to get the pixel info of an image or get coloured points. I have seen AR apps where the camera is connected to the CameraView and then things are rendered on top, but the camera stream is never touched by the application.
According to this post it should be possible to get the data I want and synchronize the point cloud and the image plane by a simple transformation. This post is also saying something similar. However, I have no idea how to get the RGB data. I cant find any open source projects or tutorials.
The closest I have gotten is finding out when a frame is ready by using this:
public void onFrameAvailable(final int cameraId) {
if (cameraId == TangoCameraIntrinsics.TANGO_CAMERA_COLOR) {
//Get the new rgb frame somehow.
}
}
I am working with the Java API and I would very much like to not delve into JNI and the NDK if at all possible. How can I get the frame that most closely matches the timestamp of my current point cloud?
Thank you for your help.
Update:
I implemented a CPU version of it and even after optimising it a bit I only managed to get .5 FPS on a small point cloud. This is also due to the fact that the colours have to be converted from the android native NV21 colour space to the GPU native RGBA colour space. I could have optimized it further, but I am not going to get a real time effect with this. The CPU on the android device simply can not perform well enough. If you want to do this on more than a few thousand points, go for the extra hassle of using the GPU or do it in post.
Tango normally delivers color pixel data directly to an OpenGLES texture. In Java, you create the destination texture and register it with Tango.connectTextureId(), then in the onFrameAvailable() callback you update the texture with Tango.updateTexture(). Once you have the color image in a texture, you can access it using OpenGLES drawing calls and shaders.
If your goal is to color a Tango point cloud, the most efficient way to do this is in the GPU. That is, instead of pulling the color image out of the GPU and accessing it in Java, you instead pass the point data into the GPU and use OpenGLES shaders to transform the 3D points into 2D texture coordinates and look up the colors from the texture. This is rather tricky to get right if you're doing it for the first time but may be required for acceptable performance.
If you really want direct access to pixel data without using the C API,
you need to render the texture into a buffer and then read the color data from the buffer. It's kind of tricky if you aren't used to OpenGL and writing shaders, but there is an Android Studio app that demonstrates that here, and is further described in this answer. This project demonstrates both how to draw the camera texture to the screen, and how to draw to an offscreen buffer and read RGBA pixels.
If you really want direct access to pixel data but decide that the NDK might be less painful than OpenGLES, the C API has TangoService_connectOnFrameAvailable() which gives you pixel data directly, i.e. without going through OpenGLES. Note, however, that the format of the pixel data is NV21, not RGB or RGBA.
I am doing this now by capturing depth with onXYZijAvailable() and images with onFrameAvailable(). I am using native code, but the same should work in Java. For every onFrameAvailable() I get the image data and put it in a preallocated ring buffer. I have 10 slots and a counter/pointer. Each new image increments the counter, which loops back from 9 to 0. The counter is an index into an array of images. I save the image timestamp in a similar ring buffer. When I get a depth image, onXYZijAvailable(), I grab the data and the timestamp. Then I go back through the images, starting with the most recent and moving backwards, until I find the one with the closest timestamp to the depth data. As you mentioned, you know that the image data will not be from the same frame as the depth data because they use the same camera. But, using these two calls (in JNI) I get within +/- 33msec, i.e. the previous or next frame, on a consistent basis.
I have not checked how close it would be to just naively use the most recently updated rgb image frame, but that should be pretty close.
Just make sure to use the onXYZijAvailable() to drive the timing, because depth updates more slowly than rgb.
I have found that writing individual images to the file system using OpenCV::imwrite() does not keep up with the real time of the camera. I have not tried streaming to a file using the video codec. That should be much faster. Depending on what you plan to do with the data in the end you will need to be careful how you store your results.
My Problem:
I have a video (with lets say 25FPS) that has to be rendered with opengles 2.0 on the screen.
For reading the video I use a decoder that decodes that video into opengl es textures. With a renderpass I draw this texture on the screen.
What I have to do is get the image from the decoder upload it to the gpu, call the shaderprogram and render the image on the screen. If the video has 25FPS I have to update the screen in 40ms steps (1000ms/25FPS).
In each step I have todo the following:
get the image from the decoder
push it to the gpu memory
render the screen
swap buffers
So far it is working.
Now it happens, that the decoder takes longer than 40ms to decode a frame. That does not happen all the time but sometimes.
A solution would be building a cache. Meaning, I do render i.e. 5 images, before showing the first. This comes with a problem, it has to happen asynchron, so the cache can be build and the screen be rendered at the same time. If that happens you can see that on the video because it is not "fluid" anymore.
My Question:
Is there a solution for that?
Is it possible to create a ?-buffer, that can be copied(?!) on the backbuffer of the rendersurface, so that I can create a cache with that kind of buffers, and copy that onto the backbuffer without blocking the other thread which is creating this buffers?
OR
How to fill the backbuffer with another buffer?
I tried already:
Rendering Framebuffer(Textures) as cache. This works almost perfect, except that the texture has to be rendered as well. This means that (because it's asynchron) if a cacheframe is build and the image for the screen is build, you have to mutex(/synchronize) the rendermethods, otherwise the program crashes. But syncrhonizing takes the whole point of doing it asynchron. So this is not a good solution.
Remember that in OpenGL, if you do not clear and redraw the screen, the previous image will persist. If a new frame is not ready in time, simply do nothing.
It sounds like you have two threads: one decoding frames, and one rendering them. This is fine.
If render() is called and a new frame is not ready in time, your render method should return immediately. Do not clear or swap buffers. The screen will be preserved.
Now, the user /may/ notice occasional hiccups when a frame is repeated twice. 25 fps is an unnatural frame rate (OpenGL only supports 60/30/15/etc.), so it will not align perfectly to the screen refresh rate.
You could live with this (user likely won't notice). Or you could force playback to 30 fps by buffering frames.
A good idea is to place a message queue between your decoder and your renderer. It could be one or several frames deep. It could be an array, linked list, or ring buffer. This allows the decoder to upload into many cached textures while the rendering is drawing a different texture.
The decoder adds frames to the queue as they come in. The renderer runs at a fixed rate (30 fps). You could pause rendering until N frames have been buffered.