Android MediaCodec/NdkMediaCodec GLES2 interop

Android MediaCodec/NdkMediaCodec GLES2 interop - android

we are trying to decode AVC/h264 bitstreams using the new NdkMediaCodec API. While decoding works fine now, we are struggling to the the contents of the decoded
video frame mapped to GLES2 for rendering.
The API allows passing a ANativeWindow at configuration time, but we want to control scheduling of the video rendering and ultimately just provide N textures which are filled
with the decoded frame data.
All attempts to map the memory returned by getOutputBuffer() to GLES vie eglCreateImageKHR/external image failed. The NdkMediaCodec seems to use libstagefright/OMX internally.
So the output buffers are very likely allocated using gralloc - arent they? Is there a way to get the gralloc handle/GraphicsBuffer to bind the frame to EGL/GLES2?
Since there are lots of pixel formats for the media frame without any further documentation on their memory layout, it's hard to use NdkMediaCodec robustly.
Thanks alot for any hints!

For general MediaCodec in java, create a SurfaceTexture for the GL ES texture you want to have the data in, then create a Surface out of this SurfaceTexture, and use this as target for the MediaCodec decoder. See http://bigflake.com/mediacodec/ (e.g. EncodeDecodeTest) for an example on doing this.
The SurfaceTexture and Surface classes aren't available directly in the NDK right now (as far as I know), though, so you'll need to call these via JNI. Then you can create an ANativeWindow from the Surface using ANativeWindow_fromSurface.
You're right that the output buffers are gralloc buffers, but since there's public APIs for doing this it's safer to rely on those than trying to take shortcuts.

Related

Android MediaCodec output format: GLES External Texture (YUV / NV12) to GLES Texture (RGB)

I am currently trying to develop a video player on Android, but am struggling with color formats.
Context: I extract and decode a video through the standard combinaison of MediaExtractor/MediaCodec. Because I need the extracted frames to be available as OpenGLES textures (RGB), I setup my decoder (MediaCodec) so that it feeds an external GLES texture (GL_TEXTURE_EXTERNAL_OES) through a SurfaceTexture. I know the data output by my HW decoder is in the NV12 (YUV420SemiPlanar) format, and I need to convert it to RGB by rendering it (with a fragment shader doing the conversion).
MediaCodec ---> GLES External Texture (NV12) [1] ---> Rendering ---> GLES Texture (RGB)
The point where I struggle is: How do I access the specific Y, U, and V values contained in the GLES External Texture ([1]). I have no idea how the GLES texture memory is set, nor how to access it (except for the "texture()" and "texelFetch()" GLSL functions).
Is there a way to access the data as I would access a simple array (pointer + offset)?
Did I overthink the whole thing?
Do either Surface or SurfaceTexture take care of conversions? (I don't think so)
Do either Surface or SurfaceTexture change the memory layout of the data while populating the GLES External Texture ([1]) so components can be accessed through GLES texture access functions?

Yes, I would say you're overthinking it. Did you test things and run into an actual issue that you could describe, or is this only theoretical so far?
Even though the raw decoder itself outputs NV12, this detail is hidden when you access it via a SufaceTexture - then you get to access it as any RGB texture. Since the physical memory layout of the texture is hidden, you don't really know if it actually is converted all at once before you get it, or if the texture accessors do a on-the-fly conversion each time you sample it. As far as I know, the implementation is free to do it in any of these ways, and the implementation details about how it is done are not observable through the public APIs at all.

Understanding Android camera SurfaceTexture and MediaCodec Surface usage

I'm trying to understand graphics memory usage/flow in Android and specifically with respect to encoding frames from the camera using MediaCodec. In order to do that I'm having to understand a bunch of graphics, OpenGL, and Android terminology/concepts that are unclear to me. I've read the Android graphics architecture material, a bunch of SO questions, and a bunch of source but I'm still confused primarily because it seems that terms have different meanings in different contexts.
I've looked at CameraToMpegTest from fadden's site here. My specific question is how MediaCodec::createInputSurface() works in conjunction with Camera::setPreviewTexture(). It seems that an OpenGL texture is created and then this is used to create an Android SurfaceTexture which can then be passed to setPreviewTexture(). My specific questions:
What does calling setPreviewTexture() actually do in terms of what memory buffer the frames go to from the camera?
From my understanding an OpenGL texture is a chunk of memory that is accessible by the GPU. On Android this has to be allocated using gralloc with the correct usage flags. The Android description of SurfaceTexture mentions that it allows you to "stream images to a given OpenGL texture": https://developer.android.com/reference/android/graphics/SurfaceTexture.html#SurfaceTexture(int). What does a SurfaceTexture do on top of an OpenGL texture?
MediaCodec::createInputSurface() returns an Android Surface. As I understand it an Android Surface represents the producer side of a buffer queue so it may be multiple buffers. The API reference mentions that "the Surface must be rendered with a hardware-accelerated API, such as OpenGL ES". How do the frames captured by the camera get from the SurfaceTexture to this Surface that is input to the encoder? I see CameraToMpegTest creates an EGLSurface using this Surface somehow but not knowing much about EGL I don't get this part.
Can someone clarify the usage of "render"? I see things such as "render to a surface", "render to the screen" among other usages that seem to maybe mean different things.
Edit: Follow-up to mstorsjo's responses:
I dug into the code for SurfaceTexture and CameraClient::setPreviewTarget() in CameraService some more to try and understand the inner workings of Camera::setPreviewTexture() better and have some more questions. To my original question of understanding the memory allocation it seems like SurfaceTexture creates a BufferQueue and CameraService passes the associated IGraphicBufferProducer to the platform camera HAL implementation. The camera HAL can then set the gralloc usage flags appropriately (e.g. GRALLOC_USAGE_SW_READ_RARELY | GRALLOC_USAGE_SW_WRITE_NEVER | GRALLOC_USAGE_HW_TEXTURE) and also dequeue buffers from this BufferQueue. So the buffers that the camera captures frames into are gralloc allocated buffers with some special usage flags like GRALLOC_USAGE_HW_TEXTURE. I work on ARM platforms with unified memory architectures so the GPU and CPU can access the same memory so what kind of impact would the GRALLOC_USAGE_HW_TEXTURE flag have on how the buffer is allocated?
The OpenGL (ES) part of SurfaceTexture seems to mainly be implemented as part of GLConsumer and the magic seems to be in updateTexImage(). Are there additional buffers being allocated for the OpenGL (ES) texture or is the same gralloc buffer that was filled by the camera able to be used? Is there some memory copying that has to happen here to get the camera pixel data from the gralloc buffer into the OpenGL (ES) texture? I guess I don't understand what calling updateTexImage() does.

It means that the camera provides the output frames via an opaque handle instead of in a user-provided buffer within the application's address space (if using setPreviewCallback or setPreviewCallbackWithBuffer). This opaque handle, the texture, can be used within OpenGL drawing.
Almost. In this case, the OpenGL texture is not a physical chunk of memory, but a handle to a variable chunk of memory within an EGL context. In this case, the sample code itself doesn't actually allocate or size the texture, it only creates a "name"/handle for a texture using glGenTextures - it's basically just an integer. Within normal OpenGL (ES), you'd use OpenGL functions to allocate the actual storage for the texture and fill it with content. In this setup, SurfaceTexture provides an Android level API/abstraction to populate the texture with data (i.e. allocate storage for it with the right flags, provide it with a size and content) - allowing you to pass the SurfaceTexture to other classes that can fill it with data (either Camera that takes a SurfaceTexture directly, or wrap in the Surface class to be able to use it in other contexts). This allows filling the OpenGL texture with content efficiently, without having to pass a buffer of raw data to your application's process and having your app upload it to OpenGL.
(Answering points 3 and 4 in reverse order.) OpenGL (ES) is a generic API for drawing. In the normal/original setup, consider a game, you'd have a number of textures for different parts of the game content (backgrounds, props, actors, etc), and then with OpenGL APIs draw this to the screen. The textures could either be more or less just copied as such to the screen, or be wrapped around a 3D object built out of triangles. This is the process called "rendering", taking the input textures and set of triangles and drawing it. In the simplest cases, you would render content straight to the screen. The GPU usually can do the same rendering into any other output buffer as well. In games, it is common to render some scene into a texture, and use that prerendered texture as part of the final render which actually ends up displayed on the screen.
An EGL context is created for passing the output from the camera into the encoder input. An EGL context is basically a context for doing OpenGL rendering. The target for the rendering is the Surface from the encoder. That is, whatever graphics is drawn using OpenGL ends up in the encoder input buffer instead of on the screen. Now the scene that is drawn using OpenGL could be any sequence of OpenGL function calls, rendering a game scene into the encoder. (This is what the Android Breakout game recorder example does.) Within the context, an texture handle is created. Instead of filling the texture with content by loading a picure from disk, as in normal game graphics rendering, this is made into a SurfaceTexture, to allow Camera to fill it with the camera picture. The SurfaceTexture class provides a callback, giving a signal when the Camera has updated the content. When this callback is received, the EGL context is activated and one frame is rendered into the EGL context output target (which is the encoder input). The rendering itself doesn't do anything fancy, but more or else copies the input texture as-is straight into the output.
This might all sound quite roundabout, but it does give a few benefits:
The actual raw bits of the camera frames never need to be handled directly within the application code (and potentially never within the application's process and address space at all). For low resolutions, this isn't much of an issue, but the setPreviewCallback API is a bottleneck when it comes to higher resolutions.
You can do color adjustments and anything else you can do within OpenGL, almost for free with GPU acceleration.

MediaCodec decoding to surface natively

I am using MediaCodec's decoder to output data to a surface. Using the .configure function, I passed a surface created through surfaceComposerClient. The problem is that the codec fails to start. I presume this is an issue with the way my surface is setup (when I set surface to NULL the codec starts)
Looking at MediaCodec decoder java examples it seems like I need to create an EGL backed SurfaceTexture. Is it possible to natively create a surface texture using C++/NDK? Are there any examples out there of this?

I assume this is not a "normal" app, since you're interacting with SurfaceFlinger directly.
You can find examples in some internal OpenGL tests -- the code was fixed up for the 5.0 Lollipop release. Take a look at the San Angeles demo, which uses the WindowSurface class to get a surface from SurfaceComposerClient.
You don't need a SurfaceTexture, or do anything with EGL, to decode a video to a surface. Surfaces have a producer-consumer structure, and EGL and MediaCodec are two different examples of producers. (SurfaceFlinger is the consumer.)
It's never easy to know why MediaCodec is failing. You can try drawing on the surface with GLES to see if it's valid, but my guess is that your problem is elsewhere.
For a SurfaceTexture, the app is both the producer and the consumer; it provides a way to decode the video to a surface that you can then manipulate as a GLES texture. This adds unnecessary overhead if you just want the video to play on screen.

refer to SimplePlayer.h&.cpp in Android-4.4 source code. it's used to decode an media file and output decoded video into surface. i think it is similar with your scenario.

Android MediaCodec: decode, process each frame, then encode

The example DecodeEditEncodeTest.java in bigflake.com, demonstrates an example of simple editing (swap the color channels using OpenGL FRAGMENT_SHADER).
Here, I want to do some complicated image processing (such as adding something in it) of each frame.
In this way, does it mean I cannot use surface. Instead, I need to use buffer?
But from EncodeDecodeTest.java, it says:
(1) Buffer-to-buffer. Buffers are software-generated YUV frames in ByteBuffer objects, and decoded to the same. This is the slowest (and least portable) approach, but it allows the application to examine and modify the YUV data.
(2) Buffer-to-surface. Encoding is again done from software-generated YUV data in ByteBuffers, but this time decoding is done to a Surface. Output is checked with OpenGL ES, using glReadPixels().
(3) Surface-to-surface. Frames are generated with OpenGL ES onto an input Surface, and decoded onto a Surface. This is the fastest approach, but may involve conversions between YUV and RGB.
If I use Buffer-to-buffer, from what the above says, it is the slowest and least portable. How slow would it be?
Or I use surface-to-surface, and read pixels out from the surface.
Which way is more feasible?
Any example available?

How to save SurfaceTexture as bitmap

When I decode a video to a surface I want to save the frames i want as bitmap/jpeg files. I don't want to draw on the screen and just want to save the content of the SurfaceTexture as an image file.

You have to render the texture.
If it were a normal texture, and you were using GLES 2 or later, you could attach it to an FBO and read directly from that. A SurfaceTexture is backed by an "external texture", and might be in a format that the GL driver doesn't support a full set of operations on, so you can't do that. You need to render it, and read the result.
FWIW, the way you go about saving the frame can have a significant performance impact. A full example demonstrating the use of MediaExtractor, MediaCodec, glReadPixels(), and PNG file creation is now up on bigflake (ExtractMpegFramesTest).

I've been looking at this lately, on the Android platform. Summing up the various options and why they are/aren't applicable.
glReadPixels()
The only option Android Java coders currently really have. Said to be slow. Reads from a framebuffer, not a texture (so one must render the texture to an internal frame buffer first, unless one wants to record the screen itself). Okay. Got things to work.
EGL_KHR_image_base()
An extension that seems to be available on the native (NJK) level, but not in Java.
glGetTexImage()
Looked promising but not available in OpenGL 2.0 ES variant.
Pixel Buffer Objects
Probably the 'right' way to do things, but requires OpenGL 3.0 ES (i.e. selected Android 4.3+ devices).
I'm not saying this is adding any info that wouldn't be available elsewhere. But having so many seemingly similar options (that still wouldn't work) was confusing. I'm not an OpenGL expert so any mistakes above are gladly corrected.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.