Android Camera2 video aspect ratio

Android Camera2 video aspect ratio - android

I don't believe the following as been explicitly answered:
Using the Camera2 API, is it possible to record outside a native device resolution (say, 1:1) without the use of post-processing such as ffmpeg?

You don't have a lot of detail to explain exactly what resolution you want to use, but sounds like you want to record in an aspect ratio/resolution that doesn't match one of the supported aspect ratios/resolutions from the device's camera.
You can always send the image data to the GPU, do whatever cropping/scaling you want there, and then send it on to a video encoder.
Generally, that'd involve:
Setting up an EGL context
Creating a render target in EGL, and getting an Android Surface for it
Creating a SurfaceTexture to send texture data to EGL
Connect EGL input SurfaceTexture to the camera (by first creating a Surface from it, if using camera2)
Connect EGL output Surface to a MediaRecorder
Write shader code to process the incoming frames from camera, crop/scale/modify as needed, then render them to the output surface
Set up camera and media recorder, then start them both
In the SurfaceTexture onFrameAvailable, trigger your rendering code.
This is quite a bit of scaffolding to build, but should be rather efficient when done.

Related

Understanding Android camera SurfaceTexture and MediaCodec Surface usage

I'm trying to understand graphics memory usage/flow in Android and specifically with respect to encoding frames from the camera using MediaCodec. In order to do that I'm having to understand a bunch of graphics, OpenGL, and Android terminology/concepts that are unclear to me. I've read the Android graphics architecture material, a bunch of SO questions, and a bunch of source but I'm still confused primarily because it seems that terms have different meanings in different contexts.
I've looked at CameraToMpegTest from fadden's site here. My specific question is how MediaCodec::createInputSurface() works in conjunction with Camera::setPreviewTexture(). It seems that an OpenGL texture is created and then this is used to create an Android SurfaceTexture which can then be passed to setPreviewTexture(). My specific questions:
What does calling setPreviewTexture() actually do in terms of what memory buffer the frames go to from the camera?
From my understanding an OpenGL texture is a chunk of memory that is accessible by the GPU. On Android this has to be allocated using gralloc with the correct usage flags. The Android description of SurfaceTexture mentions that it allows you to "stream images to a given OpenGL texture": https://developer.android.com/reference/android/graphics/SurfaceTexture.html#SurfaceTexture(int). What does a SurfaceTexture do on top of an OpenGL texture?
MediaCodec::createInputSurface() returns an Android Surface. As I understand it an Android Surface represents the producer side of a buffer queue so it may be multiple buffers. The API reference mentions that "the Surface must be rendered with a hardware-accelerated API, such as OpenGL ES". How do the frames captured by the camera get from the SurfaceTexture to this Surface that is input to the encoder? I see CameraToMpegTest creates an EGLSurface using this Surface somehow but not knowing much about EGL I don't get this part.
Can someone clarify the usage of "render"? I see things such as "render to a surface", "render to the screen" among other usages that seem to maybe mean different things.
Edit: Follow-up to mstorsjo's responses:
I dug into the code for SurfaceTexture and CameraClient::setPreviewTarget() in CameraService some more to try and understand the inner workings of Camera::setPreviewTexture() better and have some more questions. To my original question of understanding the memory allocation it seems like SurfaceTexture creates a BufferQueue and CameraService passes the associated IGraphicBufferProducer to the platform camera HAL implementation. The camera HAL can then set the gralloc usage flags appropriately (e.g. GRALLOC_USAGE_SW_READ_RARELY | GRALLOC_USAGE_SW_WRITE_NEVER | GRALLOC_USAGE_HW_TEXTURE) and also dequeue buffers from this BufferQueue. So the buffers that the camera captures frames into are gralloc allocated buffers with some special usage flags like GRALLOC_USAGE_HW_TEXTURE. I work on ARM platforms with unified memory architectures so the GPU and CPU can access the same memory so what kind of impact would the GRALLOC_USAGE_HW_TEXTURE flag have on how the buffer is allocated?
The OpenGL (ES) part of SurfaceTexture seems to mainly be implemented as part of GLConsumer and the magic seems to be in updateTexImage(). Are there additional buffers being allocated for the OpenGL (ES) texture or is the same gralloc buffer that was filled by the camera able to be used? Is there some memory copying that has to happen here to get the camera pixel data from the gralloc buffer into the OpenGL (ES) texture? I guess I don't understand what calling updateTexImage() does.

It means that the camera provides the output frames via an opaque handle instead of in a user-provided buffer within the application's address space (if using setPreviewCallback or setPreviewCallbackWithBuffer). This opaque handle, the texture, can be used within OpenGL drawing.
Almost. In this case, the OpenGL texture is not a physical chunk of memory, but a handle to a variable chunk of memory within an EGL context. In this case, the sample code itself doesn't actually allocate or size the texture, it only creates a "name"/handle for a texture using glGenTextures - it's basically just an integer. Within normal OpenGL (ES), you'd use OpenGL functions to allocate the actual storage for the texture and fill it with content. In this setup, SurfaceTexture provides an Android level API/abstraction to populate the texture with data (i.e. allocate storage for it with the right flags, provide it with a size and content) - allowing you to pass the SurfaceTexture to other classes that can fill it with data (either Camera that takes a SurfaceTexture directly, or wrap in the Surface class to be able to use it in other contexts). This allows filling the OpenGL texture with content efficiently, without having to pass a buffer of raw data to your application's process and having your app upload it to OpenGL.
(Answering points 3 and 4 in reverse order.) OpenGL (ES) is a generic API for drawing. In the normal/original setup, consider a game, you'd have a number of textures for different parts of the game content (backgrounds, props, actors, etc), and then with OpenGL APIs draw this to the screen. The textures could either be more or less just copied as such to the screen, or be wrapped around a 3D object built out of triangles. This is the process called "rendering", taking the input textures and set of triangles and drawing it. In the simplest cases, you would render content straight to the screen. The GPU usually can do the same rendering into any other output buffer as well. In games, it is common to render some scene into a texture, and use that prerendered texture as part of the final render which actually ends up displayed on the screen.
An EGL context is created for passing the output from the camera into the encoder input. An EGL context is basically a context for doing OpenGL rendering. The target for the rendering is the Surface from the encoder. That is, whatever graphics is drawn using OpenGL ends up in the encoder input buffer instead of on the screen. Now the scene that is drawn using OpenGL could be any sequence of OpenGL function calls, rendering a game scene into the encoder. (This is what the Android Breakout game recorder example does.) Within the context, an texture handle is created. Instead of filling the texture with content by loading a picure from disk, as in normal game graphics rendering, this is made into a SurfaceTexture, to allow Camera to fill it with the camera picture. The SurfaceTexture class provides a callback, giving a signal when the Camera has updated the content. When this callback is received, the EGL context is activated and one frame is rendered into the EGL context output target (which is the encoder input). The rendering itself doesn't do anything fancy, but more or else copies the input texture as-is straight into the output.
This might all sound quite roundabout, but it does give a few benefits:
The actual raw bits of the camera frames never need to be handled directly within the application code (and potentially never within the application's process and address space at all). For low resolutions, this isn't much of an issue, but the setPreviewCallback API is a bottleneck when it comes to higher resolutions.
You can do color adjustments and anything else you can do within OpenGL, almost for free with GPU acceleration.

How to understand the underneath of setDisplay/setSurface/setPrewviewDisplay/setPreviewTexture for Android

Since Android api level 1, We can attach a MediaPlayer or Camera to the Surface with setDisplay or setPrewviewDisplay, then image data can be transfered to gpu and processed much faster.
After SurfaceTexture is introduced, We can create our own texture with the target GL_TEXTURE_EXTERNAL_OES and attach the MediaPlayer or Camera to opengl es.
These are well known, but what I want to talk about is the underneath which is about Android graphics architecture.(Android Graphics architecture)
The data produced is on the CPU side, so it must be transferred to GPU in a very fast way.
Why does every Android device transfer the data so fast and how to make it underneath?
Or is this a hardware issue which has nothing to do with Android?

The data is not produced on the CPU side. The camera and hardware video codecs store their data in buffers allocated by the kernel gralloc mechanism (referenced from native code through the non-public GraphicBuffer). Surfaces communicate through BufferQueue objects, which pass the frames around by handle, without copying the data itself.
It's up to the OEM to ensure that the camera, video codecs, and GPU can use common formats. The YUV output by the video codec must be something that the GLES implementation can handle as an external texture. This is also why EGL_RECORDABLE_ANDROID is needed when sending GLES rendering to a MediaCodec... need to let the EGL implementation know that the frame it's rendering must be recognizable by the video codec.

Editing frames and encoding with MediaCodec

I was able to decode an mp4 video. If I configure the decoder using a Surface I can see the video on screen. Now, I want to edit the frame (adding a yellow line or even better overlapping a tiny image) and encode the video as a new video. It is not necessary to show the video and I don't care now about the performance.(If I show the frames while editing I could have a gap if the editing function takes a lot of time), So, What do you recommend to me, configure the decoder with a GlSurface anyway and use OpenGl (GLES), or configure it with null and somehow convert the Bytebuffer to a Bitmap, modify it, and encode the bitmap as a byte array? Also I saw in Grafika page that you cand use a Surface with a custom Rederer and use OpenGl (GLES). Thanks

You will have to use OpenGLES. ByteBuffer/Bitmap approach can not give realistic performance/features.
Now that you've been able to decode the Video (using MediaExtractor and Codec) to a Surface, you need to use the SurfaceTexture used to create the Surface as an External Texture and render using GLES to another Surface retrieved from MediaCodec configured as an encoder.
Though Grafika doesn't have an exactly similar complete project, you can start with your existing project and then try to use either of the following subprojects in grafika Continuous Camera or Show + capture camera, which currently renders Camera frames (fed to SurfaceTexture) to a Video (and display).
So essentially, the only change is the MediaCodec feeding frames to SurfaceTexture instead of the Camera.
Google CTS DecodeEditEncodeTest does exactly the same and can be used as a reference in order to make the learning curve smoother.
Using this approach, you can certainly do all sorts of things like manipulating the playback speed of video (fast forward and slow-down), adding all sorts of overlays on the scene, play with colors/pixels in the video using shaders etc.
Checkout filters in Show + capture camera for an illustration for the same.
Decode-edit-Encode flow
When using OpenGLES, 'editing' of the frame happens via rendering using GLES to the Encoder's input surface.
If decoding and rendering+encoding are separated out in different threads, you're bound to skip a few frames every frame, unless you implement some sort of synchronisation between the two threads to keep the decoder waiting until the render+encode for that frame has happened on the other thread.
Although modern hardware codecs support simultaneous video encoding and decoding, I'd suggest, do the decoding, rendering and encoding in the same thread, especially in your case, when the performance is not a major concern right now. That will help avoiding the problems of having to handle synchronisation on your own and/or frame jumps.

MediaCodec decoding to surface natively

I am using MediaCodec's decoder to output data to a surface. Using the .configure function, I passed a surface created through surfaceComposerClient. The problem is that the codec fails to start. I presume this is an issue with the way my surface is setup (when I set surface to NULL the codec starts)
Looking at MediaCodec decoder java examples it seems like I need to create an EGL backed SurfaceTexture. Is it possible to natively create a surface texture using C++/NDK? Are there any examples out there of this?

I assume this is not a "normal" app, since you're interacting with SurfaceFlinger directly.
You can find examples in some internal OpenGL tests -- the code was fixed up for the 5.0 Lollipop release. Take a look at the San Angeles demo, which uses the WindowSurface class to get a surface from SurfaceComposerClient.
You don't need a SurfaceTexture, or do anything with EGL, to decode a video to a surface. Surfaces have a producer-consumer structure, and EGL and MediaCodec are two different examples of producers. (SurfaceFlinger is the consumer.)
It's never easy to know why MediaCodec is failing. You can try drawing on the surface with GLES to see if it's valid, but my guess is that your problem is elsewhere.
For a SurfaceTexture, the app is both the producer and the consumer; it provides a way to decode the video to a surface that you can then manipulate as a GLES texture. This adds unnecessary overhead if you just want the video to play on screen.

refer to SimplePlayer.h&.cpp in Android-4.4 source code. it's used to decode an media file and output decoded video into surface. i think it is similar with your scenario.

How to pass Camera preview to the Surface created by MediaCodec.createInputSurface()?

Ideally I'd like to accomplish two goals:
Pass the Camera preview data to a MediaCodec encoder via a Surface. I can create the Surface using MediaCodec.createInputSurface() but the Camera.setPreviewDisplay() takes a SurfaceHolder, not a Surface.
In addition to passing the Camera preview data to the encoder, I'd also like to display the preview on-screen (so the user can actually see what they are encoding). If the encoder wasn't involved then I'd use a SurfaceView, but that doesn't appear to work in this scenario since SurfaceView creates its own Surface and I think I need to use the one created by MediaCodec.
I've searched online quite a bit for a solution and haven't found one. Some examples on bigflake.com seem like a step in the right direction but they take an approach that adds a bunch of EGL/SurfaceTexture overhead that I'd like to avoid. I'm hoping there is a simpler example or solution where I can get the Camera and MediaCodec talking more directly without involving EGL or textures.

As of Android 4.3 (API 18), the bigflake CameraToMpegTest approach is the correct way.
The EGL/SurfaceTexture overhead is currently unavoidable, especially for what you want to do in goal #2. The idea is:
Configure the Camera to send the output to a SurfaceTexture. This makes the Camera output available to GLES as an "external texture".
Render the SurfaceTexture to the Surface returned by MediaCodec#createInputSurface(). That feeds the video encoder.
Render the SurfaceTexture a second time, to a GLSurfaceView. That puts it on the display for real-time preview.
The only data copying that happens is performed by the GLES driver, so you're doing hardware-accelerated blits, which will be fast.
The only tricky bit is you want the external texture to be available to two different EGL contexts (one for the MediaCodec, one for the GLSurfaceView). You can see an example of creating a shared context in the "Android Breakout game recorder patch" sample on bigflake -- it renders the game twice, once to the screen, once to a MediaCodec encoder.
Update: This is implemented in Grafika ("Show + capture camera").
Update: The multi-context approach in "show + capture camera" approach is somewhat flawed. The "continuous capture" Activity uses a plain SurfaceView, and is able to do both screen rendering and video recording with a single EGL context. This is recommended.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.