I’m encoding a set of JPEGs into mp4 by using the MediaCodec API. The photos could have any resolution, but I adjust all photos to be multiple of 16 to ensure they have a compatible size with MediaCodec, and make sure they are within the supported sizes returned by the codec video capabilities.
I’ve found out that in some old devices using the OMX.qcom.video.encoder.avc codec, some resolutions produce garbled videos, as seen in the next samples with different aspect ratios. The problem does not happen when using standard aspect ratios such as 16:9, 4:3, etc, only when using custom ones.
Original
Result
Original
Result
Investigating the issue, I discovered through another user’s question that this could be related to the fact old Qualcomm devices require the Y plane of the YUV data to be aligned at a 2K boundary. But, I’m not working with YUV data at all directly, instead I’m using an Input Surface and rendering through OpenGL.
My guess is that maybe the Codec underlaying system for the Input Surface works anyway with YUV buffers and the Qualcomm codec handles all the conversion, is just a guess. But if so, then, is there any formula I could use to adjust the resolution and align it to such boundary requirement, even if it will produce some cropping? Or if I am misled about my guess, then what could be causing such issue?
See the next accepted answer for the statement about the 2K boundary alignment.
How to get stride and Y plane alignment values for MediaCodec encoder
Related
I'm trying to capture Android's views as bitmaps and save them as .mp4 file.
I'm using MediaCodec to encode bitmaps and MediaMuxer to mux them into .mp4.
Using YUV420p color format I expect input buffers from MediaCodec to be of size resWidth * resHeight * 1.5 but Qualcomm's OMX.qcom.video.encoder.avc gives me more than that (no matter what resolution I choose). I believe that it wants me to do some alignment in my input byte stream but I have no idea how to find out what exactly it expects me to do.
This is what I get when I pack my data tightly in input buffers on Nexus 7 (2013) using Qualcomm's codec: https://www.youtube.com/watch?v=JqJD5R8DiC8
And this video is made by the very same app ran on Nexus 10 (codec OMX.Exynos.AVC.Encoder): https://www.youtube.com/watch?v=90RDXAibAZI
So it looks like luma plane is alright in faulty video but what happened with chroma plane is a mystery for me.
I prepared minimal (2 classes) working code example exposing this issue: https://github.com/eeprojects/MediaCodecExample
You can get videos shown above just by running this app (there will be same artefacts if your device utilizes Qualcomm's codec).
There are multiple ways of storing YUV 420 in buffers; you need to check the individual pixel format you chose. MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420Planar and MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420PackedPlanar are in practice the same, called planar or I420 for short, while the others, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420SemiPlanar, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420PackedSemiPlanar and MediaCodecInfo.CodecCapabilities.COLOR_TI_FormatYUV420PackedSemiPlanar are called semiplanar or NV12.
In semiplanar, you don't have to separate planes for U and V, but you have one single plane with pairs of interleaved U,V.
See
https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java (lines 925-949) for an example on how to fill in the buffer for the semiplanar formats.
I want to get image flows with the least distortions possible(no noise reduction, etc) without having to deal with RAW outputs.
I'm working with two streams(one when using the deprecated camera), one for the preview and one for the processing. I understand the camera2 api, but am wondering what kind of upsampling/downsampling is used when fitting the sensor output to the surfaces?
More specifically, I'm working on zoomed images, and according to the camera2 documentation concerning cropping and the references:
For non-raw streams, any additional per-stream cropping will be done to maximize the final pixel area of the stream.
The whole concept is easy enough to understand, but it's also mentioned that:
Output streams use this rectangle to produce their output, cropping to a smaller region if necessary to maintain the stream's aspect ratio, then scaling the sensor input to match the output's configured resolution.
But I haven't been able to find any info about this scaling. Which method is used(filter based, bicubic, edge-directed, etc)? is there a way to get this info? and is there a way I can actually choose which one is used?
Concerning the deprecated camera, I'm guessing the zoom is just simpler, in the sense that it's probably equivalent to having SCALER_CROPPING_TYPE_CENTER_ONLY with only a limited set of crop regions corresponding to the exposed zoom ratios. But is the image scaling the same as in camera2? If someone could shed some light I'd be happy.
Real life example
Camera sensor: 5312x2988(16:9)
I want a 4x zoom so the crop region should be (1992, 1120, 1328, 747)
(btw what happens to odd sizes? for instance with SCALER_CROPPING_TYPE_CENTER_ONLY devices?)
Now I have a surface of size(1920, 1080), the crop area and the stream ratio fit, but the 1328x747 values must be transformed to fill the 1920x1080 surface. The nature of this transformation is what I want to know.
The scaling algorithm used depends on the device; generally for power efficiency and speed, scaling is done in hardware blocks usually at the end of camera image signal processor (ISP) pipeline.
Therefore, you can't generally rely on it being any particular kind of scaling or filtering. Unfortunately, if you want to understand the entire processing pipeline, you have to start with RAW and implement it yourself.
If you're on the same device, the old camera API and the new camera2 API talk to the same hardware abstraction layer, and the same hardware scalers, so the scaling output will generally match exactly for the same resolution. (with the exception of LEGACY-level devices, where camera2 may need additional GPU-based scaling, which will be bilinear downsampling - but you don't really know when this would apply).
I've got an Android application which does motion detection and video recording. It supports both the Camera and Camera2 APIs in order to provide backwards compatibility. I'm using an ImageReader with the Camera2 API in order to do motion detection. I'm currently requesting JPEG format images, which are very slow. I understand that requesting YUV images would be faster, but is it true that the YUV format varies depending on which device is being used? I just wanted to check before I give up on optimizing this.
All devices will support NV21 and YV12 formats for the old camera API (since API 12), and for camera2, all devices will support YUV_420_888.
YUV_420_888 is a flexible YUV format, so it can represent multiple underlying formats (including NV21 and YV12). So you'll need to check the pixel and row strides in the Images from the ImageReader to ensure you're reading through the 3 planes of data correctly.
If you need full frame rate, you need to work in YUV - JPEG has a lot of encoding overhead and generally won't run faster than 2-10fps, while YUV will run at 30fps at least at preview resolutions.
I solved this problem by using the luminance (Y) values only, the format for which doesn't vary between devices. For the purposes of motion detection, a black and white image is fine. This also gets around the problem on API Level 21 where some of the U and V data is missing when using the ImageReader.
I am passing the output of a MediaExtractor into a MediaCodec decoder, and then passing the decoder's output buffer to an encoder's input buffer. The problem I have is that I need to reduce the resolution from the 1920x1080 output from the decoder to 1280x720 by the time it comes out of the encoder. I can do this using a Surface, but I am trying to target Android 4.1 so will need to achieve this another way. Does anyone know how to change the resolution of a video file using MediaCodec but in a way that is compatible with 4.1?
You can use libswscale from libav/ffmpeg, or libyuv, or any other YUV handling library, or write your own downscaling routine - it's not very hard actually.
Basically, when you feed the output from the decoder output buffer into the encoder input buffer, you already can't assume you can do a plain copy, because the two may use different color formats. So to be flexible, your code for copying data already needs to be able to convert any supported decoder output color format into any supported encoder input color format. In this copy step, you can just scale down the data. A trivial nearest neighbor downscale is very simple to implement; better looking scaling require a bit more work.
You don't need to do a full SW decode/encode, you can just use SW to adjust the data in the intermediate copy step. But as fadden pointed out, MediaCodec isn't completely stable prior to 4.3 anyway, so it may still not work on all devices.
I am using the GL_OES_EGL_image_external extension to play a video with OpenGL. The problem is that on some devices the video dimensions are exceeding the maximum texture size of OpenGL. Is there any way how I can dynamically deal with this issue, e.g. downscale the frames on the fly or do I have to reduce the video size beforehand?
If you are really hitting the max texture size in OpenGL ES (FWIW I believe this is about 2048x2048 with recent devices) then you could do a few things:
You could set setVideoScalingMode(VIDEO_SCALING_MODE_SCALE_TO_FIT) on your MediaPlayer. I believe this will scale the video resoltion to the size of the SurfaceTexture/Surface that it is attached to.
You could alternatively have four videos playing and render them to seperate TEXTURE_EXTERNAL_OES's then render these four textures seperately in GL. However that could kill your performance.
If I saw the error message and some context of the code I could maybe provide some more information.