I'm trying to stream video in android through ffmpeg,the output which i am getting after the decoding is YUV format.Is it possible to render YUV image format directly in the Android screen?
Yes and no.
The output of the camera and hardware video decoders is generally YUV. Frames from these sources are generally sent directly to the display. They may be converted by the driver, typically with a hardware scaler and format converter. This is necessary for efficiency.
There isn't an API to allow an app to pass YUV frames around the same way. The basic problem is that "YUV" covers a lot of ground. The buffer format used by the video decoder may be a proprietary internal format that the various hardware modules can process efficiently; for your app to create a surface in this format, it would have to perform a conversion, and you're right back where you were performance-wise.
You should be able to use GLES2 shaders to do the conversion for you on the way to the display, but I don't have a pointer to code that demonstrates this.
Update: an answer to this question has a link to a WebRTC source file that demonstrates doing the YUV conversion in a GLES2 shader.
Related
I am currently trying to develop a video player on Android, but am struggling with color formats.
Context: I extract and decode a video through the standard combinaison of MediaExtractor/MediaCodec. Because I need the extracted frames to be available as OpenGLES textures (RGB), I setup my decoder (MediaCodec) so that it feeds an external GLES texture (GL_TEXTURE_EXTERNAL_OES) through a SurfaceTexture. I know the data output by my HW decoder is in the NV12 (YUV420SemiPlanar) format, and I need to convert it to RGB by rendering it (with a fragment shader doing the conversion).
MediaCodec ---> GLES External Texture (NV12) [1] ---> Rendering ---> GLES Texture (RGB)
The point where I struggle is: How do I access the specific Y, U, and V values contained in the GLES External Texture ([1]). I have no idea how the GLES texture memory is set, nor how to access it (except for the "texture()" and "texelFetch()" GLSL functions).
Is there a way to access the data as I would access a simple array (pointer + offset)?
Did I overthink the whole thing?
Do either Surface or SurfaceTexture take care of conversions? (I don't think so)
Do either Surface or SurfaceTexture change the memory layout of the data while populating the GLES External Texture ([1]) so components can be accessed through GLES texture access functions?
Yes, I would say you're overthinking it. Did you test things and run into an actual issue that you could describe, or is this only theoretical so far?
Even though the raw decoder itself outputs NV12, this detail is hidden when you access it via a SufaceTexture - then you get to access it as any RGB texture. Since the physical memory layout of the texture is hidden, you don't really know if it actually is converted all at once before you get it, or if the texture accessors do a on-the-fly conversion each time you sample it. As far as I know, the implementation is free to do it in any of these ways, and the implementation details about how it is done are not observable through the public APIs at all.
I am passing the output of a MediaExtractor into a MediaCodec decoder, and then passing the decoder's output buffer to an encoder's input buffer. The problem I have is that I need to reduce the resolution from the 1920x1080 output from the decoder to 1280x720 by the time it comes out of the encoder. I can do this using a Surface, but I am trying to target Android 4.1 so will need to achieve this another way. Does anyone know how to change the resolution of a video file using MediaCodec but in a way that is compatible with 4.1?
You can use libswscale from libav/ffmpeg, or libyuv, or any other YUV handling library, or write your own downscaling routine - it's not very hard actually.
Basically, when you feed the output from the decoder output buffer into the encoder input buffer, you already can't assume you can do a plain copy, because the two may use different color formats. So to be flexible, your code for copying data already needs to be able to convert any supported decoder output color format into any supported encoder input color format. In this copy step, you can just scale down the data. A trivial nearest neighbor downscale is very simple to implement; better looking scaling require a bit more work.
You don't need to do a full SW decode/encode, you can just use SW to adjust the data in the intermediate copy step. But as fadden pointed out, MediaCodec isn't completely stable prior to 4.3 anyway, so it may still not work on all devices.
The example DecodeEditEncodeTest.java in bigflake.com, demonstrates an example of simple editing (swap the color channels using OpenGL FRAGMENT_SHADER).
Here, I want to do some complicated image processing (such as adding something in it) of each frame.
In this way, does it mean I cannot use surface. Instead, I need to use buffer?
But from EncodeDecodeTest.java, it says:
(1) Buffer-to-buffer. Buffers are software-generated YUV frames in ByteBuffer objects, and decoded to the same. This is the slowest (and least portable) approach, but it allows the application to examine and modify the YUV data.
(2) Buffer-to-surface. Encoding is again done from software-generated YUV data in ByteBuffers, but this time decoding is done to a Surface. Output is checked with OpenGL ES, using glReadPixels().
(3) Surface-to-surface. Frames are generated with OpenGL ES onto an input Surface, and decoded onto a Surface. This is the fastest approach, but may involve conversions between YUV and RGB.
If I use Buffer-to-buffer, from what the above says, it is the slowest and least portable. How slow would it be?
Or I use surface-to-surface, and read pixels out from the surface.
Which way is more feasible?
Any example available?
I want to create live painting video as export feature for a painting application.
I can create a video with a series of images, with the use of a library ( FFMPEG or MediaCodec). But, this would require too much processing power to compare the images and encode the video.
While drawing, I know exactly which pixels are changed. So, I can save lot of processing if I can pass this info to FFMPEG, instead of having the FFMPEG figure this out from the images.
Is there away to efficiently encode the video for this purpose ?
It should not require "too much processing power" for MediaCodec. Because, for example, device is capable to write video in real time, some of them write full HD video.There's another thing : each MediaCodec's encoder requires pixel data in specific format, you should query API for supported capabilities before using the API. Also it will be tricky to make your app work on all devices with MediaCodec if your app produces only one pixel format, because probably not all of devices will support it(another words: different vendors have different MediaCodec implementation).
I am testing imaging algorithms using a android phone's camera as input, and need a way to consistently test the algorithms. Ideally I want to take a pre-recorded video feed and have the phone 'pretend' that the video feed is a live video from the camera.
My ideal solution would be where the app running the algorithms has no knowledge that the video is pre-recorded. I do not want to load the video file directly into the app, but rather read it in as sensor data if at all possible.
Is this approach possible? If so, any pointers in the right direction would be extremely helpful, as Google searches have failed me so far
Thanks!
Edit: To clarify, my understanding is that the Camera class uses a camera service to read video from the hardware. Rather than do something application-side, I would like to create a custom camera service that reads from a video file instead of the hardware. Is that doable?
When you are doing processing on a live android video feed you will need to build your own custom camera application that feeds you individual frames via the PreviewCallback interface that Android provides.
Now, simulating this would be a little bit tricky seen as the format for the preview frames will generally be in the NV21 format. If you are using a pre-recorded video, I don't think there is any clear way of reading frames one by one unless you try the getFrameAtTime method which will give you bitmaps in an entirely different format.
This leads me to suggest that you could probably test with these Bitmaps (though I'm really not sure what you are trying to do here) from the getFrameAtTime method. In order for this code to then work on a live camera preview, you would need to have to convert your NV21 frames from the PreviewCallback interface into the same format as the Bitmaps from getFrameAtTime, or you could then adapt your algorithm to process NV21 format frames. NV21 is a pretty neat format, presenting color and luminance data separately, but it can be tricky to use.