I am writing some code that adds a watermark to an already existing video using OpenGL.
I took most of the code from ContinuousCaptureActivity in Grafika - https://github.com/google/grafika/blob/master/app/src/main/java/com/android/grafika/ContinuousCaptureActivity.java
Where instead of using camera to render on a SurfaceTexture, I use the MoviePlayer class, also present in grafika. Also, instead of rendering random boxes, I render the watermark.
I run MoviePlayer at its full speed, i.e., reading from source and rendering on to the surface as soon as a frame is decoded. This does the job for a 30s video in 2s or less.
Now the issue comes with onFrameAvailable callback. It is called only once after every 4 or 5 frames are rendered by the MoviePlayer class. This makes me lose frames in the output video. Now if I make the MoviePlayer thread go to sleep until the corresponding onFrameAvailable is called, everything is fine and no frames are missed. However, now processing my 30s video takes around 5s.
My question is how do I make SurfaceTexture faster? Or is there some completely different approach that I have to look into?
Note that I do not need to render anything on the screen.
Related
I have a simple android app with a camera preview.
I would like to set the preview so that it shows what happened x seconds before.
I'm trying to do a buffer but it looks like there are no way to control what's inside the preview.
I use camera2 and a textureView for the preview.
Do you have any ideas, or library that could help me ?
Thanks !
You need to cache ~30 preview buffers somehow.
One possible way is to use an ImageReader where you wait for 30 onImageAvailable callbacks to fire before you acquire the first Image. But this requires you to draw the Image yourself to the preview TextureView once you start acquiring them, which is difficult to do correctly before Android Q's ImageReader with usage flags constructor.
You can also cache things in OpenGL; use a SurfaceTexture and a GLSurfaceView, and copy the SurfaceTexture frames to a list of 30 other textures in a circular buffer, then start rendering when the 30th one is drawn. But requires quite a bit of scaffolding code to implement.
The basic issue I am trying to solve is to delay what is sent to a virtual display by a second or so. So basically, I am trying to shift all frames by 1 second after the initial recording. Note that a surface is used as an input and another surface is used as an output through this virtual display. My initial hunch is to explore a few ideas, given that modification of the Android framework or use of non-public APIs is fine. Java or native C/C++ is fine.
a) I tried delaying frames posted to the virtual display or output surface by a second or two in SurfaceFlinger. This does not work as it causes all surfaces to be delayed by the same amount of time (synchronous processing of frames).
b) MediaCodec uses a surface as an input to encode, and then produce the decoded data. Is there anyway to use MediaCodec such that it does not actually encode and only produce unencoded raw frames? Seems unlikely. Moreover, how does MediaCodec do this under the hood? Process things frame by frame. If I can extrapolate the method I might be able to extract frame by frame from my input surface and create a ring buffer delayed by the amount of time I require.
c) How do software decoders, such as FFmpeg, actually do this in Android? I assume they take in a surface but how would they extrapolate and process frame by frame
Note that I can certainly encode and decode to retrieve the frames and post them but I want to avoid actually decoding. Note that modifying the Android framework or using non-public APIs is fine.
I also found this: Getting a frame from SurfaceView
It seems like option d) could be using a SurfaceTexture but I would like to avoid the process of encoding/decoding.
As I understand it, you have a virtual display that is sending its output to a Surface. If you just use a SurfaceView for output, frames output by the virtual display appear on the physical display immediately. The goal is to introduce one second of latency between when the virtual display generates a frame and when the Surface consumer receives it, so that (again using SurfaceView as an example) the physical display shows everything a second late.
The basic concept is easy enough: send the virtual display output to a SurfaceTexture, and save the frame into a circular buffer; meanwhile another thread is reading frames out of the tail end of the circular buffer and displaying them. The trouble with this is what #AdrianCrețu pointed out in the comments: one second of full-resolution screen data at 60fps will occupy a significant fraction of the device's memory. Not to mention that copying that much data around will be fairly expensive, and some devices might not be able to keep up.
(It doesn't matter whether you do it in the app or in SurfaceFlinger... the data for up to 60 screen-sized frames has to be held somewhere for a full second.)
You can reduce the volume of data in various ways:
Reduce the resolution. Scaling 2560x1600 to 1280x800 removes 3/4 of the pixels. The loss of quality should be difficult to notice on most displays, but it depends on what you're viewing.
Reduce the color depth. Switching from ARGB8888 to RGB565 will cut the size in half. This will be noticeable though.
Reduce the frame rate. You're generating the frames for the virtual display, so you can choose to update it more slowly. Animation is still reasonably smooth at 30fps, halving the memory requirements.
Apply image compression, e.g. PNG or JPEG. Fairly effective, but too slow without hardware support.
Encode inter-frame differences. If not much is changing from frame to frame, the incremental changes can be very small. Desktop-mirroring technologies like VNC do this. Somewhat slow to do in software.
A video codec like AVC will both compress frames and encode inter-frame differences. That's how you get 1GByte/sec down to 10Mbit/sec and still have it look pretty good.
Consider, for example, the "continuous capture" example in Grafika. It feeds the Camera output into a MediaCodec encoder, and stores the H.264-encoded output in a ring buffer. When you hit "capture", it saves the last 7 seconds. This could just as easily play the camera feed with a 7-second delay, and it only needs a few megabytes of memory to do it.
The "screenrecord" command can dump H.264 output or raw frames across the ADB connection, though in practice ADB is not fast enough to keep up with raw frames (even on tiny displays). It's not doing anything you can't do from an app (now that we have the mediaprojection API), so I wouldn't recommend using it as sample code.
If you haven't already, it may be useful to read through the graphics architecture doc.
I am referring to the demo app Grafika, in which the CameraCaptureActivity records a video while showing a live preview of the effects applied.
While recording in the CameraCaptureActivity, any effect that is applied to the frame that comes from the camera is done twice.
Once for the preview and once while saving the video to the file.
Since the same frame that is previewed is being saved to the file, it would save a lot of processing if this could be done just once.
The rendering of the frames happens directly on the two surfaces, one being the GLSurfaceView (for preview) and the other being MediaCodec (saving part).
Is there a way to render the OpenGL effect only once?
If I could copy the contents of one surface to the other it would be great.
Is there a way to do this?
Yes: you can render to an FBO, then blit the output twice, once for display and once for recording.
Grafika's "record GL app" activity demonstrates three different approaches to the problem (one of which only works on GLES 3.0+). The doFrame() method does the work; the draw-once-blit-twice approach is currently here.
My Problem:
I have a video (with lets say 25FPS) that has to be rendered with opengles 2.0 on the screen.
For reading the video I use a decoder that decodes that video into opengl es textures. With a renderpass I draw this texture on the screen.
What I have to do is get the image from the decoder upload it to the gpu, call the shaderprogram and render the image on the screen. If the video has 25FPS I have to update the screen in 40ms steps (1000ms/25FPS).
In each step I have todo the following:
get the image from the decoder
push it to the gpu memory
render the screen
swap buffers
So far it is working.
Now it happens, that the decoder takes longer than 40ms to decode a frame. That does not happen all the time but sometimes.
A solution would be building a cache. Meaning, I do render i.e. 5 images, before showing the first. This comes with a problem, it has to happen asynchron, so the cache can be build and the screen be rendered at the same time. If that happens you can see that on the video because it is not "fluid" anymore.
My Question:
Is there a solution for that?
Is it possible to create a ?-buffer, that can be copied(?!) on the backbuffer of the rendersurface, so that I can create a cache with that kind of buffers, and copy that onto the backbuffer without blocking the other thread which is creating this buffers?
OR
How to fill the backbuffer with another buffer?
I tried already:
Rendering Framebuffer(Textures) as cache. This works almost perfect, except that the texture has to be rendered as well. This means that (because it's asynchron) if a cacheframe is build and the image for the screen is build, you have to mutex(/synchronize) the rendermethods, otherwise the program crashes. But syncrhonizing takes the whole point of doing it asynchron. So this is not a good solution.
Remember that in OpenGL, if you do not clear and redraw the screen, the previous image will persist. If a new frame is not ready in time, simply do nothing.
It sounds like you have two threads: one decoding frames, and one rendering them. This is fine.
If render() is called and a new frame is not ready in time, your render method should return immediately. Do not clear or swap buffers. The screen will be preserved.
Now, the user /may/ notice occasional hiccups when a frame is repeated twice. 25 fps is an unnatural frame rate (OpenGL only supports 60/30/15/etc.), so it will not align perfectly to the screen refresh rate.
You could live with this (user likely won't notice). Or you could force playback to 30 fps by buffering frames.
A good idea is to place a message queue between your decoder and your renderer. It could be one or several frames deep. It could be an array, linked list, or ring buffer. This allows the decoder to upload into many cached textures while the rendering is drawing a different texture.
The decoder adds frames to the queue as they come in. The renderer runs at a fixed rate (30 fps). You could pause rendering until N frames have been buffered.
I'm designing an application that has a OpenGL processing pipeline (collection of shaders) and simultaneously requires the end user to see the unprocessed camera preview.
For the sake of example, suppose you want to show the user the camera preview and at the same time count the number of red objects in the scenes you receive from the camera, but any shaders you utilize to count the objects such as hue filtering, etc. should not be seen by the user.
How would I go about setting this up properly?
I know I can setup a camera preview and then on the callback receive camera frame data in YUV format, then dump that into an OpenGL texture and process the frame that way, however, that has performance problems associated with it. I have to roundtrip the data from the camera hardware to the VM, then pass it back to the GPU memory. I'm using SurfaceTexture to get the data from the camera directly in OpenGL understandable format and pass that to my shaders to solve this issue.
I thought I'd be able to show that same unprocessed SurfaceTexture to the end user, but TextureView does not have a constructor or a setter where I can pass it the SurfaceTexture I want it to render. It always creates its own.
This is an overview of my current setup:
GLRenderThread: this class extends from Thread, setups the OpenGL context, display, etc. and uses a SurfaceTexture as the surface (3rd parameter of eglCreateWindowSurface).
GLFilterChain: A collection of shaders that perform detection on the input texture.
Camera: Uses a separate SurfaceTexture which is used as the input of GLFilterChain and grabs the camera's preview
Finally a TextureView that displays the GLRenderThread's SurfaceTexture
Obviously, with this setup, I'm showing the processed frames to the user which is not what I want. Further, the processing of the frames is not real-time. Basically, I run the input from Camera through the chain once and once all filters are done, I call updateTexImage to grab the next frame from the Camera. My processing is around 10 frames per second on Nexus 4.
I feel that I probably need to use 2 GL contexts, one for real-time preview and one for processing, but I'm not certain. I'm hoping someone can push me in the right direction.
can you please upload some of the code you are using?
you might be able to call glDrawArrays on a texture created for and bound to the surface view you are using to display the preview initially, and then flush it and bind a separate texture with your other texture to do the analysis with? something like
GLES20.glUseProgram(simpleProgram);
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textures[0]);
GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4);
GLES20.glUseProgram(crazyProgram);
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textures[1]);
GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4);
where your camera's preview surfacetexture is bound to textures[0] and then a separate surfacetexture created for texture[1]
maybe?
Unless your processing runs slower than real time, then the answer is a simple one: just keep the original camera texture untouched, calculate the processed image to a different texture and display both to the user, side by side in a single GLView. Keep a single thread, as all the processing happens on the GPU anyway. Multiple threads only complicate matters here.
The number of processing steps does not really matter, as there can be arbitrary number of intermediate textures (also see ping-ponging) that are never displayed to the user - no one and nothing is forcing you to.
The notion of real time is probably confusing here. Just think of a frame as an undivisible time snapshot. By doing so, you will ignore the delay that it takes for the image to go from the camera to the screen, but if you can keep it at interactive frame rates (such as at least 20 frames per second), then this can mostly be ignored.
On the other hand, if your processing is much slower, you need to make a choice between introducing a delay in the camera feed and process only every Nth frame, or alternately display every camera frame in real time and let the next processed frame lag behind. To do that, you would probably need two separate rendering contexts to enable asynchronous processing, which might be potentially hard to do on Android (or maybe just as simple as creating a second GLView, since you can live without data sharing between the contexts).