SurfaceView refresh rate implementation

SurfaceView refresh rate implementation - android

I'm constantly updating a SurfaceView in a dedicated thread with a tight loop that starts with lockCanvas and ends with unlockCanvasAndPost.
No matter how simple is the drawing, I do not get more than 60FPS, which makes sense as that is the refresh rate for the display.
My question is: how is this maximum rate enforced?
I would expect lockCanvas to be the blocking method (i.e., the method that inserts the appropriate delay), but profiling seems to show that both lockCanvas and unlockCanvasAndPost contribute to the delay that bounds the rate.

lockCanvas() acquires a buffer to draw in, unlockCanvas() submits the buffer to the system compositor. If there are no free buffers, lockCanvas() will block until one is available. unlockCanvas() could stall if SurfaceFlinger doesn't process the IPC message right away, but generally the unlock should return quickly.
If you want to understand how the graphics system works, see the graphics architecture doc. Of particular interest will be the section on game loops; what you're doing is "queue stuffing", which is easy to do but has slightly worse latency characteristics than a Choreographer-based approach.

Related

Double buffering slows frame rendering | systrace analysis

Im working at a simple 2D Game with Custom View canvas drawing (postInvalidate()) and HardwareAcceleration. After weeks of performance analysis i decided to sync my update and drawing operations with the VSYNC pulse over the Interface Choreographer.FrameCallback. Im thinking thats the right way to get smooth movements.
However im still experiencing choppy movements. I analyzed it with systrace and noticed that is has something to do with my BufferQueue. As soon as double buffering sets in, the frame time exceeds the 16ms. I made a screenshot of my trace with some explanations:
The whole draw operation waits for the buffer release of the SurfaceFlinger (consumer) to dequeue its own new empty Buffer.
Can you tell me if this is a regular behavior or what could be the reason for this?

On your graph, you have a note, "SurfaceFlinger misses VSYNC".
However, if you look at the BufferQueue row, you can see that the buffer arrived after the VSYNC deadline. SurfaceFlinger woke up, but there was nothing to do.
Your app then provided an additional buffer, which meant you had two buffers pending. Since you continued to provide a buffer on every VSYNC, the queue never got back down to zero buffers. With the queue stuffed full, every attempt to add additional buffers results in blocking.
FWIW, your BufferQueue is triple-buffered: two are in the queue, one is on the display.
There are a few things you can do:
Have the app drop frames if you've missed the deadline.
Specify a presentation time for the frames so SurfaceFlinger will drop them if the time is passed.
Deliberately drop a frame every once in a while to let the queue empty. (Not the preferred approach.)
#2 only works with GLES on a SurfaceView, so we can ignore that one.
#1 might work for you; you can see an example in Grafika. It essentially says, "if the next VSYNC is firing in less than 2ms, or has already fired, don't bother rendering the current frame." The View/invalidate approach doesn't give you the same fine-grained control that GLES does though, so I'm not sure how well that will work.
The key to smooth animation on a busy device isn't hitting every frame at 60fps. The key is to make your updates based on delta time, so things look smooth even if you drop a frame or two.
For additional details on the graphics architecture, see this doc.

Need to run coroutines after rendering frame

My project is for Android. I need to run code after rendering the frame. This is because rendering time could vary vaguely for different frames. I have some time consuming mesh and collider creation routines (So basically there is no room for multithreading because of direct interaction with unity's API). Let's say the profiler shows something like this:
When I do my coroutines I can't know how much time the rendering takes. So I can't decide how much time budget I have for the coroutine and when to yield it. It could be 20ms or it could be 1ms.
Let's reserve a constant time ,say 10ms, for coroutine execution and assume that target frame rate is 30 and physx time is negligible . Now if rendering take 5ms then 5+10=15ms and I dropped the opportunity to use 15 more milliseconds. On the contrary if the frame had a spike and took 25ms, Then 25+10=35ms and that is greater than the 30FPS target framerate resulting in a visible FPS drop. So either way the constant time has very bad consequence.
But if I could run code after rendering time, I could know exactly how much time I have before the screen refresh and yield at the right moment without losing precious time or the risk of creating a spike with my own hands!!!
I know the codes run according to unity's execution order http://docs.unity3d.com/Manual/ExecutionOrder.html. I need a workaround or any viable strategy to do coroutines reliably.

IEnumerator MyCoroutine()
{
while(true){
yield return new WaitForEndOfFrame();
// My code
}
}
http://docs.unity3d.com/ScriptReference/WaitForEndOfFrame.html
As mentioned in the docs, waits for all to be done, right before swapping the buffer. I don't think you can do anything after that without hacking the engine.

Syncing openGL rendering with c++ game loop on android

I am building a game like application using android NDK and openGL ES 2.0
So far I understand the concept of vertices and shaders and programs.
The main game loop would be a loop in a single thread as follows
step 1. Read all the user input
step 2. Update game objects (if needed) based on the input
step 3. make draw calls for all the objects
step 4. call glSwapBuffers
then loop back to step 1
But I ran into various confusions regarding sync and threading so I'm listing all the question together.
1.Since open GL draw calls are asynchronous the draw calls and glSwapBuffers may get called many times before the gpu has even rendered actually a single frame from calls from last iteration of loop. Will this be problematic? buffer overflow or tearing ?
2.Assuming VSYNC is enabled then does point 1 still causes problem?
3.Since all calls are async how do I measure the time spent rendering each frame? glSwapBuffers would return immediately so how can I know when was the frame actually done?
4.loading textures will occupy space in the ram is checking free memory before loading texture standard way or I should keep loading textures until I reach OUT_OF_MEMORY_ERROR?
5.If I switch to multithreaded approach calling just glswapbuffers at a fixed 60 times per second without any regard to the thread which is processing input and giving out draw calls then what is supposed to happen?
Also how do I control the fps in game loop? I know the exact fps depends on a large no of factors but how can you go close to that

The SwapBuffers() will not be executed out of order. Issuing it after all of the draw commands for the frame is fine. The driver will take care about it, you don't need to sync anything. You can only screw this about by using multiple threads or multiple contexts, but even that would take a lot of effort.
There is no problem with 1, and VSYNC does not directly change anything here.
The calls might be asynchronous, but the driver will not queue up an unlimit amount of work. Sooner or later, it will have to block, if you try to issue too many calls in advance. When vsync is on, the typicial behavior is that the driver will queue up at most a few frames (or just one, depending on the driver settings), and SwapBuffers() will block when that limit is reached. So the timing statistics you get there are accurate, after the first few frames. Note that this is still much better than completely flushing the queue, as the driver unblocks as soon as the first pending buffer swap was carried out.
This is a totally new topic, which probably belongs into another question. However: It is very unlikely that you get any of the current desktop GL implementations to ever generate GL_OUT_OF_MEMORY. The driver will automatically page textures (and other objects) between VRAM and system RAM (and the OS might even page that to disk). The GL also provides no means to query the available memory.
In that scenario, you will need to synchronize manually. That approach does not make the slightest sense and seems like trying to solve a problem which does not exist. If you want your game to use multithreading, still put all the gl rendering (and swapbuffers) into the same thread. You can use different threads for input processing, sound, physics, update of the scene, general game logic and whatever. But you should just use a single thread/single context approach for the GL. That way, it also won't hurt you when SwapBuffers() blocks your render thread, as your game logic and input handling is still done, and the render thread will just render new frames with the newest available data in the frequency the display needs (with vsync on) or as fast as the CPU and GPU can work (if vsync is off).

What does "SurfaceView" represent in systrace result?

When investigating an game play stuttering issue, I found that bewteen eglSwapBuffer() from the game and postFramebuffer() in surfaceflinger, there is always a delay in "SurfaceView" lasts from 0.5ms to 10ms which seems pretty random and irrelevant to CPU load. What does this really represents? Does it has anything to do with VSYNC point of display?
http://i.stack.imgur.com/n8MvG.png

That row represents a BufferQueue. The height of the element (0 or 1 in the visible portion of your trace) indicates how many buffers are present in the queue.
In this case, it's the queue of graphics buffers that are being presented on your SurfaceView Surface. When your app calls eglSwapBuffers(), it submits a buffer to the queue. When SurfaceFlinger wakes up on a VSYNC signal, it latches a buffer from the queue if one is available, and composites it for the next refresh.
Update: BufferQueues and their uses are described in some detail here. Appendix C mentions their appearance in systrace.

Can I use performTraversals to measure FPS on Android?

I'd like to calculate FPS to detect performance issue of an application based on existing Android profiling tool .
I noted that on Systrace, it can record the length of performTraversals. As far as I know, performTraversals performs measure, layout and draw, which include most of jobs when updating a frame. So can performTraversals be representative enough to measure whether a frame will take 60 ms to update?
I also noted that Systrace record the time spending on SurfaceFlinger. I know SurfaceFlinger served for rendering purpose, but I don't know the exact beginning point and ending point of a frame. Should I also considering the time spent on SurfaceFlinger to the frame rate? (Though I do observe that SurfaceFlinger perform more frequently than performTraversals, which means SurfaceFlinger may not necessarily follow performTraversals. It will also be triggered in other scenarios.)
P.S. I'm aware of the sysdump gfxinfo, but it can only record 128 frames(~2 seconds), while what I want may last much longer.

Systrace is not useful for measuring FPS overall, but you can do that trivially with a frame counter and System.nanoTime(). If you're not hitting your target framerate, though, it can help you figure out why not.
The official docs provide some useful pointers, but there's a lot of information and the interactions can be complex. The key things to know are:
The device display panel generates a vsync signal. You can see that on the VSYNC line. Every time it transitions between 1 and 0 is a refresh.
The vsync wakes surfaceflinger, which gathers up the incoming buffers for the various windows and composites them (either itself using OpenGL ES, or through the Hardware Composer).
If your app was running faster than the panel refresh rate (usually 60fps), it will have blocked waiting for surfaceflinger (in, say, eglSwapBuffers()). Once surfaceflinger acquires the buffer, the app is free to continue and generate another frame.
Unless you're rendering offscreen, you can't go faster than surfaceflinger.
As of Android 4.3 (API 18) you can add your own events to the systrace output using the android.os.Trace class. Wrapping your draw method with trace markers can be extremely informative. You have to enable their tag with systrace to see them.
If you want to be running at 60fps, your rendering must finish in well under 16.7ms. If you see a single invocation of performTraversals taking longer than that, you're not going to hit maximum speed.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.