What does "SurfaceView" represent in systrace result? - android

When investigating an game play stuttering issue, I found that bewteen eglSwapBuffer() from the game and postFramebuffer() in surfaceflinger, there is always a delay in "SurfaceView" lasts from 0.5ms to 10ms which seems pretty random and irrelevant to CPU load. What does this really represents? Does it has anything to do with VSYNC point of display?
http://i.stack.imgur.com/n8MvG.png

That row represents a BufferQueue. The height of the element (0 or 1 in the visible portion of your trace) indicates how many buffers are present in the queue.
In this case, it's the queue of graphics buffers that are being presented on your SurfaceView Surface. When your app calls eglSwapBuffers(), it submits a buffer to the queue. When SurfaceFlinger wakes up on a VSYNC signal, it latches a buffer from the queue if one is available, and composites it for the next refresh.
Update: BufferQueues and their uses are described in some detail here. Appendix C mentions their appearance in systrace.

Related

Double buffering slows frame rendering | systrace analysis

Im working at a simple 2D Game with Custom View canvas drawing (postInvalidate()) and HardwareAcceleration. After weeks of performance analysis i decided to sync my update and drawing operations with the VSYNC pulse over the Interface Choreographer.FrameCallback. Im thinking thats the right way to get smooth movements.
However im still experiencing choppy movements. I analyzed it with systrace and noticed that is has something to do with my BufferQueue. As soon as double buffering sets in, the frame time exceeds the 16ms. I made a screenshot of my trace with some explanations:
The whole draw operation waits for the buffer release of the SurfaceFlinger (consumer) to dequeue its own new empty Buffer.
Can you tell me if this is a regular behavior or what could be the reason for this?
On your graph, you have a note, "SurfaceFlinger misses VSYNC".
However, if you look at the BufferQueue row, you can see that the buffer arrived after the VSYNC deadline. SurfaceFlinger woke up, but there was nothing to do.
Your app then provided an additional buffer, which meant you had two buffers pending. Since you continued to provide a buffer on every VSYNC, the queue never got back down to zero buffers. With the queue stuffed full, every attempt to add additional buffers results in blocking.
FWIW, your BufferQueue is triple-buffered: two are in the queue, one is on the display.
There are a few things you can do:
Have the app drop frames if you've missed the deadline.
Specify a presentation time for the frames so SurfaceFlinger will drop them if the time is passed.
Deliberately drop a frame every once in a while to let the queue empty. (Not the preferred approach.)
#2 only works with GLES on a SurfaceView, so we can ignore that one.
#1 might work for you; you can see an example in Grafika. It essentially says, "if the next VSYNC is firing in less than 2ms, or has already fired, don't bother rendering the current frame." The View/invalidate approach doesn't give you the same fine-grained control that GLES does though, so I'm not sure how well that will work.
The key to smooth animation on a busy device isn't hitting every frame at 60fps. The key is to make your updates based on delta time, so things look smooth even if you drop a frame or two.
For additional details on the graphics architecture, see this doc.

SurfaceView refresh rate implementation

I'm constantly updating a SurfaceView in a dedicated thread with a tight loop that starts with lockCanvas and ends with unlockCanvasAndPost.
No matter how simple is the drawing, I do not get more than 60FPS, which makes sense as that is the refresh rate for the display.
My question is: how is this maximum rate enforced?
I would expect lockCanvas to be the blocking method (i.e., the method that inserts the appropriate delay), but profiling seems to show that both lockCanvas and unlockCanvasAndPost contribute to the delay that bounds the rate.
lockCanvas() acquires a buffer to draw in, unlockCanvas() submits the buffer to the system compositor. If there are no free buffers, lockCanvas() will block until one is available. unlockCanvas() could stall if SurfaceFlinger doesn't process the IPC message right away, but generally the unlock should return quickly.
If you want to understand how the graphics system works, see the graphics architecture doc. Of particular interest will be the section on game loops; what you're doing is "queue stuffing", which is easy to do but has slightly worse latency characteristics than a Choreographer-based approach.

Frame Drop issue with video playback

I am experiencing frame drop issue with video playback. We just moved from ICS to KK4.4. Video is very small 320x240 resolution. There is no audio to make things simple.
Systrace is at following location: https://www.dropbox.com/s/bee6xymg3kpn4ft/mytrace2.html?dl=0
I have enabled Triple Buffering and hwcomposer is generating fake vsync's to the SurfaceFlinger.
I can see following issues:
Triple buffering not enabled properly since videodecoder allocates 7 buffers queue. In case Triple buffering was working fine for each frame that gets queued from TimedEventQueue(OnVideoEvent), the buffer that should get dequeued should be 2 slots behind. For Ex: if we queue buf 4 then buf 2 should be dequeued but whats is getting dequeued is immediate previous buffer which surfaceflinger only releases when it gets the chance to run the next time. Hence the delay and whichin turn causes cancelbuffers for video to catch up.
SurfaceFlinger itself taking some time to finish.
Vsync not getting turned on at appropriate times say every 33ms for a 30 fps video. An Issue with vsync generation logic in HWComposer or vsync not being enabled by eventControl due to no actual buffers being Queued?
Updating from the below comment that I made:
Other things that I have noted is that async and mDequeueBufferCannotBlock flags are both false and hence getMinUndequeuedBufferCount() returns 1 and thus we see the immediate previous buffer being asked for dequeue rather than a buffer 2 slots behind. PLease let me know if there is hole in above understanding. And anything that i can do to get around this
Any help is greatly appreciated.
The video codecs decide how many buffers they need. BufferQueue configuration is a negotiation.
I don't see where SurfaceFlinger is taking a long time to finish. Look at the /system/bin/surfaceflinger line -- it's working quickly.
There is no value in waking up and doing work when there's no work to do, so VSYNC gets switched off if there's nothing happening. Your latter assumption is correct -- if you look at the SurfaceView row you can see a correlation between no buffers available and no wake-up in SurfaceFlinger.
I'm seeing 180ms between the arrival of frames on the SurfaceView BufferQueue, which is about 5.5fps. There's some long onVideoEvent sections in the mediaserver process -- what codec are you using?
You described this as a "frame drop" issue -- which player are you using?

Android triple buffering - expected behavior?

I'm investigating the performance of my app since I noticed it dropping some frames while scrolling. I ran systrace (on a Nexus 4 running 4.3) and noticed an interesting section in the output.
Everything is fine at first. Zooming in on the left section, we can see that drawing starts on every vsync, finishes with time to spare, and waits until the next vsync. Since it's triple buffered, it should be drawing into a buffer that will be posted on the following vsync after it's done.
On the 4th vsync in the zoomed in screenshot, the app does some work and the draw operation doesn't finish in time for the next vsync. However, we don't drop any frames because the previous draws were working a frame ahead.
After this happens though, the draw operations don't make up for the missed vsync. Instead, only one draw operation starts per vsync, and now they're not drawing one frame ahead anymore.
Zooming in on the right section, the app does some more work and misses another vsync. Since we weren't drawing a frame ahead, a frame actually gets dropped here. After this, it goes back to drawing one frame ahead.
Is this expected behavior? My understanding was that triple buffering allowed you to recover if you missed a vsync, but this behavior looks like it drops a frame once every two vsyncs you miss.
Follow up questions
On the right side of this screenshot, the app is actually rendering buffers faster than the display is consuming them. During performTraversals #1 (labeled in the screenshot), let's say buffer A is being displayed and buffer B is being rendered. #1 finishes long before the vsync and puts buffer B in the queue. At this point, shouldn't the app be able to immediately start rendering buffer C? Instead, performTraversals #2 doesn't start until the next vsync, wasting the precious time in between.
In a similar vein, I'm a bit confused about the need for waitForever on the left side here. Let's say buffer A is being displayed, buffer B is in the queue, and buffer C is being rendered. When buffer C is finished rendering, why isn't it immediately added to the queue? Instead it does a waitForever until buffer B is removed from the queue, at which point it adds buffer C, which is why the queue seems to always stay at size 1 no matter how fast the app is rendering buffers.
The amount of buffering provided only matters if you keep the buffers full. That means rendering faster than the display is consuming them.
The labels don't appear in your images, but I'm guessing that the purple row above the green vsync row is the BufferQueue status. You can see that it generally has 0 or 1 full buffers at any time. At the very left of the "zoomed-in on the left" image you can see that it's got two buffers, but after that it only has one, and 3/4 of the way across the screen you see a very short purple bar the indicates it just barely rendered the frame in time.
See this post and this post for background.
Update for the added questions...
The detail in the other post barely scratched the surface. We must go deeper.
The BufferQueue count shown in systrace is the number of queued buffers, i.e. the number of buffers that have content in them. When SurfaceFlinger grabs a buffer for display, it releases the buffer immediately, changing its state to "free". This is particularly exciting when the buffer is being shown on an overlay, because the display is rendering directly from the buffer (as opposed to compositing into a scratch buffer and displaying that).
Let me say that again: the buffer from which the display is actively reading data for display on the screen is marked as "free" in the BufferQueue. The buffer has an associated fence that is initially "active". While it's active, nobody is allowed to modify the buffer contents. When the display no longer needs the buffer, it signals the fence.
So the reason why the code over on the left of your trace is in waitForever() is because it's waiting for the fence to signal. When VSYNC hits, the display switches to a different buffer, signals the fence, and your app can start using the buffer immediately. This eliminates the latency that would be incurred if you had to wait for SurfaceFlinger to wake up, see that the buffer was no longer in use, send an IPC through BufferQueue to release the buffer, etc.
Note that the calls to waitForever() only show up when you're not falling behind (left side and right side of the trace). I'm not sure offhand why it's happening at all when the queue has only 1 full buffer -- it should be dequeueing the oldest buffer, which should already have signaled.
The bottom line is that you'll never see the BufferQueue go above two for triple buffering.
Not all devices work as described above. Nexus 7 (2012) doesn't use the "explicit sync" mechanism, and pre-ICS devices don't have BufferQueues at all.
Going back to your numbered screenshot, yes, there's plenty of time between '1' and '2' where your app could run performTraversals(). It's hard to say for sure without knowing what your app is doing, but I would guess you've got a Choreographer-driven animation cycle that wakes up every VSYNC and does work. It doesn't run more often than that.
If you systrace Android Breakout you can see what it looks like when you render as fast as you can ("queue stuffing") and rely on BufferQueue back-pressure to regulate the game speed.
It's especially interesting to compare N4 running 4.3 with N4 running 4.4. On 4.3, the trace is similar to yours, with the queue largely hovering at 1, with regular drops to 0 and occasional spikes to 2. On 4.4, the queue is almost always at 2 with an occasional drop to 1. In both cases it's sleeping in eglSwapBuffers(); in 4.3 the trace usually shows waitForever() below that, while in 4.4 it shows dequeueBuffer(). (I don't know the reason for this offhand.)
Update 2: The reason for the difference between 4.3 and 4.4 appears to be a Nexus 4 driver change. The 4.3 driver used the old dequeueBuffer call, which turns into dequeueBuffer_DEPRECATED() (Surface.cpp line 112). The old interface doesn't take the fence as an "out" parameter, so the call has to call waitForever() itself. The newer interface just returns the fence to the GL driver, which does the wait when it needs to (which might not be right away).
Update 3: An even longer explanation is now available here.

Can I use performTraversals to measure FPS on Android?

I'd like to calculate FPS to detect performance issue of an application based on existing Android profiling tool .
I noted that on Systrace, it can record the length of performTraversals. As far as I know, performTraversals performs measure, layout and draw, which include most of jobs when updating a frame. So can performTraversals be representative enough to measure whether a frame will take 60 ms to update?
I also noted that Systrace record the time spending on SurfaceFlinger. I know SurfaceFlinger served for rendering purpose, but I don't know the exact beginning point and ending point of a frame. Should I also considering the time spent on SurfaceFlinger to the frame rate? (Though I do observe that SurfaceFlinger perform more frequently than performTraversals, which means SurfaceFlinger may not necessarily follow performTraversals. It will also be triggered in other scenarios.)
P.S. I'm aware of the sysdump gfxinfo, but it can only record 128 frames(~2 seconds), while what I want may last much longer.
Systrace is not useful for measuring FPS overall, but you can do that trivially with a frame counter and System.nanoTime(). If you're not hitting your target framerate, though, it can help you figure out why not.
The official docs provide some useful pointers, but there's a lot of information and the interactions can be complex. The key things to know are:
The device display panel generates a vsync signal. You can see that on the VSYNC line. Every time it transitions between 1 and 0 is a refresh.
The vsync wakes surfaceflinger, which gathers up the incoming buffers for the various windows and composites them (either itself using OpenGL ES, or through the Hardware Composer).
If your app was running faster than the panel refresh rate (usually 60fps), it will have blocked waiting for surfaceflinger (in, say, eglSwapBuffers()). Once surfaceflinger acquires the buffer, the app is free to continue and generate another frame.
Unless you're rendering offscreen, you can't go faster than surfaceflinger.
As of Android 4.3 (API 18) you can add your own events to the systrace output using the android.os.Trace class. Wrapping your draw method with trace markers can be extremely informative. You have to enable their tag with systrace to see them.
If you want to be running at 60fps, your rendering must finish in well under 16.7ms. If you see a single invocation of performTraversals taking longer than that, you're not going to hit maximum speed.

Categories

Resources