I am using MediaCodec to play 1080p#60fps video. This is on freescale SabreSD platform with Android Lollipop 5.1.
Initially because of BufferQueue Synchronous Mode, the FPS was way below 60.I could now manage to play at 70FPS by changing the BufferQueue to Asynchronous as in JB.
Now the next challenge I am facing is the video lags and FPS drops drastically to 40 when I start interacting with the screen (pulling down notification bar , pressing volume button etc).
So I ran rafika MultiSurfaceActivity and Record GL, I can see all the test play smoothly when no screen is touched or disturbed, but as soon as I start scrolling the notification bar from top and continue that for long time, the fps gets reduced to 35-40FPS.
I have confirmed the same test on Kitkat 4.4.2 and JB 4.2.2 and they seems to work fine.
Same behaviour when playing MP4 from Gallery. The video gets stuck and lags a lot when we start playing with Notification bar
Can anyone explain what has change from Kitkat to Lollipop which can cause this issue (VSync, Triple Buffering ?).
Regurgitating a bit from the Grafika issue tracker:
The bouncing ball is software-rendered, so anything that soaks up CPU time is going to make it slow down. On devices with medium-grade CPUs and big displays (e.g. Nexus 10) it never gets close to 60fps. So a slowdown while you are playing with the nav bar doesn't surprise me, but if it continues to be slow even after you stop playing with the nav bar, then that's a little weird.
Video playback should be less affected, as that does less with the CPU.
Investigation into such problems usually begins by using systrace to capture traces in "good" and "bad" states, and comparing the two.
The key point of BufferQueue "async mode" is to allow frames to drop if the consumer can't keep up with the producer. It's primarily meant for SurfaceTexture, where producer and consumer are in the same app, potentially on the same thread, so having the producer stall waiting for the consumer could cause the program to hang. I'm not sure what you mean by needing it to exceed 60fps, but I would guess you're throwing frames at the display faster than it can render them... so you're not really increasing the frame rate, you're just using the BufferQueue to drop the frames instead of using Choreographer to decide when you need to drop them yourself.
In any event, I left Google back in June 2014, well before Lollipop was completed. If something works correctly on KitKat but weirdly on Lollipop, I'm afraid I can't provide much insight. If you can reproduce the behavior easily, it might be worth capturing a video that demonstrates the problem (point a second smart phone at the device exhibiting the problem, so they can see how you manipulate the device) and filing a bug on http://b.android.com/.
Some traces uploaded by the OP:
https://www.dropbox.com/s/luwovq7ohozccdy/Lollipop_bad.zip
https://www.dropbox.com/s/zkv0aqw0shecpw2/lollipop_good.zip
https://www.dropbox.com/s/g7qe01xvmfyvpak/kitkat.zip
Looking at the kitkat trace, something weird is going on in SurfaceFlinger. The main thread is sitting in postFrameBuffer for a very long time (23-32ms). It eventually wakes up, and the CPU row suggests it was waiting on activity from a "galcore daemon", which I'm not familiar with (seems particular to Vivante GPU).
The lollipop traces only show the CPU rows, as if the capture were done without the necessary tags. I don't believe the systrace capture command changed significantly between kitkat and lollipop, so I'm puzzled as to why the user-space-initiated logging would vanish but the kernel thread scheduling stuff would remain. Make sure you have sched gfx view specified.
The newer lollipop traces only have about a second of good data. When you see "Did Not Finish" it means a "start" record had no matching "end" record. You can increase the systrace logging buffer size with the -b flag. I think there's enough there though.
Looking at the /system/bin/surfaceflinger row you can see that, in the "good" trace, postFrameBuffer usually finishes in about 16ms, but it's still waiting on galcore. Zoom in on 388ms (use WASD keys). At 388.196ms, on the CPU 2 row, you can see galcore do something. Right after it completes, the thin line at the top of the surfaceflinger row changes from light grey (sleeping) to green (running). At 388.548ms, again on CPU 2, galcore runs again, and right after that on the surfaceflinger row you see queueBuffer start to execute.
The "bad" trace looks identical. For example, you can see two galcore executions at 101.146ms and 101.666ms, with what appear to be similar effects on the surfaceflinger row. The key difference is the time spent in postFrameBuffer, which is around 16ms for "good" and around 30ms for "bad".
So this doesn't appear to be a behavioral shift; rather, things are taking longer and deadlines are being missed.
As far as I can tell, SurfaceFlinger is being held up by galcore daemon. This is true in both "good" and "bad" cases. To see what the timing should look like you can run systrace on a Nexus device, or compare to traces from other devices (e.g. the one in this case study or this SO question). If you zoom in you can see doComposition executing in a few milliseconds, and postFrameBuffer finishing in a few tenths of a millisecond.
Summing up: you don't have good and bad, you have bad and worse. :-) I don't know what galcore is, but you'll likely need to have a conversation with the GPU OEM.
Related
I have trouble debugging an performance issue on a div with overflow:auto. As you can see in the image below (and read in the the breakdown below that) it looks like the webview is wasting precious time by not delivering frames.
Weirdest thing is that the same content scrolls pretty smoothly on two older devices (Moto G and X 2013, Lollipop 5.1) but shows noticeable framedrops on newer ones (Moto X 2014 (Marshmallow 6.0) and Pixel C (Nougat 7.0)).
Between 990 and 995ms it does some compositing, rasterizing and gpu stuff. Then it sits idly for 15ms (that's almost an entire frame), updates the layer tree, then waits for another 18ms.
So we've missed ~2 frames by now even when it looks like it should have been able to deliver at least one.
Then it fires a scroll event and gets to chew on the javascript that is tied to that. That takes ~5ms and now it does some painting and compositing, then waits ANOTHER 10ms and updates the layer tree, then more waiting and then, finally, delivers it's first new frame since 990ms.
This entire thing took 66ms but, according to the timeline, the webview used most of that time to sit on its ass. And this is not an exception, I'm seeing this pattern during the entire recording of the scroll.
When I look at a timeline taken from the Moto X 1st gen it looks like the webview wastes a lot less time and tries to deliver frames as often as possible. Sure, it's not doing 60fps all the time but at this point I'm happy it does 40 instead of 20 or even lower.
The obvious question here is "what the hell is happening?" - or maybe more accurate: "why are things (read: frames) NOT happening?"
PS: I've checked the Android System Webview version and all of the devices I tested this on have v52 installed. The only thing I can think of is that the OS is 'causing' this. Did something change since Marshmallow?
UPDATE
I've been able to solve most of the lag by relaying most of the onscroll-logic I did to requestAnimationFrame and hide content that is off-screen (with visibility:hidden). But this doesn't really explain why Chrome seems to just skip frames when it's not really doing anything. It seems it has to do with scrolling a large area fairly complex content but I'd expect to see that in the timeline in some form. Instead, it just shows large empty spaces that block frames from rendering..?
I am attempting to determine (to within 1 ms) when particular screen flips happen on Android. Choreographer fires every time a frame flips, but gives no way of determining which frame is actually being displayed. According to https://source.android.com/devices/graphics/architecture.html, there are several layers in the process: the user land buffer, which flips to a triple-buffered queue, which flips to the surface flinger, which flips to the hardware. Each of these layers can potentially drop a frame, but at this point I have only determined how to to monitor the user land buffer. Is there a way to monitor the other buffers/flips (in real time, on a non-rooted, non-custom phone)?
I have observed unexpected frame delays on the HTC M8 (about 1 every 5 minutes), but the Nexus 7 does not appear to have this problem. I measure the delays by using a Cedrus StimTracker (http://cedrus.com/stimtracker/) with a photo sensor and the Lab Streaming Layer (https://github.com/sccn/labstreaminglayer). I have tried using eglPresentationTimeANDROID to control when screens are flipped, and that has not fixed the problem.
Note that I'm using the ndk, but I can usually use the JNI to get access to non-ndk features when I need to.
The reason I care is in order to use Android for psychological and neurological experiments, where 1 ms precision is highly desirable.
As far as accessible APIs go, it sounds like you've found the relevant bits and pieces. If you haven't yet, please read through this stackoverflow item.
Using Choreographer and extrapolation, you can guess at when the next display refresh will occur. Using eglPresentationTimeANDROID() on an Android 5.0+ device, you can tell SurfaceFlinger when you want a particular frame to be sent to the display. Assuming SurfaceFlinger is properly accounting for all latency (such as additional frames added by "smart" panels), that should get you reliable timing.
(Bear in mind that the timing is based on when the display latches the next frame, not when the next frame is fully visible on the display... the latency there will depend on the panel.)
Grafika's "scheduled swap" Activity uses this feature, but it sounds like you're already familiar.
The only way to get signaled by the display when it does the swap would be to dup() the display-retire fence fd from the previous frame, and wait on it. Some of the code in SurfaceFlinger does this, notably DispSync watches the retire fences to see if the software "VSYNC" is drifting. There is no public API for fences, and the user-space response time could certainly be more than 1ms anyway... it usually works out better to schedule ahead than it does to react. Your requirement for non-rooted non-custom devices makes this problematic.
If you're mostly seeing correct behavior, but occasionally seeing a miss, your best bet is to use systrace to track down the cause.
I'm trying to use the output of systrace to detect janky scrolling during automated tests: I want to notice it early, without having to sit there watching.
I spent some time trying to fathom the trace, and found this ebook very helpful: https://www.safaribooksonline.com/library/view/high-performance-android/9781491913994/ch04.html
The most promising hypothesis was checking whether VSYNC-sf ever stopped ticking on phones displaying VSYNC-sf.
On other machines, SurfaceFlinger seems to be started by either HW_SYNC_0 or VSYNC (sometimes one or both of those VSYNCs stop) but SurfaceFlinger also seems to be involved with VsyncOn, which sometimes appears to keep track of whether there are activity buffers outstanding, and sometimes whether there are input events that need delivering. Confusingly, sometimes input events are delivered during half-second pauses when there's no surface flinger activity, no application drawing, and when even the VSYNC and HW_VSYNC signals decide to pause.
Does anyone know what's going on there?
Should I simply expect to see Surface Flinger always busy - not alternately busy and idle with each tick - and always aligned with one or other of the VSYNCs?
I also sometimes see SurfaceFlinger taking longer than a tick to complete its processing - is that the application's fault for having a very complicated display, or is it just something that happens because some queue isn't empty enough?
I'd prefer to miss a possible jank than claim to have found one which isn't there.
Thanks!
Testing Display Performance Lists how to use the new framestats command from dumpsys to get this type of information. It will provide information on what frames you've missed, and how many of them you've missed.
It's also worth noting that SurfaceFlinger isn't always busy. It's only active when part of the screen needs to be updated. If nothing on the screen needs updating, then no new rendering occurs, and such, SurfaceFlinger should be idle.
You can get a bigger-picture view of the Android rendering pipeline with the Rendering Performance 101 video from Android Performance Patterns.
I'm investigating the performance of my app since I noticed it dropping some frames while scrolling. I ran systrace (on a Nexus 4 running 4.3) and noticed an interesting section in the output.
Everything is fine at first. Zooming in on the left section, we can see that drawing starts on every vsync, finishes with time to spare, and waits until the next vsync. Since it's triple buffered, it should be drawing into a buffer that will be posted on the following vsync after it's done.
On the 4th vsync in the zoomed in screenshot, the app does some work and the draw operation doesn't finish in time for the next vsync. However, we don't drop any frames because the previous draws were working a frame ahead.
After this happens though, the draw operations don't make up for the missed vsync. Instead, only one draw operation starts per vsync, and now they're not drawing one frame ahead anymore.
Zooming in on the right section, the app does some more work and misses another vsync. Since we weren't drawing a frame ahead, a frame actually gets dropped here. After this, it goes back to drawing one frame ahead.
Is this expected behavior? My understanding was that triple buffering allowed you to recover if you missed a vsync, but this behavior looks like it drops a frame once every two vsyncs you miss.
Follow up questions
On the right side of this screenshot, the app is actually rendering buffers faster than the display is consuming them. During performTraversals #1 (labeled in the screenshot), let's say buffer A is being displayed and buffer B is being rendered. #1 finishes long before the vsync and puts buffer B in the queue. At this point, shouldn't the app be able to immediately start rendering buffer C? Instead, performTraversals #2 doesn't start until the next vsync, wasting the precious time in between.
In a similar vein, I'm a bit confused about the need for waitForever on the left side here. Let's say buffer A is being displayed, buffer B is in the queue, and buffer C is being rendered. When buffer C is finished rendering, why isn't it immediately added to the queue? Instead it does a waitForever until buffer B is removed from the queue, at which point it adds buffer C, which is why the queue seems to always stay at size 1 no matter how fast the app is rendering buffers.
The amount of buffering provided only matters if you keep the buffers full. That means rendering faster than the display is consuming them.
The labels don't appear in your images, but I'm guessing that the purple row above the green vsync row is the BufferQueue status. You can see that it generally has 0 or 1 full buffers at any time. At the very left of the "zoomed-in on the left" image you can see that it's got two buffers, but after that it only has one, and 3/4 of the way across the screen you see a very short purple bar the indicates it just barely rendered the frame in time.
See this post and this post for background.
Update for the added questions...
The detail in the other post barely scratched the surface. We must go deeper.
The BufferQueue count shown in systrace is the number of queued buffers, i.e. the number of buffers that have content in them. When SurfaceFlinger grabs a buffer for display, it releases the buffer immediately, changing its state to "free". This is particularly exciting when the buffer is being shown on an overlay, because the display is rendering directly from the buffer (as opposed to compositing into a scratch buffer and displaying that).
Let me say that again: the buffer from which the display is actively reading data for display on the screen is marked as "free" in the BufferQueue. The buffer has an associated fence that is initially "active". While it's active, nobody is allowed to modify the buffer contents. When the display no longer needs the buffer, it signals the fence.
So the reason why the code over on the left of your trace is in waitForever() is because it's waiting for the fence to signal. When VSYNC hits, the display switches to a different buffer, signals the fence, and your app can start using the buffer immediately. This eliminates the latency that would be incurred if you had to wait for SurfaceFlinger to wake up, see that the buffer was no longer in use, send an IPC through BufferQueue to release the buffer, etc.
Note that the calls to waitForever() only show up when you're not falling behind (left side and right side of the trace). I'm not sure offhand why it's happening at all when the queue has only 1 full buffer -- it should be dequeueing the oldest buffer, which should already have signaled.
The bottom line is that you'll never see the BufferQueue go above two for triple buffering.
Not all devices work as described above. Nexus 7 (2012) doesn't use the "explicit sync" mechanism, and pre-ICS devices don't have BufferQueues at all.
Going back to your numbered screenshot, yes, there's plenty of time between '1' and '2' where your app could run performTraversals(). It's hard to say for sure without knowing what your app is doing, but I would guess you've got a Choreographer-driven animation cycle that wakes up every VSYNC and does work. It doesn't run more often than that.
If you systrace Android Breakout you can see what it looks like when you render as fast as you can ("queue stuffing") and rely on BufferQueue back-pressure to regulate the game speed.
It's especially interesting to compare N4 running 4.3 with N4 running 4.4. On 4.3, the trace is similar to yours, with the queue largely hovering at 1, with regular drops to 0 and occasional spikes to 2. On 4.4, the queue is almost always at 2 with an occasional drop to 1. In both cases it's sleeping in eglSwapBuffers(); in 4.3 the trace usually shows waitForever() below that, while in 4.4 it shows dequeueBuffer(). (I don't know the reason for this offhand.)
Update 2: The reason for the difference between 4.3 and 4.4 appears to be a Nexus 4 driver change. The 4.3 driver used the old dequeueBuffer call, which turns into dequeueBuffer_DEPRECATED() (Surface.cpp line 112). The old interface doesn't take the fence as an "out" parameter, so the call has to call waitForever() itself. The newer interface just returns the fence to the GL driver, which does the wait when it needs to (which might not be right away).
Update 3: An even longer explanation is now available here.
I have an OpenGL game for Android. It runs at a good 60fps when the screen is touched. When I release my finger it goes back down to around 30fps. Does the touch event/release raise/lower a thread's priority and if so how can I replicate this to keep it at a constant 60fps. This only seems to be an issue on Galaxy Note 2 so far.
I'll assume you are using onDrawFrame and setRenderMode(RENDERMODE_CONTINUOUSLY).
30 and 60FPS indicates that your implementation of onDrawFrame is called as the device's screen refreshes. Most displays refresh at 60Hz, giving you 60FPS.
It is likely that the Galaxy Note 2 has some power saving feature that limits screen refresh to 30Hz when there are no touches on screen. Check if there's any way to disable this feature.
AFAIK, OpenGL ES does not specify a standard for screen refresh rates, you will need a throttling function to ensure that your game runs/feels the same (i.e. at the same speed) despite differences in FPS.
Yes.
The best way to observe this phenomena is to use systrace with the "freq" tag enabled. You probably need a rooted device, and you definitely need one on which systrace is enabled.
systrace will record changes in the clock frequency for various components. It varies by device, but you can usually get the per-core CPU clocks and GPU memory rate. You will likely see several of them drop significantly at the same time your frame rate drops.
The motivation for doing this is to reduce power requirements and extend battery life. The assumption is that, while your finger is in contact with the screen, you're actively doing something and the device should be as responsive as possible. After a brief period of time, the clocks will slow to a level appropriate for the current workload. The heuristics that determine how long to wait before slowing, and how much to slow down, are tuned for each device.
(This has caused some people to create a thread that just sits and spins on multi-core devices as a way to artificially prop up the CPU clock rate. Not recommended. See also this answer.)
The bottom line is that this isn't a simple matter of adjusting thread priorities. You have to choose between recognizing that the slowdown will happen and adapting to it (by making your game updates independent of frame rate), or figure out some way to fool the device into staying in a higher-power mode when you want smooth animation.
(For anyone who wants to play along at home: build a copy of Grafika and start the "Record GL app" activity. If you drag your finger around the screen all will be well, but if you leave it alone for a few seconds you may start to see the dropped-frame counter rising as the app falls behind. Seen on Nexus 5, Nexus 7 (2013), and others.)