I'm trying to use the output of systrace to detect janky scrolling during automated tests: I want to notice it early, without having to sit there watching.
I spent some time trying to fathom the trace, and found this ebook very helpful: https://www.safaribooksonline.com/library/view/high-performance-android/9781491913994/ch04.html
The most promising hypothesis was checking whether VSYNC-sf ever stopped ticking on phones displaying VSYNC-sf.
On other machines, SurfaceFlinger seems to be started by either HW_SYNC_0 or VSYNC (sometimes one or both of those VSYNCs stop) but SurfaceFlinger also seems to be involved with VsyncOn, which sometimes appears to keep track of whether there are activity buffers outstanding, and sometimes whether there are input events that need delivering. Confusingly, sometimes input events are delivered during half-second pauses when there's no surface flinger activity, no application drawing, and when even the VSYNC and HW_VSYNC signals decide to pause.
Does anyone know what's going on there?
Should I simply expect to see Surface Flinger always busy - not alternately busy and idle with each tick - and always aligned with one or other of the VSYNCs?
I also sometimes see SurfaceFlinger taking longer than a tick to complete its processing - is that the application's fault for having a very complicated display, or is it just something that happens because some queue isn't empty enough?
I'd prefer to miss a possible jank than claim to have found one which isn't there.
Thanks!
Testing Display Performance Lists how to use the new framestats command from dumpsys to get this type of information. It will provide information on what frames you've missed, and how many of them you've missed.
It's also worth noting that SurfaceFlinger isn't always busy. It's only active when part of the screen needs to be updated. If nothing on the screen needs updating, then no new rendering occurs, and such, SurfaceFlinger should be idle.
You can get a bigger-picture view of the Android rendering pipeline with the Rendering Performance 101 video from Android Performance Patterns.
Related
I am using MediaCodec to play 1080p#60fps video. This is on freescale SabreSD platform with Android Lollipop 5.1.
Initially because of BufferQueue Synchronous Mode, the FPS was way below 60.I could now manage to play at 70FPS by changing the BufferQueue to Asynchronous as in JB.
Now the next challenge I am facing is the video lags and FPS drops drastically to 40 when I start interacting with the screen (pulling down notification bar , pressing volume button etc).
So I ran rafika MultiSurfaceActivity and Record GL, I can see all the test play smoothly when no screen is touched or disturbed, but as soon as I start scrolling the notification bar from top and continue that for long time, the fps gets reduced to 35-40FPS.
I have confirmed the same test on Kitkat 4.4.2 and JB 4.2.2 and they seems to work fine.
Same behaviour when playing MP4 from Gallery. The video gets stuck and lags a lot when we start playing with Notification bar
Can anyone explain what has change from Kitkat to Lollipop which can cause this issue (VSync, Triple Buffering ?).
Regurgitating a bit from the Grafika issue tracker:
The bouncing ball is software-rendered, so anything that soaks up CPU time is going to make it slow down. On devices with medium-grade CPUs and big displays (e.g. Nexus 10) it never gets close to 60fps. So a slowdown while you are playing with the nav bar doesn't surprise me, but if it continues to be slow even after you stop playing with the nav bar, then that's a little weird.
Video playback should be less affected, as that does less with the CPU.
Investigation into such problems usually begins by using systrace to capture traces in "good" and "bad" states, and comparing the two.
The key point of BufferQueue "async mode" is to allow frames to drop if the consumer can't keep up with the producer. It's primarily meant for SurfaceTexture, where producer and consumer are in the same app, potentially on the same thread, so having the producer stall waiting for the consumer could cause the program to hang. I'm not sure what you mean by needing it to exceed 60fps, but I would guess you're throwing frames at the display faster than it can render them... so you're not really increasing the frame rate, you're just using the BufferQueue to drop the frames instead of using Choreographer to decide when you need to drop them yourself.
In any event, I left Google back in June 2014, well before Lollipop was completed. If something works correctly on KitKat but weirdly on Lollipop, I'm afraid I can't provide much insight. If you can reproduce the behavior easily, it might be worth capturing a video that demonstrates the problem (point a second smart phone at the device exhibiting the problem, so they can see how you manipulate the device) and filing a bug on http://b.android.com/.
Some traces uploaded by the OP:
https://www.dropbox.com/s/luwovq7ohozccdy/Lollipop_bad.zip
https://www.dropbox.com/s/zkv0aqw0shecpw2/lollipop_good.zip
https://www.dropbox.com/s/g7qe01xvmfyvpak/kitkat.zip
Looking at the kitkat trace, something weird is going on in SurfaceFlinger. The main thread is sitting in postFrameBuffer for a very long time (23-32ms). It eventually wakes up, and the CPU row suggests it was waiting on activity from a "galcore daemon", which I'm not familiar with (seems particular to Vivante GPU).
The lollipop traces only show the CPU rows, as if the capture were done without the necessary tags. I don't believe the systrace capture command changed significantly between kitkat and lollipop, so I'm puzzled as to why the user-space-initiated logging would vanish but the kernel thread scheduling stuff would remain. Make sure you have sched gfx view specified.
The newer lollipop traces only have about a second of good data. When you see "Did Not Finish" it means a "start" record had no matching "end" record. You can increase the systrace logging buffer size with the -b flag. I think there's enough there though.
Looking at the /system/bin/surfaceflinger row you can see that, in the "good" trace, postFrameBuffer usually finishes in about 16ms, but it's still waiting on galcore. Zoom in on 388ms (use WASD keys). At 388.196ms, on the CPU 2 row, you can see galcore do something. Right after it completes, the thin line at the top of the surfaceflinger row changes from light grey (sleeping) to green (running). At 388.548ms, again on CPU 2, galcore runs again, and right after that on the surfaceflinger row you see queueBuffer start to execute.
The "bad" trace looks identical. For example, you can see two galcore executions at 101.146ms and 101.666ms, with what appear to be similar effects on the surfaceflinger row. The key difference is the time spent in postFrameBuffer, which is around 16ms for "good" and around 30ms for "bad".
So this doesn't appear to be a behavioral shift; rather, things are taking longer and deadlines are being missed.
As far as I can tell, SurfaceFlinger is being held up by galcore daemon. This is true in both "good" and "bad" cases. To see what the timing should look like you can run systrace on a Nexus device, or compare to traces from other devices (e.g. the one in this case study or this SO question). If you zoom in you can see doComposition executing in a few milliseconds, and postFrameBuffer finishing in a few tenths of a millisecond.
Summing up: you don't have good and bad, you have bad and worse. :-) I don't know what galcore is, but you'll likely need to have a conversation with the GPU OEM.
I am attempting to determine (to within 1 ms) when particular screen flips happen on Android. Choreographer fires every time a frame flips, but gives no way of determining which frame is actually being displayed. According to https://source.android.com/devices/graphics/architecture.html, there are several layers in the process: the user land buffer, which flips to a triple-buffered queue, which flips to the surface flinger, which flips to the hardware. Each of these layers can potentially drop a frame, but at this point I have only determined how to to monitor the user land buffer. Is there a way to monitor the other buffers/flips (in real time, on a non-rooted, non-custom phone)?
I have observed unexpected frame delays on the HTC M8 (about 1 every 5 minutes), but the Nexus 7 does not appear to have this problem. I measure the delays by using a Cedrus StimTracker (http://cedrus.com/stimtracker/) with a photo sensor and the Lab Streaming Layer (https://github.com/sccn/labstreaminglayer). I have tried using eglPresentationTimeANDROID to control when screens are flipped, and that has not fixed the problem.
Note that I'm using the ndk, but I can usually use the JNI to get access to non-ndk features when I need to.
The reason I care is in order to use Android for psychological and neurological experiments, where 1 ms precision is highly desirable.
As far as accessible APIs go, it sounds like you've found the relevant bits and pieces. If you haven't yet, please read through this stackoverflow item.
Using Choreographer and extrapolation, you can guess at when the next display refresh will occur. Using eglPresentationTimeANDROID() on an Android 5.0+ device, you can tell SurfaceFlinger when you want a particular frame to be sent to the display. Assuming SurfaceFlinger is properly accounting for all latency (such as additional frames added by "smart" panels), that should get you reliable timing.
(Bear in mind that the timing is based on when the display latches the next frame, not when the next frame is fully visible on the display... the latency there will depend on the panel.)
Grafika's "scheduled swap" Activity uses this feature, but it sounds like you're already familiar.
The only way to get signaled by the display when it does the swap would be to dup() the display-retire fence fd from the previous frame, and wait on it. Some of the code in SurfaceFlinger does this, notably DispSync watches the retire fences to see if the software "VSYNC" is drifting. There is no public API for fences, and the user-space response time could certainly be more than 1ms anyway... it usually works out better to schedule ahead than it does to react. Your requirement for non-rooted non-custom devices makes this problematic.
If you're mostly seeing correct behavior, but occasionally seeing a miss, your best bet is to use systrace to track down the cause.
I'm trying to get a better understanding of the Android display subsystem, but one item that's still confusing to me is how VSYNC signals are handled, and why so many exist in the first place.
Android is designed to use VSYNC at its core, but there are multiple VSYNC signals that it employs. Via https://source.android.com/devices/graphics/implement.html in the "VSYNC Offset" section, there is a flow diagram which diagrams three VSYNC signals: HW_VSYNC_0, VSYNC, and SF-VSYNC. I understand that HW_VSYNC is used to update the timing in DispSync, and that VSYNC and SF-VSYNC are used by the apps and surfaceflinger, but why are these individual signals necessary at all? Furthermore, how do the offsets impact these signals? Is there a timing diagram available anywhere which better explains this?
Thanks for any help you can offer.
To understand this stuff, it's best to start with the System-Level Graphics Architecture document, taking particular note of The Need for Triple-Buffering section and the associated diagram (which ideally would be an animated GIF). The sentence that begins, "If the app starts rendering halfway between VSYNC signals" is talking specifically about DispSync. Once you've read that, hopefully the DispSync section of the device graphics doc makes more sense.
Most devices don't have DispSync offsets configured, so there is really only one VSYNC signal. In what follows I'm assuming DispSync is enabled.
The hardware only provides one VSYNC signal, corresponding to the primary display refresh. The others are generated in software by the SurfaceFlinger DispSync code, firing at fixed offsets from the actual VSYNC. Some clever software is used to keep the timings from slipping out of phase.
The signals are used to trigger SurfaceFlinger composition and app rendering. If you follow the section in the architecture document, you can see that this establishes two frames of latency between when the app renders its content, and when the content appears on the screen. Think of it like this: given three occurrences of VSYNC, the app draws at V0, the system does composition at V1, and the composed frame is sent to the display at V2.
If you're trying to track touch input, perhaps moving a map around under the user's finger, any latency will be seen by the user as sluggish touch response. The goal is to minimize the latency to improve the user experience. Suppose we delayed the events slightly, so the app draws at V0.5, we composite at V1.2, and then swap to the display at V2. By offsetting the app and SF activity we reduce the total latency from 2 frames to 1.5, as shown below.
That's what DispSync is for. In the feedback diagram on the page you linked, HW_VSYNC_0 is the hardware refresh for the physical display, VSYNC causes the app to render, and SF_VSYNC causes SurfaceFlinger to perform composition. Referring to them as "VSYNC" is a bit of a misnomer, but on an LCD panel referring to anything as "VSYNC" is probably a misnomer.
The "retire fence timestamps" noted in the feedback loop diagram refers to a clever optimization. Since we're not doing any work on the actual hardware VSYNC, we can be slightly more efficient if we turn the refresh signal off. The DispSync code will instead use the timestamps from retire fences (which is a whole other discussion) to see if it is falling out of sync, and will temporarily re-enable the hardware signal until it's back on track.
Edit: you can see how the values are configured in the Nexus 5 boardconfig. Note the settings for VSYNC_EVENT_PHASE_OFFSET_NS and SF_VSYNC_EVENT_PHASE_OFFSET_NS.
In the android displaying processes, SurfaceFlinger does an important role in that situation.
By the way, are there any methods to notice the SurfaceFlinger starting or stopping at the application level?
Or any other displaying processes to observe?
I want to know about the time difference between touching and displaying.
The SurfaceFlinger process does not start or stop while applications are running. If it does, the system restarts.
It sounds like you're interested in knowing the latency between when you touch the screen, and when the results of that touch are visible. You can use systrace to observe the various events, though this requires a fair understanding of the system. (Start with this doc.)
In general, an app can expect 2 to 2.5 frames of latency between cause and effect. On the N5, with the DispSync mechanism, this can be reduced to 1.5 - 2 frames.
There's no public API for 3rd party apps to access the surface flinger. If you are working at the platform level (your own device or custom ROM) then you'd have to add your own timing mechanism between the input subsystem and the surface flinger.
I'd like to calculate FPS to detect performance issue of an application based on existing Android profiling tool .
I noted that on Systrace, it can record the length of performTraversals. As far as I know, performTraversals performs measure, layout and draw, which include most of jobs when updating a frame. So can performTraversals be representative enough to measure whether a frame will take 60 ms to update?
I also noted that Systrace record the time spending on SurfaceFlinger. I know SurfaceFlinger served for rendering purpose, but I don't know the exact beginning point and ending point of a frame. Should I also considering the time spent on SurfaceFlinger to the frame rate? (Though I do observe that SurfaceFlinger perform more frequently than performTraversals, which means SurfaceFlinger may not necessarily follow performTraversals. It will also be triggered in other scenarios.)
P.S. I'm aware of the sysdump gfxinfo, but it can only record 128 frames(~2 seconds), while what I want may last much longer.
Systrace is not useful for measuring FPS overall, but you can do that trivially with a frame counter and System.nanoTime(). If you're not hitting your target framerate, though, it can help you figure out why not.
The official docs provide some useful pointers, but there's a lot of information and the interactions can be complex. The key things to know are:
The device display panel generates a vsync signal. You can see that on the VSYNC line. Every time it transitions between 1 and 0 is a refresh.
The vsync wakes surfaceflinger, which gathers up the incoming buffers for the various windows and composites them (either itself using OpenGL ES, or through the Hardware Composer).
If your app was running faster than the panel refresh rate (usually 60fps), it will have blocked waiting for surfaceflinger (in, say, eglSwapBuffers()). Once surfaceflinger acquires the buffer, the app is free to continue and generate another frame.
Unless you're rendering offscreen, you can't go faster than surfaceflinger.
As of Android 4.3 (API 18) you can add your own events to the systrace output using the android.os.Trace class. Wrapping your draw method with trace markers can be extremely informative. You have to enable their tag with systrace to see them.
If you want to be running at 60fps, your rendering must finish in well under 16.7ms. If you see a single invocation of performTraversals taking longer than that, you're not going to hit maximum speed.