Interpreting Multicore Performance Trace (Eclipse/Android) - android

I'm working on an android game, and I started noticing a little sluggishness during development so I wanted to try to utilize multithreading for fun and learning.
My application has 3 threads:
UI thread (should be mostly idle)
Game Logic Thread
Graphics Thread
I minimized the critical section between threads 2 and 3 as best I could, with the idea that the game logic could update independently of the rendering thread, and then at the end of both threads I could have a short as possible window where I push all the graphics updates from the logic thread to the game loop. This should allow the two threads to work independently for a good majority of the time. In theory sounds like a performance win.
However once I got around to implementing, my performance took a big dive. It much worse than before, one loop of updating and rendering is taking like 50 ms (20fps), so it looks like garbage. This is just rendering some 20 triangles and maybe 20 textured quads, a really simple workload (I afraid to think of what it will be when I implement proper graphics).
Anyway I took a DDMS trace in android to profile where things were going wrong or could be improved.
http://i.stack.imgur.com/DDUYE.png
This is a view of roughly 3 frames of my game. So far it seems to be doing roughly what I expected. The parts that are highlighted in blue is the locked section, which looks about right (keeps the glThread mostly waiting while it is locked). However once I unlock it I should see both threads working simultaneously, and it looks like they are, but if I look closer:
http://i.stack.imgur.com/vukXQ.png
I'm doing my development on a dual core phone, but if I understand the trace right it doesn't look like it's ever doing anything in parallel, and what's worse it appears to be switching the active thread hundreds of times per millisecond! (unless I'm interpreting this incorrectly). All this context switching seems like it would be awful for performance, so I'm not sure why it would want to switch back and forth so fast.
So after my long winded explanation, I'm wondering a few things:
Is my understanding correct, that the filled rectangles in the trace are the active threads, and the colored lines are sleeping threads? Otherwise what do they mean?
Why don't I ever see my threads running simultaneously on a supposedly dual core phone?
Why is it switching active threads so rapidly?
In DDMS I get the warning "WARNING: a debugger is active; method-tracing results will be skewed". Is this something to worry about? How can I get rid of this warning? (I launced app via Run, not via Debug if it makes a difference)

Very nice question, let me start with answers:
You have mixed up threads/methods/activeMethod. Each line in traceview is thread (and if you named your threads, you'll see it's name on left side, like "GL Thread", "main", etc..). Rectangles(colored) represents active executing methods inside each thread, while colored lines represents "paused" methods inside thread. By "paused", i mean "method is still executing, but context was switched to some other thread, and when context switched again to this thread, this method will continue to work. In terminology you've used in your question, ye, lines are sleeping thread's methods, and rectangle is active thread executing method. You can find more info about DDMS traceview here.
Distributing threads among cores is another story and heavily depends on underlying Android OS mechanisms. First of all, be sure that target Android OS is started with SMP (Symmetric Multi-Processing) option on, which is default case for multicore phones, i guess :), but i'm not expert in those things. Some words about SMP you can find here.
Thread switching depends on OS Thread/Process scheduler, thread priority, etc. More info about this things you can find in this answers.
Even if you ran application in non-debugging mode, when you connect with DDMS, and do things such Method profiling, you'll activate debugging parts of davlik vm. More details about debugging here, section "Implementation".
Hope you'll find this answer helpful.

Thanks for the question. A full answer by an insider will be helpful to me, too. I'll say what I know.
Some (all?) phones have an option to enable/disable the second core. Have you checked that yours is turned on?
In my own app I've noticed that merely going from one thread to two (on one core) with no change in total work done causes a factor of 1.5 slowdown, so clearly threading itself has a cost.
It's been in the news that Intel is calling Google out on poor implementation of multicore threading:
http://www.pcworld.com/article/257307/dual_core_processors_wasted_on_android_intel_claims.html
Your results validate this.
One other thing to bear in mind is that multi-core is not multi-processor. You're sharing cache and memory controller bandwidth between cores. One can stall while it waits for the other to finish with a shared resource, in particular for writes on shared cache lines. However this effect ought not account for the single-threading you are seeing.

Related

Heavy CPU analysis while running foreground service

So, I've a Service that is important for applications logic and thus needs to be alive all the time.
First thing is first, I created it as a foreground service.
Whenever user starts to run, Service starts to do a lot of heavy things. It uses FusedLocationAPI different type of sensors and do a lot of different calculations. At this point phone starts to heat up (CPU usage is high). When running stops, heating stops and CPU drops lower.
This makes me think that the main problem is whenever these sensors are used and calculations are made, but there's a lot of possibilities that could cause this and I wanted to understand, how can I deeply analyze batter usage in this case? I read that Battery Historian should be way to go, but the information there is complicated and do not give me much information. How can I use this data to understand which class/runnable/code part is held responsible for the CPU usage? Maybe there's better ways to solve this? Thanks in advance.
Detailed analysis on what's happening using CPU profiler (after suggestion).
Image. Application is opened and few different screens are switch to and from. Nothing special. Phone does not heat up and from CPU analysis everything also looks pretty okay.
2.User starts to run and enter heavy state. The yellow (light green) rectangle shows previously mentioned "important service" that bring important role to application. Called functions does not use too much time of CPU considering the % of whole running trip - which makes me think, I've to look somewhere else..
What I saw is that CPU increases heavily whenever I lock the phone (while service is running. From CPU Bottom Up analysis I cannot understand what is causing the problem (orange rectangle) - looks like a bunch of Android stuff is happening, but it does not give me any clue.
From CPU profiler I do understand that I have a serious problem considering CPU usage, but right now I do not understand what is causing it (which piece of code/class/function). I know for the fact that whenever my important service is created (on app start) I use PARTIAL_WAKE_LOCK to not allow CPU go to sleep and also make service to be alive at all times. As I understand I need to find different solution for this, because CPU drains too much of battery.
How could I found this using only profiler? I now this fact only for code and profiler stack does not tell me much about this (maybe Im not looking at the right place?)
I found out the root cause of high CPU usage.
I did detailed investigation on threads and found out that DefaultDispatcher threads are using A LOT of CPU.
Turns out the has been a bug in Kotlin-Coroutines library and latest version provides fixes. When I did tests I used version 1.3.1, but changing version number to 1.3.3 provided me with big improvements: from 40% usage to 10% CPU usage.

Why can't UI thread and Render thread run at the same time?

I am a mediocore android developer for years. I like android but there's a big problem; frame drops. Even the most powerful ones can stutter so frequently while IOS devices can run at constant 60fps. I just can't understand why. I want to know it. So first thing i did was watching an I/O presentation about performance. And i didn't really understand one thing. Why can't ui and render thread run at the same time ? Yeah i know the basics like render thread can't know what to render while ui thread is doing it's thing but why can't render thread render the frame before? You can see the video here:
https://youtu.be/9HtTL_RO2wI?t=491
And here's a diagram what am i asking for:
You get the idea. I don't know about low level things about android, can anyone explain this like i'm five.
Your process' main thread is responsible for the rendering of the frames that will be presented to the user, so you should keep the code running there as fast and light as possible. If you have to do some heavy processing or access any IO (network, sdcard, etc) that may impact on the fluidity of the application since the thread may be waiting for a response.
As a good practice you should start that IO access/heavy processing on another thread to run in background and let the system decide the priority to run it, if necessary is recommended to present some feedback to the user like a ProgressBar or something to indicate that something is being processed.
Also, the Render Thread need to know what to render before it does it, so the UI Thread have to process which information the app would like to present to the user.
As #JonGoodwin points out, they both run in parallel, but usually in two cores of the same processor, as nowadays phones have at least two cores. Both threads are run in CPU, where RenderThread sends rendering commands to the GPU. Notice that this is true since API 21 (RenderThread is what enables things like ripple effect).
The problem, though, is what #LucianoFerruzzi points out: usually poor code that does too many things in the UI thread (RenderThread is not accessible, at least not with standard mechanisms).
Also, see the following episode of Android Developers Backstage: Episode 74: Graphics

Context switching is too expensive

I am writing a video processing app and have come across the following performance issue:
Most of the methods in my app have great differences between cpu time and real time.
I have investigated using the DDMS TraceView and have discovered that the main culprit for these discrepancies is context switching in some base methods, such as MediaCodec.start() or MediaCodec.dequeueOutputBuffer()
MediaCodec.start() for example has 0.7ms Cpu time and 24.2ms Real time. 97% of this real time is used up by the context switch.
This would not be a real problem, but the method is called quite often, and it is not the only one that presents this kind of symptom.
I also need to mention that all of the processing happens in a single AsyncTask, therefore on a single non-UI thread.
Is context switching a result of poor implementation, or an inescapable reality of threading?
I would very much appreciate any advice in this matter.
First, I doubt the time is actually spent context-switching. MediaCodec.start() is going to spend some amount of time waiting for the mediaserver process to talk to the video driver, and that's probably what you're seeing. (Unless you're using a software codec, your process doesn't do any of the actual work -- it sends IPC requests to mediaserver, which talks to the hardware codec.) It's possible traceview is just reporting its best guess at where the time went.
Second, AsyncTask threads are executed at a lower priority. Since MediaCodec should be doing all of the heavy lifting in the hardware codec, this won't affect throughput, but it's possible that it's having some effect on latency because other threads will be prioritized by the scheduler. If you're worried about performance, stop using AsyncTask. Either do the thread management yourself, or use the handy helpers in java.util.concurrent.
Third, if you really want to know what's happening when multiple threads and processes are involved, you should be using systrace, not traceview. An example of using systrace with custom trace markers (to watch CPU cores spin up) can be found here.

Debugging Threads in Android - How can I see in real time number of threads in queue etc

I'm loading a bunch of images using AsyncTasks, creating bitmaps. Lots of recycling views going on etc. Without going into the gory details, I would like to know if there is any way I can get some realtime stats on threads that might be helpful. In particular what I am noticing is that the doInBackground runs really fast once it gets kicked off, but it seems they it takes a while for these tasks to run. So I was wondering how I can know how many threads are running at a given time. I have seen the dreaded 128 limit on thread exception with 10 in queue, but thats once there is an overload, I would like to be able to watch this as the program is running. Hopefully this visibility will tell me something. BTW, I did try bumping of the thread priority within the doInBackground() but again its really not that it is not fast once it runs, its that it does not get scheduled to run fast. I'm on Android Studio, what kind of tools are available?
When debugging and stopped on a breakpoint you can scroll through all the launched threads and their current execution points (the spinner on the left of the debug tab).
Knowing where your threads are started you can launch method profiling on this method and see how your threads are performing (a "timer" button on the left of the android tab).

Why does traceview give inconsistent measurements?

I am trying to speed up my app start-up time (currently ~5 seconds due to slow Guice binding), and when I run traceview I'm seeing pretty big variations (as high as 30%) in measurements from executions of the same code.
I would assume this is from garbage collection differences, but the time spent in startGC according to traceview is completely insignificant.
This is particularly aggravating because it's very difficult to determine what the effects were of my optimizations when the measurements are so variable.
Why does this happen? Is there any way to make the measurements more consistent?
I suppose you are starting profiling from the code rather than turning it on manually? But anyway even if you use Debug.startMethodTracing and Debug.stopMethodTracing from a specific point of your code you will receive different measurments.
You can see here that Traceview disables the JIT and I believe some other optimizations so during profiling your code is executed slower than without it. Also your code performance depends on overall system load. If some other app is doing any heavy operation in background your code will execute longer. So you should definitely get results that a slightly different and so start-up time couldn't be a constant.
Generally it is not so important how long your method executes but how much CPU time it consumes comparing to other methods.
Sounds like measurement is not your ultimate goal. Your ultimate goal is to make it faster.
The way to do that is by finding what activities are accounting for a large fraction of time, so you can find a better way to do them.
I said "finding", not "measuring", and I said "activities", not "routines".
To do this, it is only necessary to sample the program's state.
Many profilers collect a large number of samples of the program's state, but then they all fall into the same logic - they summarize, on the theory that all you want is measurements, and you don't really care of what.
In fact, if rather than getting summaries you could examine some of the samples in detail, it would tell you a great deal more about how the program is spending its time.
What's more, if on as few as two(2) samples you could see the program pursuing some goal, and it was something you could improve significantly, you would see a significant speedup.
This process can be repeated several times, and that's how you can really optimize it.
The process is explained in more detail here, and there's a use case here.
If you are doing any network related activity on startup then this tool can help you understand what is happening and how you might be able to optimize connections and caching. http://developer.att.com/developer/legalAgreementPage.jsp?passedItemId=9700312

Categories

Resources