Is there a good way (proper way, or effective way) to debug slow running code?
I have a thread which runs multiple loops and then recurses and my code is running very slow.
Is there a good way to debug different loops or sections of code to find out which is running slowest?
If the debugger already does this, can someone please explain how,
Many thanks
What you need is not a debugger, but a profiler. Check this tool from Android's SDK: traceview
One of the most primitive ways to determine slow points is to litter the code with print statements. Bottlenecks then show up as delays between prints. This can be improved by printing the system time as you move from one loop to another, making it trivial to determine the slowest loops.
A solution that is potentially easier and more thorough is to use a performance profiler. Most mainstream languages will have standalone profilers or debuggers with performance profiling built-in. A good profiler will determine the percentage of execution time spent on each area of your code and offer useful information for optimizing performance bottlenecks.
If you need more specific information it would be helpful to post the language you are using as well as relevant sections of the code.
Related
I am currently chasing some dropped frames in my app..I turned to systrace for help, but unfortunately I am not smart from its output..
Here is my traceview..
My problem is basically that in my adapter I am creating quite large list items..That means when scrolling the adapter does nothing for some time and then it has to create quite large view..And even I did a lot of optimizations (obvi I do recycling, I avoided basically all redundant object instantiations,..), there are some dropped frames..It has to take more than 16ms:/
back to my main problem..I thought I will see even traces of methods I'm calling right into my adapter during the getView invocation. But I can't see that there. Am I doing anything wrong? Do you see from this traceview where is the main bottleneck of my code? I am lost:/
Thank you..
I think you've already spotted the key thing that systrace has to tell you: in pid 527, performTraversals is taking 20+ milliseconds. This seems to be split evenly between getDisplayList and drawDisplayList.
If you look at the Falcon Pro case study you'll see a number of similarities, although in that case the majority of the time was spent drawing (because of major overdraw). Most of the techniques described in that article are generally useful though; use traceview, hierarchyviewer, and the on-device developer options to look for low-hanging performance issues.
If you want to see your own methods in systrace, you can add your own application-specific tags starting in Android 4.3. There is an example here.
I am trying to speed up my app start-up time (currently ~5 seconds due to slow Guice binding), and when I run traceview I'm seeing pretty big variations (as high as 30%) in measurements from executions of the same code.
I would assume this is from garbage collection differences, but the time spent in startGC according to traceview is completely insignificant.
This is particularly aggravating because it's very difficult to determine what the effects were of my optimizations when the measurements are so variable.
Why does this happen? Is there any way to make the measurements more consistent?
I suppose you are starting profiling from the code rather than turning it on manually? But anyway even if you use Debug.startMethodTracing and Debug.stopMethodTracing from a specific point of your code you will receive different measurments.
You can see here that Traceview disables the JIT and I believe some other optimizations so during profiling your code is executed slower than without it. Also your code performance depends on overall system load. If some other app is doing any heavy operation in background your code will execute longer. So you should definitely get results that a slightly different and so start-up time couldn't be a constant.
Generally it is not so important how long your method executes but how much CPU time it consumes comparing to other methods.
Sounds like measurement is not your ultimate goal. Your ultimate goal is to make it faster.
The way to do that is by finding what activities are accounting for a large fraction of time, so you can find a better way to do them.
I said "finding", not "measuring", and I said "activities", not "routines".
To do this, it is only necessary to sample the program's state.
Many profilers collect a large number of samples of the program's state, but then they all fall into the same logic - they summarize, on the theory that all you want is measurements, and you don't really care of what.
In fact, if rather than getting summaries you could examine some of the samples in detail, it would tell you a great deal more about how the program is spending its time.
What's more, if on as few as two(2) samples you could see the program pursuing some goal, and it was something you could improve significantly, you would see a significant speedup.
This process can be repeated several times, and that's how you can really optimize it.
The process is explained in more detail here, and there's a use case here.
If you are doing any network related activity on startup then this tool can help you understand what is happening and how you might be able to optimize connections and caching. http://developer.att.com/developer/legalAgreementPage.jsp?passedItemId=9700312
I'm working on an android game, and I started noticing a little sluggishness during development so I wanted to try to utilize multithreading for fun and learning.
My application has 3 threads:
UI thread (should be mostly idle)
Game Logic Thread
Graphics Thread
I minimized the critical section between threads 2 and 3 as best I could, with the idea that the game logic could update independently of the rendering thread, and then at the end of both threads I could have a short as possible window where I push all the graphics updates from the logic thread to the game loop. This should allow the two threads to work independently for a good majority of the time. In theory sounds like a performance win.
However once I got around to implementing, my performance took a big dive. It much worse than before, one loop of updating and rendering is taking like 50 ms (20fps), so it looks like garbage. This is just rendering some 20 triangles and maybe 20 textured quads, a really simple workload (I afraid to think of what it will be when I implement proper graphics).
Anyway I took a DDMS trace in android to profile where things were going wrong or could be improved.
http://i.stack.imgur.com/DDUYE.png
This is a view of roughly 3 frames of my game. So far it seems to be doing roughly what I expected. The parts that are highlighted in blue is the locked section, which looks about right (keeps the glThread mostly waiting while it is locked). However once I unlock it I should see both threads working simultaneously, and it looks like they are, but if I look closer:
http://i.stack.imgur.com/vukXQ.png
I'm doing my development on a dual core phone, but if I understand the trace right it doesn't look like it's ever doing anything in parallel, and what's worse it appears to be switching the active thread hundreds of times per millisecond! (unless I'm interpreting this incorrectly). All this context switching seems like it would be awful for performance, so I'm not sure why it would want to switch back and forth so fast.
So after my long winded explanation, I'm wondering a few things:
Is my understanding correct, that the filled rectangles in the trace are the active threads, and the colored lines are sleeping threads? Otherwise what do they mean?
Why don't I ever see my threads running simultaneously on a supposedly dual core phone?
Why is it switching active threads so rapidly?
In DDMS I get the warning "WARNING: a debugger is active; method-tracing results will be skewed". Is this something to worry about? How can I get rid of this warning? (I launced app via Run, not via Debug if it makes a difference)
Very nice question, let me start with answers:
You have mixed up threads/methods/activeMethod. Each line in traceview is thread (and if you named your threads, you'll see it's name on left side, like "GL Thread", "main", etc..). Rectangles(colored) represents active executing methods inside each thread, while colored lines represents "paused" methods inside thread. By "paused", i mean "method is still executing, but context was switched to some other thread, and when context switched again to this thread, this method will continue to work. In terminology you've used in your question, ye, lines are sleeping thread's methods, and rectangle is active thread executing method. You can find more info about DDMS traceview here.
Distributing threads among cores is another story and heavily depends on underlying Android OS mechanisms. First of all, be sure that target Android OS is started with SMP (Symmetric Multi-Processing) option on, which is default case for multicore phones, i guess :), but i'm not expert in those things. Some words about SMP you can find here.
Thread switching depends on OS Thread/Process scheduler, thread priority, etc. More info about this things you can find in this answers.
Even if you ran application in non-debugging mode, when you connect with DDMS, and do things such Method profiling, you'll activate debugging parts of davlik vm. More details about debugging here, section "Implementation".
Hope you'll find this answer helpful.
Thanks for the question. A full answer by an insider will be helpful to me, too. I'll say what I know.
Some (all?) phones have an option to enable/disable the second core. Have you checked that yours is turned on?
In my own app I've noticed that merely going from one thread to two (on one core) with no change in total work done causes a factor of 1.5 slowdown, so clearly threading itself has a cost.
It's been in the news that Intel is calling Google out on poor implementation of multicore threading:
http://www.pcworld.com/article/257307/dual_core_processors_wasted_on_android_intel_claims.html
Your results validate this.
One other thing to bear in mind is that multi-core is not multi-processor. You're sharing cache and memory controller bandwidth between cores. One can stall while it waits for the other to finish with a shared resource, in particular for writes on shared cache lines. However this effect ought not account for the single-threading you are seeing.
I have an android app that is getting fairly large and complex now, and it seems to have intermittent performance problems. One time I will run the app and it's fine, another time it will struggle when switching views.
How can I detect the causes of the performance problem using debugging tools so that I may correct it?
Use the ddms tool which comes with the SDK. It has a nice feature called Allocation Tracker that allows you to see in real time how much memory your code is consuming and what specific line is causing that.
Most of the cases your app will slow down because of bad adapter implementations, poor layout inflation techniques or not using a cache system to decode Bitmaps (such as using SoftReference).
Take a look at this article for a brief explanation: Tracking Memory Allocations
In addition to the tool Cristian mentioned, Traceview is another helpful one. It's not very well documented but it can give you information about how often methods are being called, and which methods are taking a lot of time.
Another good memory tracking tool is MAT, here is a page that describes how to use it with Android: http://ttlnews.blogspot.com/2010/01/attacking-memory-problems-on-android.html
Both the tracing and the heap dumps can be done through the DDMS panel, if you prefer not to work with the command line. In Eclipse, in the devices panel, under the device/emulator you are using, click on your app (listed by package name), and you can then Start/Stop Method Profiling to get a trace and you can use Dump HPROF to get a heap dump. Note, the dumps need to be converted to work with the MAT plugin. The attacking-memory-problems-on-android above describes how to do that.
I am currently writing a paper on the Android platform. After some research, it's clear that Dalvik has room for improvement. I was wondering, what do you think would be the best use of a developer's time with this goal?
JIT compilation seems like the big one, but then i've also heard this would be of limited use on such a low resource machine. Does anyone have a resource or data that backs this up?
Are there any other options that should be considered? Aside from developing a robust native development kit to bypass the VM.
For those who are interested, there is a lecture that has been recorded and put online regarding the Dalvik VM.
Any thoughts welcome, as this question appears subjective i'll clarify that the answer I'll accept must have some justification for proposed changes. Any data to back it up, such as the improvement in the Sun JVM when it was introduced, would be a massive plus.
Better garbage collection: compacting at minimum (to eliminate memory fragmentation problems experienced today), ideally less CPU intensive at doing the collection itself (to reduce the "my game frame rates suck" complaints)
JIT, as you cite
Enough documentation that, when coupled with an NDK, somebody sufficiently crazy could compile Dalvik bytecode to native code for an AOT compilation option
Make it separable from Android itself, such that other projects might experiment with it and community contributions might arrive in greater quantity and at a faster clip
I'm sure I could come up other ideas if you need them.
JIT. The stuff about it not helping is a load of crap. You might be more selective about what code you JIT but having 1/10th the performance of native code is always going to be limiting
Decent GC. Modern generational garbage collectors do not have big stutters.
Better code analysis. There are lot of cases where allocations/frees don't need to be made, locks held, and so on. It allows you to write clean code rather than doing optimizations that the machine is better at
In theory most of the higher level languages (Java, Javascript, python,...) should be within 20% of native code performance for most cases. But it requires the platform vendor to spend 100s+ developer man years. Sun Java is getting good. They have also been working on it for 10 years.
One of the main problems with Dalvik is performance, which is terrible I heard, but one of the things I would like most is the addition of more languages.
The JVM has had community projects getting Python and Ruby running on the platform, and even special languages such as Scala, Groovy and Closure developed for it. It would be nice to see these (and/or others) on the Dalvik platform as well. Sun has been working on the Da Vinci machine as well, a dynamic typing extension of the JVM, which indicates a major shift away from the "one language fits all" philosophy Sun has followed over the last 15 years.