I am writing a video processing app and have come across the following performance issue:
Most of the methods in my app have great differences between cpu time and real time.
I have investigated using the DDMS TraceView and have discovered that the main culprit for these discrepancies is context switching in some base methods, such as MediaCodec.start() or MediaCodec.dequeueOutputBuffer()
MediaCodec.start() for example has 0.7ms Cpu time and 24.2ms Real time. 97% of this real time is used up by the context switch.
This would not be a real problem, but the method is called quite often, and it is not the only one that presents this kind of symptom.
I also need to mention that all of the processing happens in a single AsyncTask, therefore on a single non-UI thread.
Is context switching a result of poor implementation, or an inescapable reality of threading?
I would very much appreciate any advice in this matter.
First, I doubt the time is actually spent context-switching. MediaCodec.start() is going to spend some amount of time waiting for the mediaserver process to talk to the video driver, and that's probably what you're seeing. (Unless you're using a software codec, your process doesn't do any of the actual work -- it sends IPC requests to mediaserver, which talks to the hardware codec.) It's possible traceview is just reporting its best guess at where the time went.
Second, AsyncTask threads are executed at a lower priority. Since MediaCodec should be doing all of the heavy lifting in the hardware codec, this won't affect throughput, but it's possible that it's having some effect on latency because other threads will be prioritized by the scheduler. If you're worried about performance, stop using AsyncTask. Either do the thread management yourself, or use the handy helpers in java.util.concurrent.
Third, if you really want to know what's happening when multiple threads and processes are involved, you should be using systrace, not traceview. An example of using systrace with custom trace markers (to watch CPU cores spin up) can be found here.
Related
I am a mediocore android developer for years. I like android but there's a big problem; frame drops. Even the most powerful ones can stutter so frequently while IOS devices can run at constant 60fps. I just can't understand why. I want to know it. So first thing i did was watching an I/O presentation about performance. And i didn't really understand one thing. Why can't ui and render thread run at the same time ? Yeah i know the basics like render thread can't know what to render while ui thread is doing it's thing but why can't render thread render the frame before? You can see the video here:
https://youtu.be/9HtTL_RO2wI?t=491
And here's a diagram what am i asking for:
You get the idea. I don't know about low level things about android, can anyone explain this like i'm five.
Your process' main thread is responsible for the rendering of the frames that will be presented to the user, so you should keep the code running there as fast and light as possible. If you have to do some heavy processing or access any IO (network, sdcard, etc) that may impact on the fluidity of the application since the thread may be waiting for a response.
As a good practice you should start that IO access/heavy processing on another thread to run in background and let the system decide the priority to run it, if necessary is recommended to present some feedback to the user like a ProgressBar or something to indicate that something is being processed.
Also, the Render Thread need to know what to render before it does it, so the UI Thread have to process which information the app would like to present to the user.
As #JonGoodwin points out, they both run in parallel, but usually in two cores of the same processor, as nowadays phones have at least two cores. Both threads are run in CPU, where RenderThread sends rendering commands to the GPU. Notice that this is true since API 21 (RenderThread is what enables things like ripple effect).
The problem, though, is what #LucianoFerruzzi points out: usually poor code that does too many things in the UI thread (RenderThread is not accessible, at least not with standard mechanisms).
Also, see the following episode of Android Developers Backstage: Episode 74: Graphics
I have reposted my question from Android Enthusiasts here, as this is more of a programming question, and it was recommended.
Anyway. Here it is:
I am making an app, that changes the build.prop of key values for a ROM. However, Android often gives me an ANR warning, as I am doing all the work on the UI thread. On the Android documentation, it tells me that I should use worker threads, and not do any work in the UI thread. But, I am building this system app to go with a ROM for a single core device.
Why would I want to use worker threads, as isn't this less efficient? As, Android has to halt the UI thread, load the worker thread, and when the UI is used again, halt the worker thread and load the UI thread again. Isn't this less efficient?
So, Should I use worker threads (Which slows the UI thread down anyway) or just do all of my work on the UI thread *Even if the application UI is really slow)?
If your users were robots, your logic would make perfect sense. No context switching equals (very slightly) less overall computation time. You could benchmark it and see how much exactly.
However, in the present (and near future) your users will most likely be humans and with that you need to start thinking of psychology: A moving progress bar or responsiveness in general will give your users the impression that the the task is actually taking a shorter time than without any sort of feedback. The subjective speed is much higher with feedback.
There exist numerous papers on the subject of subjective speed, the first one I could find on the web has a nice comparison of progress bars in a video (basically, some bars seem to go faster than others, thus reducing the subjective overall wait time).
Use worker threads.
As you've said, doing everything on the UI thread locks your UI until the operation is completed. This means you can't update progress, can't handle input events (such as the user pressing a cancel button), etc.
Your concern about the speed of context switching is misplaced - this happens all the time anyway, as core system processes and other apps run in the background. Some quick Googling shows that context switching a thread within the same process is typically faster than a process-level context switch anyway. There is slightly more overhead introduced by creating the threads and then the subsequent context switches, but it's likely to be minute - especially if you only have the 1 thread doing the work. For the reasons I've listed above alone (UI updates and the ability to accept user input), take the few-millisecond overall performance hit.
The short version:
I'm developing a synth app and using Opensl with low latency. I was doing all the audio calculation in the Opensl callback funktion (I know I should not but I did anyway). Now the calculations take about 75% cpu time on my nexus 4, so the next step is to do all the calculations in multiple threads instead.
The problem I ran into was that the audio started to stutter since the callback thread obviously run on a high priority while my new thread doesn't. If I use more/bigger buffers the problem goes away but so does the realtime too. Setting higher priority on the new thread don't seem to work.
So, is there even possible to do threaded low latency audio or do I have to do everything in the callback for it to work?
I have a buffer of 256 samples and that's about 5ms and that should be ages for the thread- scheduler-thingie to run my calc thread.
I think the fundamental problem lies in the performance of your synth-engine. A decent channel count with a Cortex-A8 or -A9 CPU is achievable with a single core. What language have you implemented it in? If it happens to be Java, I recommend porting it to C++.
Using multiple threads for synthesis is certainly possible, but brings with it new problems - namely that each thread must synchronise before the generated audio can be mixed.
Unless you take an additional latency hit that would come from running the synthesis threads asynchronously, the likely set-up is that in your render call-back you'd signal the additional synthesis threads and then wait for them to complete before mixing the audio from all of them together.
(an obvious optimisation is that the render call-back runs some of the processing itself as it's already running on the CPU and would otherwise be doing nothing).
Herein lies the problem. Unless you can be certain that your synth render threads run with real-time priority, you can potentially take a scheduling hit each time the render callback runs, and potentially another if you block the callback thread waiting for the synth render threads to catch up.
Last time I looked at audio on Android, Bionic was deficient of a means of setting real-time thread priority (e.g. SCHED_FIFO). In any case, whether this is even allowed is matter of operating system policy: on a desktop Linux system you either need to be root or have adjusted the appropriate ulimit (as root) - I'm not sure what Android does here, but I very much suspect that downloaded apps aren't by default given this permission. Nor the other useful permission which is to mlock() the code and its likely stack needs into physical memory.
I'm working on an android game, and I started noticing a little sluggishness during development so I wanted to try to utilize multithreading for fun and learning.
My application has 3 threads:
UI thread (should be mostly idle)
Game Logic Thread
Graphics Thread
I minimized the critical section between threads 2 and 3 as best I could, with the idea that the game logic could update independently of the rendering thread, and then at the end of both threads I could have a short as possible window where I push all the graphics updates from the logic thread to the game loop. This should allow the two threads to work independently for a good majority of the time. In theory sounds like a performance win.
However once I got around to implementing, my performance took a big dive. It much worse than before, one loop of updating and rendering is taking like 50 ms (20fps), so it looks like garbage. This is just rendering some 20 triangles and maybe 20 textured quads, a really simple workload (I afraid to think of what it will be when I implement proper graphics).
Anyway I took a DDMS trace in android to profile where things were going wrong or could be improved.
http://i.stack.imgur.com/DDUYE.png
This is a view of roughly 3 frames of my game. So far it seems to be doing roughly what I expected. The parts that are highlighted in blue is the locked section, which looks about right (keeps the glThread mostly waiting while it is locked). However once I unlock it I should see both threads working simultaneously, and it looks like they are, but if I look closer:
http://i.stack.imgur.com/vukXQ.png
I'm doing my development on a dual core phone, but if I understand the trace right it doesn't look like it's ever doing anything in parallel, and what's worse it appears to be switching the active thread hundreds of times per millisecond! (unless I'm interpreting this incorrectly). All this context switching seems like it would be awful for performance, so I'm not sure why it would want to switch back and forth so fast.
So after my long winded explanation, I'm wondering a few things:
Is my understanding correct, that the filled rectangles in the trace are the active threads, and the colored lines are sleeping threads? Otherwise what do they mean?
Why don't I ever see my threads running simultaneously on a supposedly dual core phone?
Why is it switching active threads so rapidly?
In DDMS I get the warning "WARNING: a debugger is active; method-tracing results will be skewed". Is this something to worry about? How can I get rid of this warning? (I launced app via Run, not via Debug if it makes a difference)
Very nice question, let me start with answers:
You have mixed up threads/methods/activeMethod. Each line in traceview is thread (and if you named your threads, you'll see it's name on left side, like "GL Thread", "main", etc..). Rectangles(colored) represents active executing methods inside each thread, while colored lines represents "paused" methods inside thread. By "paused", i mean "method is still executing, but context was switched to some other thread, and when context switched again to this thread, this method will continue to work. In terminology you've used in your question, ye, lines are sleeping thread's methods, and rectangle is active thread executing method. You can find more info about DDMS traceview here.
Distributing threads among cores is another story and heavily depends on underlying Android OS mechanisms. First of all, be sure that target Android OS is started with SMP (Symmetric Multi-Processing) option on, which is default case for multicore phones, i guess :), but i'm not expert in those things. Some words about SMP you can find here.
Thread switching depends on OS Thread/Process scheduler, thread priority, etc. More info about this things you can find in this answers.
Even if you ran application in non-debugging mode, when you connect with DDMS, and do things such Method profiling, you'll activate debugging parts of davlik vm. More details about debugging here, section "Implementation".
Hope you'll find this answer helpful.
Thanks for the question. A full answer by an insider will be helpful to me, too. I'll say what I know.
Some (all?) phones have an option to enable/disable the second core. Have you checked that yours is turned on?
In my own app I've noticed that merely going from one thread to two (on one core) with no change in total work done causes a factor of 1.5 slowdown, so clearly threading itself has a cost.
It's been in the news that Intel is calling Google out on poor implementation of multicore threading:
http://www.pcworld.com/article/257307/dual_core_processors_wasted_on_android_intel_claims.html
Your results validate this.
One other thing to bear in mind is that multi-core is not multi-processor. You're sharing cache and memory controller bandwidth between cores. One can stall while it waits for the other to finish with a shared resource, in particular for writes on shared cache lines. However this effect ought not account for the single-threading you are seeing.
Just wondering if threading with one processor improves things for me.
I am building an application that performs data intensive calculations (fft on pcm data) while a UI is running and needs to run smoothly.
I have been looking at AsyncTask but was thinking:
If I have a single core processor (600 MHz ARM 11 processor) running on my Optimus One, will threading make a difference? I thought for threads to run independantly you would need multiple processors? Or have I gone wrong somewhere?
In order to guarantee responsiveness, it is imperative to leave the main or UI thread to do the UI things. This excludes intensive drawing or 3d rendering in games. When you start to do computationally intensive things in your main thread, the user will see lag. A classic example:
on a button click, sleep(1000). Compare this with, on a button click, start an AsyncTask that sleeps(1000).
An asynctask (and other threading) allows the app to process the calculations and UI interactions "simulataneously".
As far as how concurrency works, context switching is the name of the game (as Dan posts).
Multithreading on a single core cpu will not increase your performance. In fact, the overhead associated with the context switching will actually decrease your performance. HOWEVER, who cares how fast your app is working, when the user gets frustrated with the UI and just closes the app?
Asynctask is the way to go, for sure.
Take a look at the Dev Guide article Designing for Responsiveness.
Android uses the Linux kernel and other specialized software to create the Android Operating System. It uses several processes and each process has at least one thread. Multi-threading on a single (and mutiple) processor hardware platform is accomplished by context switching. This gives the illusion of running more than one thread per processor at a time.