I'm trying to profile my renderer, and I'm seeing some weird profiling behavior that I can't explain.
I'm using a glSurfaceView, which I have set to render continuously.
This is how my onDrawFrame() is structured
public void onDrawFrame(GL10 unused) {
GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT | GLES20.GL_DEPTH_BUFFER_BIT);
executeAllDrawCommands();
}
This was behaving slowly under light load, so I created a timer class and started to profile this some. I was quite surprised by what I saw.
I put some probes on my onDrawFrame method like so:
public void onDrawFrame(GL10 unused) {
swapTimer.end();
clearTimer.start();
GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT | GLES20.GL_DEPTH_BUFFER_BIT);
clearTimer.end();
drawTimer.start();
executeAllDrawCommands();
drawTimer.end();
swapTimer.start();
}
clearTimer measures the time it takes to call glClear, drawTimer measures the time it takes to run all my draw calls, and swapTimer measures the time from when onDrawFrame exits and when it returns (the time taken to call eglSwapBuffers).
When I ran a very lightly loaded scene, I got some really strange numbers I can't explain:
swapTimer : 20ms (average)
clearTimer : 11ms (average)
drawTimer : 2ms (average)
I expected the swap time to be somewhat largish, as I believe the device has vsync forced enable at ~30fps, though I don't know why the actual 'clear' call is blocking for 11 milliseconds? I thought it was just supposed to issue an asynchronous command and return?
When I draw a much more busy scene, the numbers change quite a bit:
swapTimer : 2ms (average)
clearTimer : 0ms (average)
drawTimer : 44ms (average)
In this scene my draw calls are taking so much time that it looks like its hiding a lot of the vsync period, and the block on the clear call totally goes away.
Is there any explanation for why glClear is blocking on my lightly loaded scene?
Link to my 'Timer' class source code in case someone is suspicious of my measuring technique: http://pastebin.com/bhXt368W
I put a glFinish (and finishTimer.start()/end() around it), and it takes all the time away from glClear. Instead now glFinish takes some number of milliseconds, and glClear becomes instant.
That explains it.
When your scene is very light and the drawings are rendered very fast, the time to clear and fill the pixels with the new color will take some time (it will always take time, otherwise the renderer is behind and is currenty drawing new stuff). The newer Android devices have fillrate limits. For example, Nexus One has a fillrate lock at 30 Hz - the screen will be synced at that frequency no matter how fast your actual drawings are going. If the drawings finishes under 30 Hz the renderer will sync up with the screen. This is why you notice this delay, which you should notice even if you remove the glClear() call. The renderer is before and faster than the screen's updates.
When the renderer have many objects to draw, the synchronization will halt (given your busy scene's profile data) because the renderer is now after the screen's updates.
When you use glFinish(), it removes the time the glClear() function otherwise would cause, which, by following the fillrate logic, means that glFinish() is now the function that is ensuring synchronization with the screen.
Calculations:
F = 1/T
Easy scene:
F = 1/T = 1/((20+11+2)*10^-3) =~ 30 Hz
The synchronization's delay time appears in your profiler. Renderer is being synchronized with the screen. That means that if you remove the glClear() or the glFinish() call, the delay would appear somewhere else.
Heavy scene:
F = 1/T = 1/((2+0+44)*10^-3)) =~ 22 Hz
The synchronization's delay time does not appear in your profiler. Renderer is after the screen's update frequency.
It seems like this is all related to vsync
That seems to be correct.
Related
I m under delphi, under android, and it look like an openGL problem. I have problem to obtain smooth scrolling. i always have Jerks :( To render the frame we do like this :
will running do begin // << main loop
doSomething // << calculate for example position of controls regarding velocity
executeAllTimerProcedure;
eglMakeCurrent(..)
paintEverything // << paint all texture, this took around 5 ms
eglSwapBuffers(...) // << stuck waiting the vsync signal
end;
The problem is that eglSwapBuffers wait for vsync signal. as the vsync signal is fired every 16.6 ms (60fps) the main loop is blocking at the step eglSwapBuffers. This can cause some jerk in the scrolling if the calculation made in dosomething are delayed. Also all timer are delayed.
i try to set eglSwapInterval(eglGetCurrentDisplay, 0); and this make eglSwapBuffers non blocking but now (and i don't know why) the jerks are even worse :(
So what i can do to make my scrolling /animation very smooth without any jerks ?
In the new Android Vitals section in the console I'm getting warnings about more than 60% of sessions being affected by slow UI render times (missed Vsync: 1.02%, slow UI thread: 14.29%, slow draw commands: 96.84%). I've turned on GPU profiling on my test device (using the production version of the app) and I'm seeing the following TextView update causing render times well over 16ms (around 24-30ms):
updateTimer = new Timer();
updateTimer.scheduleAtFixedRate(new TimerTask() {
#Override
public void run() {
runOnUiThread(new Runnable() {
#Override
public void run() {
timeLeftView.setText(timeLeftString);
}
});
}
}, 100, 500);
When I comment out the textView update, nothing is being changed on the screen and profiler doesn't create any new bars.
One clue is that when opening the activity with the timer, the first 3-4 updates of the timer have rendering at about 8ms but then they rise to around 24-30ms.
Another clue is when I touch any part of the screen, the render times drop back to around 8ms for a few seconds before they shoot up again to 24-30ms. When I stop touch, the render times drop back again for a few seconds before they shoot up again.
So what I'd like to know is:
Is this normal for such a simple TextView update to cause high render times?
Is this what's messing up my Android vitals? Because it runs at only twice a second. Could the problem be elsewhere? The above code is the only thing that's creating high bars in GPU profiling, the other elements of the app work fine, long listviews with multiple textviews and images have rendering times of around 8ms.
What can I do to reduce these draw times? I've tried removing the centering and gravity in the layout for the TextView, as well as wrap_content (as suggested in another answer) but neither have any effect. Apart from that, I'm unsure what to do.
If you put a lot of layers in your xml it will force android to render multiple times ( if you have a lot of layers, refact your code!! ).
I strongly recommend this reading : https://developer.android.com/training/improving-layouts/index.html
About render the TextView multiple times, the speed of the rendering depends of the device you are running your application!
Tried pretty much every suggestion.
Finally solved it by increasing the frequency of the runnable from 500ms to 50ms or shorter. The problem was that the low frequency of the runnable let the CPU/GPU go to a low power state so draws took longer. By increasing the frequency of the runnable and the draws, the CPU/GPU doesn't go into low power state and frames are drawn much faster. Yes, it's more taxing on the battery but not as much as the screen being on in the first place. No users have complained either way and Android vitals are happy now.
Besides, looking at how default/official apps from device manufacturers work (including from Google itself), this is exactly how they handle TextView updates. Google's clock app for example (countdown timer, not stopwatch) updates the TextView ~60 times a second even though once a second would be all that's needed and most frugal.
I am developing an Android video player. I use ffmpeg in native code to decode video frame. In the native code, I have a thread called decode_thread that calls avcodec_decode_video2()
int decode_thread(void *arg) {
avcodec_decode_video2(codecCtx, pFrame, &frameFinished,pkt);
}
I have another thread called display_thread that uses aNativeWindow to display a decoded frame on a SurfaceView.
The problem is that if I let the decode_thread run continuously without a delay. It significantly reduces the performance of avcodec_decode_video2(). Sometimes it takes about 0.1 seconds to decode a frame. However if I put a delay on the decode_thread. Something likes this.
int decode_thread(void *arg) {
avcodec_decode_video2(codecCtx, pFrame, &frameFinished,pkt);
usleep(20*1000);
}
The performance of avcodec_decode_video2() is really good, about 0.001 seconds. However putting a delay on the decode_thread is not a good solution because it affects the playback. Could anyone explain the behavior of avcodec_decode_video2() and suggest me a solution?
It looks impossible that the performance of video decoding function would improve just because your thread sleeps. Most likely the video decoding thread gets preempted by another thread, and hence you get the increased timing (hence your thread did not work). When you add a call to usleep, this does the context switch to another thread. So when your decoding thread is scheduled again the next time, it starts with the full CPU slice, and is not interrupted in the decode_ video2 function anymore.
What should you do? You surely want to decode packets a little bit ahead than you show them - the performance of avcodec_decode_video2 certainly isn't constant, and if you try to stay just one frame ahead, you might not have enough time to decode one of the frames.
I'd create a producer-consumer queue with the decoded frames, with the top limit. The decoder thread is a producer, and it should run until it fills up the queue, and then it should wait until there's room for another frame. The display thread is a consumer, it would take frames from this queue and display them.
I'm writing a simple NDK OpenSL ES audio app that records the users touches on a virtual piano keyboard and then plays them back forever over a set loop. After much experimenting and reading, I've settled on using a separate POSIX loop to achieve this. As you can see in the code it subtracts any processing time taken from the sleep time in order to make the interval of each loop as close to the desired sleep interval as possible (in this case it's 5000000 nanoseconds.
void init_timing_loop() {
pthread_t fade_in;
pthread_create(&fade_in, NULL, timing_loop, (void*)NULL);
}
void* timing_loop(void* args) {
while (1) {
clock_gettime(CLOCK_MONOTONIC, &timing.start_time_s);
tic_counter(); // simple logic gates that cycle the current tic
play_all_parts(); // for-loops through all parts and plays any notes (From an OpenSL buffer) that fall on the current tic
clock_gettime(CLOCK_MONOTONIC, &timing.finish_time_s);
timing.diff_time_s.tv_nsec = (5000000 - (timing.finish_time_s.tv_nsec - timing.start_time_s.tv_nsec));
nanosleep(&timing.diff_time_s, NULL);
}
return NULL;
}
The problem is that even using this the results are better, but quite inconsistent. sometimes notes will delay for perhaps even 50ms at a time, which makes for very wonky playback.
Is there a better way of approaching this? To debug I ran the following code:
gettimeofday(&timing.curr_time, &timing.tzp);
__android_log_print(ANDROID_LOG_DEBUG, "timing_loop", "gettimeofday: %d %d",
timing.curr_time.tv_sec, timing.curr_time.tv_usec);
Which gives a fairly consistent readout - that doesn't reflect the playback inaccuracies whatsoever. Are there other forces at work with Android preventing accurate timing? Or is OpenSL ES a potential issue? All the buffer data is loaded into memory - could there be bottlenecks there?
Happy to post more OpenSL code if needed... but at this stage I'm trying figure out if this thread loop is accurate or if there's a better way to do it.
You should consider seconds when using clock_gettime as well, you may get greater timing.start_time_s.tv_nsec than timing.finish_time_s.tv_nsec. tv_nsec starts from zero when tv_sec is increased.
timing.diff_time_s.tv_nsec =
(5000000 - (timing.finish_time_s.tv_nsec - timing.start_time_s.tv_nsec));
try something like
#define NS_IN_SEC 1000000000
(timing.finish_time_s.tv_sec * NS_IN_SEC + timing.finish_time_s.tv_nsec) -
(timing.start_time_s.tv_nsec * NS_IN_SEC + timing.start_time_s.tv_nsec)
I'm working on creating an app that allows very low bandwidth communication via high frequency sound waves. I've gotten to the point where I can create a frequency and do the fourier transform (with the help of Moonblink's open source code for Audalyzer).
But here's my problem: I'm unable to get the code to run with the correct timing. Let's say I want a piece of code to execute every 10ms, how would I go about doing this?
I've tried using a TimerTask, but there is a huge delay before the code actually executes, like up to 100ms.
I also tried this method simply by pinging the current time and executing only when that time has elapsed. But there is still a delay problem. Do you guys have any ideas?
Thread analysis = new Thread(new Runnable()
{
#Override
public void run()
{
android.os.Process.setThreadPriority(android.os.Process.THREAD_PRIORITY_URGENT_DISPLAY);
long executeTime = System.currentTimeMillis();
manualAnalyzer.measureStart();
while (FFTransforming)
{
if(System.currentTimeMillis() >= executeTime)
{
//Reset the timer to execute again in 10ms
executeTime+=10;
//Perform Fourier Transform
manualAnalyzer.doUpdate(0);
//TODO: Analyze the results of the transform here...
}
}
manualAnalyzer.measureStop();
}
});
analysis.start();
I would recommend a very different approach: Do not try to run your code in real time.
Instead, rely on only the low-level audio code running in real time, by recording (or playing) continuously for a period of time encompassing the events of interest.
Your code then runs somewhat asynchronously to this, decoupled by the audio buffers. Your code's sense of time is determined not by the system clock as it executes, but rather by the defined inter-sample-interval of the audio data you work with. (ie, if you are using 48 Ksps then 10 mS later is 480 samples later)
You may need to modify your protocol governing interaction between the devices to widen the time window in which transmissions can be expected to occur. Ie, you can have precise timing with respect to the actual modulation and symbols within a "packet", but you should not expect nearly the same order of precision in determining when a packet is sent or received - you will have to "find" it amidst a longer recording containing noise.
Your thread/loop strategy is probably roughly as close as you're going to get. However, 10ms is not a lot of time, most Android devices are not super-powerful, and a Fourier transform is a lot of work to do. I find it unlikely that you'll be able to fit that much work in 10ms. I suspect you're going to have to increase that period.
i changed your code so that it takes the execution time of doUpdate into account. The use of System.nanoTime() should also increase accuracy.
public void run() {
android.os.Process.setThreadPriority(android.os.Process.THREAD_PRIORITY_URGENT_DISPLAY);
long executeTime=0;
long nextTime = System.nanoTime();
manualAnalyzer.measureStart();
while (FFTransforming)
{
if(System.nanoTime() >= nextTime)
{
executeTime = System.nanoTime();
//Perform Fourier Transform
manualAnalyzer.doUpdate(0);
//TODO: Analyze the results of the transform here...
executeTime = System.nanoTime() - executeTime;
//guard against the case that doUpdate took longer than 10ms
final long i = executeTime/10000000;
//set the timer to execute again at the next full 10ms intervall
nextTime+= 10000000+ i*10000000
}
}
manualAnalyzer.measureStop();
}
What else could you do?
eliminate Garbage Collection
go native with the NDK (just an idea, this might as well give no benefit)