I have a problem with very low rendering time on an android tablet using the NDK and the egl commands. I have timed calls to eglSwapBuffers and is taking a variable amount of time, frequently exceeded the device frame rate. I know it synchronizes to the refresh, but that is around 60FPS, and the times here drop well below that.
The only command I issue between calls to swap is glClear, so I know it isn't anything that I'm drawing causing the problem. Even just by clearing the frame rate drops to 30FPS (erratic though).
On the same device a simple GL program in Java easily renders at 60FPS, thus I know it isn't fundamentally a hardware issue. I've looked through the Android Java code for setting up the GL context and can't see any significant difference. I've also played with every config attribute, and while some alter the speed slightly, none (that I can find) change this horrible frame rate drop.
To ensure the event polling wasn't an issue I moved the rendering into a thread. That thread now only does rendering, thus just calls clear and swap repeatedly. The slow performance still persists.
I'm out of ideas what to check and am looking for suggestions as to what the problem might be.
There's really not enough info (like what device you are testing on, what was you exact config etc) to answer this 100% reliable but this kind of behavior is usually caused by window and surface pixel format mismatch eg. 16bit (RGB565) vs 32bit.
FB_MULTI_BUFFER=3 environment variable will enable the multi buffering on Freescale i.MX 6 (Sabrelite) board with some recent LTIB build (without X). Your GFX driver may needs something like this.
Related
I recently upgraded my old Galaxy S2 phone to a brand new Galaxy S7, and was very surprised to find an old game I wrote seemed to be performing worse on the new phone. After cutting everything down to a bare bones project, I have discovered the problem - the GLES20.glFinish() call I was performing at the end of every onDrawFrame. With this in there, with a glClear but no draw calls, the FPS hovered around 40. Without the glFinish, solid 60 FPS. My old S2 had solid 60 FPS regardless.
I then went back to my game, and removed the glFinish method call, and sure enough performance went back to being perfect and there was no obvious downside to its removal.
Why was glFinish slowing down my frame rate on my new phone but not my old phone?
I think a speculative answer is as good as it's going to get, so — apologies for almost certainly repeating a lot of what you already know:
Commands sent to OpenGL go through three states, named relative to the GPU side of things:
unsubmitted
submitted but pending
completed
Communicating with the code running the GPU is usually expensive. So most OpenGL implementations accept your calls and just queue the work up inside your memory space for a while. At some point it'll decide that a communication is justified and will pay the cost to transfer all the calls at once, promoting them to the submitted state. Then the GPU will complete each one (potentially out-of-order, subject to not breaking the API).
glFinish:
... does not return until the effects of all previously called GL
commands are complete. Such effects include all changes to GL state,
all changes to connection state, and all changes to the frame buffer
contents.
So for some period when that CPU thread might have been doing something else, it now definitely won't. But if you don't glFinish then your output will probably still appear, it's just unclear when. glFlush is often the correct way forwards — it'll advance everything to submitted but not wait for completed, so everything will definitely appear shortly, you just don't bother waiting for it.
OpenGL bindings to the OS vary a lot; in general though you almost certainly want to flush rather than finish if your environment permits you to do so. If it's valid to neither flush nor finish and the OS isn't pushing things along for you based on any criteria then it's possible you're incurring some extra latency (e.g. the commands you issue one frame may not reach the GPU until you fill up the unsubmitted queue again during the next frame) but if you're doing GL work indefinitely then output will almost certainly still proceed.
Android sits upon EGL. Per the spec, 3.9.3:
... eglSwapBuffers and eglCopyBuffers perform an implicit flush operation
on the context ...
I therefore believe that you are not required to perform either a flush or a finish in Android if you're double buffering. A call to swap the buffers will cause a buffer swap as soon as drawing is complete without blocking the CPU.
As to the real question, the S7 has an Adreno 530 GPU. The S2 has a Mali T760MP6 GPU. The Malis are produced by ARM, the Adrenos by Qualcomm, so they're completely different architectures and driver implementations. So the difference that causes the blocking could be almost anything. But it's permitted to be. glFinish isn't required and is a very blunt instrument; it's probably not one of the major optimisation targets.
I am attempting to determine (to within 1 ms) when particular screen flips happen on Android. Choreographer fires every time a frame flips, but gives no way of determining which frame is actually being displayed. According to https://source.android.com/devices/graphics/architecture.html, there are several layers in the process: the user land buffer, which flips to a triple-buffered queue, which flips to the surface flinger, which flips to the hardware. Each of these layers can potentially drop a frame, but at this point I have only determined how to to monitor the user land buffer. Is there a way to monitor the other buffers/flips (in real time, on a non-rooted, non-custom phone)?
I have observed unexpected frame delays on the HTC M8 (about 1 every 5 minutes), but the Nexus 7 does not appear to have this problem. I measure the delays by using a Cedrus StimTracker (http://cedrus.com/stimtracker/) with a photo sensor and the Lab Streaming Layer (https://github.com/sccn/labstreaminglayer). I have tried using eglPresentationTimeANDROID to control when screens are flipped, and that has not fixed the problem.
Note that I'm using the ndk, but I can usually use the JNI to get access to non-ndk features when I need to.
The reason I care is in order to use Android for psychological and neurological experiments, where 1 ms precision is highly desirable.
As far as accessible APIs go, it sounds like you've found the relevant bits and pieces. If you haven't yet, please read through this stackoverflow item.
Using Choreographer and extrapolation, you can guess at when the next display refresh will occur. Using eglPresentationTimeANDROID() on an Android 5.0+ device, you can tell SurfaceFlinger when you want a particular frame to be sent to the display. Assuming SurfaceFlinger is properly accounting for all latency (such as additional frames added by "smart" panels), that should get you reliable timing.
(Bear in mind that the timing is based on when the display latches the next frame, not when the next frame is fully visible on the display... the latency there will depend on the panel.)
Grafika's "scheduled swap" Activity uses this feature, but it sounds like you're already familiar.
The only way to get signaled by the display when it does the swap would be to dup() the display-retire fence fd from the previous frame, and wait on it. Some of the code in SurfaceFlinger does this, notably DispSync watches the retire fences to see if the software "VSYNC" is drifting. There is no public API for fences, and the user-space response time could certainly be more than 1ms anyway... it usually works out better to schedule ahead than it does to react. Your requirement for non-rooted non-custom devices makes this problematic.
If you're mostly seeing correct behavior, but occasionally seeing a miss, your best bet is to use systrace to track down the cause.
I am building a game like application using android NDK and openGL ES 2.0
So far I understand the concept of vertices and shaders and programs.
The main game loop would be a loop in a single thread as follows
step 1. Read all the user input
step 2. Update game objects (if needed) based on the input
step 3. make draw calls for all the objects
step 4. call glSwapBuffers
then loop back to step 1
But I ran into various confusions regarding sync and threading so I'm listing all the question together.
1.Since open GL draw calls are asynchronous the draw calls and glSwapBuffers may get called many times before the gpu has even rendered actually a single frame from calls from last iteration of loop. Will this be problematic? buffer overflow or tearing ?
2.Assuming VSYNC is enabled then does point 1 still causes problem?
3.Since all calls are async how do I measure the time spent rendering each frame? glSwapBuffers would return immediately so how can I know when was the frame actually done?
4.loading textures will occupy space in the ram is checking free memory before loading texture standard way or I should keep loading textures until I reach OUT_OF_MEMORY_ERROR?
5.If I switch to multithreaded approach calling just glswapbuffers at a fixed 60 times per second without any regard to the thread which is processing input and giving out draw calls then what is supposed to happen?
Also how do I control the fps in game loop? I know the exact fps depends on a large no of factors but how can you go close to that
The SwapBuffers() will not be executed out of order. Issuing it after all of the draw commands for the frame is fine. The driver will take care about it, you don't need to sync anything. You can only screw this about by using multiple threads or multiple contexts, but even that would take a lot of effort.
There is no problem with 1, and VSYNC does not directly change anything here.
The calls might be asynchronous, but the driver will not queue up an unlimit amount of work. Sooner or later, it will have to block, if you try to issue too many calls in advance. When vsync is on, the typicial behavior is that the driver will queue up at most a few frames (or just one, depending on the driver settings), and SwapBuffers() will block when that limit is reached. So the timing statistics you get there are accurate, after the first few frames. Note that this is still much better than completely flushing the queue, as the driver unblocks as soon as the first pending buffer swap was carried out.
This is a totally new topic, which probably belongs into another question. However: It is very unlikely that you get any of the current desktop GL implementations to ever generate GL_OUT_OF_MEMORY. The driver will automatically page textures (and other objects) between VRAM and system RAM (and the OS might even page that to disk). The GL also provides no means to query the available memory.
In that scenario, you will need to synchronize manually. That approach does not make the slightest sense and seems like trying to solve a problem which does not exist. If you want your game to use multithreading, still put all the gl rendering (and swapbuffers) into the same thread. You can use different threads for input processing, sound, physics, update of the scene, general game logic and whatever. But you should just use a single thread/single context approach for the GL. That way, it also won't hurt you when SwapBuffers() blocks your render thread, as your game logic and input handling is still done, and the render thread will just render new frames with the newest available data in the frequency the display needs (with vsync on) or as fast as the CPU and GPU can work (if vsync is off).
I'd like to calculate FPS to detect performance issue of an application based on existing Android profiling tool .
I noted that on Systrace, it can record the length of performTraversals. As far as I know, performTraversals performs measure, layout and draw, which include most of jobs when updating a frame. So can performTraversals be representative enough to measure whether a frame will take 60 ms to update?
I also noted that Systrace record the time spending on SurfaceFlinger. I know SurfaceFlinger served for rendering purpose, but I don't know the exact beginning point and ending point of a frame. Should I also considering the time spent on SurfaceFlinger to the frame rate? (Though I do observe that SurfaceFlinger perform more frequently than performTraversals, which means SurfaceFlinger may not necessarily follow performTraversals. It will also be triggered in other scenarios.)
P.S. I'm aware of the sysdump gfxinfo, but it can only record 128 frames(~2 seconds), while what I want may last much longer.
Systrace is not useful for measuring FPS overall, but you can do that trivially with a frame counter and System.nanoTime(). If you're not hitting your target framerate, though, it can help you figure out why not.
The official docs provide some useful pointers, but there's a lot of information and the interactions can be complex. The key things to know are:
The device display panel generates a vsync signal. You can see that on the VSYNC line. Every time it transitions between 1 and 0 is a refresh.
The vsync wakes surfaceflinger, which gathers up the incoming buffers for the various windows and composites them (either itself using OpenGL ES, or through the Hardware Composer).
If your app was running faster than the panel refresh rate (usually 60fps), it will have blocked waiting for surfaceflinger (in, say, eglSwapBuffers()). Once surfaceflinger acquires the buffer, the app is free to continue and generate another frame.
Unless you're rendering offscreen, you can't go faster than surfaceflinger.
As of Android 4.3 (API 18) you can add your own events to the systrace output using the android.os.Trace class. Wrapping your draw method with trace markers can be extremely informative. You have to enable their tag with systrace to see them.
If you want to be running at 60fps, your rendering must finish in well under 16.7ms. If you see a single invocation of performTraversals taking longer than that, you're not going to hit maximum speed.
I'm writing a graphically intense game for the Nexus One, using the NDK (revision 4) and OpenGL ES 2.0. We're really pushing the hardware here, and for the most part it works well, except every once in a while I get a serious crash with this log message:
W/SharedBufferStack( 398): waitForCondition(LockCondition) timed out
(identity=9, status=0). CPU may be pegged. trying again.
The entire system locks up, repeats this message over and over, and will either restart after a couple minutes or we have to reboot it manually. We're using Android OS 2.1, update 1.
I know a few other people out there have seen this bug, sometimes in relation to audio. In my case it's caused by the SharedBufferStack, so I'm guessing it's an OpenGL issue. Has anyone encountered this, and better yet fixed it? Or does anyone know what's going on with the SharedBufferStack to help me narrow things down?
I don't believe such error can occur in audio code, SharedBufferStack is only used in Surface libraries. Most probably this is a bug in EGL swapBuffers or SurfaceFlinger implementation, and you should file it to the bug tracker.
I got CPU may be pegged messages on LogCat because I had a ArrayBlockingQueue in my code. If you have any blocking queue (as seems to be the case with audio buffers), be sure to BlockingQueue.put() only if you have timing control enough to properly BlockingQueue.take() elements to make room for it. Or else, have a look on using BlockingQueue.offer().
The waitForCondition() causes the lockup (system-freeze).
But it is not the root-cause. This seems to be a issue with
The audio-framework (ur game has sounds?)
-or-
The GL rendering-subsystem.
Any "CPU-pegged" messages in the log?
You might want to take a look at this:
http://soledadpenades.com/2009/08/25/is-the-cpu-pegged-and-friends/
There seems to be a driver problem with eglSwapBuffers():
http://code.google.com/p/android/issues/detail?id=20833&q=cpu%20may%20be%20pegged&colspec=ID%20Type%20Status%20Owner%20Summary%20Stars
One workaround is to call glFinish() preceding your call to eglSwapBuffers(), however this will induce a performance hit.
FWIW, I hit this issue recently while developing on Android 2.3.4 using GL ES 2 on a Samsung Galaxy S.
The issue for me was a bug in my glDrawArrays call - I was rendering past the end of the buffer, i.e. the "count" I was passing in was greater than the actual count. Interestingly, that call did not throw an exception, but it would intermittently result in the issue you described. Also, the buffer I ended up rendering looked wrong so I knew something was off. The "CPU may be pegged" thing just made it more annoying to track down the real issue.