I'm writing a graphically intense game for the Nexus One, using the NDK (revision 4) and OpenGL ES 2.0. We're really pushing the hardware here, and for the most part it works well, except every once in a while I get a serious crash with this log message:
W/SharedBufferStack( 398): waitForCondition(LockCondition) timed out
(identity=9, status=0). CPU may be pegged. trying again.
The entire system locks up, repeats this message over and over, and will either restart after a couple minutes or we have to reboot it manually. We're using Android OS 2.1, update 1.
I know a few other people out there have seen this bug, sometimes in relation to audio. In my case it's caused by the SharedBufferStack, so I'm guessing it's an OpenGL issue. Has anyone encountered this, and better yet fixed it? Or does anyone know what's going on with the SharedBufferStack to help me narrow things down?
I don't believe such error can occur in audio code, SharedBufferStack is only used in Surface libraries. Most probably this is a bug in EGL swapBuffers or SurfaceFlinger implementation, and you should file it to the bug tracker.
I got CPU may be pegged messages on LogCat because I had a ArrayBlockingQueue in my code. If you have any blocking queue (as seems to be the case with audio buffers), be sure to BlockingQueue.put() only if you have timing control enough to properly BlockingQueue.take() elements to make room for it. Or else, have a look on using BlockingQueue.offer().
The waitForCondition() causes the lockup (system-freeze).
But it is not the root-cause. This seems to be a issue with
The audio-framework (ur game has sounds?)
-or-
The GL rendering-subsystem.
Any "CPU-pegged" messages in the log?
You might want to take a look at this:
http://soledadpenades.com/2009/08/25/is-the-cpu-pegged-and-friends/
There seems to be a driver problem with eglSwapBuffers():
http://code.google.com/p/android/issues/detail?id=20833&q=cpu%20may%20be%20pegged&colspec=ID%20Type%20Status%20Owner%20Summary%20Stars
One workaround is to call glFinish() preceding your call to eglSwapBuffers(), however this will induce a performance hit.
FWIW, I hit this issue recently while developing on Android 2.3.4 using GL ES 2 on a Samsung Galaxy S.
The issue for me was a bug in my glDrawArrays call - I was rendering past the end of the buffer, i.e. the "count" I was passing in was greater than the actual count. Interestingly, that call did not throw an exception, but it would intermittently result in the issue you described. Also, the buffer I ended up rendering looked wrong so I knew something was off. The "CPU may be pegged" thing just made it more annoying to track down the real issue.
Related
Two questions below.
We have a graphics OpenGL ES 2 application that worked well for a few years on Windows, Linux, MacOS, iPhones, iPads, and Android phones. In the last few months we started receiving feedback from users of some of the Android devices (like Toshiba Thrive, HTC One X, Nexus 7 or Asus Transformer, API 15 and 17) regarding issues with black or flickering screen, or rarely, an app crash. Our app targets API 9 and up, and it is written in NDK using NativeActivity, based directly on nvidia android examples and demos, it has been thoroughly tested on all platforms, no memory leaks, no invalid memory accesses, it rarely calls some small java code.
Looking at LogCat, we noticed two kinds of error messages on these devices:
(1) JNI ERROR: env->self != thread-self (0x11734c0 vs. 0xd6d360); auto-correcting
(2) NvRmChannelSubmit failed (err = 196623, SyncPointValue = 0) followed by GL_OUT_OF_MEMORY
Regarding (1), we know about the threads vs. JNI issues, and we hopefully know how to fix this. I have read this information and my question here is: does "auto-correcting" mean that we have to worry about some ERROR, or is it just a warning meaning that the code will behave badly IN THE FUTURE, but now it works perfectly well (corrected!) and this is not related to issue (2)? The reason I'm asking is that sometimes we also see the following lines:
E/libEGL: call to OpenGL ES API with no current context (logged once per thread)
E/NvEGLUtil: Failure: eglSwapBuffers, error = 0x0000300d (swap:422)
which look seriously. We have tested our app on an API 17 emulator with JNIcheck enabled - no issues are reported, and the app works well.
Now, regarding message (2), I have found a few forums (for example here, here and also this) where people reported this message, and the reasons are unclear. Looks like firmware or driver issue, or GPU memory leaks or memory fragmentation... Many games are affected by screen flicker, and people are trying to reboot/reset the device, clear cache, upgrade, etc., but the issue seems to persist. This problem concerns quite a few popular devices. Despite GL_OUT_OF_MEMORY error code, "not enough memory" is not justified, because the app we used for tests used small 32x32 textures instead of 512x512 textures that are used in the regular version (and these bigger textures work perfectly well on older devices). Anyone has any experience on how to fix this, and is this fixable on our side at all? Is this an officially confirmed hardware/firmware/OS bug? I am looking for a known reason and a real solution to this problem, not a trial-and-error workaround that would accidentally help without knowing why.
Thanks!
So, after a few years of trying to identify the problem, it is time for the answer :-) The issue was extremely painful, time-consuming and difficult (almost impossible) to debug, it was non-deterministic, rare, would only affect some specific devices, it appeared that it was correlated with a specific version of the system or even with running (or not) other programs at the same time...
In our C++ code, at the end of the nvidia framework's bool Engine::initUI() function we called our own keepScreenOn(getApp()) function, which, using the argument of the current activity, called our own static java method:
//Keep the screen on.
//Note that flag modification must be done in the UI thread:
//https://android-developers.googleblog.com/2009/05/painless-threading.html
static void keepScreenOn(Activity a) {
final Window w = a.getWindow();
if (w != null) {
a.runOnUiThread(new Runnable() {
public void run() {
w.addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
}
});
}
}
As I understand, modifying the Window flag causes the window to be destroyed and recreated (anyone please correct me if I'm wrong), which is obviously not a good idea when the app is in the process of starting. It seems that this is what caused – albeit extremely rarely – some race condition between threads or problems to some graphics drivers... which resulted in delayed error messages like "NvRmChannelSubmit failed (err = 196623, SyncPointValue = 0)" and then "GL_OUT_OF_MEMORY".
The fact that setting the window flag causes such delayed GL problems was surprising and it was not discovered by deduction (we spent a few years trying to find the cause of this problem in our OpenGL code). It was rather discovered by hopeless commenting out any piece of code that could influence the display... And the solution was to introduce our own subclass of NativeActivity which creates the main application window with the proper flag right from the start:
public class OurSubclassOfNativeActivity extends NativeActivity
{
#Override
protected void onCreate(Bundle savedInstanceState)
{
getWindow().addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
super.onCreate(savedInstanceState);
}
}
We wanted to avoid introducing our own subclass of NativeActivity, but seems like the need to set the FLAG_KEEP_SCREEN_ON forces us to do so.
I have android game, using libgdx framework
on Google Play store there are report:
java.lang.RuntimeException: eglSwapBuffers failed: EGL_SUCCESS
at android.opengl.GLSurfaceView$EglHelper.throwEglException(GLSurfaceView.java:1085)
at android.opengl.GLSurfaceView$EglHelper.swap(GLSurfaceView.java:1043)
at android.opengl.GLSurfaceView$GLThread.guardedRun(GLSurfaceView.java:1369)
at android.opengl.GLSurfaceView$GLThread.run(GLSurfaceView.java:1123)
what I can do?
devices reported : Samsung GT-S5830i, Samsung galaxy Y, LGE LG-P990 , Motorola Photon 4G, Motorola Droid X2,
This problem has been reported here before. There is already an issue filed.
You can help by providing more details to this issue.
I made some research and found out that this problem occurs in low end devices because they have low memory. Loading and unloading of textures between two scene crashes SwapBuffer and hence throws this Runtime exception.
Most annoying thing about this issue is that, when I tested in such devices, I didn't get any such error but in playstore I got too many report with this issue.
So, we can tackle this issue in two ways:
1)Filter out low end devices from compatible list.
2)Catch the exception using UncaughtExceptionHandler() and tell user about low memory problem.
Solution in last edit.
It actually did happen on my low end devices as well (GT-S5830 and GT-S5830i).
The thing is, it did not happen because of low memory; I logged the current memory usage of my game and it did not cross 3 megabytes, when I had like 80 or more megabytes of free ram. I even ran System.gc() consistently, which hints the garbage collector to free up some ram space.
I have no workaround but will update this answer as soon as I find one.
After some search, gpu related stuff (like textures) are not managed by the garbage collector (that's why they should be disposed manually). So calling System.gc() is somehow pointless. Still, I'm disposing all of my textures, and the memory usage of my game is pretty low.
I tried all sorts of solutions and nothing worked BUT here is something that should fix the problem (haven't tried this one, but should work nonetheless):
Simply don't load so much textures over and over. My game used to dispose and then initialize all textures whenever the user navigated away from a screen. That is possibly causing the problem. What you need to do is to keep the loaded textures/texture atlases in memory (don't lose their reference). That way, navigating back to a screen wouldn't reload all the textures.
Avoid using raw Textures and instead use a POT (Power Of Two) TextureAtlas.
I will apply these two steps to my project, and if the problem goes away, I'll come back to confirm my solution.
That wasn't the problem at all. I ran a really long loop of disposing and loading of textures, no exception/error was thrown. My above suggestion is not the solution. The problem is probably related to excessive Screen switching, but I guess not since this problem also happens when changing the screen orientation repeatedly from portrait mode to landscape mode and vice versa.
Solution:
I thought that Game's setScreen(screen) calls Screen's dispose() automatically (which is not the case). dispose() was used to dispose all of my underlying textures. I simply solved the problem by calling dispose() in Screen's hide() overridden method.
Using TextureAtlass is very important because you reduce the amount of handles that are attached to each Texture. (Which may be the reason of the EGL_SUCCESS error)
Tested on both GT-S5830 and GT-S5830i (Samsung Galaxy Ace and Samsung Galaxy Y). Problem no longer occurs.
I have a problem with very low rendering time on an android tablet using the NDK and the egl commands. I have timed calls to eglSwapBuffers and is taking a variable amount of time, frequently exceeded the device frame rate. I know it synchronizes to the refresh, but that is around 60FPS, and the times here drop well below that.
The only command I issue between calls to swap is glClear, so I know it isn't anything that I'm drawing causing the problem. Even just by clearing the frame rate drops to 30FPS (erratic though).
On the same device a simple GL program in Java easily renders at 60FPS, thus I know it isn't fundamentally a hardware issue. I've looked through the Android Java code for setting up the GL context and can't see any significant difference. I've also played with every config attribute, and while some alter the speed slightly, none (that I can find) change this horrible frame rate drop.
To ensure the event polling wasn't an issue I moved the rendering into a thread. That thread now only does rendering, thus just calls clear and swap repeatedly. The slow performance still persists.
I'm out of ideas what to check and am looking for suggestions as to what the problem might be.
There's really not enough info (like what device you are testing on, what was you exact config etc) to answer this 100% reliable but this kind of behavior is usually caused by window and surface pixel format mismatch eg. 16bit (RGB565) vs 32bit.
FB_MULTI_BUFFER=3 environment variable will enable the multi buffering on Freescale i.MX 6 (Sabrelite) board with some recent LTIB build (without X). Your GFX driver may needs something like this.
I have android game, using libgdx framework
on Google Play store there are report:
java.lang.RuntimeException: eglSwapBuffers failed: EGL_SUCCESS
at android.opengl.GLSurfaceView$EglHelper.throwEglException(GLSurfaceView.java:1085)
at android.opengl.GLSurfaceView$EglHelper.swap(GLSurfaceView.java:1043)
at android.opengl.GLSurfaceView$GLThread.guardedRun(GLSurfaceView.java:1369)
at android.opengl.GLSurfaceView$GLThread.run(GLSurfaceView.java:1123)
what I can do?
devices reported : Samsung GT-S5830i, Samsung galaxy Y, LGE LG-P990 , Motorola Photon 4G, Motorola Droid X2,
This problem has been reported here before. There is already an issue filed.
You can help by providing more details to this issue.
I made some research and found out that this problem occurs in low end devices because they have low memory. Loading and unloading of textures between two scene crashes SwapBuffer and hence throws this Runtime exception.
Most annoying thing about this issue is that, when I tested in such devices, I didn't get any such error but in playstore I got too many report with this issue.
So, we can tackle this issue in two ways:
1)Filter out low end devices from compatible list.
2)Catch the exception using UncaughtExceptionHandler() and tell user about low memory problem.
Solution in last edit.
It actually did happen on my low end devices as well (GT-S5830 and GT-S5830i).
The thing is, it did not happen because of low memory; I logged the current memory usage of my game and it did not cross 3 megabytes, when I had like 80 or more megabytes of free ram. I even ran System.gc() consistently, which hints the garbage collector to free up some ram space.
I have no workaround but will update this answer as soon as I find one.
After some search, gpu related stuff (like textures) are not managed by the garbage collector (that's why they should be disposed manually). So calling System.gc() is somehow pointless. Still, I'm disposing all of my textures, and the memory usage of my game is pretty low.
I tried all sorts of solutions and nothing worked BUT here is something that should fix the problem (haven't tried this one, but should work nonetheless):
Simply don't load so much textures over and over. My game used to dispose and then initialize all textures whenever the user navigated away from a screen. That is possibly causing the problem. What you need to do is to keep the loaded textures/texture atlases in memory (don't lose their reference). That way, navigating back to a screen wouldn't reload all the textures.
Avoid using raw Textures and instead use a POT (Power Of Two) TextureAtlas.
I will apply these two steps to my project, and if the problem goes away, I'll come back to confirm my solution.
That wasn't the problem at all. I ran a really long loop of disposing and loading of textures, no exception/error was thrown. My above suggestion is not the solution. The problem is probably related to excessive Screen switching, but I guess not since this problem also happens when changing the screen orientation repeatedly from portrait mode to landscape mode and vice versa.
Solution:
I thought that Game's setScreen(screen) calls Screen's dispose() automatically (which is not the case). dispose() was used to dispose all of my underlying textures. I simply solved the problem by calling dispose() in Screen's hide() overridden method.
Using TextureAtlass is very important because you reduce the amount of handles that are attached to each Texture. (Which may be the reason of the EGL_SUCCESS error)
Tested on both GT-S5830 and GT-S5830i (Samsung Galaxy Ace and Samsung Galaxy Y). Problem no longer occurs.
I'm developing an engine and a game at the same time in C++ and I'm using box2D for the physics back end. I'm testing on different android devices and on 2 out of 3 devices, the game runs fine and so do the physics. However, on my galaxy tab 10.1 I'm sporadically getting a sort of "stutter". Here is a youtube video demonstrating:
http://www.youtube.com/watch?v=DSbd8vX9FC0
The first device the game is running on is an Xperia Play... the second device is a Galaxy Tab 10.1. Needless to say the Galaxy tab has much better hardware than the Xperia Play, yet Box2D is lagging at random intervals for random lengths of time. The code for both machines is exactly the same. Also, the rest of the engine/game is not actually lagging. The entire time, it's running at solid 60 fps. So this "stuttering" seems to be some kind of delay or glitch in actually reading values from box2D.
The sprites you see moving check to see if they have an attached physical body at render time and set their positional values based on the world position of the physical body. So it seems to be in this specific process that box2D is seemingly out of sync with the rest of the application. Quite odd. I realize it's a long shot but I figured I'd post it here anyway to see if anyone had ideas... since I'm totally stumped. Thanks for any input in advance!
Oh, P.S. I am using a fixed time step since that seems to be the most commonly suggested solution for things like this. I moved to a fixed time step while developing this on my desktop, I ran into a similar issue just more severe and the fixed step was the solution. Also like I said the game is running steady at 60 fps, which is controlled by a low latency timer so I doubt simple lag is the issue. Thanks again!
As I mentioned in the comments here, this came down to being a timer resolution issue. I was using a timer class which was supposed to access the highest resolution system timer, cross platform. Everything worked great, except when it came to Android, some versions worked and some versions it did not. The galaxy tab 10.1 was one such case.
I ended up re-writing my getSystemTime() method to use a new addition to C++11 called std::chrono::high_resolution_clock. This also worked great (everywhere but Android)... except it has yet to be implemented in any NDK for android. It is supposed to be implemented in version 5 of the crystax NDK R7, which at the time of this post is 80% complete.
I did some research into various methods of accessing the system time or something by which I could base a reliable timer on the NDK side, but what it comes down to is that these various methods are not supported on all platforms. I've went through the painful process of writing my own engine from scratch simply so that I could support every version of android, so betting on methods that are inconsistently implemented is nonsensical.
The only sensible solution for anyone facing this problem, in my opinion, is to simply abandon the idea of implementing such code on the NDK side. I'm going to do this on the Java end instead, since thus far in all my tests this has been sufficiently reliable across all devices that I've tested on. More on that here:
http://www.codeproject.com/Articles/189515/Androng-a-Pong-clone-for-Android#Gettinghigh-resolutiontimingfromAndroid7
Update
I have now implemented my proposed solution, to do timing on the java side and it has worked. I also discovered that handling any relatively large number, regardless of data type (a number such as the nano seconds from calling the monotonic clock) in the NDK side also results in serious lagging on some versions of android. As such I've optimized this as much as possible by passing around a pointer to the system time, to ensure we're not passing-by-copy.
One last thing too, my statement that calling the monotonic clock from the NDK side is unreliable is however, it would seem, false. From the Android docks on System.nanoTime(),
...and System.nanoTime(). This clock is guaranteed to be monotonic,
and is the recommended basis for the general purpose interval timing
of user interface events, performance measurements, and anything else
that does not need to measure elapsed time during device sleep.
So it would seem, if this can be trusted, that calling the clock is reliable, but as mentioned there are other issues that then arise, like handling allocating and dumping the massive number that results which alone nearly cut my framerate in half on the Galaxy Tab 10.1 with Android 3.2. Ultimate conclusion: supporting all android devices equally is either damn near or flat out impossible and using native code seems to make it worse.
I am very new to game development, and you seem a lot more experienced and it may be silly to ask, but are you using delta time to update your world? Altough you say you have a constant frame rate of 60 fps, maybe your frame counter calculates something wrong, and you should use delta time to skip some frames when the FPS is low, or your world seem to "stay behind". I am pretty sure that you are familiar with this, but I think a good example is here : DeltaTimeExample altough it is a C implementation. If you need I can paste some code from my Android Projects of how I use delta time, that I've developed following this book : Beginning Android Games.