Android: how to forcefully reproduce "OpenGL context loss" issue? - android

There's some possibility for an android OpenGL application of GL context loss while on background. So, to keep things simple, if you're got unexpected Renderer.onSurfaceCreated call - whoah, you're lucky. System wiped you out and you have to recover all you GL stuff from scratch.
One thing that rather bothers me and google documentation seems to keep silence about - how could one firmly and efficiently reproduce the issue during development?

Related

Crash with SurfaceView in Android NDK when pausing/resuming app fast

When I pause/unpause my app really fast then I get the following problem:
E/BufferQueueProducer( 177): [SurfaceView] connect(P): already connected (cur=1 req=1)
E/libEGL (25863): eglCreateWindowSurface: native_window_api_connect (win=0xb4984508) failed (0xffffffea) (already connected to another API?)
E/libEGL (25863): eglCreateWindowSurface:416 error 3003 (EGL_BAD_ALLOC)
Im pretty sure that I am stopping/starting my render thread correctly and this issue really only occurs when I pause/resume the app really fast (like when you mash the open-apps button).
Any ideas what might be the cause for eglCreateWindowSurface returning EGL_NO_SURFACE here? My guess would be it has to do with something still being connected to the SurfaceView.
It sounds like you're trying to create an EGLSurface for a Surface that already has one. If speed is an issue it's usually because of the lag in Surface callback handling -- the SurfaceView Surface is partially handled by the Window Manager, which requires inter-process communication.
Perhaps your native code still has a handle to the old SurfaceHolder, and if you moved more slowly the handle would be replaced by an upcoming surfaceCreated()? It's hard to say without knowing exactly what your code does. One way to approach these sorts of problems is by adding logging at all the interesting state change points, and comparing the logs from "slow" pause/resume and "fast" pause/resume.
It should be possible to avoid these situations by managing the SurfaceView state carefully. This appendix to the graphics arch doc talks about the difference between the Activity and SurfaceView lifecycles, and two ways to structure an app to avoid issues.

Android OpenGL ES: auto-correcting env->self and NvRmChannelSubmit failed

Two questions below.
We have a graphics OpenGL ES 2 application that worked well for a few years on Windows, Linux, MacOS, iPhones, iPads, and Android phones. In the last few months we started receiving feedback from users of some of the Android devices (like Toshiba Thrive, HTC One X, Nexus 7 or Asus Transformer, API 15 and 17) regarding issues with black or flickering screen, or rarely, an app crash. Our app targets API 9 and up, and it is written in NDK using NativeActivity, based directly on nvidia android examples and demos, it has been thoroughly tested on all platforms, no memory leaks, no invalid memory accesses, it rarely calls some small java code.
Looking at LogCat, we noticed two kinds of error messages on these devices:
(1) JNI ERROR: env->self != thread-self (0x11734c0 vs. 0xd6d360); auto-correcting
(2) NvRmChannelSubmit failed (err = 196623, SyncPointValue = 0) followed by GL_OUT_OF_MEMORY
Regarding (1), we know about the threads vs. JNI issues, and we hopefully know how to fix this. I have read this information and my question here is: does "auto-correcting" mean that we have to worry about some ERROR, or is it just a warning meaning that the code will behave badly IN THE FUTURE, but now it works perfectly well (corrected!) and this is not related to issue (2)? The reason I'm asking is that sometimes we also see the following lines:
E/libEGL: call to OpenGL ES API with no current context (logged once per thread)
E/NvEGLUtil: Failure: eglSwapBuffers, error = 0x0000300d (swap:422)
which look seriously. We have tested our app on an API 17 emulator with JNIcheck enabled - no issues are reported, and the app works well.
Now, regarding message (2), I have found a few forums (for example here, here and also this) where people reported this message, and the reasons are unclear. Looks like firmware or driver issue, or GPU memory leaks or memory fragmentation... Many games are affected by screen flicker, and people are trying to reboot/reset the device, clear cache, upgrade, etc., but the issue seems to persist. This problem concerns quite a few popular devices. Despite GL_OUT_OF_MEMORY error code, "not enough memory" is not justified, because the app we used for tests used small 32x32 textures instead of 512x512 textures that are used in the regular version (and these bigger textures work perfectly well on older devices). Anyone has any experience on how to fix this, and is this fixable on our side at all? Is this an officially confirmed hardware/firmware/OS bug? I am looking for a known reason and a real solution to this problem, not a trial-and-error workaround that would accidentally help without knowing why.
Thanks!
So, after a few years of trying to identify the problem, it is time for the answer :-) The issue was extremely painful, time-consuming and difficult (almost impossible) to debug, it was non-deterministic, rare, would only affect some specific devices, it appeared that it was correlated with a specific version of the system or even with running (or not) other programs at the same time...
In our C++ code, at the end of the nvidia framework's bool Engine::initUI() function we called our own keepScreenOn(getApp()) function, which, using the argument of the current activity, called our own static java method:
//Keep the screen on.
//Note that flag modification must be done in the UI thread:
//https://android-developers.googleblog.com/2009/05/painless-threading.html
static void keepScreenOn(Activity a) {
final Window w = a.getWindow();
if (w != null) {
a.runOnUiThread(new Runnable() {
public void run() {
w.addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
}
});
}
}
As I understand, modifying the Window flag causes the window to be destroyed and recreated (anyone please correct me if I'm wrong), which is obviously not a good idea when the app is in the process of starting. It seems that this is what caused – albeit extremely rarely – some race condition between threads or problems to some graphics drivers... which resulted in delayed error messages like "NvRmChannelSubmit failed (err = 196623, SyncPointValue = 0)" and then "GL_OUT_OF_MEMORY".
The fact that setting the window flag causes such delayed GL problems was surprising and it was not discovered by deduction (we spent a few years trying to find the cause of this problem in our OpenGL code). It was rather discovered by hopeless commenting out any piece of code that could influence the display... And the solution was to introduce our own subclass of NativeActivity which creates the main application window with the proper flag right from the start:
public class OurSubclassOfNativeActivity extends NativeActivity
{
#Override
protected void onCreate(Bundle savedInstanceState)
{
getWindow().addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
super.onCreate(savedInstanceState);
}
}
We wanted to avoid introducing our own subclass of NativeActivity, but seems like the need to set the FLAG_KEEP_SCREEN_ON forces us to do so.

eglSwapBuffers is erratic/slow

I have a problem with very low rendering time on an android tablet using the NDK and the egl commands. I have timed calls to eglSwapBuffers and is taking a variable amount of time, frequently exceeded the device frame rate. I know it synchronizes to the refresh, but that is around 60FPS, and the times here drop well below that.
The only command I issue between calls to swap is glClear, so I know it isn't anything that I'm drawing causing the problem. Even just by clearing the frame rate drops to 30FPS (erratic though).
On the same device a simple GL program in Java easily renders at 60FPS, thus I know it isn't fundamentally a hardware issue. I've looked through the Android Java code for setting up the GL context and can't see any significant difference. I've also played with every config attribute, and while some alter the speed slightly, none (that I can find) change this horrible frame rate drop.
To ensure the event polling wasn't an issue I moved the rendering into a thread. That thread now only does rendering, thus just calls clear and swap repeatedly. The slow performance still persists.
I'm out of ideas what to check and am looking for suggestions as to what the problem might be.
There's really not enough info (like what device you are testing on, what was you exact config etc) to answer this 100% reliable but this kind of behavior is usually caused by window and surface pixel format mismatch eg. 16bit (RGB565) vs 32bit.
FB_MULTI_BUFFER=3 environment variable will enable the multi buffering on Freescale i.MX 6 (Sabrelite) board with some recent LTIB build (without X). Your GFX driver may needs something like this.

Unloading resources before Activity finish() in AndEngine

I've used System.exit(0) before to quit my game. But as this is a no-no in Android I tried calling just activity.finish(). Now if I start the game again right after quitting it, all textures are messed up (white, stretched, or otherwise messed up).
I'm using both managed and unmanaged textures in AndEngine. And AndEngine version 1 (so no OpenGL ES 2.0).
What are all the unloading I should do manually before quitting the game to avoid this from happening? What do normally unload with OpenGL based Android games? Any tips and tricks are very welcome.
Well this is a really old question. But my problem really was that I had static references to (actually Scala objects) that would hold on to the textures even after finishing the activity and only killing the process would help. Really bad design. Be careful with your references on Android.

Nexus One / Android "CPU may be pegged" bug

I'm writing a graphically intense game for the Nexus One, using the NDK (revision 4) and OpenGL ES 2.0. We're really pushing the hardware here, and for the most part it works well, except every once in a while I get a serious crash with this log message:
W/SharedBufferStack( 398): waitForCondition(LockCondition) timed out
(identity=9, status=0). CPU may be pegged. trying again.
The entire system locks up, repeats this message over and over, and will either restart after a couple minutes or we have to reboot it manually. We're using Android OS 2.1, update 1.
I know a few other people out there have seen this bug, sometimes in relation to audio. In my case it's caused by the SharedBufferStack, so I'm guessing it's an OpenGL issue. Has anyone encountered this, and better yet fixed it? Or does anyone know what's going on with the SharedBufferStack to help me narrow things down?
I don't believe such error can occur in audio code, SharedBufferStack is only used in Surface libraries. Most probably this is a bug in EGL swapBuffers or SurfaceFlinger implementation, and you should file it to the bug tracker.
I got CPU may be pegged messages on LogCat because I had a ArrayBlockingQueue in my code. If you have any blocking queue (as seems to be the case with audio buffers), be sure to BlockingQueue.put() only if you have timing control enough to properly BlockingQueue.take() elements to make room for it. Or else, have a look on using BlockingQueue.offer().
The waitForCondition() causes the lockup (system-freeze).
But it is not the root-cause. This seems to be a issue with
The audio-framework (ur game has sounds?)
-or-
The GL rendering-subsystem.
Any "CPU-pegged" messages in the log?
You might want to take a look at this:
http://soledadpenades.com/2009/08/25/is-the-cpu-pegged-and-friends/
There seems to be a driver problem with eglSwapBuffers():
http://code.google.com/p/android/issues/detail?id=20833&q=cpu%20may%20be%20pegged&colspec=ID%20Type%20Status%20Owner%20Summary%20Stars
One workaround is to call glFinish() preceding your call to eglSwapBuffers(), however this will induce a performance hit.
FWIW, I hit this issue recently while developing on Android 2.3.4 using GL ES 2 on a Samsung Galaxy S.
The issue for me was a bug in my glDrawArrays call - I was rendering past the end of the buffer, i.e. the "count" I was passing in was greater than the actual count. Interestingly, that call did not throw an exception, but it would intermittently result in the issue you described. Also, the buffer I ended up rendering looked wrong so I knew something was off. The "CPU may be pegged" thing just made it more annoying to track down the real issue.

Categories

Resources