I've been porting a cross platform C++ engine to Android, and noticed that it will inexplicably (and inconsistently) block when calling pthread_mutex_lock. This engine has already been working for many years on several platforms, and the problematic code hasn't changed in years, so I doubt it's a deadlock or otherwise buggy code. It must be my port to Android..
So far there are several places in the code that block on pthread_mutex_lock. It isn't entirely reproducible either. When it hangs, there's no suspicious output in LogCat.
I modified the mutex code like this (edited for brevity... real code checks all return values):
void MutexCreate( Mutex* m )
{
#ifdef WINDOWS
InitializeCriticalSection( m );
#else ANDROID
pthread_mutex_init( m, NULL );
#endif
}
void MutexDestroy( Mutex* m )
{
#ifdef WINDOWS
DeleteCriticalSection( m );
#else ANDROID
pthread_mutex_destroy( m, NULL );
#endif
}
void MutexLock( Mutex* m )
{
#ifdef WINDOWS
EnterCriticalSection( m );
#else ANDROID
pthread_mutex_lock( m );
#endif
}
void MutexUnlock( Mutex* m )
{
#ifdef WINDOWS
LeaveCriticalSection( m );
#else ANDROID
pthread_mutex_unlock( m );
#endif
}
I tried modifying MutexCreate to make error-checking and recursive mutexes, but it didn't matter. I wasn't even getting errors or log output either, so either that means my mutex code is just fine, or the errors/logs weren't being shown. How exactly does the OS notify you of bad mutex usage?
The engine makes heavy use of static variables, including mutexes. I can't see how, but is that a problem? I doubt it because I modified lots of mutexes to be allocated on the heap instead, and the same behavior occurred. But that may be because I missed some static mutexes. I'm probably grasping at straws here.
I read several references including:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html
http://www.embedded-linux.co.uk/tutorial/mutex_mutandis
http://linux.die.net/man/3/pthread_mutex_init
Android NDK Mutex
Android NDK problem pthread_mutex_unlock issue
The "errorcheck" mutexes will check a couple of things (like attempts to use a non-recursive mutex recursively) but nothing spectacular.
You said "real code checks all return values", so presumably your code explodes if any pthread call returns a nonzero value. (Not sure why your pthread_mutex_destroy takes two args; assuming copy & paste error.)
The pthread code is widely used within Android and has no known hangups, so the issue is not likely in the pthread implementation itself.
The current implementation of mutexes fits in 32 bits, so if you print *(pthread_mutex_t* mut) as an integer you should be able to figure out what state it's in (technically, what state it was in at some point in the past). The definition in bionic/libc/bionic/pthread.c is:
/* a mutex is implemented as a 32-bit integer holding the following fields
*
* bits: name description
* 31-16 tid owner thread's kernel id (recursive and errorcheck only)
* 15-14 type mutex type
* 13 shared process-shared flag
* 12-2 counter counter of recursive mutexes
* 1-0 state lock state (0, 1 or 2)
*/
"Fast" mutexes have a type of 0, and don't set the tid field. In fact, a generic mutex will have a value of 0 (not held), 1 (held), or 2 (held, with contention). If you ever see a fast mutex whose value is not one of those, chances are something came along and stomped on it.
It also means that, if you configure your program to use recursive mutexes, you can see which thread holds the mutex by pulling the bits out (either by printing the mutex value when trylock indicates you're about to stall, or dumping state with gdb on a hung process). That, plus the output of ps -t, will let you know if the thread that locked the mutex still exists.
Related
I implemented an algorithm on android using OpenCL and OpenMP. The OpenMP implementation runs about 10 times slower than the OpenCL one.
OpenMP: ~250 ms
OpenCL: ~25 ms
But overall, if I measure the time from the java android side, I get roughly the same time to call and get my values.
For example:
Java code:
// calls C implementation using JNI (Java Native Interface)
bool useOpenCL = true;
myFunction(bitmap, useOpenCL); // ~300 ms, timed with System.nanoTime() here, but omitted code for clarity
myFunction(bitmap, !useOpenCL); // ~300 ms, timed with System.nanoTime() here, but omitted code for clarity
C code:
JNIEXPORT void JNICALL Java_com_xxxxx_myFunctionNative(JNIEnv * env, jobject obj, jobject pBitmap, jboolean useOpenCL)
{
// same before, setting some variables
clock_t startTimer, stopTimer;
startTimer = clock();
if ((bool) useOpenCL) {
calculateUsingOpenCL(); // runs in ~25 ms, timed here, using clock()
}
else {
calculateUsingOpenMP(); // runs in ~250 ms
}
stopTimer = clock();
__android_log_print(ANDROID_LOG_VERBOSE, APPNAME, "Time in ms: %f\n", 1000.0f* (float)(stopTimer - startTimer) / (float)CLOCKS_PER_SEC);
// same from here on, e.g.: copying values to java side
}
The Java code, in both cases executes roughly in the same time, around 300 ms. To be more precise, elapsedTime is a bit more for OpenCL, that is OpenCL is slower on average.
Looking at the individual run-times of the OpenMP, and OpenCL implementations, OpenCL version should be much faster overall. But for some reason, there is an overhead that I cannot find.
I also compared OpenCL vs Normal native code (no OpenMP), I still got the same results, with roughly same runtime overall, even though the calculateUsingOpenCL ran at least 10 times faster.
Ideas:
Maybe the GPU (in OpenCL case) is less efficient in general, because it has less memory available. There are few variables that we need to preallocate, which are used every frame. So, we checked the time it takes for android to draw a bitmap in both cases (OpenMP, OpenCL). In the OpenCL case, sometimes drawing a bitmap took longer (3 times longer), but not by the amount that would equalize the overall run time of the program.
Does JNI use GPU to accelerate some calls, which could cause the OpenCL version to be slower?
EDIT:
Is it possible that Java Garbage collection is triggered by OpenCL, causing the big overhead?
It turns out, clock() is unreliable (on Android), so instead we used the following method to measure time. With this method, everything is ok.
int64_t getTimeNsec() {
struct timespec now;
clock_gettime(CLOCK_MONOTONIC, &now);
return (int64_t) now.tv_sec*1000000000LL + now.tv_nsec;
}
clock_t startTimer, stopTimer;
startTimer = getTimeNsec();
function_to_measure();
stopTimer = getTimeNsec();
__android_log_print(ANDROID_LOG_VERBOSE, APPNAME, "Runtime in milliseconds (ms): %f", (float)(stopTimer - startTimer) / 1000000.0f);
This was suggested here:
How to obtain computation time in NDK
So I noticed that my app crashes after repeated calls of the following method.
JNIEXPORT void JNICALL Java_com_kitware_VolumeRender_VolumeRenderLib_DummyFunction(JNIEnv * env,jobject obj, jlong udp, jdoubleArray rotation, jdoubleArray translation){
jboolean isCopy1, isCopy2 ;
jdouble* rot = env->GetDoubleArrayElements(rotation,&isCopy1);
jdouble* trans = env->GetDoubleArrayElements(translation,&isCopy2);
if(isCopy1 == JNI_TRUE){
env->ReleaseDoubleArrayElements(rotation,rot, JNI_ABORT);
}
if(isCopy2 == JNI_TRUE){
env->ReleaseDoubleArrayElements(translation,trans, JNI_ABORT);
}
}
I thought this would be due to some missing memory space but I do free the memory here don't I? Still after 512 calls to that method I get my app crashing.
I could provide you with the Logcat if needed but it's a pretty long one. And after investigating a little I'm pretty sure the error is in the memory allocation/free process (i.e commenting out the two GetDoubleArrayElements() get me a running app no matter how many times I call the function).
In android docs: http://developer.android.com/training/articles/perf-jni.html
it is clearly stated:
You must Release every array you Get. Also, if the Get call fails, you must ensure that your code doesn't try to Release a NULL pointer later.
the number 512 is as far as I remember a limit on the number of local references which your code exceeds. So you should remove those checks: if(isCopy2 == JNI_TRUE){.
Still, above docs has a paragraph on JNI_ABORT, which explains it might be used together with isCopy - but its a bit confusing. You might search android sources on how to use JNI_ABORT, ie here is some code:
http://androidxref.com/6.0.0_r1/xref/frameworks/ml/bordeaux/learning/multiclass_pa/jni/jni_multiclass_pa.cpp#77
In my code I often use PushLocalFrame/PopLocalFrame to prevent local references leaks.
I know that OpenMP is included in NDK (usage example here: http://recursify.com/blog/2013/08/09/openmp-on-android ). I've done what it says on that page but when I use: #pragma omp for on a simple for loop that scans a vector, the app crashes with the famous "fatal signal 11".
What am I missing here? Btw I use a modified example from the Android samples, it's Tutorial 2 Mixed Processing. All I want is to parallelize (multithread) some of the for loops and nested for loops that I have in the jni c++ file while using OpenCV.
Any help/suggestion is appreciated!
Edit: sample code added:
#pragma omp parallel for
Mat tmp(iheight, iwidth, CV_8UC1);
for (int x = 0; x < iheight; x++) {
for (int y = 0; y < iwidth; y++) {
int value = (int) buffer[x * iwidth + y];
tmp.at<uchar>(x, y) = value;
}
}
Based on this: http://www.slideshare.net/noritsuna/how-to-use-openmp-on-native-activity
Thanks!
I think this is a known issue in GOMP, see Bug 42616 and Bug 52738.
It's about your app will crash if you try to use OpenMP directives or functions on a non-main thread, and can be traced back to the gomp_thread() function (see libgomp/libgomp.h # line 362 and 368) which returns NULL for threads you create:
#ifdef HAVE_TLS
extern __thread struct gomp_thread gomp_tls_data;
static inline struct gomp_thread *gomp_thread (void)
{
return &gomp_tls_data;
}
#else
extern pthread_key_t gomp_tls_key;
static inline struct gomp_thread *gomp_thread (void)
{
return pthread_getspecific (gomp_tls_key);
}
#endif
As you can see GOMP uses different implementation depending on whether or not thread-local storage (TLS) is available.
If it is available, then HAVE_TLS flag is set, and a global variable is used to track the state of each thread,
Otherwise, thread-local data will be managed via the function pthread_setspecific.
In the earlier version of NDKs the thread-local storage (the __thread keyword) isn't supported so HAVE_TLS won't be defined, therefore pthread_setspecific will be used.
Remark: I'm not sure whether __thread is supported or not in the last version of NDK, but here you can read the same answers about Android TLS.
When GOMP creates a worker thread, it sets up the thread specific data in the function gomp_thread_start() (line 72):
#ifdef HAVE_TLS
thr = &gomp_tls_data;
#else
struct gomp_thread local_thr;
thr = &local_thr;
pthread_setspecific (gomp_tls_key, thr);
#endif
But, when the application creates a thread independently, the thread specific data isn't set, and so the gomp_thread() function returns NULL. This causes the crash and this isn't a problem when TLS is supported, since the global variable that's used will always be available
I remember that this issue had been fixed android-ndk-r10d, but it only works with background processes (no Java). It means when you enable OpenMP and create a native thread from JNI (what is called from Java Android) then your app will crash remains.
I am using the official Android port of SDL 1.3, and using it to set up the GLES2 renderer. It works for most devices, but for one user, it is not working. Log output shows the following error:
error of type 0x500 glGetIntegerv
I looked up 0x500, and it refers to GL_INVALID_ENUM. I've tracked down where the problem occurs to the following code inside the SDL library: (the full source is quite large and I cut out logging and basic error-checking lines, so let me know if I haven't included enough information here)
glGetIntegerv( GL_NUM_SHADER_BINARY_FORMATS, &nFormats );
glGetBooleanv( GL_SHADER_COMPILER, &hasCompiler );
if( hasCompiler )
++nFormats;
rdata->shader_formats = (GLenum *) SDL_calloc( nFormats, sizeof( GLenum ) );
rdata->shader_format_count = nFormats;
glGetIntegerv( GL_SHADER_BINARY_FORMATS, (GLint *) rdata->shader_formats );
Immediately after the last line (the glGetIntegerv for GL_SHADER_BINARY_FORMATS), glGetError() returns GL_INVALID_ENUM.
The problem is the GL_ARB_ES2_compatibility extension is not properly supported on your system.
By GL_INVALID_ENUM it means that it does not know the GL_NUM_SHADER_BINARY_FORMATS and GL_SHADER_BINARY_FORMATS enums, which are a part of the said extension.
In contrast, GL_SHADER_COMPILER was recognized, which is strange.
You can try using GL_ARB_get_program_binary and using these two instead:
#define GL_NUM_PROGRAM_BINARY_FORMATS 0x87fe
#define GL_PROGRAM_BINARY_FORMATS 0x87ff
Note that these are different from:
#define GL_SHADER_BINARY_FORMATS 0x8df8
#define GL_NUM_SHADER_BINARY_FORMATS 0x8df9
But they should pretty much do the same.
We ( http://www.mosync.com ) have compiled our ARM recompiler with the Android NDK which takes our internal byte code and generates ARM machine code. When executing recompiled code we see an enormous increase in performance, with one small exception, we can't use any Java Bitmap operations.
The native system uses a function which takes care of all the calls to the Java side which the recompiled code is calling. On the Java (Dalvik) side we then have bindings to Android features. There are no problems while recompiling the code or when executing the machine code. The exact same source code works on Symbian and Windows Mobile 6.x so the recompiler seems to generate correct ARM machine code.
Like I said, the problem we have is that we can't use Java Bitmap objects. We have verified that the parameters which are sent from the Java code is correct, and we have tried following the execution down in Android's own JNI systems. The problem is that we get an UnsupportedOperationException with "size must fit in 32 bits.". The problem seems consistent on Android 1.5 to 2.3. We haven't tried the recompiler on any Android 3 devices.
Is this a bug which other people have encountered, I guess other developers have done similar things.
I found the message in dalvik_system_VMRuntime.c:
/*
* public native boolean trackExternalAllocation(long size)
*
* Asks the VM if <size> bytes can be allocated in an external heap.
* This information may be used to limit the amount of memory available
* to Dalvik threads. Returns false if the VM would rather that the caller
* did not allocate that much memory. If the call returns false, the VM
* will not update its internal counts.
*/
static void Dalvik_dalvik_system_VMRuntime_trackExternalAllocation(
const u4* args, JValue* pResult)
{
s8 longSize = GET_ARG_LONG(args, 1);
/* Fit in 32 bits. */
if (longSize < 0) {
dvmThrowException("Ljava/lang/IllegalArgumentException;",
"size must be positive");
RETURN_VOID();
} else if (longSize > INT_MAX) {
dvmThrowException("Ljava/lang/UnsupportedOperationException;",
"size must fit in 32 bits");
RETURN_VOID();
}
RETURN_BOOLEAN(dvmTrackExternalAllocation((size_t)longSize));
}
This method is called, for example, from GraphicsJNI::setJavaPixelRef:
size_t size = size64.get32();
jlong jsize = size; // the VM wants longs for the size
if (reportSizeToVM) {
// SkDebugf("-------------- inform VM we've allocated %d bytes\n", size);
bool r = env->CallBooleanMethod(gVMRuntime_singleton,
gVMRuntime_trackExternalAllocationMethodID,
jsize);
I would say it seems that the code you're calling is trying to allocate a too big size. If you show the actual Java call which fails and values of all the arguments that you pass to it, it might be easier to find the reason.
I managed to find a work-around. When I wrap all the Bitmap.createBitmap calls inside a Activity.runOnUiThread() It works.