Android NDK - Debugging a random crash with a bad callstack - android

My android native application crashes randomly but frequently, and I am unable to get sufficient info out ndk-gdb. This is the message following the crash:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 19983]
0x4012c6ac in memcpy () from /Users/Andreas/dev/android/obj/local/armeabi-v7a/libc.so
bt returns an unusable callstack:
#0 0x4012c6ac in memcpy () from /Users/Andreas/dev/android/obj/local/armeabi-v7a/libc.so
#1 0x67337388 in ?? ()
Cannot access memory at address 0x7
#2 0x67337388 in ?? ()
Cannot access memory at address 0x7
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I am using NDK-r8e
I have checked all uses of memcpy() in my program and they're not responsible for this (verified by making them call another memcpy-like function with a different name, and still getting the above crash with the exact signature).
Any ideas how to get a more useable call stack, or to further debug this? Does the NDK offer any memory check functionality in case this is a memory overwrite?

Related

using continue in LLDB stops the process and throws a SIGSEGV

I'm trying to debug a native C++ App that is crashing using lldb
I also have a sleep(5) at the start of android_main so that I can attach to my app in that time, if that matters
After attaching the app is paused/stopped. So, I use continue
But the process is immediately stopped after using continue and throws a SIGSEGV
(lldb) continue
Process 4158 resuming
Process 4158 stopped
* thread #19, name = 'com.example.app', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
frame #0: 0x00007cefe26282c8
-> 0x7cefe26282c8: movq (%rcx), %rdx
0x7cefe26282cb: movq %rdx, 0x18a0(%rax)
0x7cefe26282d2: movl 0x8(%rcx), %ecx
0x7cefe26282d5: movl %ecx, 0x18a8(%rax)
After another continue, the app just exits/crashes
(lldb) continue
Process 4158 exited with status = 11 (0x0000000b)
How do I fix this and just continue execution as normal?
Your app crashed because the register rcx was supposed to hold the address of some object, but in fact held a value that was not in readable memory. That's what SIGSEGV means - a request was made to access memory that couldn't be fulfilled. You can't "continue execution as normal" since the program didn't get some data it needs, so it doesn't have a way to move forward. If you knew the value that SHOULD have been in rcx, you could set rcx to the right value, and then continue. But what you really have to do is figure out why that value was bad, fix the code, rebuild and rerun.

ANR on mkdirs() and exists()

I'm pretty baffled by the ANR I'm getting from my application as I don't understand how it could happen.
I've got mutliple ANR for these codes:
File(applicationContext.filesDir).mkdirs()
File(applicationContext.filesDir).exists()
and I get the following ANR report:
1.
main (native): tid=1 systid=30195
#00 pc 0xc57c8 libc.so
#01 pc 0x21580 libopenjdk.so
at java.io.UnixFileSystem.createDirectory0(UnixFileSystem.java)
at java.io.UnixFileSystem.createDirectory(UnixFileSystem.java:354)
at java.io.File.mkdir(File.java:1325)
at java.io.File.mkdirs(File.java:1352)
#01 pc 0x21fc0 libjavacore.so
at libcore.io.Linux.access(Linux.java)
at libcore.io.ForwardingOs.access(ForwardingOs.java:131)
at libcore.io.BlockGuardOs.access(BlockGuardOs.java:76)
at libcore.io.ForwardingOs.access(ForwardingOs.java:131)
at android.app.ActivityThread$AndroidOs.access(ActivityThread.java:8068)
at java.io.UnixFileSystem.checkAccess(UnixFileSystem.java:281)
at java.io.File.exists(File.java:813)
My application targets from Android 5 to Android 12 and only Android 11 and Android 12 are getting these ANRs.
Do you guys have any idea how to solve this ? Should I File(applicationContext.filesDir).mkdirs() on a different an IO Thread to avoid blocking ?
You should definitely perform all file operations off the main thread, regardless of what else is happening. It's definitely not a permissions issue -- if it were, you would just get a permission denial, not an ANR.
Having said that, you generally wouldn't get an ANR from just checking your app's fileDir, even on the main thread. But my guess is that wherever this AND came from, either the device's internal storage is really slow, or it's likely that your app was moved to external storage. Checking external storage availability takes longer.
Either way, as I said earlier, it doesn't actually matter why it's happening. You should just be performing all the file IO on a separate thread

Are kernel signals on Android ever benign?

I've installed a native code crash reporting system in my Android app, based on signal processing. There's a bunch of sigaction() calls for all the popular crash causes, there's crash report dumping and sending home to the bug tracker.
I'm receiving pretty consistent reports about some signals that are definitely not in my code. There are multiple instances of SISSEGV that originates in android.view.GLES20Canvas.nDrawDisplayList. There's another that comes from within android.webkit.JWebCoreJavaBridge.sharedTimerFired.
Question: is there any chance those signals constitute normal program operation and won't cause a crash/termination if not caught by me? Naturally, I can't reproduce any of those at will. They happen on customer devices.

Android native code fork() has issues with IPC/Binder

I have an Android native Server app compiled as Platform privileged module that forks itself. This module also uses Android services, like SurfaceFlinger. I need to fork to have one sandboxed process per client.
Fork() works fine and the parent process has no issue at all. But in the child process, when I try to access any Android service/resource I get:
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr xxxxxxxx ... ...
/system/lib/libbinder.so (android::Parcel::ipcSetDataReference
...
/system/lib/libbinder.so (android::BpBinder::transact
NativeCrashListener( 1203): Couldn't find ProcessRecord for pid XXXX
This happens even when I try to create a NEW client, thus, not using any previous created reference.
NativeCrashListener doesn't know about my child process, thus, maybe ActivityManager also doesn't.
I looked at the Zygote code but have not found anything helpful there. I'm probably missing some step or calling some function on the child process. Any ideas ??? =)
You can't create a new Binder process this way.
The problem is that fork() only clones the current thread, not all threads. In the new process, the Binder IPC code will expect the Binder helper threads to be running, but none of them will be. You need to fork() and then exec().
The zygote process avoids this issue by having only one thread running when fork() is called. It deliberately defers initialization of the Binder code to the child process. (In the current implementation, it actually has a couple of threads running in Dalvik, but the internal fork handling stops and restarts those threads on every fork).
fadden is right, fork() cannot be used to create a new process that uses Android APIs reliably. The best you can do with it is exec() to run a standalone command-line program, everything else is likely to not work as you expect.
However, the platform supports sandboxed processes, in the form of isolated service processes. See http://developer.android.com/guide/topics/manifest/service-element.html#isolated for more details. In essence, this runs your service in a special process under a random UID that has no permissions.
For the record, this is what Chrome on Android uses to isolate 'tabs' into sandboxed 'renderer processes'.

Continue after assert with Android NDK and GDB

Is it possible to trigger an GDB break from C++ in an Android NDK program which still allows the program to resume afterwards?
Meaning, I hit the assert causing GDB to halt the program, and I want to be able to press the "Play" button in Eclipse to resume the program, continuing beyond the assert.
Right now I am using:
__asm__ ("bkpt 0");
This triggers the program to halt, and brings me to the line of code that triggered it, but does not allow me to resume afterwards.
GDB output the following at the time that the program is halted.
(gdb)
82 info signal SIGBUS
&"info signal SIGBUS\n"
~"Signal Stop\tPrint\tPass to program\tDescription\n"
~"SIGBUS Yes\tYes\tYes\t\tBus error\n"
82^done
(gdb)
If I press "resume" at this point I get the following output in the LogCat:
Fatal signal 11 (SIGSEGV) at 0xfffffffd (code=1)
Perhaps my quesiton is how to throw a non-fatal break?
The standard Linux way of detecting if your process is being debugged is:
if (ptrace(PTRACE_TRACEME, 0, NULL, 0) == -1)
//Yes, we're running under GDB
With that in mind, do a conditional hard breakpoint (bkpt 0) that only fires when under debugger.
Not sure if Java-only debugging in Android would affect ptrace. Give it a try.
EDIT: call raise(SIGABRT) to break. Then in GDB, type signal 0 to continue. Other signals, like SIGINT and SIGTRAP, might work too.

Categories

Resources