Can low memory cause seg faults in native code? - android

I have a group of crashes in native code that are rare but happen consistently inolving SEGV_MAPERR or SEGV_ACCERR. These crashes are almost always reported by Crashlytics with very low RAM free (1-5% typically). 'Normal' crashes (ie, ones I have debugged) have no pattern in RAM free.
Is it possible these crashes are caused by a low memory condition? What would be the mechanism for this? Is there any way to tell if these are low memory related crashes or programming errors (using pointers wrongly, etc)? In many cases, the crash is happening in a library which I can't debug and I can't replicate the crashes on my devices.
Here's some of these crashes pulled from the Developer Console because it provides a little more detail than Crashlytics in the trace in these cases:
********** Crash dump: **********
Build fingerprint: 'htc/a32eul_metropcs_us/htc_a32eul:5.1/LMY47O/637541.3:user/release-keys'
pid: 10902, tid: 10989, name: .xxx.xxxx >>> com.xxx.xxxxx <<<
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x97f78000
Stack frame #00 pc 0004cd80 /data/app/xxx.xxx.xxxxx-1/lib/arm/libxxx.so: Routine xxxxxMixerInterleavedFloatOutput at libgcc2.c:?
********** Crash dump: **********
Build fingerprint: 'Xiaomi/land/land:6.0.1/MMB29M/V8.1.1.0.MALMIDI:user/release-keys'
pid: 2661, tid: 2746, name: .xxx.xxxx >>> com.xxx.xxxx <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
Stack frame #00 pc 00016954 /system/lib/libc.so (__memcpy_base+36)
Stack frame #01 pc 0000b14c /data/app/com.xxx.xxxx-2/lib/arm/libswresample-2.so: Routine ??
??:0

There are two general possibilities:
A low memory condition in of itself is not going to somehow trigger a segfault in a running application. What can happen is that when the application asks for additional memory to be allocated to it, the memory allocation request fails. This is a well defined memory condition. It is documented that the relevant system calls can fail in allocating memory. But what often happens is that the application are not coded properly to check for a failed memory allocation request, and they crash for that reason. In that case, it is not true that a low memory condition is responsible for an application segfault, it is an application bug.
The Linux kernel overcommits the available memory. As a result of that it is possible that the kernel will have no option but to select a process to be killed, when all available RAM has been exhausted.
However, in the case of the OOM killer kicking in, the chosen victims are terminated with a SIGKILL. A SEGFAULT indicates an application bug.

Related

How do I get more call stack depth in crash dumps on Android?

I have some code in C++ that uses NDK. When a crash occurs in the C++ code (on device; not through emulator), I get a tombstone (crash dump), that contains a call stack that is always 2 levels deep:
I/DEBUG ( 5089): pid: 5048, tid: 5062 >>> com.example.site <<<
I/DEBUG ( 5089): #00 pc 0059e08c /data/data/com.example.site/lib/libexample.so (_ZNK10MyNamespaceAPI11MyClass12GetDataEv)
I/DEBUG ( 5089): #01 lr 5bc9ef2c /data/data/com.example.site/lib/libexample.so
I/DEBUG ( 5089): 5cc6e764 5bce3070 /data/data/com.example.site/lib/libexample.so
I/DEBUG ( 5089): 5cc6e774 5bce309c /data/data/com.example.site/lib/libexample.so
I/DEBUG ( 5089): 5cc6e784 5bce2af4 /data/data/com.example.site/lib/libexample.so
I/DEBUG ( 5089): 5cc6e788 5c27ea9c /data/data/com.example.site/lib/libexample.so
Is there a way to configure my app or Android to provide more detail and depth in the call stack printed to the crash dump? What actually determines this? I've seen some examples where people get up to 15 levels of call stack depth.
The backtrace mechanism, which has evolved over the past few years, shows as many frames as it is able to find (up to a fixed limit of 32, IIRC). It will stop early if something prevents it from walking any farther up the stack.
The call mechanism on ARM puts the return address in the link register (LR), but the compiler is allowed to spill that onto the stack. For "noreturn" functions it technically doesn't have to set it at all. There are assembler pseudo-ops that add meta-data that help unwinders figure out where the return address can be found, and in the more recent versions of Android that should all work.
When you get a two-deep stack trace, it means that unwinding has failed on the current method, and it can only show you the value of the program counter (PC) and the value that happens to be in LR.
Make sure you're compiling with -g to enable debugging.
Is the failing function called directly from JNI? In some older versions of Android the trace would stop at the JNI call bridge because of the way the code was structured, though that was fixed in Dalvik back in 2011. Recent devices use Art, though, which I expect has a different way of doing things.
Similar question here.

Google Developer Console Crash Reading for cocos2dx crashes

I have some stack traces in my Crash Tab in Developer Console, I was wondering how to read and debug them. The code is in cocos2dx so its really not human readable.
I know the technique with the ndk-stack where it tell you the class and line number etc. but can it be applicable in this scenario since I don't have the obj/local/armeabi folder with me.
adb logcat | ndk-stack -sym ./obj/local/armeabi is the command which I am talking about.
Is there any other solution to solve this problem, so that I can see what crashes users are experiencing.
Thank you.
A stack trace from the application. How can we trace this? And its crashing on just one device. Galaxy Tab 3 10.1 with such logs.
Build fingerprint:
'samsung/santos103gxx/santos103g:4.2.2/JDQ39/P5200XXUANC1:user/release-keys'
pid: 11321, tid: 11321, name: ldhelloworld >>> com.test.helloworld
<<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000005
Stack frame #00 pc 0002b098 Stack frame #01 pc 00000020
Crash dump is completed

Heap Corruption - SEGV_MAPERR in Android Native code

I am trying to create a small library for stream AES encryption, I started my work based on Facebook Conceal project (https://github.com/facebook/conceal), just changing some things and improving the wrapper around the native to support ciphers with padding.
It is working and it can decipher files without problems but I get random Heap Memory Corruptions when I work with large streams, and after a lot of time debugging I have been unable to find the error.
Here is my code:
https://gist.github.com/frisco82/9782725
I have tried to find memory allocation or free problems but there are almost no malloc or free, and jni call should be safe, the same goes for openssl ones (I have compiled my own but conceal provided ones also fail)
CheckJni does not warn about anything and while the context handling is a bit out of the box it doesn't seem broken (indeed Android conscrypt seems to use something similar).
Also if someone can point me to a Android native AES multistep (multiple update calls) library I will switch to that and forget this.
The error varies from time to time but it is usually similar to his:
03-26 10:33:02.065: A/dalvikvm(2475): ### ABORTING: DALVIK: HEAP MEMORY CORRUPTION IN mspace_malloc addr=0x0
03-26 10:33:02.065: A/libc(2475): Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 2494 (AsyncTask #1)
03-26 10:33:02.205: I/DEBUG(933): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
03-26 10:33:02.205: I/DEBUG(933): Build fingerprint: 'generic_x86/google_sdk_x86/generic_x86:4.4.2/KK/999428:eng/test-keys'
03-26 10:33:02.205: I/DEBUG(933): Revision: '0'
03-26 10:33:02.205: I/DEBUG(933): pid: 2475, tid: 2494, name: AsyncTask #1 >>> com.proton <<<
03-26 10:33:02.205: I/DEBUG(933): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbaad
Full stack traces:
http://pastebin.com/f6mDuQEj
It is working and it can decipher files without problems but I get
random Heap Memory Corruptions when I work with large streams.
From above line it looks to me that your program is clearly overwriting the memory which was allocated implicitly or explicitly by your code. I was trying to understand your code however it was not clear to me. But I tried to look from memory corruption scenario and found that your program does have malloc/free call which might lead to memory overrun.
EVP_CIPHER_CTX *ctx = (EVP_CIPHER_CTX*) malloc(sizeof(EVP_CIPHER_CTX));
EVP_CIPHER_CTX_init(ctx);
EVP_CIPHER_CTX *ctx = (EVP_CIPHER_CTX*) malloc(sizeof(EVP_CIPHER_CTX));
EVP_CIPHER_CTX_init(ctx);
I tried to check the layout of the EVP_CIPHER_CTX structure but it was not available in your code. But I saw that these pointers are getting used in various context within your program. Now you should check that under which scenario your buffer can be overwritten as some places you have used different keyLength and depending on this your program is executing different function. I think you may want to review these codes and see whether overflow is possible!!!....
As your application would be running on android based system where we can not run any dynamic tool(Valgrind/WinDBG/Pageheap..) so I guess you need to review your code by putting some log at important place and see where you are overwriting.
Hope above information would be useful for you to understand your problem.
After all I was able to work around this problem, EVP_CipherUpdate (or jni ReleaseByteArrayElements) sometimes overflow the output buffer causing the heap corruption, nothing in my code was wrong and also it was not a problem with the caller as replacing EVP_CipherUpdate with a memcpy call with the same parameters worked as expected and there was no heap corruption.
So the solution was adding some extra length to the output buffer sent to nativeUpdate and the error was gone.
I have made the full working version of the library for others to use at:
https://github.com/frisco82/conceal

Android native application running even after segmentation fault with same PID

I'm running custom android on i.MX51 board and observed a strange issue with an application.
I got logs in n logcat of segmentation fault of an application (native, written using NDK) :
03-19 15:26:46.763 I/DEBUG ( 2234): pid: 2257, tid: 2257 >>> /usr/bin/powerMgr <<<
03-19 15:26:46.763 I/DEBUG ( 2234): signal 8 (SIGFPE), code 0 (?), fault addr 000008d1
Even after this the application continued to run with same PID (2257) which I confirmed from both logcat and ps command. Is this possible ? If yes, how ??
That's not a segmentation fault (SIGSEGV, signal 11). You got a SIGFPE, signal 8, possibly as the result of an integer divide-by-zero. The signal handling didn't kill the process, so it just continued executing.
Many ARM CPUs lack hardware division instructions, so the SIGFPEis thrown explicitly from the software divide function. As a result you don't get a meaningful value in "fault addr".
The treatment of this has changed over time; newer versions of Android are a bit better about it.

Android JNI: root-causing deadd00d (dvmAbort)

Comments on a number of StackOverflow questions have pointed out that a fault address of deadd00d indicates a deliberate VM abort.
I DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadd00d
And indeed, when running the logs through ndk-stack, I see that the top of the stack frame decodes to:
Stack frame #00 pc 00050b0e /system/lib/libdvm.so (dvmAbort)
Then the comments say to look earlier in your logs for the problem. What exactly am I looking for -- is there a particular tag or string to search for? (dalvikvm perhaps?) I've scrolled through many pages of logs without finding anything relevant -- is that normal, or should it be immediately before the fault?
The deadd00d most frequently happens inside a particular call to GetObjectClass(). I've tried calling env->ExceptionCheck immediately before that line, but it doesn't report any prior errors.
I've also tried turning on CheckJNI with
adb shell setprop debug.checkjni 1
per the instructions here and here, but when killing and re-launching the app, I don't see the expected message
D Late-enabling CheckJNI
but rather
D AndroidRuntime: CheckJNI is OFF
Using adb shell getprop indicates that the property really is on, so I'm not sure what's going on there.
if it is native crash,you can search "backtrace"
it will point where you native code methods crash,than you should analyze the methods,

Categories

Resources