I'm running custom android on i.MX51 board and observed a strange issue with an application.
I got logs in n logcat of segmentation fault of an application (native, written using NDK) :
03-19 15:26:46.763 I/DEBUG ( 2234): pid: 2257, tid: 2257 >>> /usr/bin/powerMgr <<<
03-19 15:26:46.763 I/DEBUG ( 2234): signal 8 (SIGFPE), code 0 (?), fault addr 000008d1
Even after this the application continued to run with same PID (2257) which I confirmed from both logcat and ps command. Is this possible ? If yes, how ??
That's not a segmentation fault (SIGSEGV, signal 11). You got a SIGFPE, signal 8, possibly as the result of an integer divide-by-zero. The signal handling didn't kill the process, so it just continued executing.
Many ARM CPUs lack hardware division instructions, so the SIGFPEis thrown explicitly from the software divide function. As a result you don't get a meaningful value in "fault addr".
The treatment of this has changed over time; newer versions of Android are a bit better about it.
Related
I have been stuck with Fatal Signal errors since using Visual Studio 2019 or 2022 to compile my Xamarin Android Project for more than 3 weeks.
I have used ndk-stack and other tools to see any meaningful stack trace I have failed and our project is not going ahead because of this.
Please note that the same exact code base works perfectly stable if I produce an APK using visual Studio 2017.
Visual Studio 2019 uses Xamarin.Andriod.SDK 12.0.0.3
Visual Studio 2017 uses Xamarin.Android.SDK 9.7.1.0
I have attempted to upgrade to Android 12 but our project is Huge and require significant amount of work. It does not seem to be an option at this point.
The reason I want to use 2019 or 2022 is because of google play store and producing app bundle instead of APK. Visual Studio 2017 does not have app bundle option.
My projects target Android 9 and I am also using PortSip Libraries. I have done extensive googling. I have not found any solid evidence if there is a problem with Xamarin Android SKD in 2019 or a problem with my Samsung Device Tab S6. The only answer I found here is:
Why application is dying randomly?
If this the case why an APK produced by 2017 does not have any segmentation faults or any race conditions.
I have handlers in c# to catch any error but non of them catches these. My app uses google MAPs and we are drawing many layers and updates on the MAP.
My app also receives many signalr updates and makes calls to WCF services.
Examples of errors:
03-09 19:54:53.811 9984 12254 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x40000000080 in tid 12254 (Thread Pool Wor), pid 9984 (------)
03-09 19:54:53.836 12404 12404 E chromium: [0309/195453.835730:ERROR:scoped_ptrace_attach.cc(27)] ptrace: Operation not permitted (1)
03-11 00:35:05.469 27236 27236 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7c13725238
We had a very similar problem with PortSip (null reference exception that was caused by the Garbage Collection thread in PortSip). We found that the problem was in the way a key class PortSip constructor contained mutual references with other classes. When those classes were disposed a null reference exception occurred on the garbage collection thread.
You reported that "I have handlers in C# to catch any error but none of them catches these". We found that to be the case as well. It turned out that since the crash was happening on the CG thread inside the PortSip library - adding error handling in C# had no effect.
If your crashes are also experiencing these symptoms - I will dig into the code and document our fix.
I have a large Android application including C/C++ libraries (via NDK/SWIG).
At some point the app crashes and the tombstone file shows:
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: 'Invalid address 0xbc9b306d passed to free: value not
allocated'
Unfortunately it does not tell me which variable is at address 0xbc9b306d. How to find out?
Restriction: unfortunately due to the scale of the code base I cannot set breakpoints in Android Studio.
BR, Rene
I have a group of crashes in native code that are rare but happen consistently inolving SEGV_MAPERR or SEGV_ACCERR. These crashes are almost always reported by Crashlytics with very low RAM free (1-5% typically). 'Normal' crashes (ie, ones I have debugged) have no pattern in RAM free.
Is it possible these crashes are caused by a low memory condition? What would be the mechanism for this? Is there any way to tell if these are low memory related crashes or programming errors (using pointers wrongly, etc)? In many cases, the crash is happening in a library which I can't debug and I can't replicate the crashes on my devices.
Here's some of these crashes pulled from the Developer Console because it provides a little more detail than Crashlytics in the trace in these cases:
********** Crash dump: **********
Build fingerprint: 'htc/a32eul_metropcs_us/htc_a32eul:5.1/LMY47O/637541.3:user/release-keys'
pid: 10902, tid: 10989, name: .xxx.xxxx >>> com.xxx.xxxxx <<<
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x97f78000
Stack frame #00 pc 0004cd80 /data/app/xxx.xxx.xxxxx-1/lib/arm/libxxx.so: Routine xxxxxMixerInterleavedFloatOutput at libgcc2.c:?
********** Crash dump: **********
Build fingerprint: 'Xiaomi/land/land:6.0.1/MMB29M/V8.1.1.0.MALMIDI:user/release-keys'
pid: 2661, tid: 2746, name: .xxx.xxxx >>> com.xxx.xxxx <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
Stack frame #00 pc 00016954 /system/lib/libc.so (__memcpy_base+36)
Stack frame #01 pc 0000b14c /data/app/com.xxx.xxxx-2/lib/arm/libswresample-2.so: Routine ??
??:0
There are two general possibilities:
A low memory condition in of itself is not going to somehow trigger a segfault in a running application. What can happen is that when the application asks for additional memory to be allocated to it, the memory allocation request fails. This is a well defined memory condition. It is documented that the relevant system calls can fail in allocating memory. But what often happens is that the application are not coded properly to check for a failed memory allocation request, and they crash for that reason. In that case, it is not true that a low memory condition is responsible for an application segfault, it is an application bug.
The Linux kernel overcommits the available memory. As a result of that it is possible that the kernel will have no option but to select a process to be killed, when all available RAM has been exhausted.
However, in the case of the OOM killer kicking in, the chosen victims are terminated with a SIGKILL. A SEGFAULT indicates an application bug.
Iam trying to port Android Lollipop on arndale board and I am facing following issue regarding ART crash (AndroidRunTime).
> I/art ( 2264): RelocateImage: /system/bin/patchoat
> --input-image-location=/system/framework/boot.art --output-image-file=/data/dalvik-cach6 F/libc ( 2443): No [stack] line found in "/proc/self/task/2443/maps"! F/libc ( 2443): Fatal signal 6
> (SIGABRT), code -6 in tid 2443 (patchoat) W/art ( 2702): Could not
> create image space with image file >/system/framework/boot.art.
> Attempting to fall back to imageless running
STEPS FOLLOWED FOR PORTING
1.Download vexpress android L 32 bit code from below link.
http://releases.linaro.org/15.05/android
2.Download arndale android KK 32 bit source with 3.9 kernel from http://releases.linaro.org/14.08/android/arndale
3.Replace the Vexpress kernel source from code download in step 1 with arndale KK 3.9 Kernel source downloaded from step2.
4.Replace Vexpress HAL (device/linaro/vexpress) with Arndale HAL (device/linaro/arndale).
5.Solve minor complilation issues related to bionic and build environment.
After flashing the images and powering on the board I am stuck at android logo and kernel crashes
> >37.790000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 Tainted: G W (3.9.1 #8) [ 37.790000]
> CPU: 0 Tainted: G W (3.9.1 #8) PC is at
> __copy_to_user_std+0x4c/0x3a8 [ 37.790000] PC is at __copy_to_user_std+0x4c/0x3a8 LR is at 0x6c000000
> >[ 37.790000] LR is at 0x6c000000
and logcat gives
> >I/art ( 2264): RelocateImage: /system/bin/patchoat --input-image-location=/system/framework/boot.art --output-image-file=/data/dalvik-cach6 F/libc ( 2443): No [stack] line found in "/proc/self/task/2443/maps"! F/libc ( 2443): Fatal signal 6
> (SIGABRT), code -6 in tid 2443 (patchoat) W/art ( 2702): Could not
> create image space with image file >/system/framework/boot.art.
> Attempting to fall back to imageless running.
POINT OF EXACT FAILURE
1.ART calls Thread::InitStackHwm from art/runtime/thread.cc.
2.The above call triggers __pthread_attr_getstack_main_thread(stack_base, stack_size) in bionic/libc/bionic/pthread_attr.cpp which returns No [stack] line found in enter code here/proc/self/task/2443/maps! and ART crashes giving SIG_ABORT and it seems as if no stack is getting created for 2443 thread, but how to solve this?
It would be great if anyone can help me to solve this issue.
Thanks,
Devarsh
This is a side effect of using 3.9 kernel with linaro vexpress android platform which is expecting 3.10 kernel(whose support for arndale is not available).
As a workaround comment out the InitStackHwm() function in art/runtime/thread.cc.
I think if in 3.10 kernel support of arndale is needed we may not need this workaround and ART would work straightaway.
void Thread::Init(ThreadList* thread_list, JavaVMExt* java_vm) {
// This function does all the initialization that must be run by the native thread it applies to.
// (When we create a new thread from managed code, we allocate the Thread* in Thread::Create so
// we can handshake with the corresponding native thread when it's ready.) Check this native
// thread hasn't been through here already...
CHECK(Thread::Current() == nullptr);
SetUpAlternateSignalStack();
InitCpu();
InitTlsEntryPoints();
RemoveSuspendTrigger();
InitCardTable();
InitTid();
// Set pthread_self_ ahead of pthread_setspecific, that makes Thread::Current function, this
// avoids pthread_self_ ever being invalid when discovered from Thread::Current().
tlsPtr_.pthread_self = pthread_self();
CHECK(is_started_);
CHECK_PTHREAD_CALL(pthread_setspecific, (Thread::pthread_key_self_, this), "attach self");
DCHECK_EQ(Thread::Current(), this);
tls32_.thin_lock_thread_id = thread_list->AllocThreadId(this);
//InitStackHwm(); This is the workaround
tlsPtr_.jni_env = new JNIEnvExt(this, java_vm);
thread_list->Register(this);
}
Comments on a number of StackOverflow questions have pointed out that a fault address of deadd00d indicates a deliberate VM abort.
I DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadd00d
And indeed, when running the logs through ndk-stack, I see that the top of the stack frame decodes to:
Stack frame #00 pc 00050b0e /system/lib/libdvm.so (dvmAbort)
Then the comments say to look earlier in your logs for the problem. What exactly am I looking for -- is there a particular tag or string to search for? (dalvikvm perhaps?) I've scrolled through many pages of logs without finding anything relevant -- is that normal, or should it be immediately before the fault?
The deadd00d most frequently happens inside a particular call to GetObjectClass(). I've tried calling env->ExceptionCheck immediately before that line, but it doesn't report any prior errors.
I've also tried turning on CheckJNI with
adb shell setprop debug.checkjni 1
per the instructions here and here, but when killing and re-launching the app, I don't see the expected message
D Late-enabling CheckJNI
but rather
D AndroidRuntime: CheckJNI is OFF
Using adb shell getprop indicates that the property really is on, so I'm not sure what's going on there.
if it is native crash,you can search "backtrace"
it will point where you native code methods crash,than you should analyze the methods,