I have a bunch of various Android phones in front of me all running 4.3/4.4 and they all seem to be suffering from some bug in Bluetooth. The app I am running is simply scanning for other bluetooth devices around it using this callback: http://developer.android.com/reference/android/bluetooth/BluetoothAdapter.LeScanCallback.html
Just LogCatting the data and still having problems...
Does anyone know about this bug and have a fix for it? I really need to get bluetooth scanning stable for a deadline I have tomorrow for a demo of my application...
Thanks.
EDIT: Supposedly in 4.4.3 (or 4.4.4) this was resolved. (Of course the day of our presentation for the project...did us no good). The main issue was the XML file keeping track of mac addresses growing over the size of 2000 and then crashing...a system reset would clear the xml file, thus solving the problem temporarily.
This is a bug in the Android bluetooth code which does not appear to have a resolution at present. Since other people keep finding this as well, I'm going to post what I found when tracing the problem through the bluetooth stack, even though it cannot really be applied as a resolution unless one is prepared to make major changes to an AOSP-based install.
Fundamentally, the problem is a SIGSEGV in btif_config.c at find_add_node() when alloc_node() fails after hearing too many unique BTLE hardware addresses.
Informative part of stack trace
D/BtGatt.btif(22509): btif_gattc_upstreams_evt: Event 4096
D/BtGatt.btif(22509): btif_gattc_add_remote_bdaddr device added idx=1
D/BtGatt.btif(22509): btif_gattc_update_properties BLE device name=beacon len=6 dev_type=2
F/libc (22509): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1), thread 22530 (BTIF)
I/DEBUG ( 171): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG ( 171): Build fingerprint: 'google/occam/mako:4.4.2/KOT49H/937116:user/release-keys'
I/DEBUG ( 171): Revision: '11'
I/DEBUG ( 171): pid: 22509, tid: 22530, name: BTIF >>> com.android.bluetooth <<<
I/DEBUG ( 171): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
I/DEBUG ( 171): r0 ffffffff r1 00007d00 r2 00007c60 r3 74c7cf00
I/DEBUG ( 171): r4 74c7cf10 r5 00000000 r6 756f95a8 r7 7503c671
I/DEBUG ( 171): backtrace:
I/DEBUG ( 171): #00 pc 0004e68c /system/lib/hw/bluetooth.default.so
I/DEBUG ( 171): #01 pc 0004ea65 /system/lib/hw/bluetooth.default.so (btif_config_set+156)
Disassembling, the code in question is this rather obviously problematic series of clearing r5 and then attempting to de-reference it as a base pointer:
4e68a: 2500 movs r5, #0
4e68c: 6829 ldr r1, [r5, #0]
4e68e: b919 cbnz r1, 4e698 <btif_gattc_test_command_impl+0x74c>
4e690: 4630 mov r0, r6
4e692: f7dd ef78 blx 2c584 <strdup#plt>
This corresponds to the “if(!node->name)” check at the end of find_add_node()
static cfg_node* find_add_node(cfg_node* p, const char* name)
{
int i = -1;
cfg_node* node = NULL;
if((i = find_inode(p, name)) < 0)
{
if(!(node = find_free_node(p)))
{
int old_size = alloc_node(p, CFG_GROW_SIZE);
if(old_size >= 0)
{
i = GET_NODE_COUNT(old_size);
node = &p->child[i];
ADD_CHILD_COUNT(p, 1);
} /* else clause to handle failure of alloc_node() is missing here */
} else ADD_CHILD_COUNT(p, 1);
}
else node = &p->child[i];
if(!node->name) /* this will SIGSEGV if node is still NULL */
node->name = strdup(name);
return node;
}
Specifically, there is no else clause to handle the failure of alloc_node(), so when that happens (presumably due to running out of storage after hearing too many device addresses) the code falls through and attempts to dereference the name member of the node pointer without ever having set it to a non-null address.
A fix would presumably need to involve:
non-crash handling of this error case when a new record cannot be allocated
more aggressive discarding of past-heard addresses when new ones keep being heard and the number of records being stored becomes unreasonable
Someone just opened an issue: https://code.google.com/p/android/issues/detail?id=67272 . Any supporting evidence should go there and hopefully Google fixes this in next release.
This is probably a rare cause, but I had this issue when trying to use an L Tone with my Galaxy Nexus. I tried various solutions, but nothing worked, then I remembered that I installed an app (in my rooted phone) that enabled Bluetooth LE (Normally 4.3 on Galaxy Nexus doesn't have access). Once I uninstalled the files the app installed, it seems to be working fine.
So, if thing aren't working still.. ask yourself if you've done anything custom on the phone that might be conflicting.
Fix that worked for my S4 (i9500) :
After installing a custom ROM for Android 5.1.1, I started facing this 'Bluetooth stopped' issue. Since my phone was rooted, I installed 'Root Uninstaller' app and FROZE bluetooth services. I haven't had any issues after that. But again I am not sure if my bluetooth is working properly. It's just that I dont get those annoying pop-ups anymore.
Hope this helps someone !
Related
I have a group of crashes in native code that are rare but happen consistently inolving SEGV_MAPERR or SEGV_ACCERR. These crashes are almost always reported by Crashlytics with very low RAM free (1-5% typically). 'Normal' crashes (ie, ones I have debugged) have no pattern in RAM free.
Is it possible these crashes are caused by a low memory condition? What would be the mechanism for this? Is there any way to tell if these are low memory related crashes or programming errors (using pointers wrongly, etc)? In many cases, the crash is happening in a library which I can't debug and I can't replicate the crashes on my devices.
Here's some of these crashes pulled from the Developer Console because it provides a little more detail than Crashlytics in the trace in these cases:
********** Crash dump: **********
Build fingerprint: 'htc/a32eul_metropcs_us/htc_a32eul:5.1/LMY47O/637541.3:user/release-keys'
pid: 10902, tid: 10989, name: .xxx.xxxx >>> com.xxx.xxxxx <<<
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x97f78000
Stack frame #00 pc 0004cd80 /data/app/xxx.xxx.xxxxx-1/lib/arm/libxxx.so: Routine xxxxxMixerInterleavedFloatOutput at libgcc2.c:?
********** Crash dump: **********
Build fingerprint: 'Xiaomi/land/land:6.0.1/MMB29M/V8.1.1.0.MALMIDI:user/release-keys'
pid: 2661, tid: 2746, name: .xxx.xxxx >>> com.xxx.xxxx <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
Stack frame #00 pc 00016954 /system/lib/libc.so (__memcpy_base+36)
Stack frame #01 pc 0000b14c /data/app/com.xxx.xxxx-2/lib/arm/libswresample-2.so: Routine ??
??:0
There are two general possibilities:
A low memory condition in of itself is not going to somehow trigger a segfault in a running application. What can happen is that when the application asks for additional memory to be allocated to it, the memory allocation request fails. This is a well defined memory condition. It is documented that the relevant system calls can fail in allocating memory. But what often happens is that the application are not coded properly to check for a failed memory allocation request, and they crash for that reason. In that case, it is not true that a low memory condition is responsible for an application segfault, it is an application bug.
The Linux kernel overcommits the available memory. As a result of that it is possible that the kernel will have no option but to select a process to be killed, when all available RAM has been exhausted.
However, in the case of the OOM killer kicking in, the chosen victims are terminated with a SIGKILL. A SEGFAULT indicates an application bug.
This code fragment is extracted from an Android crash report on a Samsung Tab S:
Build fingerprint: 'samsung/chagallwifixx/chagallwifi:5.0.2/LRX22G/T800XXU1BOCC:user/release-keys'
Revision: '7'
ABI: 'arm'
r0 a0d840bc r1 a0dcb880 r2 00000001 r3 a0d840bc
r4 a0dc3c4c r5 00000000 r6 a066d200 r7 00000000
r8 32d68f40 r9 a0c359a8 sl 00000014 fp bef3ba84
ip a0dc3fb8 sp bef3ba10 lr a0c35a0c pc a0c34bc8 cpsr 400d0010
r0 through r9 are pretty clearly general purpose registers, sp (r13) is the stack pointer, and pc (r15) is the program counter (instruction pointer). Referring to the Wikipedia's ARM Architecture page Registers section (one of many pages I looked through), I find that lr (r14) is the link register, and cpsr is the "Current Program Status Register."
I would like to know what sl (r10), fp (r11) and ip (r12) are. I expect ip is not the "instruction pointer" because that function is done by pc (r15).
Is there a reference document I haven't found that illustrates these names?
The current ARM EABI procedure call standard outlines the standard 'special' names for r12-r15:
PC (r15): Program counter
LR (r14): Link register
SP (r13): Stack pointer
IP (r12): Intra-procedure scratch register*
The GNU tools also still support names from the deprecated legacy APCS as identifiers for the given register numbers, even though they no longer necessarily have any meaning:
FP (r11): Frame pointer - may still be true for ARM code; Thumb code tends to keep actual frame pointers in r7, and of course the code may be compiled without frame pointers at all, in which cases "fp" is just another callee-saved general register.
SL (r10): Stack limit - I don't actually know the history of that one, but in most modern code r10 is no more special than r4-r8.
Note that r9 is not necessarily a general-purpose register - the EABI reserves it for platform-specific purposes. Under linux-gnueabi it's nothing special, but other platforms may use it for special purposes like a TLS or global object table pointer, so it may also go by SB (static base) or TR (thread register).
* The story behind that the limited range of the PC-relative branch instructions - if the linker finds the target of a call ends up more than 32MB away, it may generate a veneer (some extra instructions within range of the call site) as the branch target, that computes the real address and performs an absolute branch, for which it may need a scratch register.
I have some code in C++ that uses NDK. When a crash occurs in the C++ code (on device; not through emulator), I get a tombstone (crash dump), that contains a call stack that is always 2 levels deep:
I/DEBUG ( 5089): pid: 5048, tid: 5062 >>> com.example.site <<<
I/DEBUG ( 5089): #00 pc 0059e08c /data/data/com.example.site/lib/libexample.so (_ZNK10MyNamespaceAPI11MyClass12GetDataEv)
I/DEBUG ( 5089): #01 lr 5bc9ef2c /data/data/com.example.site/lib/libexample.so
I/DEBUG ( 5089): 5cc6e764 5bce3070 /data/data/com.example.site/lib/libexample.so
I/DEBUG ( 5089): 5cc6e774 5bce309c /data/data/com.example.site/lib/libexample.so
I/DEBUG ( 5089): 5cc6e784 5bce2af4 /data/data/com.example.site/lib/libexample.so
I/DEBUG ( 5089): 5cc6e788 5c27ea9c /data/data/com.example.site/lib/libexample.so
Is there a way to configure my app or Android to provide more detail and depth in the call stack printed to the crash dump? What actually determines this? I've seen some examples where people get up to 15 levels of call stack depth.
The backtrace mechanism, which has evolved over the past few years, shows as many frames as it is able to find (up to a fixed limit of 32, IIRC). It will stop early if something prevents it from walking any farther up the stack.
The call mechanism on ARM puts the return address in the link register (LR), but the compiler is allowed to spill that onto the stack. For "noreturn" functions it technically doesn't have to set it at all. There are assembler pseudo-ops that add meta-data that help unwinders figure out where the return address can be found, and in the more recent versions of Android that should all work.
When you get a two-deep stack trace, it means that unwinding has failed on the current method, and it can only show you the value of the program counter (PC) and the value that happens to be in LR.
Make sure you're compiling with -g to enable debugging.
Is the failing function called directly from JNI? In some older versions of Android the trace would stop at the JNI call bridge because of the way the code was structured, though that was fixed in Dalvik back in 2011. Recent devices use Art, though, which I expect has a different way of doing things.
Similar question here.
I have some stack traces in my Crash Tab in Developer Console, I was wondering how to read and debug them. The code is in cocos2dx so its really not human readable.
I know the technique with the ndk-stack where it tell you the class and line number etc. but can it be applicable in this scenario since I don't have the obj/local/armeabi folder with me.
adb logcat | ndk-stack -sym ./obj/local/armeabi is the command which I am talking about.
Is there any other solution to solve this problem, so that I can see what crashes users are experiencing.
Thank you.
A stack trace from the application. How can we trace this? And its crashing on just one device. Galaxy Tab 3 10.1 with such logs.
Build fingerprint:
'samsung/santos103gxx/santos103g:4.2.2/JDQ39/P5200XXUANC1:user/release-keys'
pid: 11321, tid: 11321, name: ldhelloworld >>> com.test.helloworld
<<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000005
Stack frame #00 pc 0002b098 Stack frame #01 pc 00000020
Crash dump is completed
I'm running custom android on i.MX51 board and observed a strange issue with an application.
I got logs in n logcat of segmentation fault of an application (native, written using NDK) :
03-19 15:26:46.763 I/DEBUG ( 2234): pid: 2257, tid: 2257 >>> /usr/bin/powerMgr <<<
03-19 15:26:46.763 I/DEBUG ( 2234): signal 8 (SIGFPE), code 0 (?), fault addr 000008d1
Even after this the application continued to run with same PID (2257) which I confirmed from both logcat and ps command. Is this possible ? If yes, how ??
That's not a segmentation fault (SIGSEGV, signal 11). You got a SIGFPE, signal 8, possibly as the result of an integer divide-by-zero. The signal handling didn't kill the process, so it just continued executing.
Many ARM CPUs lack hardware division instructions, so the SIGFPEis thrown explicitly from the software divide function. As a result you don't get a meaningful value in "fault addr".
The treatment of this has changed over time; newer versions of Android are a bit better about it.