I'm catching C++ signals so I print some debugging information. But doing so I am unable to get the crash dump that the NDK prints when you crash.
Can you manually print the crash dump. I see debuggerd.c (http://kobablog.wordpress.com/2011/05/12/debuggerd-of-android/) does the work but not sure how I would use it. Otherwise is there some way to rethrow the signal without my signal handler catching it and get the crash dump still.
Here is what I currently do:
struct sigaction psa, oldPsa;
void CESignalHandler::init() {
CELogI("Crash handler started");
psa.sa_sigaction = handleCrash;
psa.sa_flags = SA_SIGINFO;
//sigaction(SIGBUS, &psa, &oldPsa);
sigaction(SIGSEGV, &psa, &oldPsa);
//sigaction(SIGSYS, &psa, &oldPsa);
//sigaction(SIGFPE, &psa, &oldPsa);
//sigaction(SIGILL, &psa, &oldPsa);
//sigaction(SIGHUP, &psa, &oldPsa);
}
void CESignalHandler::handleCrash(int signalNumber, siginfo_t *sigInfo, void *context) {
static volatile sig_atomic_t fatal_error_in_progress = 0;
if (fatal_error_in_progress) //Stop a signal loop.
_exit(1);
fatal_error_in_progress = 1;
char* j;
asprintf(&j, "Crash Signal: %d, crashed on: %x, UID: %ld\n", signalNumber, (long) sigInfo->si_addr, (long) sigInfo->si_uid); //%x prints out the faulty memory address in hex
CELogE(j);
CESignalHandler::getStackTrace();
sigaction(signalNumber, &oldPsa, NULL);
}
You need to reset the signal handler(s) to the previous function, and then crash again -- ideally at the point where the signal was originally thrown. You can do this by passing a struct sigaction in as the 3rd argument to sigaction(), and using the saved value to restore the original behavior in the signal handler.
This can be a bit tricky because of the way debuggerd works (and because the way it works has changed over time). For a "hard" failure like a segmentation fault, returning from the signal handler just causes the same signal to be re-thrown. The Android crash handler uses this to its advantage by contacting debuggerd, waiting for it to attach with ptrace, and then resuming. debuggerd then gets to watch the process as it crashes (for the second time).
This doesn't work for "soft" failures, e.g. somebody manually sends your process a SIGABRT or gets a SIGPIPE from a write(). If the signal handler contacts debuggerd and resumes, the process just clears the signal and continues on, leaving debuggerd to wait indefinitely for a second crash that never happens. This was partially fixed a couple of releases back; now the debug code re-issues the signal itself (which doesn't actually do anything until the signal handler returns, because the signal is blocked while the handler runs). This usually works, and when it doesn't, debuggerd will time out and drop the connection.
So. If you receive a segmentation fault or bus error, you can just restore the original signal handler and then return from yours, and when the process crashes again the debuggerd handler will deal with it. If somebody sent you a SIGHUP, you should handle it entirely yourself, because debuggerd doesn't care about that signal at all.
Things get weird with SIGFPE. This is a "soft" failure, because most ARM CPUs don't have a hardware integer divide instruction, and the signal is actually being sent explicitly from the libgcc __div0 function. You can restore the signal handler, and then re-send the signal yourself; but depending on what version of Android you're running you might have to send it twice. Ideally you'd like to be doing this from the code that encountered the arithmetic problem, rather than the signal handler, but that's tricky unless you can replace __div0. You would need to send the signal with tgkill(), not kill(), as the latter will result in the signal being sent to the main thread of the process, which would cause debuggerd to dump the stack for the wrong thread.
You might be tempted to copy the handler out of bionic/linker/debugger.cpp, but that's a bad idea -- the protocol used to communicate between bionic and debuggerd has changed in the past and will likely change again.
You need call oldPsa manually like this
oldPsa(signalNumber, sigInfo, context)
Related
I try to access the accelerometer from the NDK. So far it works. But the way events are written to the eventqueue seems a little bit strange.
See the following code:
ASensorManager* AcquireASensorManagerInstance(void) {
typedef ASensorManager *(*PF_GETINSTANCEFORPACKAGE)(const char *name);
void* androidHandle = dlopen("libandroid.so", RTLD_NOW);
PF_GETINSTANCEFORPACKAGE getInstanceForPackageFunc = (PF_GETINSTANCEFORPACKAGE) dlsym(androidHandle, "ASensorManager_getInstanceForPackage");
if (getInstanceForPackageFunc) {
return getInstanceForPackageFunc(kPackageName);
}
typedef ASensorManager *(*PF_GETINSTANCE)();
PF_GETINSTANCE getInstanceFunc = (PF_GETINSTANCE) dlsym(androidHandle, "ASensorManager_getInstance");
return getInstanceFunc();
}
void init() {
sensorManager = AcquireASensorManagerInstance();
accelerometer = ASensorManager_getDefaultSensor(sensorManager, ASENSOR_TYPE_ACCELEROMETER);
looper = ALooper_prepare(ALOOPER_PREPARE_ALLOW_NON_CALLBACKS);
accelerometerEventQueue = ASensorManager_createEventQueue(sensorManager, looper, LOOPER_ID_USER, NULL, NULL);
auto status = ASensorEventQueue_enableSensor(accelerometerEventQueue,
accelerometer);
status = ASensorEventQueue_setEventRate(accelerometerEventQueue,
accelerometer,
SENSOR_REFRESH_PERIOD_US);
}
That's how I initialize everything. My SENSOR_REFRESH_PERIOD_US is 100.000 - so 10 refreshs per second. Now I have the following method to receive the events of the event queue.
vector<sensorEvent> update() {
ALooper_pollAll(0, NULL, NULL, NULL);
vector<sensorEvent> listEvents;
ASensorEvent event;
while (ASensorEventQueue_getEvents(accelerometerEventQueue, &event, 1) > 0) {
listEvents.push_back(sensorEvent{event.acceleration.x, event.acceleration.y, event.acceleration.z, (long long) event.timestamp});
}
return listEvents;
}
sensorEvent at this point is a custom struct which I use. This update method gets called via JNI from Android every 10 seconds from an IntentService (to make sure it runs even when the app itself is killed). Now I would expect to receive 100 values (10 per second * 10 seconds). In different tests I received around 130 which is also completly fine for me even it's a bit off. Then I read in the documentation of ASensorEventQueue_setEventRate that it's not forced to follow the given refresh period. So if I would get more than I wanted it would be totally fine.
But now the problem: Sometimes I receive like 13 values in 10 seconds and when I continue to call update 10 secods later I get the 130 values + the missing 117 of the run before. This happens completly random and sometimes it's not the next run but the fourth following or something like that.
I am completly fine with being off from the refresh period by having more values. But can anyone explain why it happens that there are so many values missing and they appear 10 seconds later in the next run? Or is there maybe a way to make sure I receive them in their desired run?
Your code is correct and as i see only one reason can be cause such behaviour. It is android system, for avoid drain battery, decreases frequency of accelerometer stream of events in some time after app go to background or device fall asleep.
You need to revise all axelerometer related logic and optimize according
Doze and App Standby
Also you can try to work with axelerometer in foreground service.
Before you tell me that I should not kill threads and instead send a signal/set a flag for them to react, let me explain the scenario:
I'm developing an audio player in Android NDK with OpenSL API (play a local mp3 file), but the Android implementation has a bug where If I perform repeatedly a seek operation on the file, the thread sadly hangs in a kind of internal deadlock when I try to free resources (SLObjectItf->Destroy).
So I moved the destroy routine to a child thread and wait for a fixed amount of time for it to finish, if it doesn't, I consider the thread as hanged and continue execution leaking some resources, which is preferable than having to go to the system settings and manually kill the app.
I tried to kill the child thread with pthread_kill using the signals SIGTERM and SIGKILL, but it seems both are terminating my whole application and Android restarts it. I cannot use pthread_cancel because the thread is hanged and also that method is not supported on Android NDK.
Is there any way to kill the child thread without killing the entire app?
EDIT: Here is the thread and the code starting it
static void *destroyDecoderInBackground(void *ignoredArgument)
{
if (decoderObject != NULL)
{
__android_log_print(ANDROID_LOG_INFO, "OpenSLES", "Destroying decoder object");
(*decoderObject)->Destroy(decoderObject);
__android_log_print(ANDROID_LOG_INFO, "OpenSLES", "Decoder object destroyed");
decoderObject = NULL;
decoderPlay = NULL;
decoderSeek = NULL;
decoderBufferQueue = NULL;
}
pthread_mutex_lock(&destroyDecoderLock);
pthread_cond_signal(&destroyDecoderCond);
pthread_mutex_unlock(&destroyDecoderLock);
pthread_exit(0);
}
static void destroyDecoder(JNIEnv* env)
{
logJava("Trying to destroy decoder");
struct timespec timeToWait;
struct timeval now;
// get absolute future time to wait
clock_gettime(CLOCK_REALTIME, &timeToWait);
timeToWait.tv_nsec = timeToWait.tv_nsec + (500 * 1000000);
// wait for destroy decoder thread to complete
pthread_mutex_lock(&destroyDecoderLock);
pthread_create(&destroyDecoderThread, NULL, &destroyDecoderInBackground, NULL);
logJava("Starting waiting");
pthread_cond_timedwait(&destroyDecoderCond, &destroyDecoderLock, &timeToWait);
pthread_mutex_unlock(&destroyDecoderLock);
logJava("Finished waiting");
if(decoderObject != NULL)
{
logJava("Destroy decoder hanged, killing thread, resources will leak!!!");
pthread_kill(destroyDecoderThread, SIGTERM);
decoderObject = NULL;
decoderPlay = NULL;
decoderSeek = NULL;
decoderBufferQueue = NULL;
}
}
From the pthread_kill man page:
Signal dispositions are process-wide: if a signal handler is
installed, the handler will be invoked in the thread thread, but if
the disposition of the signal is "stop", "continue", or "terminate",
this action will affect the whole process.
In Dalvik the signals used for special handling (e.g SIGQUIT dumps the stacks, SIGUSR1 causes a GC) are blocked before any threads are created, and then unblocked in the SignalCatcher thread using sigwait(). You can't alter the block status for the threads you don't control, so this won't work for you.
What you can do instead is install a signal handler for an otherwise unused signal (e.g. I don't think SIGUSR2 is used by shipping versions of Dalvik), and have it call pthread_exit(). As noted in the man page for that function:
When a thread terminates, process-shared resources (e.g., mutexes,
condition variables, semaphores, and file descriptors) are not
released, and functions registered using atexit(3) are not called.
This sounds like the "desired" behavior.
Having said all that, please don't give up on the hope of not doing this. :-) It sounds like you recognize part of the problem (resource leaks), but you also have to consider the possibility of inconsistent state, e.g. mutexes that think they're held by the thread that exited. You can end up in a state where other threads will now hang or act strangely.
I am writing android renderscript code which requires back to back kernel calls (sometimes output of one kernel become input of other). I also have some global pointers, binded to memory from Java layer. Each kernel updates those global pointers and outputs something. I hav e to make sure that execute of kernel1 is finished, before kernel2 starts execution.
I looked at android renderscript docs, but couldn't understand syncAll(Usage) and finish() well. Can anyone clarify how to achieve this behaviour?
Thanks
mScript.forEach_kernel1(mColorImageAllocation, tempAlloc);
// make sure kernel1 finishes, from android rs doc, copyTo should block
tempAlloc.copyTo(testOutputBitmap);
for (short i = 0; i < NUM_DIST; i++) {
mScript.set_gCurrentDistanceIndex(i);
mScript.forEach_kernel2(tempAlloc);
mRS.finish(); // wait till kernel2 finishes
}
In the above example, same kernel2 is called with different global parameters on kernel1's output.
For this code you don't need either. RS is a pipeline model so any work which could impact the result of a later command must be finished first by the driver.
syncAll() is used to sync memory spaces not execution. For example to propagate changes from script memory to graphics memory.
I'm trying to write a service that communicates with a USB device using USB Interrupt transfer. Basically I'm blocking on UsbConnection.requestWait() in a thread to wait for interrupts transfers in, then pass those to the activity using an intent.
I seem to be having problems when the USB devices sends me a largish number of interrupt packets in a row (about 50). It sometimes works but usually the app crash with a message of that flavor:
02-23 01:55:53.387: A/libc(8460): ### ABORTING: heap corruption detected by tmalloc_small
02-23 01:55:53.387: A/libc(8460): Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 8460 (pf.mustangtamer)
it's not always a malloc call that fails, I have seen several flavors of malloc (dlmalloc, malloc_small) as well as dlfree. In every instance I get a Fatal Signal 11 and a reference to 0xdeadbaad so somehow I am corrupting the heap.
It's not obvious from the heap dump what is causing the corruption.
Here is what I believe is the offending code (the problem only occurs when receiving many packets back to back to back):
private class ReceiverThread extends Thread {
public ReceiverThread(String string) {
super(string);
}
public void run() {
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
buffer.clear();
UsbRequest inRequest = new UsbRequest();
inRequest.initialize(mUsbConnection, mUsbEndpointIn);
while(mUsbDevice != null ) {
if (inRequest.queue(buffer, BUFFER_SIZE) == true) {
// (mUsbConnection.requestWait() is blocking
if (mUsbConnection.requestWait() == inRequest){
buffer.flip();
byte[] bytes = new byte[buffer.remaining()];
buffer.get(bytes);
//TODO: use explicit intent, not broadcast
Intent intent = new Intent(RECEIVED_INTENT);
intent.putExtra(DATA_EXTRA, bytes);
sendBroadcast(intent);
} else{
Log.d(TAG, "mConnection.requestWait() returned for a different request (likely a send operation)");
}
} else {
Log.e(TAG, "failed to queue USB request");
}
buffer.clear();
}
Log.d(TAG, "RX thread terminating.");
}
}
Right now the activity is not consuming the intents, I'm trying to get the USB communication to stop crashing before I implement that side.
I'm not seeing how the code above can corrupt the heap, possibly through some non-thread safe behavior. Only one request is queued at a time so I think "buffer" is safe.
My target is a tablet running JB 4.3.1 if that makes a difference.
I'm not seeing anything wrong with this either. You may want to try removing code from your loop and see if it still corrupts the heap to help you zoom on the offending area.
Remember that heap operations are usually delayed, the garbage collector doesn't run immediately, so you could be corrupting it somewhere else, and it's only showing up in this loop because it is very heap intensive.
try to use a larger heap size by setting android:largeHeap="true" in your application manifest.
I would have asked these questions in a comment, but alas, not enough rep.
I see nothing directly wrong with the code above, but I would check the following:
What is BUFFER_SIZE? crazily, I've had very strange problems with UsbRequest.queue() for sizes greater than 15KB. I'm pretty sure that this wouldn't cause your heap corruption, but it could lead to weirdness later. I had to break my requests into multiple calls to queue() to do large reads.
Are you using a bulk USB endpoint? I don't know what your application is, so I cant say for sure if you should be using a bulk endpoint or not, but its the type of endpoint intended for large transfers.
Lastly, when I encountered this 0xdeadbaad problem (detected by tmalloc_large), it had nothing to do with the code I thought was at fault (the code near the malloc) - it was of course a threading issue in which I had JNI native code reading/writing the same buffers on multiple separate threads! Its only that it gets detected when malloc is called, as user3343927 mentioned.
I am writing network communication program with Android ndk, using epoll.
I found the method ‘epoll_wait’ woken not very accurate
while(1){
struct epoll_event events[3];
log_string("epoll_wait start");//here will print start time
events_len = epoll_wait(_epoll_fd, events, 3, 20 * 1000);// wait 20 second,for test,I use pipe instead of socket,monitor a pipe EPOLLIN event
if (events_len <= 0) {
log_string("epoll_wait end events_len=%d,errno=%d", events_len, errno);//Normally,the events_len always is 0,and errno is 0
}
}
The above code runs on the PC(like Ubuntun PC) is very normal,as expected.
If it runs on Android Phone(use Android Service , separate thread to run) is as expected at first.
After some time,epoll_wait becomes not very accurate,sometimes got -1 and errno=4,sometimes waited very long.
So I only know that phenomenon, but do not know why.
Can you tell why and tell me the best practices for use android epoll?
thx
4 is EINTR, which means your app got a signal. This isn't really an error, just restart epoll.
Regarding "waited very long", does your app hold at least a partial wakelock?