I'm trying to write a service that communicates with a USB device using USB Interrupt transfer. Basically I'm blocking on UsbConnection.requestWait() in a thread to wait for interrupts transfers in, then pass those to the activity using an intent.
I seem to be having problems when the USB devices sends me a largish number of interrupt packets in a row (about 50). It sometimes works but usually the app crash with a message of that flavor:
02-23 01:55:53.387: A/libc(8460): ### ABORTING: heap corruption detected by tmalloc_small
02-23 01:55:53.387: A/libc(8460): Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 8460 (pf.mustangtamer)
it's not always a malloc call that fails, I have seen several flavors of malloc (dlmalloc, malloc_small) as well as dlfree. In every instance I get a Fatal Signal 11 and a reference to 0xdeadbaad so somehow I am corrupting the heap.
It's not obvious from the heap dump what is causing the corruption.
Here is what I believe is the offending code (the problem only occurs when receiving many packets back to back to back):
private class ReceiverThread extends Thread {
public ReceiverThread(String string) {
super(string);
}
public void run() {
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
buffer.clear();
UsbRequest inRequest = new UsbRequest();
inRequest.initialize(mUsbConnection, mUsbEndpointIn);
while(mUsbDevice != null ) {
if (inRequest.queue(buffer, BUFFER_SIZE) == true) {
// (mUsbConnection.requestWait() is blocking
if (mUsbConnection.requestWait() == inRequest){
buffer.flip();
byte[] bytes = new byte[buffer.remaining()];
buffer.get(bytes);
//TODO: use explicit intent, not broadcast
Intent intent = new Intent(RECEIVED_INTENT);
intent.putExtra(DATA_EXTRA, bytes);
sendBroadcast(intent);
} else{
Log.d(TAG, "mConnection.requestWait() returned for a different request (likely a send operation)");
}
} else {
Log.e(TAG, "failed to queue USB request");
}
buffer.clear();
}
Log.d(TAG, "RX thread terminating.");
}
}
Right now the activity is not consuming the intents, I'm trying to get the USB communication to stop crashing before I implement that side.
I'm not seeing how the code above can corrupt the heap, possibly through some non-thread safe behavior. Only one request is queued at a time so I think "buffer" is safe.
My target is a tablet running JB 4.3.1 if that makes a difference.
I'm not seeing anything wrong with this either. You may want to try removing code from your loop and see if it still corrupts the heap to help you zoom on the offending area.
Remember that heap operations are usually delayed, the garbage collector doesn't run immediately, so you could be corrupting it somewhere else, and it's only showing up in this loop because it is very heap intensive.
try to use a larger heap size by setting android:largeHeap="true" in your application manifest.
I would have asked these questions in a comment, but alas, not enough rep.
I see nothing directly wrong with the code above, but I would check the following:
What is BUFFER_SIZE? crazily, I've had very strange problems with UsbRequest.queue() for sizes greater than 15KB. I'm pretty sure that this wouldn't cause your heap corruption, but it could lead to weirdness later. I had to break my requests into multiple calls to queue() to do large reads.
Are you using a bulk USB endpoint? I don't know what your application is, so I cant say for sure if you should be using a bulk endpoint or not, but its the type of endpoint intended for large transfers.
Lastly, when I encountered this 0xdeadbaad problem (detected by tmalloc_large), it had nothing to do with the code I thought was at fault (the code near the malloc) - it was of course a threading issue in which I had JNI native code reading/writing the same buffers on multiple separate threads! Its only that it gets detected when malloc is called, as user3343927 mentioned.
Related
I'm receiving a large batch of data from a BLE server via NOTIFY. On the iOS counterpart of the app, this transfer is pretty straightforward and has no performance drop, however on Android I can't seem to find an implementation that doesn't have terrible performance.
The initial implementation used a simple ByteArray override:
// Warning! Semi-pseudocode
var rawData = ByteArray(0)
characteristic.onReceive { data ->
rawData += data
}
This works pretty okay up until around 3MB transferred. After that, the assignment slows down incredibly (e.g. first 10% - ~1MB - would transfer under a minute, second 10% would be around a minute, third 10% almost 2 minutes, then all the following batches increase in time by roughly the amount the previous batch took, think Fibonacci sequence). I understand this is because with every onReceive call, I create a new ByteArray and throw away the previous object, at the end resulting in GC collecting 2-3MB objects.
This is obviously not optimal, especially when the full transfer is ~16MB.
I've tried a preallocated ByteBuffer (both native and Java-based allocation), LinkedList and Array of ByteArrays, and so on.
The onReceive part is solved with Kotlin Coroutines (ReceiveChannel<ByteArray> and Flow<ByteArray>), and if I'm not saving the data, it is even faster than iOS (mainly due to the MTU difference - 162 vs 250 bytes received).
What would be the optimal way of collating these received batches into a single ByteArray?
I wouldn't be confident your issue is related to buffer allocation, but we use a ByteArrayOutputStream to aggregate MTU-sized segments coming from characteristic notifications.
ByteArrayOutputStream inputBuffer = new ByteArrayOutputStream();
public void onCharacteristicChanged(BluetoothGatt gatt, BluetoothGattCharacteristic chr) {
inputBuffer.write(chr.getValue());
}
Before you tell me that I should not kill threads and instead send a signal/set a flag for them to react, let me explain the scenario:
I'm developing an audio player in Android NDK with OpenSL API (play a local mp3 file), but the Android implementation has a bug where If I perform repeatedly a seek operation on the file, the thread sadly hangs in a kind of internal deadlock when I try to free resources (SLObjectItf->Destroy).
So I moved the destroy routine to a child thread and wait for a fixed amount of time for it to finish, if it doesn't, I consider the thread as hanged and continue execution leaking some resources, which is preferable than having to go to the system settings and manually kill the app.
I tried to kill the child thread with pthread_kill using the signals SIGTERM and SIGKILL, but it seems both are terminating my whole application and Android restarts it. I cannot use pthread_cancel because the thread is hanged and also that method is not supported on Android NDK.
Is there any way to kill the child thread without killing the entire app?
EDIT: Here is the thread and the code starting it
static void *destroyDecoderInBackground(void *ignoredArgument)
{
if (decoderObject != NULL)
{
__android_log_print(ANDROID_LOG_INFO, "OpenSLES", "Destroying decoder object");
(*decoderObject)->Destroy(decoderObject);
__android_log_print(ANDROID_LOG_INFO, "OpenSLES", "Decoder object destroyed");
decoderObject = NULL;
decoderPlay = NULL;
decoderSeek = NULL;
decoderBufferQueue = NULL;
}
pthread_mutex_lock(&destroyDecoderLock);
pthread_cond_signal(&destroyDecoderCond);
pthread_mutex_unlock(&destroyDecoderLock);
pthread_exit(0);
}
static void destroyDecoder(JNIEnv* env)
{
logJava("Trying to destroy decoder");
struct timespec timeToWait;
struct timeval now;
// get absolute future time to wait
clock_gettime(CLOCK_REALTIME, &timeToWait);
timeToWait.tv_nsec = timeToWait.tv_nsec + (500 * 1000000);
// wait for destroy decoder thread to complete
pthread_mutex_lock(&destroyDecoderLock);
pthread_create(&destroyDecoderThread, NULL, &destroyDecoderInBackground, NULL);
logJava("Starting waiting");
pthread_cond_timedwait(&destroyDecoderCond, &destroyDecoderLock, &timeToWait);
pthread_mutex_unlock(&destroyDecoderLock);
logJava("Finished waiting");
if(decoderObject != NULL)
{
logJava("Destroy decoder hanged, killing thread, resources will leak!!!");
pthread_kill(destroyDecoderThread, SIGTERM);
decoderObject = NULL;
decoderPlay = NULL;
decoderSeek = NULL;
decoderBufferQueue = NULL;
}
}
From the pthread_kill man page:
Signal dispositions are process-wide: if a signal handler is
installed, the handler will be invoked in the thread thread, but if
the disposition of the signal is "stop", "continue", or "terminate",
this action will affect the whole process.
In Dalvik the signals used for special handling (e.g SIGQUIT dumps the stacks, SIGUSR1 causes a GC) are blocked before any threads are created, and then unblocked in the SignalCatcher thread using sigwait(). You can't alter the block status for the threads you don't control, so this won't work for you.
What you can do instead is install a signal handler for an otherwise unused signal (e.g. I don't think SIGUSR2 is used by shipping versions of Dalvik), and have it call pthread_exit(). As noted in the man page for that function:
When a thread terminates, process-shared resources (e.g., mutexes,
condition variables, semaphores, and file descriptors) are not
released, and functions registered using atexit(3) are not called.
This sounds like the "desired" behavior.
Having said all that, please don't give up on the hope of not doing this. :-) It sounds like you recognize part of the problem (resource leaks), but you also have to consider the possibility of inconsistent state, e.g. mutexes that think they're held by the thread that exited. You can end up in a state where other threads will now hang or act strangely.
I'm catching C++ signals so I print some debugging information. But doing so I am unable to get the crash dump that the NDK prints when you crash.
Can you manually print the crash dump. I see debuggerd.c (http://kobablog.wordpress.com/2011/05/12/debuggerd-of-android/) does the work but not sure how I would use it. Otherwise is there some way to rethrow the signal without my signal handler catching it and get the crash dump still.
Here is what I currently do:
struct sigaction psa, oldPsa;
void CESignalHandler::init() {
CELogI("Crash handler started");
psa.sa_sigaction = handleCrash;
psa.sa_flags = SA_SIGINFO;
//sigaction(SIGBUS, &psa, &oldPsa);
sigaction(SIGSEGV, &psa, &oldPsa);
//sigaction(SIGSYS, &psa, &oldPsa);
//sigaction(SIGFPE, &psa, &oldPsa);
//sigaction(SIGILL, &psa, &oldPsa);
//sigaction(SIGHUP, &psa, &oldPsa);
}
void CESignalHandler::handleCrash(int signalNumber, siginfo_t *sigInfo, void *context) {
static volatile sig_atomic_t fatal_error_in_progress = 0;
if (fatal_error_in_progress) //Stop a signal loop.
_exit(1);
fatal_error_in_progress = 1;
char* j;
asprintf(&j, "Crash Signal: %d, crashed on: %x, UID: %ld\n", signalNumber, (long) sigInfo->si_addr, (long) sigInfo->si_uid); //%x prints out the faulty memory address in hex
CELogE(j);
CESignalHandler::getStackTrace();
sigaction(signalNumber, &oldPsa, NULL);
}
You need to reset the signal handler(s) to the previous function, and then crash again -- ideally at the point where the signal was originally thrown. You can do this by passing a struct sigaction in as the 3rd argument to sigaction(), and using the saved value to restore the original behavior in the signal handler.
This can be a bit tricky because of the way debuggerd works (and because the way it works has changed over time). For a "hard" failure like a segmentation fault, returning from the signal handler just causes the same signal to be re-thrown. The Android crash handler uses this to its advantage by contacting debuggerd, waiting for it to attach with ptrace, and then resuming. debuggerd then gets to watch the process as it crashes (for the second time).
This doesn't work for "soft" failures, e.g. somebody manually sends your process a SIGABRT or gets a SIGPIPE from a write(). If the signal handler contacts debuggerd and resumes, the process just clears the signal and continues on, leaving debuggerd to wait indefinitely for a second crash that never happens. This was partially fixed a couple of releases back; now the debug code re-issues the signal itself (which doesn't actually do anything until the signal handler returns, because the signal is blocked while the handler runs). This usually works, and when it doesn't, debuggerd will time out and drop the connection.
So. If you receive a segmentation fault or bus error, you can just restore the original signal handler and then return from yours, and when the process crashes again the debuggerd handler will deal with it. If somebody sent you a SIGHUP, you should handle it entirely yourself, because debuggerd doesn't care about that signal at all.
Things get weird with SIGFPE. This is a "soft" failure, because most ARM CPUs don't have a hardware integer divide instruction, and the signal is actually being sent explicitly from the libgcc __div0 function. You can restore the signal handler, and then re-send the signal yourself; but depending on what version of Android you're running you might have to send it twice. Ideally you'd like to be doing this from the code that encountered the arithmetic problem, rather than the signal handler, but that's tricky unless you can replace __div0. You would need to send the signal with tgkill(), not kill(), as the latter will result in the signal being sent to the main thread of the process, which would cause debuggerd to dump the stack for the wrong thread.
You might be tempted to copy the handler out of bionic/linker/debugger.cpp, but that's a bad idea -- the protocol used to communicate between bionic and debuggerd has changed in the past and will likely change again.
You need call oldPsa manually like this
oldPsa(signalNumber, sigInfo, context)
My application has a UI (implemented with an Activity) and a service (implemented with the IntentService). The service is used to send data (synchronous, using NetworkStream.Write) to a remote server as well as to update the transmission status to the UI (implemented using Broadcast Receiver method).
Here is my problem:
The application works properly if the size of the buffer used for the NetworkStream.Write is 11 KB or less.
However, if the size of the buffer is larger than 11 KB, say 20 KB (this size needed in order to send jpg images), then the sevice keeps working properly (verified with log file), nonetheless the UI its gone (similar as if device's back button is pushed) and I can't find the way to bring it back. Its important to point out that in this case the Activity its not going into OnStop() nor OnDestroy() states.
At first I thought this would be some ApplicationNotResponding related issue due to a server delay, yet the UI crashes after about 5 sec.
Moreover, this only happens with the Hardware version. The emulator version works fine.
// SEND STREAM:
Byte[] outStream = new Byte[20000];
// -- Set up TCP connection: --
TcpClient ClientSock = new TcpClient();
ClientSock.Connect("myserver.com", 5555);
NetworkStream serverStream = ClientSock.GetStream();
serverStream.Write(outStream, 0, outStream.Length);
serverStream.Flush();
// . . .
// RECEIVE STREAM:
inStream.Initialize(); // Clears any previous value.
int nBytesRead = 0;
nBytesRead = serverStream.Read(inStream, 0, 1024);
// -- Closing communications socket: --
ClientSock.Close();
One thing first: I would have been commented the question to clarify one thing before I give an answer, but unfortunately I don't have enough reputation yet.
The thing I would have asked for is: Why do you need to have a buffer greater than 11k to send an JPG image?
I nearly do the same in one (async) task with an image of 260k, but with a buffer of 10240 Bytes. Works without difficulties.
byte[] buffer = new byte[10240];
for (int length = 0; (length = in.read(buffer)) > 0;) {
outputStream.write(buffer, 0, length);
outputStream.flush();
bytesWritten += length;
progress = (int) ((double) bytesWritten * 100 / totalBytes);
publishProgress();
}
outputStream.flush();
I use this part to read an JPG image from resources or SD and post to my server.
Well you may want to change your application to use asynctask and take a look to the guide :
http://developer.android.com/training/basics/network-ops/connecting.html
Network operations can involve unpredictable delays. To prevent this from causing a poor user experience, always perform network operations on a separate thread from the UI.
Since android 4.0 it's impossible to perform network related task in the same thread as the UI thread. Also just to be clear http://developer.android.com/guide/components/services.html
Caution: A service runs in the main thread of its hosting process—the
service does not create its own thread and does not run in a separate
process
I'm having some trouble understanding why this code
public class BitmapAllocTest extends Activity {
/** Called when the activity is first created. */
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
byte[] b = new byte[20 * 1000 * 1000];
b = null;
Bitmap.createBitmap(2500, 2000, Bitmap.Config.ARGB_8888);
}
}
throws an OutOfMemory exception on a device with a 24mb heap limit. If I comment out either of the allocations it runs fine. I was under the impression that the java vm would try to garbage collect before throwing OutOfMemory exceptions.
I suspect it having to do with android allocating the bitmaps on the native heap.
I posted this on the issue tracker and got this answer:
There are a couple of things going on.
The VM on older devices uses
conservative collection. Most (but
not all) devices running >= 2.0 will
use type-precise GC, but none of them
yet have live-precise GC.
What this means is, the fact that you
set "b = null" doesn't guarantee that
all copies of that reference are gone
-- a copy might still be sitting in a register somewhere, and without
liveness detection the GC can't know
that it will never be used again.
It's also perfectly legal for the
compiler to discard the "b = null"
assignment since you never look at "b"
again.
Bitmap pixel data uses the magical
"external allocation" mechanism rather
than the usual heap allocator.
Sometimes you get unpleasant
interactions.
We're working on fixing all of these
issues.
Link: http://code.google.com/p/android/issues/detail?id=10821
I was under the impression that the java vm would try to garbage collect before throwing OutOfMemory exceptions.
You have to trigger the GC by yourself and retry. I had to do that recently and couldn't figure out another way to do that.