ANR Input dispatching timed out (server) is not responding

ANR Input dispatching timed out (server) is not responding - android

In my Android application I am getting a very strange crash randomly. When I open the application I am fetching data from server on splash screen. When the application land on home page suddenly whole application is closed and here is what gets printed in the log:
ANR in com.test (com.test/.activities.SplashActivity)
PID: 16020
Reason: Input dispatching timed out (49abf6 com.test/com.test.activities.SplashActivity (server) is not responding. Waited 5004ms for FocusEvent(hasFocus=false))
Parent: com.test/.activities.SplashActivity
Load: 29.87 / 30.79 / 30.77
----- Output from /proc/pressure/memory -----
some avg10=0.00 avg60=0.00 avg300=0.00 total=3115344
full avg10=0.00 avg60=0.00 avg300=0.00 total=1338179
----- End output from /proc/pressure/memory -----

The cause of this exception is that UI thread is blocked and your application is not responding for more than 5 seconds.
Double check that the code fetching data from server is not executed on main thread. Also if you are using a braodcast receiver to do any logic that could take 10 seconds will lead to this issue.
Something else to check is the exception handling in the code fetching data from server.
You can take a look at the following page also:
https://developer.android.com/training/articles/perf-anr.html#anr

Related

How to handle :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 10 seconds errors?

We're seeing a number of TimeoutExceptions in GcWatcher.finalize, BinderProxy.finalize, and PlainSocketImpl.finalize. 90+% of them happen on Android 4.3. We're getting reports of this from Crittercism from users out in the field.
The error is a variation of: "com.android.internal.BinderInternal$GcWatcher.finalize() timed out after 10 seconds"
java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 10 seconds
at android.os.BinderProxy.destroy(Native Method)
at android.os.BinderProxy.finalize(Binder.java:459)
at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:187)
at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:170)
at java.lang.Thread.run(Thread.java:841)
So far we haven't had any luck reproducing the problem in house or figuring out what might have caused it.
Any ideas what can cause this?
Any idea how to debug this and find out which part of the app causes this?
Anything that sheds light on the issue helps.
More Stacktraces:
1 android.os.BinderProxy.destroy
2 android.os.BinderProxy.finalize Binder.java, line 482
3 java.lang.Daemons$FinalizerDaemon.doFinalize Daemons.java, line 187
4 java.lang.Daemons$FinalizerDaemon.run Daemons.java, line 170
5 java.lang.Thread.run Thread.java, line 841
2
1 java.lang.Object.wait
2 java.lang.Object.wait Object.java, line 401
3 java.lang.ref.ReferenceQueue.remove ReferenceQueue.java, line 102
4 java.lang.ref.ReferenceQueue.remove ReferenceQueue.java, line 73
5 java.lang.Daemons$FinalizerDaemon.run Daemons.java, line 170
6 java.lang.Thread.run
3
1 java.util.HashMap.newKeyIterator HashMap.java, line 907
2 java.util.HashMap$KeySet.iterator HashMap.java, line 913
3 java.util.HashSet.iterator HashSet.java, line 161
4 java.util.concurrent.ThreadPoolExecutor.interruptIdleWorkers ThreadPoolExecutor.java, line 755
5 java.util.concurrent.ThreadPoolExecutor.interruptIdleWorkers ThreadPoolExecutor.java, line 778
6 java.util.concurrent.ThreadPoolExecutor.shutdown ThreadPoolExecutor.java, line 1357
7 java.util.concurrent.ThreadPoolExecutor.finalize ThreadPoolExecutor.java, line 1443
8 java.lang.Daemons$FinalizerDaemon.doFinalize Daemons.java, line 187
9 java.lang.Daemons$FinalizerDaemon.run Daemons.java, line 170
10 java.lang.Thread.run
4
1 com.android.internal.os.BinderInternal$GcWatcher.finalize BinderInternal.java, line 47
2 java.lang.Daemons$FinalizerDaemon.doFinalize Daemons.java, line 187
3 java.lang.Daemons$FinalizerDaemon.run Daemons.java, line 170
4 java.lang.Thread.run

Full disclosure - I'm the author of the previously mentioned talk in TLV DroidCon.
I had a chance to examine this issue across many Android applications, and discuss it with other developers who encountered it - and we all got to the same point: this issue cannot be avoided, only minimized.
I took a closer look at the default implementation of the Android Garbage collector code, to understand better why this exception is thrown and on what could be the possible causes. I even found a possible root cause during experimentation.
The root of the problem is at the point a device "Goes to Sleep" for a while - this means that the OS has decided to lower the battery consumption by stopping most User Land processes for a while, and turning Screen off, reducing CPU cycles, etc. The way this is done - is on a Linux system level where the processes are Paused mid run. This can happen at any time during normal Application execution, but it will stop at a Native system call, as the context switching is done on the kernel level. So - this is where the Dalvik GC joins the story.
The Dalvik GC code (as implemented in the Dalvik project in the AOSP site) is not a complicated piece of code. The basic way it work is covered in my DroidCon slides. What I did not cover is the basic GC loop - at the point where the collector has a list of Objects to finalize (and destroy). The loop logic at the base can be simplified like this:
take starting_timestamp,
remove object for list of objects to release,
release object - finalize() and call native destroy() if required,
take end_timestamp,
calculate (end_timestamp - starting_timestamp) and compare against a hard coded timeout value of 10 seconds,
if timeout has reached - throw the java.util.concurrent.TimeoutException and kill the process.
Now consider the following scenario:
Application runs along doing its thing.
This is not a user facing application, it runs in the background.
During this background operation, objects are created, used and need to be collected to release memory.
Application does not bother with a WakeLock - as this will affect the battery adversely, and seems unnecessary.
This means the Application will invoke the GC from time to time.
Normally the GC runs is completed without a hitch.
Sometimes (very rarely) the system will decide to sleep in the middle of the GC run.
This will happen if you run your application long enough, and monitor the Dalvik memory logs closely.
Now - consider the timestamp logic of the basic GC loop - it is possible for the device to start the run, take a start_stamp, and go to sleep at the destroy() native call on a system object.
When it wakes up and resumes the run, the destroy() will finish, and the next end_stamp will be the time it took the destroy() call + the sleep time.
If the sleep time was long (more than 10 seconds), the java.util.concurrent.TimeoutException will be thrown.
I have seen this in the graphs generated from the analysis python script - for Android System Applications, not just my own monitored apps.
Collect enough logs and you will eventually see it.
Bottom line:
The issue cannot be avoided - you will encounter it if your app runs in the background.
You can mitigate by taking a WakeLock, and prevent the device from sleeping, but that is a different story altogether, and a new headache, and maybe another talk in another con.
You can minimize the problem by reducing GC calls - making the scenario less likely (tips are in the slides).
I have not yet had the chance to go over the Dalvik 2 (a.k.a ART) GC code - which boasts a new Generational Compacting feature, or performed any experiments on an Android Lollipop.
Added 7/5/2015:
After reviewing the Crash reports aggregation for this crash type, it looks like these crashes from version 5.0+ of Android OS (Lollipop with ART) only account for 0.5% of this crash type. This means that the ART GC changes has reduced the frequency of these crashes.
Added 6/1/2016:
Looks like the Android project has added a lot of info on how the GC works in Dalvik 2.0 (a.k.a ART).
You can read about it here - Debugging ART Garbage Collection.
It also discusses some tools to get information on the GC behavior for your app.
Sending a SIGQUIT to your app process will essentially cause an ANR, and dump the application state to a log file for analysis.

We see this constantly, all over our app, using Crashlytics. The crash usually happens way down in platform code. A small sampling:
android.database.CursorWindow.finalize() timed out after 10 seconds
java.util.regex.Matcher.finalize() timed out after 10 seconds
android.graphics.Bitmap$BitmapFinalizer.finalize() timed out after 10 seconds
org.apache.http.impl.conn.SingleClientConnManager.finalize() timed out after 10 seconds
java.util.concurrent.ThreadPoolExecutor.finalize() timed out after 10 seconds
android.os.BinderProxy.finalize() timed out after 10 seconds
android.graphics.Path.finalize() timed out after 10 seconds
The devices on which this happens are overwhelmingly (but not exclusively) devices manufactured by Samsung. That could just mean that most of our users are using Samsung devices; alternately it could indicate a problem with Samsung devices. I'm not really sure.
I suppose this doesn't really answer your questions, but I just wanted to reinforce that this seems quite common, and is not specific to your application.

I found some slides about this issue.
http://de.slideshare.net/DroidConTLV/android-crash-analysis-and-the-dalvik-garbage-collector-tools-and-tips
In this slides the author tells that it seems to be a problem with GC, if there are a lot of objects or huge objects in heap. The slide also include a reference to a sample app and a python script to analyze this issue.
https://github.com/oba2cat3/GCTest
https://github.com/oba2cat3/logcat2memorygraph
Furthermore I found a hint in comment #3 on this side: https://code.google.com/p/android/issues/detail?id=53418#c3

Here is an effective solution from didi to solve this problem, Since this bug is very common and difficult to find the cause, It looks more like a system problem, Why can't we ignore it directly？Of course we can ignore it, Here is the sample code:
final Thread.UncaughtExceptionHandler defaultUncaughtExceptionHandler =
Thread.getDefaultUncaughtExceptionHandler();
Thread.setDefaultUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
#Override
public void uncaughtException(Thread t, Throwable e) {
if (t.getName().equals("FinalizerWatchdogDaemon") && e instanceof TimeoutException) {
} else {
defaultUncaughtExceptionHandler.uncaughtException(t, e);
}
}
});
By setting a special default uncaught exception handler, application can change the way in which uncaught exceptions are handled for those threads that would already accept whatever default behavior the system provided. When an uncaught TimeoutException is thrown from a thread named FinalizerWatchdogDaemon, this special handler will block the handler chain, the system handler will not be called, so crash will be avoided.
Through practice, no other bad effects were found. The GC system is still working, timeouts are alleviated as CPU usage decreases.
For more details see: https://mp.weixin.qq.com/s/uFcFYO2GtWWiblotem2bGg

We solved the problem by stopping the FinalizerWatchdogDaemon.
public static void fix() {
try {
Class clazz = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");
Method method = clazz.getSuperclass().getDeclaredMethod("stop");
method.setAccessible(true);
Field field = clazz.getDeclaredField("INSTANCE");
field.setAccessible(true);
method.invoke(field.get(null));
}
catch (Throwable e) {
e.printStackTrace();
}
}
You can call the method in Application's lifecycle, like attachBaseContext().
For the same reason, you also can specific the phone's manufacture to fix the problem, it's up to you.

Broadcast Receivers timeout after 10 seconds. Possibly your doing an asynchronous call (wrong) from a broadcast receiver and 4.3 actually detects it.

One thing which is invariably true is that at this time, the device would be suffocating for some memory (which is usually the reason for GC to most likely get triggered).
As mentioned by almost all authors earlier, this issue surfaces when Android tries to run GC while the app is in background. In most of the cases where we observed it, user paused the app by locking their screen.
This might also indicate memory leak somewhere in the application, or the device being too loaded already.
So the only legitimate way to minimize it is:
to ensure there are no memory leaks, and
to reduce the memory footprint of the app in general.

try {
Class<?> c = Class.forName("java.lang.Daemons");
Field maxField = c.getDeclaredField("MAX_FINALIZE_NANOS");
maxField.setAccessible(true);
maxField.set(null, Long.MAX_VALUE);
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (NoSuchFieldException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
}

The finalizeQueue may be too long
i think that java may require GC.SuppressFinalize() & GC.ReRegisterForFinalize() to let user reduce the finalizedQueue length explicitly
if the JVM' source code is available, may implement these method ourself, such as android ROM maker

It seems like a Android Runtime bug. There seems to be finalizer that runs in its separate thread and calls finalize() method on objects if they are not in the current frame of the stacktrace.
For example following code(created to verify this issue) ended with the crash.
Let's have some cursor that do something in finalize method(e.g. SqlCipher ones, do close() which locks to the database that is currently in use)
private static class MyCur extends MatrixCursor {
public MyCur(String[] columnNames) {
super(columnNames);
}
#Override
protected void finalize() {
super.finalize();
try {
for (int i = 0; i < 1000; i++)
Thread.sleep(30);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
And we do some long running stuff having opened cursor:
for (int i = 0; i < 7; i++) {
new Thread(new Runnable() {
#Override
public void run() {
MyCur cur = null;
try {
cur = new MyCur(new String[]{});
longRun();
} finally {
cur.close();
}
}
private void longRun() {
try {
for (int i = 0; i < 1000; i++)
Thread.sleep(30);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}).start();
}
This causes following error:
FATAL EXCEPTION: FinalizerWatchdogDaemon
Process: la.la.land, PID: 29206
java.util.concurrent.TimeoutException: MyCur.finalize() timed out after 10 seconds
at java.lang.Thread.sleep(Native Method)
at java.lang.Thread.sleep(Thread.java:371)
at java.lang.Thread.sleep(Thread.java:313)
at MyCur.finalize(MessageList.java:1791)
at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:222)
at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:209)
at java.lang.Thread.run(Thread.java:762)
The production variant with SqlCipher is very similiar:
12-21 15:40:31.668: E/EH(32131): android.content.ContentResolver$CursorWrapperInner.finalize() timed out after 10 seconds
12-21 15:40:31.668: E/EH(32131): java.util.concurrent.TimeoutException: android.content.ContentResolver$CursorWrapperInner.finalize() timed out after 10 seconds
12-21 15:40:31.668: E/EH(32131): at java.lang.Object.wait(Native Method)
12-21 15:40:31.668: E/EH(32131): at java.lang.Thread.parkFor$(Thread.java:2128)
12-21 15:40:31.668: E/EH(32131): at sun.misc.Unsafe.park(Unsafe.java:325)
12-21 15:40:31.668: E/EH(32131): at java.util.concurrent.locks.LockSupport.park(LockSupport.java:161)
12-21 15:40:31.668: E/EH(32131): at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:840)
12-21 15:40:31.668: E/EH(32131): at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:873)
12-21 15:40:31.668: E/EH(32131): at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
12-21 15:40:31.668: E/EH(32131): at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:200)
12-21 15:40:31.668: E/EH(32131): at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
12-21 15:40:31.668: E/EH(32131): at net.sqlcipher.database.SQLiteDatabase.lock(SourceFile:518)
12-21 15:40:31.668: E/EH(32131): at net.sqlcipher.database.SQLiteProgram.close(SourceFile:294)
12-21 15:40:31.668: E/EH(32131): at net.sqlcipher.database.SQLiteQuery.close(SourceFile:136)
12-21 15:40:31.668: E/EH(32131): at net.sqlcipher.database.SQLiteCursor.close(SourceFile:510)
12-21 15:40:31.668: E/EH(32131): at android.database.CursorWrapper.close(CursorWrapper.java:50)
12-21 15:40:31.668: E/EH(32131): at android.database.CursorWrapper.close(CursorWrapper.java:50)
12-21 15:40:31.668: E/EH(32131): at android.content.ContentResolver$CursorWrapperInner.close(ContentResolver.java:2746)
12-21 15:40:31.668: E/EH(32131): at android.content.ContentResolver$CursorWrapperInner.finalize(ContentResolver.java:2757)
12-21 15:40:31.668: E/EH(32131): at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:222)
12-21 15:40:31.668: E/EH(32131): at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:209)
12-21 15:40:31.668: E/EH(32131): at java.lang.Thread.run(Thread.java:762)
Resume: Close cursors ASAP. At least on Samsung S8 with Android 7 where the issue have been seen.

For classes that you create (ie. are not part of the Android) its possible to avoid the crash completely.
Any class that implements finalize() has some unavoidable probability of crashing as explained by #oba. So instead of using finalizers to perform cleanup, use a PhantomReferenceQueue.
For an example check out the implementation in React Native: https://github.com/facebook/react-native/blob/master/ReactAndroid/src/main/java/com/facebook/jni/DestructorThread.java

how to deal with this case: andorid epoll_wait return -1 and errno=4 used ndk

I am writing network communication program with Android ndk, using epoll.
I found the method ‘epoll_wait’ woken not very accurate
while(1){
struct epoll_event events[3];
log_string("epoll_wait start");//here will print start time
events_len = epoll_wait(_epoll_fd, events, 3, 20 * 1000);// wait 20 second,for test,I use pipe instead of socket,monitor a pipe EPOLLIN event
if (events_len <= 0) {
log_string("epoll_wait end events_len=%d,errno=%d", events_len, errno);//Normally,the events_len always is 0,and errno is 0
}
}
The above code runs on the PC(like Ubuntun PC) is very normal,as expected.
If it runs on Android Phone（use Android Service , separate thread to run） is as expected at first.
After some time,epoll_wait becomes not very accurate,sometimes got -1 and errno=4,sometimes waited very long.
So I only know that phenomenon, but do not know why.
Can you tell why and tell me the best practices for use android epoll?
thx

4 is EINTR, which means your app got a signal. This isn't really an error, just restart epoll.
Regarding "waited very long", does your app hold at least a partial wakelock?

Catch C++ signal in Android NDK and still print crash dump

I'm catching C++ signals so I print some debugging information. But doing so I am unable to get the crash dump that the NDK prints when you crash.
Can you manually print the crash dump. I see debuggerd.c (http://kobablog.wordpress.com/2011/05/12/debuggerd-of-android/) does the work but not sure how I would use it. Otherwise is there some way to rethrow the signal without my signal handler catching it and get the crash dump still.
Here is what I currently do:
struct sigaction psa, oldPsa;
void CESignalHandler::init() {
CELogI("Crash handler started");
psa.sa_sigaction = handleCrash;
psa.sa_flags = SA_SIGINFO;
//sigaction(SIGBUS, &psa, &oldPsa);
sigaction(SIGSEGV, &psa, &oldPsa);
//sigaction(SIGSYS, &psa, &oldPsa);
//sigaction(SIGFPE, &psa, &oldPsa);
//sigaction(SIGILL, &psa, &oldPsa);
//sigaction(SIGHUP, &psa, &oldPsa);
}
void CESignalHandler::handleCrash(int signalNumber, siginfo_t *sigInfo, void *context) {
static volatile sig_atomic_t fatal_error_in_progress = 0;
if (fatal_error_in_progress) //Stop a signal loop.
_exit(1);
fatal_error_in_progress = 1;
char* j;
asprintf(&j, "Crash Signal: %d, crashed on: %x, UID: %ld\n", signalNumber, (long) sigInfo->si_addr, (long) sigInfo->si_uid); //%x prints out the faulty memory address in hex
CELogE(j);
CESignalHandler::getStackTrace();
sigaction(signalNumber, &oldPsa, NULL);
}

You need to reset the signal handler(s) to the previous function, and then crash again -- ideally at the point where the signal was originally thrown. You can do this by passing a struct sigaction in as the 3rd argument to sigaction(), and using the saved value to restore the original behavior in the signal handler.
This can be a bit tricky because of the way debuggerd works (and because the way it works has changed over time). For a "hard" failure like a segmentation fault, returning from the signal handler just causes the same signal to be re-thrown. The Android crash handler uses this to its advantage by contacting debuggerd, waiting for it to attach with ptrace, and then resuming. debuggerd then gets to watch the process as it crashes (for the second time).
This doesn't work for "soft" failures, e.g. somebody manually sends your process a SIGABRT or gets a SIGPIPE from a write(). If the signal handler contacts debuggerd and resumes, the process just clears the signal and continues on, leaving debuggerd to wait indefinitely for a second crash that never happens. This was partially fixed a couple of releases back; now the debug code re-issues the signal itself (which doesn't actually do anything until the signal handler returns, because the signal is blocked while the handler runs). This usually works, and when it doesn't, debuggerd will time out and drop the connection.
So. If you receive a segmentation fault or bus error, you can just restore the original signal handler and then return from yours, and when the process crashes again the debuggerd handler will deal with it. If somebody sent you a SIGHUP, you should handle it entirely yourself, because debuggerd doesn't care about that signal at all.
Things get weird with SIGFPE. This is a "soft" failure, because most ARM CPUs don't have a hardware integer divide instruction, and the signal is actually being sent explicitly from the libgcc __div0 function. You can restore the signal handler, and then re-send the signal yourself; but depending on what version of Android you're running you might have to send it twice. Ideally you'd like to be doing this from the code that encountered the arithmetic problem, rather than the signal handler, but that's tricky unless you can replace __div0. You would need to send the signal with tgkill(), not kill(), as the latter will result in the signal being sent to the main thread of the process, which would cause debuggerd to dump the stack for the wrong thread.
You might be tempted to copy the handler out of bionic/linker/debugger.cpp, but that's a bad idea -- the protocol used to communicate between bionic and debuggerd has changed in the past and will likely change again.

You need call oldPsa manually like this
oldPsa(signalNumber, sigInfo, context)

How to monitor and log what module holds locks in linux kernel?

Is it possible to monitor what module holds a lock in linux kernel?
I mean how can I know who locks a spin_lock,semaphore, mutex etc.
for user space:
Linux: How can I find the thread which holds a particular lock?

Normally, when a soft-lockup is detected, the kernel will print the current stack trace. For example:
INFO: task bdi-default:19 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bdi-default D 0000000000000001 0 19 2 0x00000000
ffff88013cd7fc60 0000000000000046 ffff88013cd7fc10 ffffffff810096f0
ffff88013806f4f8 0000000000000000 0000000000d7fc20 ffff880028313b40
ffff88013cd7b0b8 ffff88013cd7ffd8 000000000000f4e8 ffff88013cd7b0b8
Call Trace:
[<ffffffff810096f0>] ? __switch_to+0xd0/0x320
[<ffffffff814eca40>] ? thread_return+0x4e/0x77e
[<ffffffff814ed8c5>] schedule_timeout+0x215/0x2e0
[<ffffffff814ed543>] wait_for_common+0x123/0x180
[<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
[<ffffffff814ed65d>] wait_for_completion+0x1d/0x20
[<ffffffff810909f9>] kthread_create+0x99/0x120
[<ffffffff81134d40>] ? bdi_start_fn+0x0/0x100
[<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff81134bb7>] bdi_forker_task+0x187/0x310
[<ffffffff81134a30>] ? bdi_forker_task+0x0/0x310
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
If the call trace doesn't provide enough information, try to enable lockup detection in the kernel (require kernel re-compilation)
make menuconfig
\--> Kernel Hacking
\--> Detect Soft Lockups
Then the call trace will show more information starting with:
BUG: soft lockup detected on CPU#0!
To see the call-trace, check dmesg, or you better use serial console to catch all kernel log (in some cases, due to soft lockup the kernel may hang w/o spitting all the logs to the file, but with serial control - or other alternatives such as netconsole etc - you can catch the logs).

Android HTTP Multipart POST - Locks WiFi and leads to phone crash

I'm trying to post a video file to our server and monitor it's progress. I followed the steps outlined in Can't grab progress on http POST file upload (Android) and that worked great, but with larger files my program will hang while writing to the output socket and subsequently cause my phone to lock the WiFi so I can't turn it on/off and then cause it to crash (after a short period).
So I attempted to write my own HTTPClient and it works, but also with intermittent success, still falling victim to the random crashes of the method outlined above. It seems this only occurs on files > 5MB, but I've had it die around 1.3MB and I've even had it successfully transfer a 13MB file. The fact that it's so random and sporadic is infuriating but I'm convinced there's some reason it's happening.
Here's my connection code:
socket.connect(new InetSocketAddress(host, port));
socket.setSendBufferSize(1024 * 65);
int bytesSent = 0;
PrintStream out = new PrintStream(socket.getOutputStream());
out.print(headersBuffer);
out.print(bodyBuffer);
bytesSent += headersBuffer.length() + headersBuffer.length();
byte[] bytes = new byte[1024 * 65];
int size;
while ((size = fileStream.read(bytes)) > 0) {
mListener.transferred(bytesSent);
Log.i(TAG, "bytes sent: " + bytesSent);
bytesSent += size;
out.write(bytes, 0, size); // Random freezes (/blocking?) on this line
out.flush();
}
Log.i(TAG, "Made it!");
out.print(lastBoundary);
out.flush();
I've used the debugger to see where it's getting to in the stack when the write just seems to block and it's the OSNetworkSystem.writeSocketImpl() function. That function just never returns...
So my next thought was - if the socket will just sit there, perhaps I can interrupt it and force it to close so at least the phone doesn't crash and the user can retry... I read up on force closing sockets in Android here (since it seems there are some problems): http://code.google.com/p/android/issues/detail?can=2&q=7933&colspec=ID%20Type%20Status%20Owner%20Summary%20Stars&id=7933
Basically what I did was create a Listener thread that looks at how many bytes have been transferred every 500ms and if there hasn't been a change, attempt to force close the socket by means of
socket.shutdownOutput();
socket.close();
However the socket returns that it is closed and everything proceeds to fail as outlined above.
Here's the general sequence of events in Logcat:
12-21 14:25:26.802 2234 2340 V UploadService: Bytes transferred: 5959800 of: 13191823
12-21 14:25:26.802 2234 2340 V UploadService: Bytes transferred: 5963896 of: 13191823
12-21 14:25:26.802 2234 2340 V UploadService: Bytes transferred: 5967992 of: 13191823
12-21 14:26:00.693 1262 1270 D WifiService: acquireWifiLockLocked: WifiLock{NetworkLocationProvider type=2 binder=android.os.Binder#45b48958}
12-21 14:26:11.083 1262 1289 D WifiHW : 'DRIVER LINKSPEED' command timed out.
12-21 14:26:21.130 1262 1500 D WifiHW : 'AP_SCAN 2' command timed out.
12-21 14:26:31.177 1262 1500 D WifiHW : 'SCAN' command timed out.
And after a few minutes the really bad stuff starts happening and the phone crashes!
Please help! Thank you.
EDIT: Works great over 3G - I'm going to try at home and see if its some sort of router issue. However - how can I catch this problem and prevent the phone from crashing?

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.