Linux Kernel wait_for_completion_timeout not wakeup by complete

Linux Kernel wait_for_completion_timeout not wakeup by complete - android

I am working on a strange issue with the i2c-omap driver. I am not sure if the problem happens at other time or not, but it happens around 5% of the time I tried to power off the system.
During system power off, I write to some registers in the PMIC via I2C. In i2c-omap.c, I can see that the calling thread is waiting on wait_for_completion_timeout with a timeout value set to 1 second. And I can see the IRQ called "complete" (I added printk AFTER "complete"). However, after "complete" gets called, the wait_for_completion_timeout did not return. Instead, it takes up to 5 MINUTES before it returns. And the return value of wait_for_completion_timeout is positive indicating that there is no timeout. And the whole I2C transaction was successful.
In the meantime, I can see printk messages from other drivers. And the serial console still works. It is on Android, and if I use "top" I can see system_server is taking about 95% of the CPU. Killing system_server can make the wait_for_completion_timeout return immediately.
So my question is what could a user space app (system_server) do to make a kernel "wait_for_completion_timeout" not being wake up?
Thanks!

wait_for_completion_timeout only guarantees that the thread waiting on a condition would become "runnable" when either (i) completion happens or (ii) timeout expires.
After that it's the job of scheduler to schedule that thread and change it's state from "runnable" to "running". The thread itself(or the completion framework) is not responsible to make a thread runnable, that's the job of scheduler.
As you have pointed out, system_server is consuming 95% of cpu and therefore making it hard for the completion thread to get scheduled. That explains why the thread is not getting scheduled.

Well, I kind of figured it out.
In the CFS scheduling, in enqueue_entity, it does "vruntime += min_vruntime" in some condition, and in dequeue_entity it does the opposite under some condition. However, those are not always executed in pair. So under some unknown condition, when min_vruntime is pretty big, the vruntime can get pretty big, so the task would be put to the right side of the rbtree and not get scheduled for a long time.
I am not sure what is the best way to fix this from the root cause, what I did is a hack in enqueue_entity, if I found vruntime>min_vruntime and the function is called for WAKEUP, I always set vruntime=min_vruntime, thus the task would be put to the relatively left side of the rbtree.
The kernel version I am using is 2.6.37
Anyone has a suggestion on how this should be fixed in a better way?

Related

Can you indicate to Android the time to wait or tell it that the app is still "active" before displaying "App isn't responding? If so, how?

I'd like to know the code or configuration needed to set that.
In my app, there are some places where I'm willingly make the app to sleep for several seconds, as it's needed for some reasons, with a Thread.sleep(long millis) function.
Problem is that on some Android APIS, at least on 25 and 26, usually that system message pops up in few seconds, confusing the user and maybe even causing the application not to fulfill the needed operations that need to happen while that sleep is happening if the user ends the app, which might cause even malfunctioning of the application.
I'd like to find a way of either forcing Android to wait for a good time like, for example, 1 minute, or to make Android aware that it's not that app isn't responding, that is willingly on a Thread.sleep function.
Is there any way to do that?

I'd like to find a way of either forcing Android to wait for a good time like, for example, 1 minute, or to make Android aware that it's not that app isn't responding, that is willingly on a Thread.sleep function.
TL;DR there is none.
Android apps should at all times be able to yield their position in the foreground to other apps. It's up to the users if they want to wait while some lengthy download is taking place or if they prefer to do something else and come back later.
You can't execute Thread.sleep() on the UI thread for long because this would "freeze the UI".
An example: Users should be able to leave your app by pressing the BACK Button at any time they wish to. If your method is blocking the UI thread, Activity#onBackPressed() can't be executed so the users can't quit.
What can you do? Move the heavy work to another thread (using e.g. AsyncTask or IntentService or some plain worker thread) and show some type of progress indicator to the users if necessary. You can/ should also toggle visibility or enabled state of Buttons etc. if required to avoid clicks which can't be processed at that point in time.

I think you have an implementation problem. The system message, known as ANR (Application Not Responding) occurs when the application cannot respond to user inputs, this may be caused by Ui thread blocking and that may be your case.
To avoid blocking the UI Thread just run your long time operations asynchronously. There are many ways to do that. You could use AsyncTask, AsyncTaskLoader, Thread, RxJava... Here you have some links to help you with that:
https://developer.android.com/training/articles/perf-anr
https://google-developer-training.gitbooks.io/android-developer-fundamentals-course-concepts/content/en/Unit%203/71c_asynctask_and_asynctaskloader_md.html
http://www.vogella.com/tutorials/RxJava/article.html

Explain behavior of Unix sleep() function executed on Android

I am currently compiling and executing some C++ code on a rooted Android device. I use adb (adb shell). To compile my code, I don't use the NDK, but I cross-compile with CMake
I'm using the function sleep(seconds) of unistd.h.
I've experienced some curious behaviors with that function on Android: Basically, I have a for loop in which I std::cout something and then call sleep(x).
If I call sleep(1), the behavior is the one expected: The program waits 1 second, and then executes the next instructions.
If I call sleep(2), the behavior isn't the one expected. The program gets stuck on that instruction for ever.... until I hit a key on my PC keyboard (not the device's one), and then it gets stuck on the next sleep(2)... until I hit a key, etc...
This behavior happens only when the device screen is off. As soon as I click on the power button to turn the screen on, the program resumes and has the expected behavior.
N.B: The behavior is the same with usleep(useconds)
I have tried to see where the limit is between 1 and 2 seconds:
1.5s, 1.25s, 1.125s -> always stay blocked | 1.0625s -> ~50% chance of staying blocked.
Obviously, there is something that prevents a thread to wake up if it sleeps more than 1 seconds (at least 2).
So my question would be, does anyone have any idea of why this is happening, and has a detailed explanation of the process ?
Thank you !

Android puts applications in the background when they aren't doing any user interaction - unix sleep and java timers etc. won't wake them up. You have to use an android alarm or runnable postDelayed handler.

wait_event_interruptible_timeout always expires, even though wake_up event occurs in time

i have a question about what seems to be weird behavior of wait_event and wake_up on an Android embedded platform (Exynos5dual based) with a pre-emptive linux 3.0 kernel.
It does not happen on a normal SMP laptop with a non-preemptive kernel (any version)
We have a linux device driver with a classic sleeper/waker scenario and here's what happens:
T0: taskA:
if(!flag)
wait_event_interruptible_timeout(wq, flag==true, timeout=0.5sec)
T1: (after a few msec) taskB:
atomic set flag
wake_up_interruptible()
T2: (after timeout msec) taskA:
wait_event_interruptible_timeout expires (ret 0) instead of waking up at T1
All read and writes of flag are atomic, and have gone using from atomic bitops (kernel set/test bit), to volatile atomic_t, to using memory barriers for each read/write with atomic_t vars (according to this)
if TaskA actually starts waiting (wait_event_* kernel functions first check the condition so it may not always be the case), then it waits for the full timeout instead of getting woken up by taskB when the flag changes value and wake_up() is called.
We suspect that the two tasks occur on different cores. Core1 deep-sleeps after wait_event_..() and cannot be woken up by wake_up_interruptible() which occurs on Core2.
Does anyone know if this is true, or if something else is to blame?
NOTE: The issue seems to go away if we save the sleeper's task struct ptr and and then do wake_up_process(saved_ptr) before (and in addition to) wake_up_interruptible(). We find this less than optimal and wonder if there is a better way.

Possible states for native threads on Android?

What are all the possible thread states during execution for native (C/C++) threads on an Android device? Are they the same as the Java Thread States? Are they Linux threads? POSIX threads?
Not required, but bonus points for providing examples of what can cause a thread to enter each state.
Edit: As requested, here's the motivation:
I'm designing the interface for a sampling profiler that works with native C/C++ code on Android. The profiler reports will show thread states over time. I need to know what all the states are in order to a) know how many distinct states I will need to possibly visually differentiate, and b) design a color scheme that visually differentiates and groups the desirable states versus the undesirable states.

I've been told that native threads on Android are just lightweight processes. This agrees with what I've found for Linux in general. Quoting this wiki page:
A process (which includes a thread) on a Linux machine can be in any of the following states:
TASK_RUNNING - The process is either executing on a CPU or waiting to be executed.
TASK_INTERRUPTIBLE - The process is suspended (sleeping) until some condition becomes true. Raising a hardware interrupt, releasing a system resource the process is waiting for, or delivering a signal are examples of conditions that might wake up the process (put its state back to TASK_RUNNING). Typically blocking IO calls (disk/network) will result in the task being marked as TASK_INTERRUPTIBLE. As soon as the data it is waiting on is ready to be read an interrupt is raised by the device and the interrupt handler changes the state of the task to TASK_INTERRUPTIBLE. Also processes in idle mode (ie not performing any task) should be in this state.
TASK_UNINTERRUPTIBLE - Like TASK_INTERRUPTIBLE, except that delivering a signal to the sleeping process leaves its state unchanged. This process state is seldom used. It is valuable, however, under certain specific conditions in which a process must wait until a given event occurs without being interrupted. Ideally not too many tasks will be in this state.
For instance, this state may be used when a process opens a device file and the corresponding device driver starts probing for a corresponding hardware device. The device driver must not be interrupted until the probing is complete, or the hardware device could be left in an unpredictable state.
Atomic write operations may require a task to be marked as UNINTERRUPTIBLE
NFS access sometimes results in access processes being marked as UNINTERRUPTIBLE
reads/writes from/to disk can be marked thus for a fraction of a second
I/O following a page fault marks a process UNINTERRUPTIBLE
I/O to the same disk that is being accessed for page faults can result in a process marked as UNINTERRUPTIBLE
Programmers may mark a task as UNINTERRUPTIBLE instead of using INTERRUPTIBLE
TASK_STOPPED - Process execution has been stopped; the process enters this state after receiving a SIGSTOP, SIGTSTP, SIGTTIN, or SIGTTOU signal.
TASK_TRACED - Process execution has been stopped by a debugger.
EXIT_ZOMBIE - Process execution is terminated, but the parent process has not yet issued a wait4() or waitpid() system call. The OS will not clear zombie processes until the parent issues a wait()-like call.
EXIT_DEAD - The final state: the process is being removed by the system because the parent process has just issued a wait4() or waitpid() system call for it. Changing its state from EXIT_ZOMBIE to EXIT_DEAD avoids race conditions due to other threads of execution that execute wait()-like calls on the same process.
Edit: And yet the Dalvik VM Debug Monitor provides different states. From its documentation:
"thread state" must be one of:
1 - running (now executing or ready to do so)
2 - sleeping (in Thread.sleep())
3 - monitor (blocked on a monitor lock)
4 - waiting (in Object.wait())
5 - initializing
6 - starting
7 - native (executing native code)
8 - vmwait (waiting on a VM resource)
"suspended" [a separate flag in the data structure] will be 0 if the thread is running, 1 if not.

If you design a system app that has to work with threads in even more advanced way than usual app, I'd first start by examining what API is available on Android to access threads.
The answer is pthread = POSIX threads, with pthread.h header file, implemented in Bionic C library. So you have starting point for knowing what you can achieve.
Another thing is that Android doesn't implement full pthread interface, only subset needed for Android to run.
More on threads + Bionic here, and how they interact with Java and VM is described here. Also I feel that thread is actually a process, as my code uses setpriority(PRIO_PROCESS, gettid(), pr); to set new thread's priority - I don't recall where I got this info, but this works.
I assume that thread may be in running, finished or blocked (e.g. waiting for mutex) state, but that's my a bit limited knowledge since I never needed other thread state.
Now question is if your app can actually retrieve these states using available API in NDK, and if there're more states, if your users would be really interested to know.
Anyway, you may start by displaying possibly incomplete states of threads, and if your users really care, you'd learn about another states from users' feedback and requests.

Google:
Thread.State BLOCKED The thread is blocked and waiting for a lock.
Thread.State NEW The thread has been created, but has never been started.
Thread.State RUNNABLE The thread may be run.
Thread.State TERMINATED The thread has been terminated.
Thread.State TIMED_WAITING The thread is waiting for a specified amount of time.
Thread.State WAITING The thread is waiting.
These states are not very well explained - I don't see the difference between BLOCKED and WAITING, for example.
Interestingly, there is no 'RUNNING' state - do these devices ever do anything?

Application priority

An application that is running for a long time in foreground will acquire more priority in time?
I explain my problem. I ported a software for communication with a fixed infrastructure in Android. I'm making some tests. Each test makes 5 experiments (the mobile node sends some queries to the infrastructure and evalutes the number of query successful and the mean time) and the result of the test is the mean of the results of these experiments.
During the test the application is always in foreground.
In the experiments the result improve e. g. (10% 15% 30% 40% 55% of query ok).
I implemented the system as activity and not yet as service.
For the test the app aquire the locks SCREEN_DIM_WAKE_LOCK and WIFI_MODE_FULL.
Thanks

It will not get more priority and you shouldn't do that on the ui thread.
There are several issues:
User can dismiss the app and your important upload process will be paused/cancel. You can do the logic to resume after a dismiss but for this case it doesn't make sense.
When the user dismiss the app, it might get closed by the OS.
You might leave the screen without updates and if that happens you will get a Force Close.
AFAIK in the next versions of Android if you do net logic on the ui thread you will get a FC. Similars to Gingerbread's Strict Mode.
Use a Service and spawn a thread with max priority. I am not sure if setting max priority to a thread in Android will make any difference but try it.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.