RenderScript blocking function invokation

RenderScript blocking function invokation - android

I am new to RenderScript and still have not so good idea on blocking/non-blocking nature of the calls from Java layer. The general question is: which situations block the code and allow RenderScript to finish. Particularly:
From Java I invoked a kernel using forEach_kernel() and that was not blocking - I had to add an extra Allocation.copyTo() so that I could use the result. Was there another way?
I read somewhere that if there are 2 kernels then calling second will block until the first one will finish. What conditions result in this - maybe only when working on the same allocation?
Will the invokable functions block a) each other, b) kernel? Particularly, I have a custom initializer invokable function which I need to prepare some data which will be later used by the kernel. This preparation might take some time so I would like to know if it is dangerous to call in Java script.invoke_somefunc() and then immediately call script.forEach_kernel()?

1) You could use rs.finish() to make sure you wait for the kernel to finish. Kernel execution is indeed asynchronous in RS.
2) We only allow one kernel to execute at a time (ignoring ScriptGroup, where you have a DAG of kernels, and thus maybe some additional room for optimizations). In this case, your second kernel won't start running until the first kernel completes.
3) Invokable functions (i.e. things you run with invoke_*() from Java) are not asynchronous. You will block until they complete on the Java side. Thus, they will block each other, or kernels. If you have a kernel followed by an invoke, you will asynchronously start the kernel, but the invoke won't begin until the kernel finishes. You will then be waiting for the invoke to finish as well.
One more detail. If your initializer doesn't require parameters, you can put it in an actual "void init(void)" function. Those get run once when the ScriptC is created.

My experiments showed that even though the functions are invoked asynchronously on Java level, they are executed one after another in RenderScript. So basically having:
script.invoke_somefunc();
script.forEach_kernel();
alloc.copyTo(); // or rs.finish();
will return immediately from first 2 lines but on RenderScript level kernel will not start until somefunc is finished. This was not so obvious from the documentation. The third line will block until everything is finished.

Related

UI blocking loops behaviours differ( Oreo vs Mashmallow)

I have a small Android application which does a server call to post some User data to a server.
Following is the code :
private boolean completed = false;
public String postData( Data data){
new Thread(new Runnable() {
#Override
public void run() {
try{
String response = callApi(data);
completed = true;
}catch(Exception e){
Log.e("API Error",e.getMessage());
completed = true;
return;
}
}
}).start();
while(!completed){
// Log.i("Inside loop","yes");
}
return response.toString();
}
The above method calls the API to post data and returns the response received which works fine.
The loop at the bottom is a UI blocking loop which blocks the UI until a response is received or an error.
The problem :
I tried the same code for Marshmallow and Oreo device and the results were different.
For Marshmallow : Things moved in line with my expectation. :)
For Oreo (8.1.0) :
The very first API call works good enough after I open the App. However, the subsequent API calls after, cause the UI to block forever although an Error or Response is received from the Server(verified by logging and debugging).
However, on setting breakpoints(running in Debug mode) the App moves with much less trouble.
It seems the system is unable to exit the UI blocking loop although the condition is met.
The second behavior which was noticed is when I log a message in the UI blocking thread, the System is able to exit the loop and return from the Method though the API response is not logged.
Could someone help understand such inconsistency across these two flavors of Android and what could be the change introduced causing such a behavior for Oreo but not for Marshmallow?
Any insight would be extremely helpful.

It's more likely to be differences in the processor cache implementation in the two different hardware devices you're using. Probably not the JVM at all.
Memory consistency is a pretty complicated topic, I recommend checking out a tutorial like this for a more in-depth treatment. Also see this java memory model explainer for details on the guarantees that the JVM will provide, irrespective of your hardware.
I'll explain a hypothetical scenario in which the behavior you've observed could happen, without knowing the specific details of your chipset:
HYPOTHETICAL SCENARIO
Two threads: Your "UI thread" (let's say it's running on core 1), and the "background thread" (core 2). Your variable, completed, is assigned a single, fixed memory location at compile time (assume that we have dereferenced this, etc., and we've established what that location is). completed is represented by a single byte, initial value of "0".
The UI thread, on core 1, quickly reaches the busy-wait loop. The first time it tries to read completed, there is a "cache miss". Thus the request goes through the cache, and reads completed (along with the other 31 bytes in the cache line) out of main memory. Now that the cache line is in core 1's L1 cache, it reads the value, and it finds that it is "0". (Cores are not connected directly to main memory; they can only access it via their cache.) So the busy-wait continues; core 1 requests the same memory location, completed, again and again, but instead of a cache miss, L1 is now able to satisfy each request, and need no longer communicate with main memory.
Meanwhile, on core 2, the background thread is working to complete the API call. Eventually it finishes, and attempts to write a "1" to that same memory location, completed. Again, there is a cache miss, and the same sort of thing happens. Core 2 writes a "1" into appropriate location in its own L1 cache. But that cache line doesn't necessarily get written back to main memory yet. Even if it did, core 1 isn't referencing main memory anyway, so it wouldn't see the change. Core 2 then completes the thread, returns, and goes off to do work someplace else.
(By the time core 2 is assigned to a different process, its cache has probably been synchronized to main memory, and flushed. So, the "1" does make it back to main memory. Not that that makes any difference to core 1, which continues to run exclusively from its L1 cache.)
And things continue in this way, until something happens to suggest to core 1's cache that it is dirty, and it needs to refresh. As I mentioned in the comments, this could be a fence occurring as part of a System.out.println() call, debugger entry, etc. Naturally, if you had used a synchronized block, the compiler would've placed a fence in your own code.
TAKEAWAYS
...and that's why you always protect accesses to shared variables with a synchronized block! (So you don't have to spend days reading processor manuals, trying to understand the details of the memory model on the particular hardware you are using, just to share a byte of information between two threads.) A volatile keyword will also solve the problem, but see some of the links in the Jenkov article for scenarios in which this is insufficient.

How does Xposed Framework hook methods in Android

I am going through Xposed framework in Android. Specifically reading blog - http://d3adend.org/blog/?p=589 for potential countermeasures and have couple of question on those line.
So when we hook a method using Xposed , framework makes that method as native and executes the code it wants to hook. So how is that in stacktrace original method is called?
com.example.hookdetection.DoStuff->getSecret //This one
de.robv.android.xposed.XposedBridge->invokeOriginalMethodNative
de.robv.android.xposed.XposedBridge->handleHookedMethod
com.example.hookdetection.DoStuff->getSecret //This one again
com.example.hookdetection.MainActivity->onCreate
android.app.Activity->performCreate
android.app.Instrumentation->callActivityOnCreate
android.app.ActivityThread->performLaunchActivity
android.app.ActivityThread->handleLaunchActivity
android.app.ActivityThread->access$800
android.app.ActivityThread$H->handleMessage
android.os.Handler->dispatchMessage
android.os.Looper->loop
android.app.ActivityThread->main
java.lang.reflect.Method->invokeNative
java.lang.reflect.Method->invoke
com.android.internal.os.ZygoteInit$MethodAndArgsCaller->run
com.android.internal.os.ZygoteInit->main
de.robv.android.xposed.XposedBridge->main
dalvik.system.NativeStart->main
Also why does it come twice in the stacktrace. I want to understand the order in which they are executed.
Is the actual method even run? Since the hooked method code executes it would not ideally execute the original method code. So how can we possible add a stracktrace detection mechanism in the same method knowing it would be replaced.

Xposed inner workings aren't easy to understand if you aren't comfortable with low level code and android kernel. To make it short, when you open an app on your Android device, there is a master process called Zygote that will spawn it as its child process.
The purpose of Xposed is to be able to control Zygote and detect whenever a process is about to be spawned, so that someone is able to hook methods by replacing their definitions before any calls are made to them.
You have a lot of control by using Xposed, you can replace the entire method body, so the original code never get called or you can use beforeCall and afterCall hooks which is basically an usage of the trampoline technique (A C++ example below)
As you can see when a method is called it doesn't directly go to the original code but to an injected code block where someone can do anything he wants (Dump, Change parameters, etc) then it will jump back to the genuine code. You can also do this after the genuine code, so you get the method output. Xposed implements this by using beforeHookedMethod and afterHookedMethod.
Adding a stacktrace detection mechanism won't help at all. You will call Java methods to get the actual stacktrace. It can be defeated easily by hooking the getStacktrace method, saving a valid genuine stacktrace, then when ever getStackTrace is called and contains Xposed methods, return the previously saved genuine stacktrace.
Your best bet is to rely on Native code to detect it, but even then any determined and experimented hacker with full device control can manage to defeat it eventually.

To add to above points when you call XposedHelpers.findAndHookMethod the callback can either be -
XC_MethodHook : Callback class for method hooks. Usually, anonymous subclasses of this class are created which override beforeHookedMethod(XC_MethodHook.MethodHookParam) and/or afterHookedMethod(XC_MethodHook.MethodHookParam).
XC_MethodReplacement : A special case of XC_MethodHook which completely replaces the original method.
1st one just provides you the hooks to execute methods before and after original method where as 2nd one replaces it completely. Eg - Xposed example on github
Couple of posts I have written -
Creating a new Xposed module in Android
Installing Xposed Framework on Android devices

Modifying Dalvik Virtual Machine to intercept methods of Application code

In my current implementation, I can only intercept the Method_Entry event of the some Class initialization methods, including:
*.<init> or *.<cinit>
* stands for any Class
All the methods written in Java applications are missing.
Currently, I have inserted "fprintf()" in the following places:
stack.cpp: dvmCallMethod()
stack.cpp: dvmCallMethodV()
stack.cpp: dvmCallMethodA()
stack.cpp: dvmInvokeMethod()
Interp.cpp: dvmInterpret()
Mterp.cpp: dvmMterpStd()
When these places of DVM are executed, I will print a message in my log file. However, only the Class initialization functions has triggered my println() code. In other words, it looks like that the execution of application methods does not go through the above places of DVM. I don't know which part of DVM is responsible for method execution of applications. Can anyone give me a clue?

The easiest way to figure out how things work is to look at how the method profiling works. Profiling adds an entry to a log file every time a method is called. The key file is dalvik/vm/Profile.h, which defines macros like TRACE_METHOD_ENTER. (In gingerbread, this was all you needed to look for. The situation changed quite a bit in ICS, when the interaction between debugging, profiling, and JIT compilation got reworked. And KitKat added the "sampling" profiler into the mix. So it's a bit more twisty now, and there are some other functions to be aware of, like dvmFastMethodTraceEnter().)
The entry points you've identified in your question are for reflection and calls in and out of native code. Calls between interpreted code are handled by updating the stack and program counter, and just continuing to loop through the interpreter. You can see this at line 3928 in the portable interpreter.
The non-obvious part is the FINISH() macro, defined on line 415. This calls into dvmCheckBefore(), line 1692 in Interp.cpp. This function checks the subMode field to see if there is anything interesting to do; you can find the various meanings in the definition, line 50 in InterpState.h. In short, flags are used for various profiling, debugging, and JIT compilation features.
You can see a subMode check on line 3916 in the portable interpreter, in the method invocation handling. It calls into dvmReportInvoke(), over in Interp.cpp, which invokes the TRACE_METHOD_ENTER macro.
If you're just trying to have something happen every time any method is invoked, you should probably just wire it into the profiling system, as that's already doing what you want. If you don't need the method profiling features, just replace them with your code.

Android application and time-consuming native code

I am working on an image processing Android application. Suppose you have a C++ singleton object that provides some time-consuming functions and allocates its own memory. Furhtermore, the C++ library will provide some other functions that will do some time-consuming work as well. This functions will be called by the singleton object. They can allocate their own temporary memory (that will be freed on function termination) and need to exchange data with the singleton object. The workflow is the following:
the native C++ library is loaded, the singleton object created (it will allocate memory and load data from the asset directory).
the user, using the application interface, select an image and loads it
the image is passed to the singleton object that will computes some informations
the user can request a particular image processing algorithm, the singleton object is asked to call the corresponing function
repeat from 4 or go to 2 if the user load another image (the singleton object will be resetted (the memory allocated on step 1 is retained until the application is tereminated)).
Step 2 and 3 are the most time consuming part of the app. I would like the user to be able to stop the current processing if too much time is passed and the application to remain responsive during the time consuming processing algorithms. The most simple way to do this app is to call the native functions and wait the, but this will probably block the UI. Another way is to design those functions to check a flag every N processed pixels to know if the function must stop (this would allow me to free memory when it happens). A third option could be to use java threads, but how?

You will have to run the time consuming task off the UI thread. You could do this with a native thread, but it would be simpler to call the native function from a background thread in java - there are several ways you can do that, such as an async task, etc which you can read about.
When you start the time consuming operation, you'll want the UI to display some sort of busy indicator to the user. The UI thread will have to remain responsive (ie, the user can 'back' or 'home') but you can disable most of your other controls if you wish.
Your native operation in the background thread would, as you suggested, periodically check a stop request flag. You will probably find it easiest to make that a native flag and set it with another (brief) native function called from the UI thread; there's the option of making it a java flag and calling java from C to check it, but that seems more complicated.
If your processing is going to be especially lengthy, arguably you should do the work not only in the background, but in the context of an Android service rather than that of an activity. To a first approximation, native code will not care about the difference, however there are potential implications for what happens if the activity goes to the background during processing - if the work is being done in a service (or more specifically, if the process contains a service which is active), Android will try to let it keep running if possible. In contrast, if the process only has an activity which is now not active because something else is in the foreground, Android is more likely to kill it or throttle its available CPU. Ultimately, whatever you do your native code will need to deal with the possibility of its process being killed before the work is done - ie, you have to be able to recover from such a state when a new process is created as the user returns your activity to the foreground. Having your flag also able to notify the native code of an onDestroy() call as an alert to save its work could be a help, but it will still need to be able to recover (at least cleanly re-do) from being killed without the courtesy of that notification.

How does the Dalvik VM save and restore its registers between method calls?

Semantically, the Dalvik VM has a fresh set of registers for each method, and does not have instructions to access the call stack. But in terms of its implementation, the registers should be saved somehow on method calls and restored on method returns. How does the (Google's implementation of) Dalvik do this?

The registers that dalvik bytecode refers to are not machine registers at all, but they are actually locations on the call stack. Whenever you call into a method, dalvik allocates enough memory on that method's stack frame to hold all the registers that that method needs.
Note that not all calculations will modify the value on the stack immediately, the vm obviously has to load the values into a machine register in order to do the calculations. The results may be kept in a machine register to be used later without immediately writing it back to the corresponding stack location, at the discretion of the VM. The values will be flushed back to the call stack if and when it is needed (i.e. when you call into another method, use various sync constructs, or it needs the register for another calculation, etc.).

Here is source repository for dalvik, you may walkthrough to findout implementation. android source

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.