Is memcpy() / mktime() thread-safe on iOS & Android? - android

I have a C library which I'm cross-compiling to use in Android & iOS apps.
It makes use of memcpy() and mktime() so I want to know if these functions are implicitly thread-safe when used in multi-threaded environments.
iOS apps compiled with modern Xcode and Android libraries compiled with modern Android NDK use a clang compiler which is LLVM-based.
I've reviewed the following questions, but have been unable to find a definitive answer:
Is memcpy process-safe?
Are functions in the C standard library thread safe?

POSIX requires of conforming implementations that all functions it standardizes be thread safe, with the exception of a relatively short list of functions. memcpy() and mktime() are both covered by POSIX, and neither is on the list of exceptions, so POSIX requires them to be thread safe (but read on).
Note well, however, that this is not a matter of the compiler used, but rather of the C library that supports your application. I recall Apple's C libraries being non-conforming in some areas. Nevertheless, there's nothing in particular about memcpy() and mktime() that makes them inherently risky from a thread safety perspective. That is, there's no reason to expect that they access any shared data, except any provided to them via their arguments.
And there's the rub. You can rely on memcpy() and mktime() not to, say, rely internally on static data, but POSIX's requirement for thread safety does not extend to working as documented in the face of data races you create through choice of arguments. Thus, for example, if two different threads call memcpy(), and the target region of one call overlaps either the source or target region of the other, then you need some flavor of synchronization between the threads.

The question if memcpy() is thread-safe might be discussible.
I would say that memcpy() is indeed thread-safe. It doesn't rely on a (global) state, which could be mangled up by multiple instances of memcpy() running. This, however, doesn't mean, that there is some magic preventing a memory area, which is concurrently the copy destination of multiple threads doing memcpy() gets badly mangled up, i.e. the copy process as a whole is not atomic. You would have to care yourself using mutexes to ensure atomicity.
mktime() is trivially threadsafe, since it doesn't use static buffers, use a global state or similar. The manpage mentions a few functions from that family being not threadsafe (those have corresponding *_r functions), but mktime() is not amongst those.

Related

c++ thread_local destructors with pthread destructors

I want to do some work after all C++ thread_local destructors called.
This is platform specific - Android, so I have access to pthreads.
The question is, when pthread_key_created destructors should be called, before or after C++ thread_local destructors? Or they can be interleaved?
I tested On Linux Mint and pthread destructors called after C++ 's.
bionic/pthread_exit.cpp currently has the same order:
void pthread_exit(void* return_value) {
// Call dtors for thread_local objects first.
__cxa_thread_finalize();
// Call the TLS destructors.
pthread_key_clean_all();
However, this is not documented behavior and you should not build something relying on it.
libstdc++ from GCC uses pthread_key_create in case the platform does not provide __cxa_thread_atexit_impl. In this case, C++ destructors run somewhere in the middle of the POSIX destructors.
To my knowledge, there is no standard which requires any particular behavior here because C++ does not know about POSIX and POSIX does not know about C++, so neither standard says what happens here. There are also some corner cases involving the resurrection of thread-local data during thread destruction which will vary among implementations. (A typical example is a per-thread logger object which is used to log from destructors of thread-local variables.)

Android JNI under the hood

I cannot find any references to a detailed explanation about how JNI works on Android in detail, so:
Since every Android application runs in its own process, with its own instance of the Dalvik/ART virtual machine, I think that the native code will be executed in the same process, am I right?
I read that when the VM invokes a function, it passes a JNIEnv pointer, a jobject pointer, and any Java arguments declared by the Java method.
But how is this made at assembly level (under the hood)?
I read that you can instantiate objects, call methods, and so on, like Reflection, using the functions provided by the JNIEnv. Therefore, my question is: have I a "direct" memory access to the VM or I have always to use the JNIEnv's functions?
The Android JVM is under Apache license, so the best detailed and precise description can be found in the form of source code. Note that there are two different JVMs: dalvik and art. Under the hood they are very different, to the extent that a user of JNI may consider special adaptations.
the native code will be executed in the same process
Exactly. Note that an Android app can run in more than one process, and also it can spawn child processes (normal Unix behavior). But JNI is not IPC.
how is this made at assembly level?
More or less, this is described in a related
question: What does a JVM have to do when calling a native method?
have I a "direct" memory access to the VM?
Yes, you have. There is no security barrier between your C code and the JVM. You can reverse engineer the data structures, and do whatever you like. The exact implementations of the JVM not only depend on the Android version, but may be modified without notice by the vendor, as long as the public API of the JVM (including JNI) is compatible. The chances that you will do something useful with direct memory access to JVM are minimal, but the risk that it will crash is very high.
Note that this is not a security issue: your C code is running in a separate process (with your Java code), and is subject to the same permissions restrictions as the Java code. It has no access to the private memory of other apps or procsses. Whatever you change in your instance of JVM will not effect VM that runs other apps.

How does AndFix patch methods?

I learned recently of an Android library AndFix which allows for live method patching. Now, as far as I know, Dalvik does not allow runtime manipulation of bytecode or dex.
Can someone provide a good explanation on how AndFix does live patching?
Looking at the sources, you can see the patch mechanism for Dalvik here. The dalvik_replaceMethod() function is modifying the internal Dalvik state, changing the Method struct to point to a replacement method.
It doesn't modify the DEX on disk or in memory, just routes the method calls to a replacement method. This approach is highly version-dependent, as changes to Method or the way methods work will break things. Dalvik hasn't changed much since mid-2011, which makes it easy, but if you look at the nearby "art" directory you can see different implementations for each major version of Android.

Is RenderScript local thread synchronization possible?

On a recent SO question, I explained how calling a RenderScript kernel multiple times will effectively force all threads to be globally synchronized between calls.
I am currently working with multiple convolutions applied in sequence to image data. Since the convolution algorithm requires reading surrounding pixel data of the input image, I have implemented a workflow where my own custom kernel is called multiple times -- to make sure that at every step, all data from the previous convolution is ready and available at the correct coordinates. This technique has worked great for me so far.
However, in my constant quest for optimization, I have noticed that there is much performance to be obtained by keeping intermediate values in local registers for a thread, instead of writing them back to the global memory allocation in between kernel calls. If I were able to chain these convolutions in such a way, things would run much quicker. The problem is obviously that accessing the registers of surrounding threads is not really possible. Furthermore, this would require threads to run in synch to make sure these intermediate values in between stages get calculated in the expected order.
In CUDA and OpenCL, these issues are very common, and are addressed by well-known barrier synchronization + shared memory tiling techniques, which in turn depend on the concept of CUDA thread blocks or OpenCL work groups. I believe these concepts are non-existent in RenderScript, as this issue is very much tied to the wildly different architectures between desktop-class GPU's and mobile SoC's.
So my obvious question here is, are such things possible in RenderScript? That is, better management of threads and possibly thread groups for quicker data sharing among them.
On the Google I/O 2013 RenderScript talk by Jason Sams and Tim Murray, it is discussed how Script Groups might be able to do some behind the scenes optimizations, such as cross-device parallelization, memory tiling, and kernel fusion; all this by analyzing at runtime the dependency DAG in the group, and either automatically creating allocations where needed or possibly optimizing them away. I'm assuming this last bit referes to fusing kernels so that they work off their own local data, kind of how I mentioned above keeping data in local registers and combining separate steps inside a single kernel.
All this seems very much in line with what I'm looking for, especially since my application is indeed a well-defined DAG of inter-dependent operations (for a Convolutional Neural Network). So if Script Groups are indeed a plausible mobile-centric alternative to these mechanisms, I'm wondering if there is any way of influencing how and where these optimizations happen. Or if not, how much can the runtime be trusted to make the correct inference from my data dependencies given the hardware its running on -- in the specific case of "surrounding" pixel data access of the convolutional algorithm.
I realize this might all still be work in pogress, and methods would be highly hardware dependent at this point. So if there is no straight solution for such matters at the present time -- I'd be very much willing to accept a speculative answer on how this kind of workflow might potentially be approached by RenderScript in future releases.
I'd be immensely grateful on some insight about this, as it would greatly affect the development direction of my own project going forward, not to mention there are surely many other people out there wondering how such general parallel computing tasks can be handled in RS.
Thank you very much!
As you've discovered, there's no way in RS to directly share data across threads. However, what you are describing can be done using a ScriptGroup. The catch is that each script in the group has to be unique, so you cannot feed your same script over and over. At least, not as it is written now. You could certainly put the "core" of your script in a RS header and include it from multiple kernels. The ScriptGroup allows you to have the output from one script become the input of another, or the output of one script becomes a global field in another. The documentation states that the kernel to kernel (output to input) is the more efficient use case. Using this approach, your synchronization issue would be resolved as the engine will execute the first script against the entire input data set before starting the second script, etc. The scripts themselves will be parallelized appropriately for the hardware (using either CPU or GPU/DSP). The engine will not have to pop back out to Java between scripts and can also manage the data allocations behind the scenes, if needed.
Something you may notice is the ScriptGroup utilizes Script.KernelID or Script.FieldID in order to identify the exact script or field in which to connect two kernels. Your custom scripts have these things auto-generated as long as you explicitly call out your kernel function using the RS compiler attribute pragma. Then you can call getKernelID_<name> (where 'name' is the kernel function name from your script) to get the kernel ID.

Dalvik VM & Java Memory Model (Concurrent programming on Android)

I am working on Android projects which involve the lot of concurrent programming and I am going to implement some custom inter-threads communication stuff (the one from java.util.concurent are not well suited for my purposes).
The concurrent programming is not easy in general but with Dalvik it seems to be even harder. To get the correct code you should know some specific things and that where problem arise with Dalvik. I just can't find a detailed documentation about the Dalvik VM. Most Android resources (even the developer.android.com focused on platform API and doesn't provide any deep information about some non-trivial (or low-level) things).
For example, to which edition of Java Language Specification the Dalvik VM is conform ? Depending of answer the treatment of volatile variables are different and affect the any concurrent code which use the volatile variables.
There are already some related questions:
Is Dalvik's memory model the same as Java's?
Double checked locking in Android
and some answers by fadden are very useful but I still want to get more detailed and complete understanding of matter in question.
So below a raw questions I am interesting in (I will update the list if necessary as answers for previous questions will arrive):
Where to find the details about the Dalvik VM which may provide the answers for questions below ?
To which edition of Java Language Specification the Dalvik VM is conform to ?
If answer to (2) is "third edition" then how complete the Dalviks's support of Java Memory Model defied in this specification ? And especially how complete the support for semantic of volatile variables ?
In the Double checked locking in Android the fadden provide the following comment:
Yup. With the addition of the "volatile" keyword, this will work on uniprocessor (all versions of Android) and SMP (3.0 "honeycomb" and later)
Does it mean that Samsung Galaxy SII which has the dual-core CPU but only Android 2.3 may execute the concurrent code incorrectly ? (of course Galaxy is only an example, the question is about of any multicore device with pre-Android 3.0 platform)
In the Is Dalvik's memory model the same as Java's? the fadden provide the answer with the following sentence:
No currently-shipping version of Dalvik is entirely correct with respect to JSR-133
Does it mean that any existing correct concurrent Java code may work incorrectly on any Android version released up to date of posting of this comment ?
Update#1: Answer to #gnat's comment (too long to be comment too)
#gnat post a comment:
#Alexey Dalvik does not conform to any JLS edition, because conformance requires passing JCK which is not an option for Dalvik. Does it mean that you even can't apply standard Java compiler because it conform to standard specification ? does that matter? if yes, how?
Well, my question was somehow ambiguous. What I actually meant is that JLS is not only the rules for Java compiler implementations but also an implicit guidelines for any JVM implementations. Indeed, JLS, for example, states that reading and writing of some types are atomic operations. It is not very interesting for compiler writer because read/write translated just into a single opcodes. But it is essential for any JVM implementation which should implement these opcodes properly. Now you should see what I am talking about. While Dalvik accept and execute the programs compiled with standard Java compiler there are no any guaranties that they are executed correctly (as you may expect) just because no one (except maybe Dalvik's developers) knows if all JLS's features used in the program are supported by Dalvik.
It is clear that JCK is not an option for Dalvik and it is Ok, but programmers really should know on which features of JLS they may rely when execute their code on Dalvik. But there is no any words about this in documentation. While you may expect that simplest operators like =, +, -, *, etc. are works as you expect what about non-trivial features like semantic of volatile variables (which is different in 2nd and 3rd editions of JLS)? And latter is not the most non-trivial things you may find in JLS and particular in Java Memory Model.
I haven't read your question completely,
but first of all do not use volatile, even opengles coders do not use it for different ui vs renderer threads.
Use volatile if and only if one thread writes (say to some class' static property)
and other reads, even then you have to synchronize, read this for some good ways to handle counts
How to synchronize a static variable among threads running different instances of a class in java?
always use synchronize
do not jump at large projects for such difficult topics like concurrent programming
go through android samples on games where they have discussed the concept of runnables,
handlers, and exchanging messages b/w threads (UI THREAD AND RENDERER THREAD).
I think you answered your own question, although you have given no details as to why the java.util.concurrent package does not fit your needs, most mobile apps simply use asynchronous IO and a minimum of threading. These devices aren't super computers capable serious distributed processing so I have a little difficulty understanding why java.util.concurrent wont meet your needs.
Secondly, if you have questions about the Dalvik implementation and whether it conforms to the JLS (it doesnt), it would seem to reason that the only reliable support for threading mechanisms would be those which the language defines - java.util.concurrent, runnable and thread local storage.
Hand rolling anything outside of the built in language support is only asking for trouble, and as your question suggests will likely not be supported in a consistent manner on Dalvik.
As always when you think you can do threading better than the guys who wrote Java, think again.
<copied from comment> Dalvik does not conform to any JLS edition, because conformance requires passing JCK which is not an option for Dalvik. </copied from comment>
programmers really should know on which features of JLS they may rely when execute their code on Dalvik
I think the only way for them to know is to study Dalvik test suite (I bet there's one and I expect it is open source isn't it?). For any feature you need, 1) try to find a test that would fail if your feature is implemented incorrectly and check if the test looks good enough. If there's no such test or it's not good enough, 1a) add new or improve existing test. Then, 2) find out if the test has been successfully run against your target implementation. If test hasn't run then 2a) run it yourself and find out if it passes or fails.
BTW above is roughly about how JCK works. The main difference is that one has to invest own time and effort with Dalvik for things one gets from Sun/Oracle for granted. Another difference seems to be that for Dalvik this is not documented while Snorcle has clear docs on that iirc
But there is no any words about this in documentation.
well if there are no words on that then I'd say quality of Dalvik documentation is suboptimal. Softly speaking
Here's the honest answer. If java.util.concurrent is not up to task for your implementation then your problem isn't java.util.concurrent but is instead your original design specifications. Revisit your design, maybe post here what in your design makes using simple mutex's not up to task for you and then the community can show you how to design it better.

Categories

Resources