I am working on Android projects which involve the lot of concurrent programming and I am going to implement some custom inter-threads communication stuff (the one from java.util.concurent are not well suited for my purposes).
The concurrent programming is not easy in general but with Dalvik it seems to be even harder. To get the correct code you should know some specific things and that where problem arise with Dalvik. I just can't find a detailed documentation about the Dalvik VM. Most Android resources (even the developer.android.com focused on platform API and doesn't provide any deep information about some non-trivial (or low-level) things).
For example, to which edition of Java Language Specification the Dalvik VM is conform ? Depending of answer the treatment of volatile variables are different and affect the any concurrent code which use the volatile variables.
There are already some related questions:
Is Dalvik's memory model the same as Java's?
Double checked locking in Android
and some answers by fadden are very useful but I still want to get more detailed and complete understanding of matter in question.
So below a raw questions I am interesting in (I will update the list if necessary as answers for previous questions will arrive):
Where to find the details about the Dalvik VM which may provide the answers for questions below ?
To which edition of Java Language Specification the Dalvik VM is conform to ?
If answer to (2) is "third edition" then how complete the Dalviks's support of Java Memory Model defied in this specification ? And especially how complete the support for semantic of volatile variables ?
In the Double checked locking in Android the fadden provide the following comment:
Yup. With the addition of the "volatile" keyword, this will work on uniprocessor (all versions of Android) and SMP (3.0 "honeycomb" and later)
Does it mean that Samsung Galaxy SII which has the dual-core CPU but only Android 2.3 may execute the concurrent code incorrectly ? (of course Galaxy is only an example, the question is about of any multicore device with pre-Android 3.0 platform)
In the Is Dalvik's memory model the same as Java's? the fadden provide the answer with the following sentence:
No currently-shipping version of Dalvik is entirely correct with respect to JSR-133
Does it mean that any existing correct concurrent Java code may work incorrectly on any Android version released up to date of posting of this comment ?
Update#1: Answer to #gnat's comment (too long to be comment too)
#gnat post a comment:
#Alexey Dalvik does not conform to any JLS edition, because conformance requires passing JCK which is not an option for Dalvik. Does it mean that you even can't apply standard Java compiler because it conform to standard specification ? does that matter? if yes, how?
Well, my question was somehow ambiguous. What I actually meant is that JLS is not only the rules for Java compiler implementations but also an implicit guidelines for any JVM implementations. Indeed, JLS, for example, states that reading and writing of some types are atomic operations. It is not very interesting for compiler writer because read/write translated just into a single opcodes. But it is essential for any JVM implementation which should implement these opcodes properly. Now you should see what I am talking about. While Dalvik accept and execute the programs compiled with standard Java compiler there are no any guaranties that they are executed correctly (as you may expect) just because no one (except maybe Dalvik's developers) knows if all JLS's features used in the program are supported by Dalvik.
It is clear that JCK is not an option for Dalvik and it is Ok, but programmers really should know on which features of JLS they may rely when execute their code on Dalvik. But there is no any words about this in documentation. While you may expect that simplest operators like =, +, -, *, etc. are works as you expect what about non-trivial features like semantic of volatile variables (which is different in 2nd and 3rd editions of JLS)? And latter is not the most non-trivial things you may find in JLS and particular in Java Memory Model.
I haven't read your question completely,
but first of all do not use volatile, even opengles coders do not use it for different ui vs renderer threads.
Use volatile if and only if one thread writes (say to some class' static property)
and other reads, even then you have to synchronize, read this for some good ways to handle counts
How to synchronize a static variable among threads running different instances of a class in java?
always use synchronize
do not jump at large projects for such difficult topics like concurrent programming
go through android samples on games where they have discussed the concept of runnables,
handlers, and exchanging messages b/w threads (UI THREAD AND RENDERER THREAD).
I think you answered your own question, although you have given no details as to why the java.util.concurrent package does not fit your needs, most mobile apps simply use asynchronous IO and a minimum of threading. These devices aren't super computers capable serious distributed processing so I have a little difficulty understanding why java.util.concurrent wont meet your needs.
Secondly, if you have questions about the Dalvik implementation and whether it conforms to the JLS (it doesnt), it would seem to reason that the only reliable support for threading mechanisms would be those which the language defines - java.util.concurrent, runnable and thread local storage.
Hand rolling anything outside of the built in language support is only asking for trouble, and as your question suggests will likely not be supported in a consistent manner on Dalvik.
As always when you think you can do threading better than the guys who wrote Java, think again.
<copied from comment> Dalvik does not conform to any JLS edition, because conformance requires passing JCK which is not an option for Dalvik. </copied from comment>
programmers really should know on which features of JLS they may rely when execute their code on Dalvik
I think the only way for them to know is to study Dalvik test suite (I bet there's one and I expect it is open source isn't it?). For any feature you need, 1) try to find a test that would fail if your feature is implemented incorrectly and check if the test looks good enough. If there's no such test or it's not good enough, 1a) add new or improve existing test. Then, 2) find out if the test has been successfully run against your target implementation. If test hasn't run then 2a) run it yourself and find out if it passes or fails.
BTW above is roughly about how JCK works. The main difference is that one has to invest own time and effort with Dalvik for things one gets from Sun/Oracle for granted. Another difference seems to be that for Dalvik this is not documented while Snorcle has clear docs on that iirc
But there is no any words about this in documentation.
well if there are no words on that then I'd say quality of Dalvik documentation is suboptimal. Softly speaking
Here's the honest answer. If java.util.concurrent is not up to task for your implementation then your problem isn't java.util.concurrent but is instead your original design specifications. Revisit your design, maybe post here what in your design makes using simple mutex's not up to task for you and then the community can show you how to design it better.
Related
On a recent SO question, I explained how calling a RenderScript kernel multiple times will effectively force all threads to be globally synchronized between calls.
I am currently working with multiple convolutions applied in sequence to image data. Since the convolution algorithm requires reading surrounding pixel data of the input image, I have implemented a workflow where my own custom kernel is called multiple times -- to make sure that at every step, all data from the previous convolution is ready and available at the correct coordinates. This technique has worked great for me so far.
However, in my constant quest for optimization, I have noticed that there is much performance to be obtained by keeping intermediate values in local registers for a thread, instead of writing them back to the global memory allocation in between kernel calls. If I were able to chain these convolutions in such a way, things would run much quicker. The problem is obviously that accessing the registers of surrounding threads is not really possible. Furthermore, this would require threads to run in synch to make sure these intermediate values in between stages get calculated in the expected order.
In CUDA and OpenCL, these issues are very common, and are addressed by well-known barrier synchronization + shared memory tiling techniques, which in turn depend on the concept of CUDA thread blocks or OpenCL work groups. I believe these concepts are non-existent in RenderScript, as this issue is very much tied to the wildly different architectures between desktop-class GPU's and mobile SoC's.
So my obvious question here is, are such things possible in RenderScript? That is, better management of threads and possibly thread groups for quicker data sharing among them.
On the Google I/O 2013 RenderScript talk by Jason Sams and Tim Murray, it is discussed how Script Groups might be able to do some behind the scenes optimizations, such as cross-device parallelization, memory tiling, and kernel fusion; all this by analyzing at runtime the dependency DAG in the group, and either automatically creating allocations where needed or possibly optimizing them away. I'm assuming this last bit referes to fusing kernels so that they work off their own local data, kind of how I mentioned above keeping data in local registers and combining separate steps inside a single kernel.
All this seems very much in line with what I'm looking for, especially since my application is indeed a well-defined DAG of inter-dependent operations (for a Convolutional Neural Network). So if Script Groups are indeed a plausible mobile-centric alternative to these mechanisms, I'm wondering if there is any way of influencing how and where these optimizations happen. Or if not, how much can the runtime be trusted to make the correct inference from my data dependencies given the hardware its running on -- in the specific case of "surrounding" pixel data access of the convolutional algorithm.
I realize this might all still be work in pogress, and methods would be highly hardware dependent at this point. So if there is no straight solution for such matters at the present time -- I'd be very much willing to accept a speculative answer on how this kind of workflow might potentially be approached by RenderScript in future releases.
I'd be immensely grateful on some insight about this, as it would greatly affect the development direction of my own project going forward, not to mention there are surely many other people out there wondering how such general parallel computing tasks can be handled in RS.
Thank you very much!
As you've discovered, there's no way in RS to directly share data across threads. However, what you are describing can be done using a ScriptGroup. The catch is that each script in the group has to be unique, so you cannot feed your same script over and over. At least, not as it is written now. You could certainly put the "core" of your script in a RS header and include it from multiple kernels. The ScriptGroup allows you to have the output from one script become the input of another, or the output of one script becomes a global field in another. The documentation states that the kernel to kernel (output to input) is the more efficient use case. Using this approach, your synchronization issue would be resolved as the engine will execute the first script against the entire input data set before starting the second script, etc. The scripts themselves will be parallelized appropriately for the hardware (using either CPU or GPU/DSP). The engine will not have to pop back out to Java between scripts and can also manage the data allocations behind the scenes, if needed.
Something you may notice is the ScriptGroup utilizes Script.KernelID or Script.FieldID in order to identify the exact script or field in which to connect two kernels. Your custom scripts have these things auto-generated as long as you explicitly call out your kernel function using the RS compiler attribute pragma. Then you can call getKernelID_<name> (where 'name' is the kernel function name from your script) to get the kernel ID.
I am embarking on some Android NATIVE coding (e.g. C++, not Java), and need to use the fairly undocumented sp<> ("Strong Pointer") refcount'd pointer class.
As far as I can tell, the Android sp<> template looks VERY much like the more familiar BOOST shared_ptr<> template. Standard refcount mechanism. They do NOT appear to be part-for-part compatible. For instance, Strong Pointers do NOT appear to be threadsafe. What other gotchas are there between the two?
A wider question would be: why is there no online reference for the NDK? Are they having enough diskspace problems on developer.android.com that they cannot fit it there? Grumble.
Android's sp<> is undocumented because it is part of the platform, and its implementation might change between platform revisions. You should not use it in NDK code, unless you copy all of the headers and corresponding source files into your own project.
It is intentionally not thread-safe for performance reason: actually doing thread-safe ref-counting requires adding memory barrier instructions which slow down the operation significantly on ARM (not so much on x86 and x86_64 though). Even Chrome uses two different classes to implement ref-counting for this reason (i.e. base::RefCounted and base::RefCountedThreadSafe).
As to other gotchas, I can't really tell, but I guess the implementation of weak pointers is also different from Boost. In all cases, if you don't understand what this code does, don't use it, it's not meant for general consumption.
I was wondering - how 'safe' are the functions provided by android libraries when doing development of other native libraries on android?
Are there things like Microsoft's strsafe.h or bstring? Or can those be ported over?
There are usually safe variants of unsafe functions you can use to ensure any manipulation problems are generally detected and dealt with before introducing difficult to detect bugs only noticed later on in execution. If I understand your question correctly, you may want to look at things like snprintf in place of printf, strncat instead of strcat, and using variants of malloc when creating character arrays that follow the 'succeed or die' convention.
I find these references helpful when coding in C for Android (I know the native library is lacking a bit).
http://www.cplusplus.com/reference/clibrary/cstring/
http://en.wikipedia.org/wiki/C_string_handling#Overview_of_functions
Using variants that require additional information, such as max buffer size or trigger easy to spot errors on failure is generally helpful to avoid subtle bugs that can be a hassle later on.
Could anyone suggest some detailed info about the internals of Android? I'm interested to know it's differences with other Linuxes, some detailed view of the Android architecture, etc. I've heard loads of scattered info, e.g. the Surface Flinger, the Stage Fright something, that it's got Wake Locks, etc., but I can't put that info together into something meaningful to me.
I just need to understand how Android works, but do it in detail. I'm not very much interested in the SDK or NDK.
Android uses an optimized Linux-Kernel, but not 'glibc' for it's communication between the Kernel and the System. Instead, 'Bionic' is used.
The Apps for Android are written in Java, but not compiled to standard Java-Byte-Code. Also, they don't use a JVM from Oracle. The JVM used by Android is the 'Dalvik Virtual Machine', which reads Dalvik-Byte-Code. The DVM is bases on Apache Harmony, which is an OpenSource JVM implementation.
Also, if found an article on the same topic: Link
I believe I read at some point that due to Android running on the Dalvik VM, that dynamic languages for the JVM (Clojure, Jython, JRuby etc.) would be hard pressed to obtain good performance on Dalvik (and hence on Android). If I recall correctly, the reasoning was that under the hood, in order to achieve the dynamic typing, there was quite a bit of fiddling done with the java bytecode and that the bytecode->dalvik translation wouldn't pick this up easily.
So should I avoid a dynamic JVM language if I want to develop for Android?
EDIT: I guess I should have provided a bit more context. I was considering using Clojure to develop apps for Android. I was thinking about using Clojure for a few reasons:
I want to learn FP
I don't really care to learn Java
Clojure seems to have some very
interesting language concepts (STM
for example).
However, when I tried to write apps for Android in Clojure, I found that there is a performance issue that is unacceptable. But I found a blog posting that said that dynamically typed languages (Clojure for example) would have problems due to the bytecode manipulation needed to get the dynamic typing. So I was sort of looking for independent confirmation that this is true or it isn't. I should have known better than to make the assumption that in this particular issue all dynamically typed JVM languages could be treated as the same. I guess I did ask a fairly broad question so I guess I shouldn't be surprised that people didn't quite understand what I was asking.
Dan Bornstein gave a presentation on Dalvik at Google I/O. It's worth watching to learn about the system in general, including the constraints you care about. The specific issue of non-Java languages compiled into Java bytecode comes up during the Q&A.
Remco van 't Veer has a github project where he's patched Clojure to work on Android. Tim Riddell has written a tutorial on how to use it.
As mentioned here by #sean, there is sometimes a bigger problem than just performance. Dan Bornstein discusses it when asked about Jython, at ~54:00 in video. There is currently no support for dynamic languages which generate bytecode on-the-fly, (because the bytecode translation is not available at runtime).
Android just got scripting
There are some patches to make clojure work.
http://riddell.us/tutorial/clojure_android/clojure_android.html
I think the real issue is the use of byte code generators by some dynamic languages; they won't generate byte code for the Davlik VM. Therefore eval will not work.
Given the relatively speaking cramped hardware of the phone running you probably should just target java and not worry about a dynamic jvm language. They dynamic languages on the jvm aren't going to be as efficient as the java to my understanding.
Besides the Android SDK is pretty sane and easy to write for I don't think you'll experience very many benefits using something else.
dynamic languages for the JVM would be hard pressed to obtain good performance on Dalvik
Dynamic languages are hard pressed to obtain good performance, period. If you want performance, use a statically typed language like Java (or C#, F# etc.).