This question answers it in a general sense and it doesn't specify on what will happen if the UUID generation is not centralized.
I recently saw an architecture where all the devices (1M+ ios and android) were generating UUID4 keys(using their own generator functions/library) and those keys when synced were marked unique on the server's database. I fear that since around a million devices will try to generate locally, the chances of collision will be higher than described in the question above.
I've used centralized UUIDs before but new to this type of distributed system, so I'm taking the analogy of a prime number generator, running in parallel on different environments, which will make the end result not prime/unique. Please correct me if my understanding is wrong.
Also, please share any good articles on gotcha's and using UUID in distributed environments.
Edit: This answer related to Python UUID generation talks about collisions and using UUID1 and not UUID4. I'm wondering if there's any documentation which will confirm this wrt randomness on android and ios devices. Moreover, how should I calculate/estimate the probability of such collisions.
The whole point of a UUID is it is just that - universally unique.
A UUID is generally based on things like the device's MAC address and timestamp among possibly other things. A million devices generating several UUIDs per second won't have any collisions, ever.
Unless Apple or Google screwed up their implementation for generating UUIDs, you have nothing to worry about.
Again, the whole point of UUIDs is that you don't need a central, single server generating all of the IDs.
Many of the answers to the question you link contain references to details about UUID algorithms. And that question really has nothing to do with UUID generation being centralized.
Update - since the focus is on UUID4, here is an excerpt from the Wikipedia article about the probability of duplicates for UUID4:
To put these numbers into perspective, the annual risk of a given person being hit by a meteorite is estimated to be one chance in 17 billion,[4] which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.
However, these probabilities only hold when the UUIDs are generated using sufficient entropy. Otherwise, the probability of duplicates could be significantly higher, since the statistical dispersion might be lower. Where unique identifiers are required for distributed applications, so that UUIDs do not clash even when data from many devices is merged, the randomness of the seeds and generators used on every device must be reliable for the life of the application. Where this is not feasible, RFC4122 recommends using a namespace variant instead.
Based on my experience with iOS, iOS is using UUID4. Given the above, I'm not worried about any collisions.
Related
We're facing a upcoming internationalization project for our Android app, expanding significantly on the three currently supported languages. All the languages that we're initially supporting will use existing layouts and are left-to-right, and we are already familiar with language specific setups (separate resource files etc.) as required, so fortunately nothing too exotic is in store for now.
I would like to be able to visually verify that our layouts work and that everything looks OK. Are there any good tools or best practices available for this need? Trying to find resources, I've only been able to identify tools for managing the actual string resources themselves or different platforms for sourcing translations, but nothing that would render them in their actual context for verification.
The only immediate thing that I can think of is to essentially run our test scripts with something like this screenshot lib, and then rinse and repeat for every language we should support. Is there a better, less work intensive and time-consuming (a single testrun times languages supported will probably take at least a few hours on a single device) approach?
On a recent SO question, I explained how calling a RenderScript kernel multiple times will effectively force all threads to be globally synchronized between calls.
I am currently working with multiple convolutions applied in sequence to image data. Since the convolution algorithm requires reading surrounding pixel data of the input image, I have implemented a workflow where my own custom kernel is called multiple times -- to make sure that at every step, all data from the previous convolution is ready and available at the correct coordinates. This technique has worked great for me so far.
However, in my constant quest for optimization, I have noticed that there is much performance to be obtained by keeping intermediate values in local registers for a thread, instead of writing them back to the global memory allocation in between kernel calls. If I were able to chain these convolutions in such a way, things would run much quicker. The problem is obviously that accessing the registers of surrounding threads is not really possible. Furthermore, this would require threads to run in synch to make sure these intermediate values in between stages get calculated in the expected order.
In CUDA and OpenCL, these issues are very common, and are addressed by well-known barrier synchronization + shared memory tiling techniques, which in turn depend on the concept of CUDA thread blocks or OpenCL work groups. I believe these concepts are non-existent in RenderScript, as this issue is very much tied to the wildly different architectures between desktop-class GPU's and mobile SoC's.
So my obvious question here is, are such things possible in RenderScript? That is, better management of threads and possibly thread groups for quicker data sharing among them.
On the Google I/O 2013 RenderScript talk by Jason Sams and Tim Murray, it is discussed how Script Groups might be able to do some behind the scenes optimizations, such as cross-device parallelization, memory tiling, and kernel fusion; all this by analyzing at runtime the dependency DAG in the group, and either automatically creating allocations where needed or possibly optimizing them away. I'm assuming this last bit referes to fusing kernels so that they work off their own local data, kind of how I mentioned above keeping data in local registers and combining separate steps inside a single kernel.
All this seems very much in line with what I'm looking for, especially since my application is indeed a well-defined DAG of inter-dependent operations (for a Convolutional Neural Network). So if Script Groups are indeed a plausible mobile-centric alternative to these mechanisms, I'm wondering if there is any way of influencing how and where these optimizations happen. Or if not, how much can the runtime be trusted to make the correct inference from my data dependencies given the hardware its running on -- in the specific case of "surrounding" pixel data access of the convolutional algorithm.
I realize this might all still be work in pogress, and methods would be highly hardware dependent at this point. So if there is no straight solution for such matters at the present time -- I'd be very much willing to accept a speculative answer on how this kind of workflow might potentially be approached by RenderScript in future releases.
I'd be immensely grateful on some insight about this, as it would greatly affect the development direction of my own project going forward, not to mention there are surely many other people out there wondering how such general parallel computing tasks can be handled in RS.
Thank you very much!
As you've discovered, there's no way in RS to directly share data across threads. However, what you are describing can be done using a ScriptGroup. The catch is that each script in the group has to be unique, so you cannot feed your same script over and over. At least, not as it is written now. You could certainly put the "core" of your script in a RS header and include it from multiple kernels. The ScriptGroup allows you to have the output from one script become the input of another, or the output of one script becomes a global field in another. The documentation states that the kernel to kernel (output to input) is the more efficient use case. Using this approach, your synchronization issue would be resolved as the engine will execute the first script against the entire input data set before starting the second script, etc. The scripts themselves will be parallelized appropriately for the hardware (using either CPU or GPU/DSP). The engine will not have to pop back out to Java between scripts and can also manage the data allocations behind the scenes, if needed.
Something you may notice is the ScriptGroup utilizes Script.KernelID or Script.FieldID in order to identify the exact script or field in which to connect two kernels. Your custom scripts have these things auto-generated as long as you explicitly call out your kernel function using the RS compiler attribute pragma. Then you can call getKernelID_<name> (where 'name' is the kernel function name from your script) to get the kernel ID.
My team needs to develop a solution to encrypt binary data (stored as a byte[]) in the context of an Android application written in Java. The encrypted data will be transmitted and stored in various ways, during which data corruption cannot be ruled out. Eventually another Android application (again written in Java) will have to decrypt the data.
It has already been decided that the encryption algorithm has to be AES, with a key of 256 bits.
However I would like to make an informed decision about which AES implementation and/or "mode" we should use. I have read about something called GCM mode, and we have done some tests with it (using BouncyCastle/SpongyCastle), but it is not entirely clear to me what exactly AES-GCM is for and what it "buys" us in comparison to plain AES - and whether there are any trade-off's to be taken into account.
Here's a list of concerns/requirements/questions we have:
Padding: the data we need to encrypt will not always be a multiple of the 128 bits, so the AES implementation/mode should add padding, yet only when necessary.
I was under the impression that a plain AES implementation, such as provided by javax.crypto.Cipher, would not do that, but initial tests indicated that it does. So I'm guessing the padding requirement in itself is no reason to resort to something like GCM instead of "plain" AES. Is that correct?
Authentication: We need a foolproof way of detecting if data corruption has occurred. However, ideally we also want to detect when decryption is attempted with an incorrect key. Hence, we want to be able to differentiate between both of these cases. The reason I ended up considering GCM in the first place was due to this Stackoverflow question, where one of the responders seems to imply that making this distinction is possible using AES-GCM, although he does not provide a detailed explanation (let alone code).
Minimise overhead: We need to limit overhead on storage and transmission of the encrypted data. Therefore we wish to know whether, and to what extent, the choice for a specific AES implementation/mode influences the amount of overhead.
Encryption/decryption performance: Although it is not a primary concern we are wondering to what extent the choice of a specific AES implementation/mode influences encryption and decryption performance, both in terms of CPU time and memory footprint.
Thanks in advance for any advice, clarification and/or code examples.
EDIT: delnan helpfully pointed out there is no such thing as "plain AES". So to clarify, what I meant by that is using Java's built-in AES support.
Like so: Cipher localCipher = Cipher.getInstance("AES");
In 2012 the answer is to go for GCM, unless you have serious compatibility issues.
GCM is an Authenticated Encryption mode. It provides you with confidentiality (encryption), integrity, and authentication (MAC) in one go.
So far, the normal modes of operation have been ECB (which is the default for Java), CBC, CTR, OFB, and a few others. They all provided encryption only. Confidentiality by itself is seldom useful without integrity though; one had to combine such classic modes with integrity checks in an ad-hoc way. Since cryptography is hard to get right, often such combinations were insecure, slower than necessary or even both.
Authenticated Encryption modes have been (fairly recently) created by cryptographers to solve that problem. GCM is one of the most successful: it has been selected by NIST, it efficient, it is is patent free, and it can carry Additional Authenticated Data (that is, data which stays in the clear, but for which you can verify authenticity). For a description of other modes see this excellent article of Matthew Green.
Coming to your concerns:
Padding: by default, Java uses PKCS#7 padding. That works, but it is often vulnerable to padding oracle attacks which are best defeated with a MAC. GCM embeds already a MAC (called GMAC).
Authentication: AES-GCM only takes one AES key as input, not passwords. It will tell you if the AES key is wrong or the payload has been tampered with, but such conditions are treated as one. Instead, you should consider using an appropriate Key Derivation Algorithm like PBKDF2 or bcrypt to derive the AES key from the password. I don't think it is always possible to tell if the password is incorrect or if the payload has been modified, because the data necessary to verify the former can always be corrupted. You can encrypt a small known string (with ECB AES), send it along, and use it to verify if the password is correct.
Minimise overhead: At the end of the day, all modes leads to the same overhead (around 10-20 bytes) if you want authentication. Unless you are working with very small payloads, this point can be ignored.
Performance: GCM is pretty good in that it is an online mode (no need to buffer the whole payload, so less memory), it is parallelizable, and it requires one AES operation and one Galois multiplication per plaintext block. Classic modes like ECB are faster (one AES operation per block only), but - again - you must also factor in the integrity logic, which may end up being slower than GMAC.
Having said that, one must be aware that GCM security relies on a good random number generation for creation of the IV.
A coworker and I were talking (after a fashion) about an article I read (HTC permission security risk). Basically, the argument came down to whether or not it was possible to log every action that an application was doing. Then someone (an abstract theroetical person) would go through and see if the app was doing what it was supposed to do and not trying to be all malicious like.
I have been programming in Android for a year now, and as far as I know if -- if -- that was possible, you would have to hack Dalvik and output what each process was doing. Even if you were to do that, I think it would be completely indecipherable because of the sheer amount of stuff each process was doing.
Can I get some input one way or the other? Is it completely impractical to even attempt to log what a foriegn application is doing?
I have been programming in Android for a year now, and as far as I know if -- if -- that was possible, you would have to hack Dalvik and output what each process was doing.
Not so much "hack Dalvik" but "hack the android.* class library, and perhaps a few other things (e.g., java.net).
Even if you were to do that, I think it would be completely indecipherable because of the sheer amount of stuff each process was doing.
You might be able to do some fancy pattern matching or something on the output -- given that you have determined patterns of inappropriate actions. Of course, there is also the small matter of having to manually test the app (to generate the output).
Is it completely impractical to even attempt to log what a foriegn application is doing?
From an SDK app? I damn well hope so.
From a device running a modded firmware with the aforementioned changes? I'd say it is impractical unless you have a fairly decent-sized development team, at which point it is merely expensive.
This is both possible and practical if you are compiling your own ROM. Android is based on Linux and I know several projects like this for Linux, like Linux Trace Toolkit. I also know of research into visualizing the results and detecting malicious apps from the results as well.
Another thing functionality like this is often used for is performance and reliability monitoring. You can read about the DTRACE functionality in Solaris to learn more about how this sort of stuff is used in business rather than academia.
In the recent weeks I have been busy with the issue of cross-platform development. That starts with the problem that I had the feeding to write a wrapper for the communication-API of MoSync (I don´t knew this SDK and others for cpd before). It should be used in our Java environment for instance to easy create a bluetooth-connection to different phones and so on.
For me the other question is now, how I can use such SDKs like MoSync, Titanium and others in a existing project? In my opinion it is not possible. Either you develop nativ or with a cpd-framework.
I would also be interested in when do you recommend this frameworks (I know already that there are some other threads about this). I personally would say that there isn´t a great future for this SDKs because of technical drawbacks and dependencies. In addition, the market for cross-platform solutions (hybrid, interpreted, cross-compiler) is at least as fragmented as the market for mobile operating systems themselves
What are your experiences?
Martin
Cross-platform implementations of any type, on mobile or anywhere, are primarily to reduce the time to market. That statement may look like oversimplified, but it more or less holds true. So, the ideal situation to use it would be to have an application/game that maybe, uses the common denominator of features across the smartphones, which could include touch, a decent UI, the network, maybe the accelerometer in some cases LBS. So, you land on multiple phones in quicker time and reduced development cost.
If you are looking to utilize a lot of hardware specific features, then we land into what's commonly known as the unknown territories. Then you have to do what people always do, gather more information about the landscape of phones to target and see if the "chosen framework" has the power to dish out the features on those platforms. In this case, you cant possibly deploy one off-the-shelf.