I just did a benchmark to compare access performance of local variables, member variables, member variables of other objects and getter setters. The benchmark increases the variable in a loop with 10 mio iterations. Here is the output:
BENCHMARK: local 101, member 1697, foreign member 151, getter setter 268
This was done on a Motorola XOOM tablet and Android 3.2. The numbers are milliseconds of execution time. Can anybody explain the deviation for the member variable to me? Especially when compared to the other object's member variable. Based on those figures it seems to be worthwhile to copy member variables to local variables before using their values in calculations. Btw, I did the same benchmark on an HTC One X and Android 4.1 and it showed the same deviation.
Are those numbers reasonable or is there a systematic error that I miss?
Here is the benchmark function:
private int mID;
public void testMemberAccess() {
// compare access times for local variables, members, members of other classes
// and getter/setter functions
final int numIterations = 10000000;
final Item item = new Item();
int i = 0;
long start = SystemClock.elapsedRealtime();
for (int k = 0; k < numIterations; k++) {
mID++;
}
long member = SystemClock.elapsedRealtime() - start;
start = SystemClock.elapsedRealtime();
for (int k = 0; k < numIterations; k++) {
item.mID++;
}
long foreignMember = SystemClock.elapsedRealtime() - start;
start = SystemClock.elapsedRealtime();
for (int k = 0; k < numIterations; k++) {
item.setID(item.getID() + 1);
}
long getterSetter = SystemClock.elapsedRealtime() - start;
start = SystemClock.elapsedRealtime();
for (int k = 0; k < numIterations; k++) {
i++;
}
long local = SystemClock.elapsedRealtime() - start;
// just make sure nothing loops aren't optimized away?
final int dummy = item.mID + i + mID;
Log.d(Game.ENGINE_NAME, String.format("BENCHMARK: local %d, member %d, foreign member %d, getter setter %d, dummy %d",
local, member, foreignMember, getterSetter, dummy));
}
Edit:
I put each loop in a function and called them 100 times randomly. Result:
BENCHMARK: local 100, member 168, foreign member 190, getter setter 271
Looks good, thx.
The foreign object was created as final class member, not inside the functions.
Well, I'd say that the Dalvik VM's optimizer is pretty smart ;-) I do know that the Dalvik VM is register-based. I don't know the guts of the Dalvik VM, but I would assume that the following is going on (more or less):
In the local case, you are incrementing a method local variable inside a loop. The optimizer recognizes that this variable isn't accessed until the loop is completed, so can use a register and applies the increments there until the loop is complete and then stores the value back into the local variable. This yields: 1 fetch, 10000000 register increments and 1 store.
In the member case, you are incrementing a member variable inside a loop. The optimizer cannot determine whether or not the member variable is accessed while the loop is running (by another method, object or thread), so it is forced to fetch, increment and store the value back into the member variable on each loop iteration. This yields: 10000000 fetches, 10000000 increments and 10000000 store operations.
In the foreign member case, you are incrementing a member variable of an object inside a loop. You have created that object within the method. The optimizer recognizes that this object cannot be accessed (by another object, method or thread) until the loop is completed, so can use a register and apply the increments there until the loop is complete and then store the value back into the foreign member variable. This yields: 1 fetch, 10000000 register increments and 1 store.
In the getter/setter case, I am going to assume that the compiler and/or optimizer is smart enough to "inline" getter/setters (ie: it doesn't really make a method call - it replaces item.setID(item.getID() + 1) with item.mID = item.mID + 1). The optimizer recognizes that you are incrementing a member variable of an object inside a loop. You have created that object within the method. The optimizer recognizes that this object cannot be accessed (by another object, method or thread) until the loop is completed, so it can use a register and apply the increments there until the loop is complete and then store the value back into the foreign member variable. This yields: 1 fetch, 10000000 register increments and 1 store.
I can't really explain why the getter/setter timing is twice the foreign member timing, but this may be due to the time it takes the optimizer to figure it out, or something else.
An interesting test would be to move the creation of the foreign object out of the method and see if that changes anything. Try moving this line:
final Item item = new Item();
outside of the method (ie: declare it as a private member variable of some object instead). I would guess that the performance would be much worse.
Disclaimer: I'm not a Dalvik engineer.
Beside varying their order, there are other things that you can do in order to try to eliminate any interference:
1- Eliminate the border effect by calculating the first item a second time; preferably by using another long variable.
2- Increase the number of iterations by 10. 1000000 seems a big number but as you can see from the first suggestion; increasing a variable 1 million times is so fast on a modern CPU that a lot of other things like filling the various caches take their importance.
3- Add spurious instructions like inserting dummy long l = SystemClock.elapsedRealtime()-start calculations. This will help showing that this 1000000 iterations is really a small number.
4- Add the volatile keyword to the mID field. This is probably the best way to factor out any compiler or CPU related optimization.
Related
Background
Suppose I have a RecyclerView, which has items that can only be unique if you look at 2 ids they have, but not just one of them.
The first Id is the primary one. Usually there aren't 2 items that have the same primary ID, but sometimes it might occur, which is why there is a secondary ID.
In my
The problem
The RecyclerView adapter needs to have a "long" type being returned:
https://developer.android.com/reference/android/support/v7/widget/RecyclerView.Adapter.html#getItemId(int)
What I've tried
The easy way to overcome this, is to have a HashMap and a counter.
The HashMap will contain the combined-keys, and the value will be the id that should be returned. The counter is used to generated the next id in case of a new combined key. The combined key can be a "Pair" class in this case.
Suppose each item in the RecyclerView data has 2 long-type keys:
HashMap<Pair<Long,Long>,Long> keyToIdMap=new HashMap();
long idGenerator=0;
this is what to do in getItemId :
Pair<Long,Long> combinedKey=new Pair(item.getPrimaryId(), item.getSecondary());
Long uniqueId=keyToIdMap.get(combinedKey);
if(uniqueId==null)
keyToIdMap.put(combinedKey,uniqueId=idGenerator++);
return uniqueId;
This has the drawback of taking more and more memory. Not much though, and it's very small and proportional to the data you already have, but still...
However, this has the advantage of being able to handle all types of IDs, and you can use even more IDs as you wish (just need something similar to Pair).
Another advantage is that it will use all IDs starting from 0.
The question
Is there perhaps a better way to achieve this?
Maybe a mathematical way? I remember I learned in the past of using prime numbers for similar tasks. Will it work here somehow?
Do the existing primary and secondary ids use the entire 64-bit range of longs? If not then it's possible to compute a unique 64-bit long from their values with e.g bit slicing.
Another approach would be to hash the two together with a hash with very low collisions (a crypto hash like SHA2 for example) and using the first 64 bits of the result. Having a range of 64 bits means you can comfortably have millions of items before the chance of a collision becomes likely - the chance of a collision is 50% when you add sqrt(64)=2**32 items, which is more than 4 billion.
Finally, having an unique independent mapping is very versatile and assuming the map is always accessible it's fine (it gets tricky when you try to synchronize that new id across machine etc). In Java you can attempt to increase performance by avoiding the boxed Longs and a separate Pair instance using a custom map implementation, but that's micro-optimizing.
Example using SHA1:
With Guava - the usage is clean and obvious.
HashFunction hf = Hashing.sha1();
long hashedId = hf.newHasher()
.putLong(primary)
.putLong(secondary)
.hash()
.asLong();
Just the standard JDK, it's pretty horrible and can probably be more efficient, should look something like this (I'm ignoring checked exceptions):
static void updateDigestWithLong(MessageDigest md, long l) {
md.update((byte)l);
md.update((byte)(l >> 8));
md.update((byte)(l >> 16));
md.update((byte)(l >> 24));
}
// this is from the Guava sources, can reimplement if you prefer
static long padToLong(bytes[] bytes) {
long retVal = (bytes[0] & 0xFF);
for (int i = 1; i < Math.min(bytes.length, 8); i++) {
retVal |= (bytes[i] & 0xFFL) << (i * 8);
}
return retVal;
}
static long hashLongsToLong(long primary, long secondary) {
MessageDigest md = MessageDigest.getInstance("SHA-1");
updateDigestWithLong(md, primary);
updateDigestWithLong(md, secondary);
return padToLong(md.digest());
}
I think my original idea is the best one I can think of.
Should cover all possible ids, with least possible collision.
My application records user movement with Geofence boundaries, if the user exits the Geofence, alerts are appropriately escalated. These alert are counted and displayed in a summary at the end of the activity. However I would like to create a stats page where it displays the last week or month of activities as well as the number of alerts so that I can display these in a chart. Is there anyway to do this effectively without using a database?
I had thought of writing data to a log file and reading it but curious as to if there is a better option.
You can use SharedPreferences but it will require a lot of controls, probably more then creating a database. If you insist not to use a database, put an integer to your shared preferences saving the count of your data, also that integer will become your id. Then you can store your data with a loop depending on your data.
Here is to write your data to shared preferences
SharedPreferences mSharedPrefs = getSharedPreferences("MyStoredData",
MODE_PRIVATE);
private SharedPreferences.Editor mPrefsEditor = mSharedPrefs.edit();
int count = mSharedPrefs.getInt("storedDataCount", 0);
for(int i = 0 ; i < yourCurrentDataCount ; i++) {
mPrefsEditor.putInt("data" + count, yourData.get(i));
count++;
}
mPrefsEditor.putInt("storedDataCount", count);
And to get your data,
int count = mSharedPrefs.getInt("storedDataCount", 0);
for(int i = 0 ; i < count ; i++) {
yourData.add(mSharedPrefs.getString("data" + i, "defaultData"));
count++;
}
Edit:
I should have added some explaining. The idea is to save the count of your data to generate an id, and save the tag according to it. This code will work like this, lets say you have 5 strings. Since you don't have a MyStoredData xml, it will get created. Then since you don't have the "storedDataCount" tag, you will get 0 as a count. Your loop will iterate 5 times and in each iteration, you will add a tag to your xml like "<.data0>your first data<./data0><.data1>your second data <./data1>... After your loop is done, you will modify your storedDataCount and it will become <.storedDataCount>5<./ storedDataCount>. And the next time you use your app, your count will start from 5 so your tag will start from <.data5>. For reading, you will iterate through tags by checking "data0", "data1" and so on.
You can use java serialization if you dont want to use database.
You can also use XML/JSON for storing data.
I support already mentioned favoritism towards using a DB for this task. Nevertheless, if I were to do it via FS, I would use a transactional async library like Square's tape is.
In your case I would keep the data during a session in JSON object (structure) and persist it (in onPause()) and restore it (in onRestore()) with tape's GSON Object Converter.
Should be easy out of the box, I believe.
Tape website: http://square.github.io/tape/
Alternatively to manually persisting a file or using a 3rd party library like tape, you could always (de)serialize your JSON to SharedPreferences.
In the following SO question: https://stackoverflow.com/questions/2067955/fast-bitmap-blur-for-android-sdk #zeh claims a port of a java blur algorithm to C runs 40 times faster.
Given that the bulk of the code includes only calculations, and all allocations are only done "one time" before the actual algorithm number crunching - can anyone explain why this code runs 40 times faster? Shouldn't the Dalvik JIT translate the bytecode and dramatically reduce the gap to native compiled code speed?
Note: I have not confirmed the x40 performance gain myself for this algorithm, but all serious image manipulation algorithm I encounter for Android, are using the NDK - so this supports the notion that NDK code will run much faster.
For algorithms that operate over arrays of data, there are two things that significantly change performance between a language like Java, and C:
Array bound checking: Java will check every access, bmap[i], and confirm i is within the array bounds. If the code tries to access out of bounds, you will get a useful exception. C & C++ do not check anything and just trust your code. The best case response to an out of bounds access is a page fault. A more likely result is "unexpected behavior".
Pointers: You can significantly reduce the operations by using pointers.
Take this innocent example of a common filter (similar to blur, but 1D):
for(int i = 0; i < ndata - ncoef; ++i) {
z[i] = 0;
for(int k = 0; k < ncoef; ++k) {
z[i] += c[k] * d[i + k];
}
}
When you access an array element, coef[k] is:
Load address of array coef into register;
Load value k into a register;
Sum them;
Go get memory at that address.
Every one of those array accesses can be improved because you know that the indexes are sequential. Neither the compiler, nor the JIT can know that the indexes are sequential so they cannot optimize fully (although they keep trying).
In C++, you would write code more like this:
int d[10000];
int z[10000];
int coef[10];
int* zptr;
int* dptr;
int* cptr;
dptr = &(d[0]); // Just being overly explicit here, more likely you would dptr = d;
zptr = &(z[0]); // or zptr = z;
for(int i = 0; i < (ndata - ncoef); ++i) {
*zptr = 0;
*cptr = coef;
*dptr = d + i;
for(int k = 0; k < ncoef; ++k) {
*zptr += *cptr * *dptr;
cptr++;
dptr++;
}
zptr++;
}
When you first do something like this (and succeed in getting it correct) you will be surprised how much faster it can be. All the array address calculations of fetching the index and summing the index and base address are replaced with an increment instruction.
For 2D array operations such as blur on an image, an innocent code data[r,c] involves two value fetches, a multiply and a sum. So with 2D arrays the benefits of pointers allows you to remove multiply operations.
So the language allows real reduction in the operations the CPU must perform. The cost is that the C++ code is horrendous to read and debug. Errors in pointers and buffer overflows are food for hackers. But when it comes to raw number grinding algorithms, the speed improvement is too tempting to ignore.
Another factor not mentioned above is the garbage collector. The problem is that garbage collection takes time, plus it can run at any time. This means that a Java program which creates lots of temporary objects (note that some types of String operations can be bad for this) will often trigger the garbage collector, which in turn will slow down the program (app).
Following is an list of Programming Language based on the levels,
Assembly Language ( Machine Language, Lover Level )
C Language ( Middle Level )
C++, Java, .net, ( Higher Level )
Here Lower level language has direct access to the Hardware. As long as the level gets increased the access to the hardware gets decrease. So Assembly Language's code runs at the highest speed while other language's code runs based on their levels.
This is the reason that C Language's code run much faster than the Java's code.
I have a Service that sends out updates every few hundred milliseconds. I have a String that gets created each time with a description of the event ("Time elapsed is 32 seconds"). Unfortunately I can't just use ints because the content can change depending on the event (usually however it's the same event type) and the feedback is going back to the user. Is there a way I can statically reuse the same String so that there aren't 100s of String allocations per minute? Even if I reuse the same variable, ie:
mEventUpdate = "Time elapsed is " + time + " seconds";
I still see a lot of String allocations being made.
At least you can use String.format() to reduce the number of created objects:
mEventUpdate = String.format("Time elapsed is %d seconds", time);
A String in Java is an unmutable object. Once created, you cannot change it any more. So if it really has to be a String there is no way to avoid the allocations.
Use StringBuffer
If you have a method returning a string, and you know that its result
will always be appended to a StringBuffer anyway, change your
signature and implementation so that the function does the append
directly, instead of creating a short-lived temporary object.
http://developer.android.com/training/articles/perf-tips.html#ObjectCreation
I wont bother with a long answer, either use format like someone said or use the often-overlooked StringBuffer - which I use when joining large number of Strings together, say in a loop, where using format wouldn't be possible.
http://developer.android.com/reference/java/lang/StringBuffer.html
(I like Android's reference because look it compared to that nasty Oracle's nasty one :P)
Declare this as global static variable
public static StringBuilder mEventUpdate = new StringBuilder();
mEventUpdate.delete(0, buffer.length());
mEventUpdate.append("Time elapsed is ");
mEventUpdate.append(time);
mEventUpdate.append(" seconds");
// TO desplay
mEventUpdate.toString();
I came across through a presentation(dalvik-vm-internals) on Dalvik VM, in that it is mentioned as for the below loops, we have use (2) and (3) and to avoid (7).
(1) for (int i = initializer; i >= 0; i--)
(2) int limit = calculate limit;
for (int i = 0; i < limit; i++)
(3) Type[] array = get array;
for (Type obj : array)
(4) for (int i = 0; i < array.length; i++)
(5) for (int i = 0; i < this.var; i++)
(6) for (int i = 0; i < obj.size(); i++)
(7) Iterable list = get list;
for (Type obj : list)
Comments: i feel that (1) and (2) are the same.
(3)
(4) every time it has to calculate the length of array, so this can be avoided
(5)
(6) same as (4), calculating the size everytime
(7) asked to avoid as list is of an Iterable type??
one more, in case if we have infinite data(assume data is coming as a stream) which loop should we consider for better efficiency?)
request you to please comment on this...
If that's what they recommend, that's what they've optimized the compiler and VM for. The ones you feel are the same aren't necessarily implemented the same way: the compiler can use all sorts of tricks with data and path analysis to avoid naively expensive operations. For instance, the array.length() result can be cached since arrays are immutable.
They're ranked from most to least efficient: but (1) is 'unnatural'. I agree, wouldn't you? The trouble with (7) is that an iterator object is created and has to be GC'ed.
Note carefully when the advice should be heeded. It's clearly intended for bounded iteration over a known collection, not the stream case. It's only relevant if the loop has significant effect on performance and energy consumption ('operating on the computer scale'). The first law of optimization is "Don't optimize". The second law (for experts) is "Don't optimize, yet.". Measure first (both execution times and CPU consumption), optimize later: this applies even to mobile devices.
What you should consider is the preceding slides: try to sleep as often and as long as possible, while responding quickly to changes. How you do that depends on what kind of stream you're dealing with.
Finally, note that the presentation is two years old, and may not fully apply to 2.2 devices where among other things JIT is implemented.
With infinite data, none of the examples are good enough. Best would be to do
for(;;) {
list.poll(); //handle concurrency, in java for example, use a blocking queue
}
1) and 2) are really different. 2) need an extra subtraction to compute i=0 doesn't.
Even better, on most processor (and well optimized code) no is comparison needed for i>=0. The processor can use the the negative flag, resulting for the last decrement (i--).
So the end of the loop -1 looks like (in pseudo assembler)
--i
jump-if-neg
while loop #2
++i
limit-i # set negative flag if i >limit
jump-if-neg
That doesn't make a big difference, except if the code in your loop is really small (like basic C string operation)
That might not work with interpreted languages.