In the following SO question: https://stackoverflow.com/questions/2067955/fast-bitmap-blur-for-android-sdk #zeh claims a port of a java blur algorithm to C runs 40 times faster.
Given that the bulk of the code includes only calculations, and all allocations are only done "one time" before the actual algorithm number crunching - can anyone explain why this code runs 40 times faster? Shouldn't the Dalvik JIT translate the bytecode and dramatically reduce the gap to native compiled code speed?
Note: I have not confirmed the x40 performance gain myself for this algorithm, but all serious image manipulation algorithm I encounter for Android, are using the NDK - so this supports the notion that NDK code will run much faster.
For algorithms that operate over arrays of data, there are two things that significantly change performance between a language like Java, and C:
Array bound checking: Java will check every access, bmap[i], and confirm i is within the array bounds. If the code tries to access out of bounds, you will get a useful exception. C & C++ do not check anything and just trust your code. The best case response to an out of bounds access is a page fault. A more likely result is "unexpected behavior".
Pointers: You can significantly reduce the operations by using pointers.
Take this innocent example of a common filter (similar to blur, but 1D):
for(int i = 0; i < ndata - ncoef; ++i) {
z[i] = 0;
for(int k = 0; k < ncoef; ++k) {
z[i] += c[k] * d[i + k];
}
}
When you access an array element, coef[k] is:
Load address of array coef into register;
Load value k into a register;
Sum them;
Go get memory at that address.
Every one of those array accesses can be improved because you know that the indexes are sequential. Neither the compiler, nor the JIT can know that the indexes are sequential so they cannot optimize fully (although they keep trying).
In C++, you would write code more like this:
int d[10000];
int z[10000];
int coef[10];
int* zptr;
int* dptr;
int* cptr;
dptr = &(d[0]); // Just being overly explicit here, more likely you would dptr = d;
zptr = &(z[0]); // or zptr = z;
for(int i = 0; i < (ndata - ncoef); ++i) {
*zptr = 0;
*cptr = coef;
*dptr = d + i;
for(int k = 0; k < ncoef; ++k) {
*zptr += *cptr * *dptr;
cptr++;
dptr++;
}
zptr++;
}
When you first do something like this (and succeed in getting it correct) you will be surprised how much faster it can be. All the array address calculations of fetching the index and summing the index and base address are replaced with an increment instruction.
For 2D array operations such as blur on an image, an innocent code data[r,c] involves two value fetches, a multiply and a sum. So with 2D arrays the benefits of pointers allows you to remove multiply operations.
So the language allows real reduction in the operations the CPU must perform. The cost is that the C++ code is horrendous to read and debug. Errors in pointers and buffer overflows are food for hackers. But when it comes to raw number grinding algorithms, the speed improvement is too tempting to ignore.
Another factor not mentioned above is the garbage collector. The problem is that garbage collection takes time, plus it can run at any time. This means that a Java program which creates lots of temporary objects (note that some types of String operations can be bad for this) will often trigger the garbage collector, which in turn will slow down the program (app).
Following is an list of Programming Language based on the levels,
Assembly Language ( Machine Language, Lover Level )
C Language ( Middle Level )
C++, Java, .net, ( Higher Level )
Here Lower level language has direct access to the Hardware. As long as the level gets increased the access to the hardware gets decrease. So Assembly Language's code runs at the highest speed while other language's code runs based on their levels.
This is the reason that C Language's code run much faster than the Java's code.
Related
I can easily read 2e15 as "two quadrillion" at a glance, but for 2000000000000000 I have to count the zeroes, which takes longer and can lead to errors.
Why can't I declare an int or long using a literal such as 2e9 or 1.3e6? I understand that a negative power of 10, such as 2e-3, or a power of 10 that is less than the number of decimal places, such as 1.0003e3, would produce a floating point number, but why doesn't Java allow such declarations, and simply truncate the floating-point part and issue a mild warning in cases where the resulting value is non-integral?
Is there a technical reason why this is a bad idea, or is this all about type-safety? Wouldn't it be trivial for the compiler to simply parse a statement like
long x = 2e12 as long x = 2000000000000 //OK for long
and int y = 2.1234e3 as int y = 2123.4 //warning: loss of precision
It's because when you use the scientific notation you create a floating point number (a double in your example). And you can't assign a floating point to an integer (that would be a narrowing primitive conversion, which is not a valid assignment conversion).
So this would not work either for example:
int y = 2d; //can't convert double to int
You have a few options:
explicitly cast the floating point to an integer: int y = (int) 2e6;
with Java 7+ use a thousand separator: int y = 2_000_000;
Because it's a shortcoming of Java.
(Specifically, there is clearly a set of literals represented by scientific notation that are exactly represented by ints and longs, and it is reasonable to desire a way to express those literals as ints and longs. But, in Java there isn't a way to do that because all scientific notation literals are necessarily floats because of Java's language definition.)
You are asking about the rules on writing a integer literals. See this reference:
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
The capability to use scientific notation as an integer literal might make things easier indeed but has not been implemented. I do not see any technical reason that would prevent such a feature from being implemented.
Background
Suppose I have a RecyclerView, which has items that can only be unique if you look at 2 ids they have, but not just one of them.
The first Id is the primary one. Usually there aren't 2 items that have the same primary ID, but sometimes it might occur, which is why there is a secondary ID.
In my
The problem
The RecyclerView adapter needs to have a "long" type being returned:
https://developer.android.com/reference/android/support/v7/widget/RecyclerView.Adapter.html#getItemId(int)
What I've tried
The easy way to overcome this, is to have a HashMap and a counter.
The HashMap will contain the combined-keys, and the value will be the id that should be returned. The counter is used to generated the next id in case of a new combined key. The combined key can be a "Pair" class in this case.
Suppose each item in the RecyclerView data has 2 long-type keys:
HashMap<Pair<Long,Long>,Long> keyToIdMap=new HashMap();
long idGenerator=0;
this is what to do in getItemId :
Pair<Long,Long> combinedKey=new Pair(item.getPrimaryId(), item.getSecondary());
Long uniqueId=keyToIdMap.get(combinedKey);
if(uniqueId==null)
keyToIdMap.put(combinedKey,uniqueId=idGenerator++);
return uniqueId;
This has the drawback of taking more and more memory. Not much though, and it's very small and proportional to the data you already have, but still...
However, this has the advantage of being able to handle all types of IDs, and you can use even more IDs as you wish (just need something similar to Pair).
Another advantage is that it will use all IDs starting from 0.
The question
Is there perhaps a better way to achieve this?
Maybe a mathematical way? I remember I learned in the past of using prime numbers for similar tasks. Will it work here somehow?
Do the existing primary and secondary ids use the entire 64-bit range of longs? If not then it's possible to compute a unique 64-bit long from their values with e.g bit slicing.
Another approach would be to hash the two together with a hash with very low collisions (a crypto hash like SHA2 for example) and using the first 64 bits of the result. Having a range of 64 bits means you can comfortably have millions of items before the chance of a collision becomes likely - the chance of a collision is 50% when you add sqrt(64)=2**32 items, which is more than 4 billion.
Finally, having an unique independent mapping is very versatile and assuming the map is always accessible it's fine (it gets tricky when you try to synchronize that new id across machine etc). In Java you can attempt to increase performance by avoiding the boxed Longs and a separate Pair instance using a custom map implementation, but that's micro-optimizing.
Example using SHA1:
With Guava - the usage is clean and obvious.
HashFunction hf = Hashing.sha1();
long hashedId = hf.newHasher()
.putLong(primary)
.putLong(secondary)
.hash()
.asLong();
Just the standard JDK, it's pretty horrible and can probably be more efficient, should look something like this (I'm ignoring checked exceptions):
static void updateDigestWithLong(MessageDigest md, long l) {
md.update((byte)l);
md.update((byte)(l >> 8));
md.update((byte)(l >> 16));
md.update((byte)(l >> 24));
}
// this is from the Guava sources, can reimplement if you prefer
static long padToLong(bytes[] bytes) {
long retVal = (bytes[0] & 0xFF);
for (int i = 1; i < Math.min(bytes.length, 8); i++) {
retVal |= (bytes[i] & 0xFFL) << (i * 8);
}
return retVal;
}
static long hashLongsToLong(long primary, long secondary) {
MessageDigest md = MessageDigest.getInstance("SHA-1");
updateDigestWithLong(md, primary);
updateDigestWithLong(md, secondary);
return padToLong(md.digest());
}
I think my original idea is the best one I can think of.
Should cover all possible ids, with least possible collision.
I have programmed for many years, and the issue I am presenting now is probably one of the strangest I have come across.
There is a block of code in my app which randomly generates a sequence of tokens, with three possible types, let's say A, B or C.
So 10 tokens might be ABCCAAABAC.
At the beginning of the block of code, the random number generator seed is initialised like so:
math.randomseed(seed)
math.random()
Now, unsurprisingly when the seed value stays constant, I always get the same sequence of tokens, because the random generation code executes in a deterministic manner. Well, almost always.
Actually, on rare occasions, out of the blue, I will get a different random sequence given the same seed. Then it's back to normal before I know it. You're probably thinking - ah, side effects, this is probably a state related issue whereby the block of code which generates the random sequence of tokens utilises a variable which changes how many times it calls random() (for instance). However, I am 99% certain that I've controlled for all obvious side effects. There's only a few places in the block of code which access the external state, and they all remain constant.
The plot thickens even more - this issue has only been apparent to me on an Android deployment of the app that I've been building. Admittedly, this is a rare bug, and I can't seem to reliably repeat it. So it might also be present in the iOS deployments. But I've yet to come across it on other platforms. I might as well mention that I'm using lua scripting via the Corona SDK to develop the app.
I have given this issue much thought, and have narrowed it down to a few possibilities:
Interaction with another thread which uses the same random number generator, which I'm unaware of
(Is this even possible in lua?) Some sort of heap corruption is leading to strange side effects
I've messed up and there's some damn obvious reference to external state which I've missed throughout many hours of debugging
The most painful aspect of all this is the non-repeatability of the bug. Most of the time the block of code acts perfectly deterministically given a repeated seed. Then it is as though there's a phase of non-determinism, which then dissipates again after some unknown amount of time. I would love to pick the brains of an expert here.
What could be going on here? Also - is it possible there could be anything platform specific going on with this particular issue, since I've only seen it on Android deployments?
For reference, here is the full block of code. It's actually generating tokens with two random properties (one of three colours, and one of three shapes), but that doesn't mean much in terms of the essence of the problem.
math.randomseed(currentRandomSeed)
math.random()
local tokenListPlan = {}
-- randomly assign weighting distribution
local thresh1, thresh2
while (true) do
local s0 = math.random(1, 99)
local s1 = math.random(1, 99)
local c0 = s0
local c1 = s1 - c0
local c2 = 100 - c1 - c0
if (c0 >= eng.DEVIATION_THRESHOLD and c1 >= eng.DEVIATION_THRESHOLD and c2 >= eng.DEVIATION_THRESHOLD) then
thresh1 = c0
thresh2 = c0 + c1
break
end
end
-- generate tokens (deterministic based on seed)
for i = 1, sortedCountTarget do
local token
local c = 1
local rnd = math.random(1, 100)
if (rnd < thresh1) then -- skewed dist
c = 1
elseif (rnd < thresh2) then
c = 2
else
c = 3
end
if (paramGameMode == eng.GAME_MODE_COLOR) then
local rnd46 = math.random(4, 6)
token = {color = c, shape = rnd46}
elseif (paramGameMode == eng.GAME_MODE_SHAPE) then
local rnd13 = math.random(1, 3)
token = {color = rnd13, shape = c + 3}
else
local rnd13 = math.random(1, 3)
local rnd46 = math.random(4, 6)
token = {color = rnd13, shape = rnd46}
end
tokenListPlan[#tokenListPlan + 1] = token
end
https://docs.coronalabs.com/api/library/math/random.html states:
This function is an interface to the simple pseudo-random generator function rand provided by ANSI C. No guarantees can be given for its statistical properties.
This makes me wonder if other programs use the same function.
That could lead to those conflicts, while also more or less explaining why they only happen sometimes.
I'm trying to write a simple ASCII style game with randomly generated world for Android terminal using c4droid IDE. It has C++ support and basically I'm generating array[width][height] tiles using the rule rand()%2 - 1 creates walkable tile, 0 is wall. But there is problem. Each time I'm 'randomly' generating map it looks the same - because rand() isn't really random.
I heard about using entropy created by HDD's or another parts. Problem is I'm using it on android so it is being weird for me to implement as C++ is not being as used as Java so I couldn't find solution on google.
so short question: How can I generate "pretty much real" random numbers using c++ on android?
you need to seed your random number generator with srand(time(NULL)). This allows the computer to use the system time to come up with pseudo-random numbers.
a link to reference: http://www.cplusplus.com/reference/clibrary/cstdlib/srand/
EDIT: it might be smart to note that you only need to seed the rand() function only once, usually at the beginning of the program.
int main()
{
srand(time(NULL)) //only needed to be called ONCE
//code and rand functions afterward
}
I think rand() should work for what you're doing. Are you seeding the random number generator?
srand(time(NULL));
// Should be a different set of numbers each time you run.
for(unsigned i = 0; i < 10; ++i) {
cout << rand() % 2 - 1;
}
I came across through a presentation(dalvik-vm-internals) on Dalvik VM, in that it is mentioned as for the below loops, we have use (2) and (3) and to avoid (7).
(1) for (int i = initializer; i >= 0; i--)
(2) int limit = calculate limit;
for (int i = 0; i < limit; i++)
(3) Type[] array = get array;
for (Type obj : array)
(4) for (int i = 0; i < array.length; i++)
(5) for (int i = 0; i < this.var; i++)
(6) for (int i = 0; i < obj.size(); i++)
(7) Iterable list = get list;
for (Type obj : list)
Comments: i feel that (1) and (2) are the same.
(3)
(4) every time it has to calculate the length of array, so this can be avoided
(5)
(6) same as (4), calculating the size everytime
(7) asked to avoid as list is of an Iterable type??
one more, in case if we have infinite data(assume data is coming as a stream) which loop should we consider for better efficiency?)
request you to please comment on this...
If that's what they recommend, that's what they've optimized the compiler and VM for. The ones you feel are the same aren't necessarily implemented the same way: the compiler can use all sorts of tricks with data and path analysis to avoid naively expensive operations. For instance, the array.length() result can be cached since arrays are immutable.
They're ranked from most to least efficient: but (1) is 'unnatural'. I agree, wouldn't you? The trouble with (7) is that an iterator object is created and has to be GC'ed.
Note carefully when the advice should be heeded. It's clearly intended for bounded iteration over a known collection, not the stream case. It's only relevant if the loop has significant effect on performance and energy consumption ('operating on the computer scale'). The first law of optimization is "Don't optimize". The second law (for experts) is "Don't optimize, yet.". Measure first (both execution times and CPU consumption), optimize later: this applies even to mobile devices.
What you should consider is the preceding slides: try to sleep as often and as long as possible, while responding quickly to changes. How you do that depends on what kind of stream you're dealing with.
Finally, note that the presentation is two years old, and may not fully apply to 2.2 devices where among other things JIT is implemented.
With infinite data, none of the examples are good enough. Best would be to do
for(;;) {
list.poll(); //handle concurrency, in java for example, use a blocking queue
}
1) and 2) are really different. 2) need an extra subtraction to compute i=0 doesn't.
Even better, on most processor (and well optimized code) no is comparison needed for i>=0. The processor can use the the negative flag, resulting for the last decrement (i--).
So the end of the loop -1 looks like (in pseudo assembler)
--i
jump-if-neg
while loop #2
++i
limit-i # set negative flag if i >limit
jump-if-neg
That doesn't make a big difference, except if the code in your loop is really small (like basic C string operation)
That might not work with interpreted languages.