My app uses a proprietary implementation of Canny edge detection based on RenderScript. I tested this on numerous devices with various APIs and it worked very reliably. Now I got the new Samsung S7 working on API23. Here (and only here) I encountered a rather ugly problem. Some of the edge pictures are studded with thousands of artifacts that stem from the Magnitude gradient calculation kernel and are NOT based on actual image information. After trying with all kind of TargetAPIs, taking renderscript.support.mode on and off, etc. I finally found that the problem only arises, when the RenderScript (and Script) instances are used for the second or more times. It does not arise when using them for the first time.
For efficiency reasons I created the RenderScript and Script instances only once in the onCreate method of MainActivity and used it repeatedly thereafter. Of course I don't like to change that.
Does anyone have a solution of this problem? Thanks.
UPDATE: Crazy things are going on here. It seems that freshly created Allocations are NOT empty from the outset! When running:
Type.Builder typeUCHAR1 = new Type.Builder(rs, Element.U8(rs));
typeUCHAR1.setX(width).setY(height);
Allocation alloc = Allocation.createTyped(rs, typeUCHAR1.create());
byte se[] = new byte[width*height];
alloc.copyTo(se);
for (int i=0;i<width*height;i++){
if (se[i]!=0){
Log.e("content: ", String.valueOf(se[i]));
}
}
... the byte Array se is full of funny numbers.... HELP! Any idea, what is going on here?
UPDATE2: I stumbled over my own ignorance here - and really don't deserve a point for this masterpiece.... However, to my defense I must say that the problem was slightly more subtle that it appears here. The context was, that I needed to assign a global allocation (Byte/U8) which initially should be empty (i.e. zero) and then, within the kernel getting partially set to 1 (only where the edges are) via rsSetElementAt_uchar(). Since this worked for many months, I was not aware anymore of the fact, that I didn't explicitely assign the zeros in this allocation.... This only had consequences in API 23, so maybe this can help others not to fall into this trap.... So, note: other than numerical Arrays that are filled with 0 (as by Java default), Allocations cannot assumed to be full of zeros at initiation. Thanks, Sakridge.
Allocation data for primitive types (non-struct/object) is not initialized by default when an Allocation is created unless passed a bitmap using the createFromBitmap api. If you are expecting this then possibly you have a bug in your app which is not exposed when the driver initializes to 0s. It would help if you can post example code which reproduces the problem.
Initialize your allocations by copying from a bitmap or Java array.
Related
I am programming in Kotlin via Android Studio 3.1.3. I created an array of type Long that was apparently too large to compile. After playing around with it for a while, I found that the maximum size array I could get to compile contained 8,207 elements. An array with 8,208 or more elements caused a compilation error. There are 350 lines of elements in the array, which contains prime numbers in numerical order. Two questions:
Does anyone have any idea why this limit would exist? 8,208 is (2^13 + 2^4), but that seems like an odd tipping point. So, I doubt that is the reason for the limitation.
Is there any way to increase the allowed size of the array?
Note: On the Android forum, it was suggested that I use ArrayList instead of ArrayLong. I appreciate that suggestion and intend to try it, but the limitation on a Long Array still seems odd to me. If anyone has a more elegant solution or an explanation for the limit, I would love to hear it! Thank you for your time.
So, what you're trying to do is something like:
var a = longArrayOf(1,2,3,4,5,6,7,8...)
There's a limitation by JVM. Maximum size of method is 64K.
If you decompile your code, you'll receive something like that for each element in the array:
DUP
SIPUSH 8206
LDC 8207
LASTORE
And that's where you hit the limit.
I want to combine two Renderscript scripts into a scriptGroup. The first one is the ScriptIntrinsicBlur. Based on the blurred U8 allocation as input the second script calculates two things: gradient and gradient-direction. The latter is the formal out-Allocation of the second kernel. The first one is a global allocation filled via rsSetElementAt_float(). Now, I find this second allocation is returned empty after execution of the scriptGroup.
Question: Is my assumption correct that with a scriptGroup you cannot use script globals - or at least not change them via rsSetElementAt_(...)?
UPDATE: I realized that the performance gain by using U8 both as output of the ScriptIntrinsicBlur and as input of the proprietary kernel is already more than satisfactory, even in a simple sequential set-up of both scripts. This is primarily because it avoids to copyTo the ScriptIntrinsicBlur's out-Allocation first into a Java-array before passing it as a separate input-allocation to the 2nd kernel.
Before, I used U8_4 (i.e. Bitmap equivalent) as output of ScriptIntrinsicBlur, and then converted it to a one-dimensional greyscale int[] array, before passing it as in-Allocation to the proprietary kernel... Now I convert to greyscale byte[] (i.e. U8) already before entering the allocation into ScriptIntrinsicBlur and use U8 also as input for the 2nd kernel.
This is what I realize again and again when working with RS: it is really worth to simplify data flows to the extent possible, the speed gains are fantastic. (maybe I will check the Scriptgroup question at a later stage, as for now I am happy with the result).
There should be no issue with using a script global like this. It's not as efficient as the output allocation, but is possible. You mentioned the out allocation is empty, what are you seeing in the script global?
Background
I'm new to renderscript, and I would like to try some experiments with it (but small ones and not the complex ones we find in the SDK), so I thought of an exercise to try out, which is based on a previous question of mine (using NDK).
What I want to do
In short, I would like to pass a bitmap data to renderscript, and then I would like it to copy the data to another bitmap that has the dimensions opposite to the previous one, so that the second bitmap would be a rotation of the first one.
For illustration:
From this bitmap (width:2 , height:4):
01
23
45
67
I would like it to rotate (counter clock-wise of 90 degrees) to:
1357
0246
The problem
I've noticed that when I try to change the signature of the root function, Eclipse gives me errors about it.
Even making new functions creates new errors. I've even tried the same code written on Google's blog (here ), but I couldn't find out how he got to create the functions he used, and how come I can't change the filter function to have the input and output bitmap arrays.
What can I do in order to customize the parameters I send to renderscript, and use the data inside it?
Is it ok not to use "filter" or "root" functions (API 11 and above)? What can I do in order to have more flexibility about what I can do there?
You are asking a bunch of separate questions here, so I will answer them in order.
1) You want to rotate a non-square bitmap. Unfortunately, the bitmap model for Renderscript won't allow you to do this easily. The reason for this is that that input and output allocations must have the same shape (i.e. same number of dimensions and values of those dimensions, even if the Types are different). In order to get the effect you want, you should use a root function that only has an output allocation of the new shape (i.e. input columns X input rows). You can create an rs_allocation global variable for holding your input bitmap (which you can then create/bind on the Java side). The kernel then merely needs to set the output cell to the result of rsGetElementAt(globalInAlloc, y, x).
2) If you are using API 11, you can't adjust the signature of the root() function (you can pass NULL allocations as input, output on the Java side if you are not using them). You also can't create more than 1 kernel per source file on these older API levels, so you are forced to only have a single "root()" function. If you want to use more kernels per source file, consider targeting a higher API level.
In a game I need to keeps tabs of which of my pooled sprites are in use. When "active" multiple sprites at once I want to transfer them from my passivePool to activePool both of which are immutable HashSets (ok, i'll be creating new sets each time to be exact). So my basic idea is to along the lines of:
activePool ++= passivePool.take(5)
passivePool = passivePool.drop(5)
but reading the scala documentation I'm guessing that the 5 that I take might be different that the 5 I then drop. Which is definitely not what I want. I could also say something like:
val moved = passivePool.take(5)
activePool ++= moved
passivePool --= moved
but as this is something I need to do pretty much every frame in realtime on a limited device (Android phone) I guess this would be much slower as I will have to search one by one each of the moved sprites from the passivePool.
Any clever solutions? Or am I missing something basic? Remember the efficiency is a primary concern here. And I can't use Lists instead of Sets because I also need random-access removal of sprites from activePools when the sprites are destroyed in the game.
There's nothing like benchmarking for getting answers to these questions. Let's take 100 sets of size 1000 and drop them 5 at a time until they're empty, and see how long it takes.
passivePool.take(5); passivePool.drop(5) // 2.5 s
passivePool.splitAt(5) // 2.4 s
val a = passivePool.take(5); passivePool --= a // 0.042 s
repeat(5){ val a = passivePool.head; passivePool -= a } // 0.020 s
What is going on?
The reason things work this way is that immutable.HashSet is built as a hash trie with optimized (effectively O(1)) add and remove operations, but many of the other methods are not re-implemented; instead, they are inherited from collections that don't support add/remove and therefore can't get the efficient methods for free. They therefore mostly rebuild the entire hash set from scratch. Unless your hash set has only a handful of elements in it, this is bad idea. (In contrast to the 50-100x slowdown with sets of size 1000, a set of size 100 has "only" a 6-10x slowdown....)
So, bottom line: until the library is improved, do it the "inefficient" way. You'll be vastly faster.
I think there may be some mileage in using splitAt here, which will give you back both the five sprites to move and the trimmed pool in a single method invocation:
val (moved, newPassivePool) = passivePool.splitAt(5)
activePool ++= moved
passivePool = newPassivePool
Bonus points if you can assign directly back to passivePool on the first line, though I don't think it's possible in a short example where you're defining the new variable moved as well.
I'm very new to Android and I'm writing my first app which consists of drawing STL data (a bunch of triangles).
For a small number of triangles (~ 4000), my app works great. But as soon as I try to load large data (~ 100000 triangles), I get memory allocation problems. Here is basically what I do:
I am using a GLSurfaceView for my rendering
a) I read in my data, creating a list of triangles (3 * X/Y/Z and a normal vector for each triangle)
b) I create ByteBuffers using allocateDirect() for the vertex data, the normals and the indices.
c) I add the data into the ByteBuffers
d) I call glVertexPointer, glNormalPointer and I assign the bytebuffers
e) I call glDrawElements() using the indexBuffer
As soon as I try to allocate around 10MB by calling allocateDirect(), my application crashes. I tried to calll allocate() instead, which works, but then the glVertexPointer method crashes (even for a small number of triangles)
Am I doing something wrong?
Also, do I have to call glVertexPointer, glNormalPointer every time I redraw or is it enough to call it in the surfaceChanged method?
Thanks a lot,
Mark
You should only create the VBOs once. After that if you need to load new data into them you do it in local arrays, and then load it in to the VBOs all at once using a "put" call.
The largest type that you can use for the index VBO is "short", so you can only draw 64k vertexes at a time. If you are drawing 100k vertexes, you will have to do it in two passes.
Edit: regarding your memory limitations, find out what the heap limit is for your device. The heap limit can be 16, 24, 32, or 48 MB. You can use the "android:largeHeap="true"" option to greatly increase the limits, but I'm not sure if that option is available for pre-Honeycomb versions of Android or not.
I'm not expert with 3D rendering, but 10MB of vertex data strikes me as quite a lot, especially for a mobile application, so I'm not surprised you're encountering memory allocation issues! Even if they did successfully get allocated, you'd likely find it unusable due to poor performance.
Does the entire scene really need to be in video memory all the time? You may want to explore techniques to cull vertices that are far away from the camera and only upload/draw what is necessary.