RenderScript ScriptGroups - getting Output from script global in second Kernel - android

I want to combine two Renderscript scripts into a scriptGroup. The first one is the ScriptIntrinsicBlur. Based on the blurred U8 allocation as input the second script calculates two things: gradient and gradient-direction. The latter is the formal out-Allocation of the second kernel. The first one is a global allocation filled via rsSetElementAt_float(). Now, I find this second allocation is returned empty after execution of the scriptGroup.
Question: Is my assumption correct that with a scriptGroup you cannot use script globals - or at least not change them via rsSetElementAt_(...)?
UPDATE: I realized that the performance gain by using U8 both as output of the ScriptIntrinsicBlur and as input of the proprietary kernel is already more than satisfactory, even in a simple sequential set-up of both scripts. This is primarily because it avoids to copyTo the ScriptIntrinsicBlur's out-Allocation first into a Java-array before passing it as a separate input-allocation to the 2nd kernel.
Before, I used U8_4 (i.e. Bitmap equivalent) as output of ScriptIntrinsicBlur, and then converted it to a one-dimensional greyscale int[] array, before passing it as in-Allocation to the proprietary kernel... Now I convert to greyscale byte[] (i.e. U8) already before entering the allocation into ScriptIntrinsicBlur and use U8 also as input for the 2nd kernel.
This is what I realize again and again when working with RS: it is really worth to simplify data flows to the extent possible, the speed gains are fantastic. (maybe I will check the Scriptgroup question at a later stage, as for now I am happy with the result).

There should be no issue with using a script global like this. It's not as efficient as the output allocation, but is possible. You mentioned the out allocation is empty, what are you seeing in the script global?

Related

RenderScript instability encountered in API 23

My app uses a proprietary implementation of Canny edge detection based on RenderScript. I tested this on numerous devices with various APIs and it worked very reliably. Now I got the new Samsung S7 working on API23. Here (and only here) I encountered a rather ugly problem. Some of the edge pictures are studded with thousands of artifacts that stem from the Magnitude gradient calculation kernel and are NOT based on actual image information. After trying with all kind of TargetAPIs, taking renderscript.support.mode on and off, etc. I finally found that the problem only arises, when the RenderScript (and Script) instances are used for the second or more times. It does not arise when using them for the first time.
For efficiency reasons I created the RenderScript and Script instances only once in the onCreate method of MainActivity and used it repeatedly thereafter. Of course I don't like to change that.
Does anyone have a solution of this problem? Thanks.
UPDATE: Crazy things are going on here. It seems that freshly created Allocations are NOT empty from the outset! When running:
Type.Builder typeUCHAR1 = new Type.Builder(rs, Element.U8(rs));
typeUCHAR1.setX(width).setY(height);
Allocation alloc = Allocation.createTyped(rs, typeUCHAR1.create());
byte se[] = new byte[width*height];
alloc.copyTo(se);
for (int i=0;i<width*height;i++){
if (se[i]!=0){
Log.e("content: ", String.valueOf(se[i]));
}
}
... the byte Array se is full of funny numbers.... HELP! Any idea, what is going on here?
UPDATE2: I stumbled over my own ignorance here - and really don't deserve a point for this masterpiece.... However, to my defense I must say that the problem was slightly more subtle that it appears here. The context was, that I needed to assign a global allocation (Byte/U8) which initially should be empty (i.e. zero) and then, within the kernel getting partially set to 1 (only where the edges are) via rsSetElementAt_uchar(). Since this worked for many months, I was not aware anymore of the fact, that I didn't explicitely assign the zeros in this allocation.... This only had consequences in API 23, so maybe this can help others not to fall into this trap.... So, note: other than numerical Arrays that are filled with 0 (as by Java default), Allocations cannot assumed to be full of zeros at initiation. Thanks, Sakridge.
Allocation data for primitive types (non-struct/object) is not initialized by default when an Allocation is created unless passed a bitmap using the createFromBitmap api. If you are expecting this then possibly you have a bug in your app which is not exposed when the driver initializes to 0s. It would help if you can post example code which reproduces the problem.
Initialize your allocations by copying from a bitmap or Java array.

What are the available kernel-functions that you can create on Renderscript?

Background
I'm learning how to use Renderscript, and I found this part in the docs:
In most respects, this is identical to a standard C function. The
first notable feature is the attribute((kernel)) applied to the
function prototype.
and they show a sample code of a kernel function:
uchar4 __attribute__((kernel)) invert(uchar4 in, uint32_t x, uint32_t y) {
uchar4 out = in;
out.r = 255 - in.r;
out.g = 255 - in.g;
out.b = 255 - in.b;
return out;
}
The problem
It seems that some samples show that the parameters of kernel functions can be different, and not only those that appear above.
Example:
uchar4 __attribute__((kernel)) grayscale(uchar4 v_in) {
float4 f4 = rsUnpackColor8888(v_in);
float3 mono = dot(f4.rgb, gMonoMult);
return rsPackColorTo8888(mono);
}
Thing is, the generated function on Java is still the same for all of those functions :
void forEach_FUNCTIONNAME(Allocation ain, Allocation aout)
where FUNCTIONNAME is the name of the function on RS.
So I assume that not every possible function can be a kernel function, and all of them need to follow some rules (besides the "attribute(kernel)" part, which needs to be added).
Yet I can't find those rules.
Only things I found is on the docs:
A kernel may have an input Allocation, an output Allocation, or both.
A kernel may not have more than one input or one output Allocation. If
more than one input or output is required, those objects should be
bound to rs_allocation script globals and accessed from a kernel or
invokable function via rsGetElementAt_type() or rsSetElementAt_type().
A kernel may access the coordinates of the current execution using the
x, y, and z arguments. These arguments are optional, but the type of
the coordinate arguments must be uint32_t.
The questions
What are the rules for creating kernel functions, besides what's written?
Which other parameters are allowed? I mean, what other parameters can I pass? Is it only those 2 "templates" of functions that I can use, or can I use other kernel-functions that have other sets of parameters?
Is there a list of valid kernel functions? One that shows which parameters sets are allowed?
Is it possible for me to customize those kernel functions, to have more parameters? For example, if I had a blurring function (I know we have a built in one) that I made, I could set the radius and the blurring algorithm.
Basically all of those questions are about the same
There really aren't that many rules. You have to have either an input and/or an output, because kernels are executed over the range present there (i.e. you have a 2-D Allocation with x=200, y=400 - it will execute on each cell of input/output). We do support an Allocation-less launch, but it is only available in the latest Android release, and thus not usable on most devices. We also support multi-input as of Android M, but earlier target APIs won't build with that (unless you are using the compatibility library).
Parameters are usually primitive types (char, int, unsigned int, long, float, double, ...) or vector types (e.g. float4, int2, ...). You can also use structures, provided that they don't contain pointers in their definition. You cannot use pointer types unless you are using the legacy kernel API, but even then, you are limited to a single pointer to a non-pointer piece of data. https://android.googlesource.com/platform/cts/+/master/tests/tests/renderscript/src/android/renderscript/cts/kernel_all.rs has a lot of simple kernels that we use for trivial testing. It shows how to combine most of the types.
You can optionally include the rs_kernel_context parameter (which lets you look up information about the size of the launch). You can also optionally pass x, y, and/or z (with uint32_t type each) to get the actual indices on which the current execution is happening. Each x/y/z coordinate will be unique for a single launch, letting you know what cell is being operated on.
For your question 4, you can't use a radius the way that you want to. It would have to be a global variable as input, since our only kernel inputs traditionally vary as you go from cell to cell of the input/output Allocations. You can look at https://android.googlesource.com/platform/cts/+/master/tests/tests/renderscript/src/android/renderscript/cts/intrinsic_blur.rs for an example about blur specifically.
Just some keypoints with which I was struggling, when I started to learn RS. Basically the yellow texts above include all RS wisdom, but in a "too compact" way to understand. In order to answer your questions 1 and 2 you have to differentiate between two types of allocations. The first type of allocations I call the "formal" allocations. In the kernel expression
uchar4 __attribute__((kernel)) invert(uchar4 in, uint32_t x, uint32_t y) {
this are the Input allocation in (of type uchar4, i.e. 8 bit unsigned integer) and the Output allocation out which is also uchar4 - this is the type you can see on the left hand side of the kernel expression. The output is what will be given back via "return", same as in Java functions. You need at least one formal allocation (i.e. one Input OR one Output OR both of them).
The other type of allocations I call "side Allocation". This is what you handle via script globals, and these can be as well input or output allocations. If you use them as input, you will pass the input from Java side via copyTo(). If If you use them as output, you will get the output to Java side via copyFrom().
Now, the point is that, although you need at least one formal allocation, there is no qualitative difference between the formal and the side allocations, the only thing you need to care is that you use at least one formal allocation.
All allocations in the kernel (whether "formal" or "side") have the same dimensions in terms of width and height.
Question 3 is implicitely answered by 1 and 2.
only formal Input allocation,
only formal Output allocation,
both formal Input and formal Output allocations
1.-3. can each have any number of additional "side" allocations.
Question 4: Yes. In your Gauss example, if you want to pass the radius of blur (e.g. 1-100) or the blurring algorithm (e.g. types 1,2 and 3) you would simply use one global variable for each of these, so that they can be applied within the kernel. Here I would not speak about "allocation" in the above sense since those are always of the same dimension as the grid spanned by the kernel (typically x width times y height). Nevertheless you still need to pass these Parameters via script_setxxx(yyy).
Hope this helps a bit.

Android RenderScript copy allocation in rs file

I'm passing an allocation created from Bitmap into rs file,
and inside the script I'm trying to copy the allocation to a new one using the rsAllocationCopy2DRange function, but I get force close when I'm trying to run the app.
Can someone please explain how to use the function correctly, and what exactly are the arguments it gets.
I looked in the reference site: http://developer.android.com/reference/renderscript/rs__allocation_8rsh.html#a7f7e2369b3ed7d7db31729b6db7ba07e
but I still don't know what are the dstMip and dstFace and how to get them.
Edit: I want to implement sobel operator, and in the implementation I need to use negative values after convolution with kernel, which is not possible using the build in allocation created from Bitmap and build in convove3x3, because the allocation is using uchar4. So I thought to implement convolution inside separate script so I can use the negative before it stored back to allocation. And I want to be able to pass only one allocation and kernel matrix, and inside the script I want to create new output allocation from the input allocation, and make the convolution on it and than copy the result back to the input allocation. I don't want to create the output allocation outside the script, in the java code, I want the entire process to be as transparent as possible, without the need to add objects that are unknown to the user.

How to customize parameters used on renderscript root function?

Background
I'm new to renderscript, and I would like to try some experiments with it (but small ones and not the complex ones we find in the SDK), so I thought of an exercise to try out, which is based on a previous question of mine (using NDK).
What I want to do
In short, I would like to pass a bitmap data to renderscript, and then I would like it to copy the data to another bitmap that has the dimensions opposite to the previous one, so that the second bitmap would be a rotation of the first one.
For illustration:
From this bitmap (width:2 , height:4):
01
23
45
67
I would like it to rotate (counter clock-wise of 90 degrees) to:
1357
0246
The problem
I've noticed that when I try to change the signature of the root function, Eclipse gives me errors about it.
Even making new functions creates new errors. I've even tried the same code written on Google's blog (here ), but I couldn't find out how he got to create the functions he used, and how come I can't change the filter function to have the input and output bitmap arrays.
What can I do in order to customize the parameters I send to renderscript, and use the data inside it?
Is it ok not to use "filter" or "root" functions (API 11 and above)? What can I do in order to have more flexibility about what I can do there?
You are asking a bunch of separate questions here, so I will answer them in order.
1) You want to rotate a non-square bitmap. Unfortunately, the bitmap model for Renderscript won't allow you to do this easily. The reason for this is that that input and output allocations must have the same shape (i.e. same number of dimensions and values of those dimensions, even if the Types are different). In order to get the effect you want, you should use a root function that only has an output allocation of the new shape (i.e. input columns X input rows). You can create an rs_allocation global variable for holding your input bitmap (which you can then create/bind on the Java side). The kernel then merely needs to set the output cell to the result of rsGetElementAt(globalInAlloc, y, x).
2) If you are using API 11, you can't adjust the signature of the root() function (you can pass NULL allocations as input, output on the Java side if you are not using them). You also can't create more than 1 kernel per source file on these older API levels, so you are forced to only have a single "root()" function. If you want to use more kernels per source file, consider targeting a higher API level.

OpenGL Memory allocation problems for drawing

I'm very new to Android and I'm writing my first app which consists of drawing STL data (a bunch of triangles).
For a small number of triangles (~ 4000), my app works great. But as soon as I try to load large data (~ 100000 triangles), I get memory allocation problems. Here is basically what I do:
I am using a GLSurfaceView for my rendering
a) I read in my data, creating a list of triangles (3 * X/Y/Z and a normal vector for each triangle)
b) I create ByteBuffers using allocateDirect() for the vertex data, the normals and the indices.
c) I add the data into the ByteBuffers
d) I call glVertexPointer, glNormalPointer and I assign the bytebuffers
e) I call glDrawElements() using the indexBuffer
As soon as I try to allocate around 10MB by calling allocateDirect(), my application crashes. I tried to calll allocate() instead, which works, but then the glVertexPointer method crashes (even for a small number of triangles)
Am I doing something wrong?
Also, do I have to call glVertexPointer, glNormalPointer every time I redraw or is it enough to call it in the surfaceChanged method?
Thanks a lot,
Mark
You should only create the VBOs once. After that if you need to load new data into them you do it in local arrays, and then load it in to the VBOs all at once using a "put" call.
The largest type that you can use for the index VBO is "short", so you can only draw 64k vertexes at a time. If you are drawing 100k vertexes, you will have to do it in two passes.
Edit: regarding your memory limitations, find out what the heap limit is for your device. The heap limit can be 16, 24, 32, or 48 MB. You can use the "android:largeHeap="true"" option to greatly increase the limits, but I'm not sure if that option is available for pre-Honeycomb versions of Android or not.
I'm not expert with 3D rendering, but 10MB of vertex data strikes me as quite a lot, especially for a mobile application, so I'm not surprised you're encountering memory allocation issues! Even if they did successfully get allocated, you'd likely find it unusable due to poor performance.
Does the entire scene really need to be in video memory all the time? You may want to explore techniques to cull vertices that are far away from the camera and only upload/draw what is necessary.

Categories

Resources