How work properly with array allocations in RenderScript

How work properly with array allocations in RenderScript - android

I've been working around with RenderScript for a few days, but I can't figure out how properly pass an array from Java to RenderScript. I saw some examples but none of them worked for me and I'm getting stuck with the lack of documentation.
In this code I'm trying to do some checks between bb and coords array for each index in that root() receives:
RenderScript code:
#pragma version(1)
#pragma rs java_package_name(com.me.example)
int4 bb;
rs_allocation coords;
void __attribute__((kernel)) root(int32_t in)
{
int index = in;
if(bb[index] > rsGetElementAt_int(coords, index))
{
if(bb[index + 1] > rsGetElementAt_int(coords, index + 1))
{
//do something
}
}
}
Java code:
RenderScript mRS = RenderScript.create(this);
ScriptC_script script = new ScriptC_script(mRS, getResources(), R.raw.match);
// This arrays comes with data from another place
int[] indices;
int[] coords;
// Create allocations
Allocation mIN = Allocation.createSized(mRS, Element.I32(mRS), indices.length);
Allocation mOUT = Allocation.createSized(mRS, Element.I32(mRS), indices.length);
Allocation coordsAlloc = Allocation.createSized(mRS, Element.I32(mRS), coords.length);
// Fill it with data
mIN.copyFrom(indices);
coordsAlloc.copyFrom(coords);
// Set the data array
script.set_coords(coordsAlloc);
// Create bb and run
script.set_bb(new Int4 (x, y, width, height));
script.forEach_root(mIN);
When I execute it I get this error on set_coords() statement:
Script::setVar unable to set allocation, invalid slot index
And program exits:
Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1), thread 12274 ...

The issue is that rs_allocation is an opaque handle and not a primitive or custom structure the reflected set_*() methods understand. Change your RS code to make coords an int32_t * and have a second type for the length:
int32_t *coords;
int32_t coordsLen;
...
void attribute((kernel)) root(int32_t in)
{
int index = in;
if(bb[index] > coords[index])
{
if(bb[index + 1] > coords[index + 1])
{
//do something
}
}
}
Then in your Java code, you create the allocation but now have to set the coordLen using the reflection method set_coordsLen() (so you can properly bounds check in your RS code, not shown here) then you have to bind the Java side array to the RS allocation:
...
Allocation coordsAlloc = Allocation.createSized(mRS, Element.I32(mRS), coords.length);
// Fill it with data
mIN.copyFrom(indices);
coordsAlloc.copyFrom(coords);
// Set the data array
script.set_coordsLen(coords.length);
script.bind_coords(coordsAlloc);
// Create bb and run
script.set_bb(new Int4 (x, y, width, height));
script.forEach_root(mIN);

Thanks for your reply Larry. I tried your approach and I got a new error in set_coordsLen() statement.
Script::setSlot unable to set allocation, invalid slot index
So I starting to think that I must have another problem in my script. I checked everything again from the beginning and I found the problem in another class. I was creating my script wrong, from another .rs file (copypaste fail):
ScriptC_script exmple = new ScriptC_script(mRS, getResources(), R.raw.exmple);
ScriptC_script script = new ScriptC_script(mRS, getResources(), R.raw.exmple);
Instead of:
ScriptC_script exmple = new ScriptC_script(mRS, getResources(), R.raw.exmple);
ScriptC_script script = new ScriptC_script(mRS, getResources(), R.raw.script);
I was getting those errors because I was setting allocations on inexistent variables. This little (and shameful) mistake cost me too many hours of frustration. Still weird that Eclipse let me invoke that sets methods.
I tried both ways of passing the array and both worked this time. I prefer yours though.
P.D: I watched your AnDevCon presentation video. Double thanks for share your knowledge about RS.

Related

Optimizing renderscript summation of row and column cells

Any advise in optimizing the following code? The code first grayscales, inverts and then thresholds the image (code not included, because it is trivial). It then sums the elements of each row and column (all elements are either 1 or 0). It then finds the row and column index of the row and column with the highest value.
The code is supposed to find the centroid of the image and it works, but I want to make it faster
I'm developing for API 23, so a reduction kernel can not be used.
Java snippet:
private int[] sumValueY = new int[640];
private int[] sumValueX = new int[480];
rows_indices_alloc = Allocation.createSized( rs, Element.I32(rs), height, Allocation.USAGE_SCRIPT);
col_indices_alloc = Allocation.createSized( rs, Element.I32(rs), width, Allocation.USAGE_SCRIPT);
public RenderscriptProcessor(RenderScript rs, int width, int height)
{
mScript.set_gIn(mIntermAllocation);
mScript.forEach_detectX(rows_indices_alloc);
mScript.forEach_detectY(col_indices_alloc);
rows_indices_alloc.copyTo(sumValueX);
col_indices_alloc.copyTo(sumValueY);
}
Renderscript.rs snippet:
#pragma version(1)
#pragma rs java_package_name(org.gearvrf.renderscript)
#include "rs_debug.rsh"
#pragma rs_fp_relaxed
const int mImageWidth=640;
const int mImageHeight=480;
int32_t maxsX=-1;
int32_t maxIndexX;
int32_t maxsY=-1;
int32_t maxIndexY;
rs_allocation gIn;
void detectX(int32_t v_in, int32_t x, int32_t y) {
int32_t sum=0;
for ( int i = 0; i < (mImageWidth); i++) {
float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, i, x));
sum+=(int)f4.r;
}
if((sum>maxsX)){
maxsX=sum;
maxIndexX = x;
}
}
void detectY(int32_t v_in, int32_t x, int32_t y) {
int32_t sum=0;
for ( int i = 0; i < (mImageHeight); i++) {
float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, x, i));
sum+=(int)f4.r;
}
if((sum>maxsY)){
maxsY=sum;
maxIndexY = x;
}
}
Any help would be appreciated

float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, x, i));
sum+=(int)f4.r;
This converts from int to float and then back to int again. I think you can simplify by just doing this:
sum += rsGetElementAt_uchar4(gIn, x, i).r;
I don't know exactly how your previous stages work because you haven't posted them, but you should try generating packed values to read here. So either put your grayscale channels in .rgba or use a single channel format and then use rsAllocationVLoad_uchar4 to fetch 4 values at once.
Also, try combining previous stages with this one, if you don't need the intermediate results of those calculations it may be cheaper to do the memory load once and then do those transformations in registers.
You might also play with how many values your threads operate on. You could try having each kernel processing width/2, width/4, width/8 elements and see how they perform. This will give GPUs more threads to play with especially on lower-resolution images but with the trade off of having more reduction steps.
You also have a multiple-writers race condition on the maxsX/maxsY and maxIndexX/maxIndexY variables. All those writes need to use atomics if you care about the exact right answer. I think maybe you posted the wrong code because you don't store to the *_indices_alloc but you copy from them at the end. So, actually you should store all the sums to those and then use either a single threaded function or a kernel with atomics to get the absolute max and max index.

How to use half precision in renderscript?

I use the following code to transfer an array of float numbers to a renderscript kernel:
float[] bufName = new float[3];
bufName [0] = 255;
bufName [1] = 255;
bufName [2] = 0;
Allocation alloc1 = Allocation.createSized(mRs, Element.F32(mRs), 3);
alloc1.copy1DRangeFrom(0, 3, mtmd);
ScriptC_foo foo = new ScriptC_foo(mRs);
foo.set_gIn(alloc1);
And I have defined gIn in the foo.rs file as follows:
rs_allocation gIn;
I would like to work with 16 bit floating point numbers. I know that I should change the allocation creation to this:
Allocation alloc1 = Allocation.createSized(mRs, Element.F16(mRs), 3);
However, I cannot find a solution for copying the bufName array to the allocation. Any help is appreciated.

Java does not define half-precision floats, so you'll have to do your own manipulation to get this to work. If you use Float.valueOf(f).shortValue() (where f is the specific flow you'd like to represent as half-precision). This should properly cast down the float to the new bit size. Then create an Allocation of Element.F16 size. You should be able to copy an array of short values down to RenderScript to do the work.

Can I set input and output allocations on Renderscript to be of different sizes/dimensions?

Background
I'm trying to learn Renderscript, so I wish to try to do some simple operations that I think about.
The problem
I thought of rotating a bitmap, which is something that's simple enough to manage.
on C/C++, it's a simple thing to do (search for "jniRotateBitmapCw90") :
https://github.com/AndroidDeveloperLB/AndroidJniBitmapOperations/blob/master/JniBitmapOperationsLibrary/jni/JniBitmapOperationsLibrary.cpp
Thing is, when I try this on Renderscript, I get this error:
android.support.v8.renderscript.RSRuntimeException: Dimension mismatch
between parameters ain and aout!
Here's what I do:
RS:
void rotate90CW(const uchar4 *in, uchar4 *out, uint32_t x, uint32_t y) {
// XY. ..X ... ...
// ...>..Y>...>Y..
// ... ... .YX X..
out[...]=in[...] ...
}
Java:
mRenderScript = RenderScript.create(this);
mInBitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sample_photo);
mOutBitmap = Bitmap.createBitmap(mInBitmap.getHeight(), mInBitmap.getWidth(), mInBitmap.getConfig());
final Allocation input = Allocation.createFromBitmap(mRenderScript, mInBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
final Allocation output = Allocation.createFromBitmap(mRenderScript, mOutBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
ScriptC_test script = new ScriptC_test(mRenderScript, getResources(), R.raw.test);
...
script.forEach_rotate90CW(input, output);
output.copyTo(mOutBitmap);
Even when I do set both allocations to be of the same size (squared bitmap), and I just set the output to be the input:
out[width * y + x] = in[width * y+x];
then what I get is a bitmap with holes... How come?
This is what I get:
The questions
Does this mean I can't do this kind of operation?
Does it mean that I can't use allocations of various sizes/dimensions?
Is it possible to overcome this issue (and still use Renderscript, of course) ? If so, how?
Maybe I could add an array variable inside the RS side, and set the allocation to it, instead?
Why do I get holes in the bitmap, for the case of a square input&output?
EDIT:This is my current code:
RS
rs_allocation *in;
uchar4 attribute((kernel)) rotate90CW(uint32_t x, uint32_t y){
// XY. ..X ... ...
// ...>..Y>...>Y..
// ... ... .YX X..
uchar4 curIn =rsGetElementAt_uchar4(in, 0, 0);
return curIn; //just for testing...
}
Java:
mRenderScript = RenderScript.create(this);
mInBitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sample_photo);
mOutBitmap = Bitmap.createBitmap(mInBitmap.getHeight(), mInBitmap.getWidth(), mInBitmap.getConfig());
final Allocation input = Allocation.createFromBitmap(mRenderScript, mInBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
final Allocation output = Allocation.createFromBitmap(mRenderScript, mOutBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
ScriptC_test script = new ScriptC_test(mRenderScript, getResources(), R.raw.test);
script.bind_in(input);
script.forEach_rotate90CW(output);
output.copyTo(mOutBitmap);
mImageView.setImageBitmap(mOutBitmap);

Here goes:
Does this mean I can't do this kind of operation?
No, not really. You just have to craft things correctly.
Does it mean that I can't use allocations of various sizes/dimensions?
No, but it does mean you can't use different size allocations in the way you currently are doing things. The default kernel in/out mechanism expects the input and output sizes to match so it can iterate over all of the elements correctly. If you need something different, it's up to you to manage it. More on that below.
Is it possible to overcome this issues...how?
The easiest solution would be to create an Allocation for input and bind it to the renderscript instance rather than pass it as a parameter. Then your RS would only need an output allocation (and your kernel only take output, x and y). From there you can determine which coordinate within the input allocation you want and place it directly into the output location:
int inX = ...;
int inY = ...;
uchar4 curIn = rsGetElementAt_uchar4(inAlloc, inX, inY);
*out = curIn;
Why do I get holes in the bitmap, for the case of a square input&output?
It's because you cannot use the x and y parameters to offset into the input and output allocation. Those in/out parameters are already pointing to the correct (same) location in both the input and output. The indexing you're doing is unnecessary and not really supported. Each time your kernel is called, it is being called for 1 element location within the allocation. This is why the input and output sizes must be the same when provided as parameters.

This should solve your problem
RS
rs_allocation *in;
uchar4 attribute((kernel)) rotate90CW(uint32_t x, uint32_t y){
...
uchar4 curIn =rsGetElementAt_uchar4(in, x, y);
return curIn;
}

How to use RenderScript with multiple input allocations?

Recently, I found render script is a better choice for image processing on Android. The performance is wonderful. But there are not many documents on it. I am wondering if I can merge multiple photos into a result photo by render script.
http://developer.android.com/guide/topics/renderscript/compute.html says:
A kernel may have an input Allocation, an output Allocation, or both. A kernel may not have more than one input or one output Allocation. If more than one input or output is required, those objects should be bound to rs_allocation script globals and accessed from a kernel or invokable function via rsGetElementAt_type() or rsSetElementAt_type().
Is there any code example for this issue?

For the kernel with multiple inputs you would have to manually handle additional inputs.
Let's say you want 2 inputs.
example.rs:
rs_allocation extra_alloc;
uchar4 __attribute__((kernel)) kernel(uchar4 i1, uint32_t x, uint32_t y)
{
// Manually getting current element from the extra input
uchar4 i2 = rsGetElementAt_uchar4(extra_alloc, x, y);
// Now process i1 and i2 and generate out
uchar4 out = ...;
return out;
}
Java:
Bitmap bitmapIn = ...;
Bitmap bitmapInExtra = ...;
Bitmap bitmapOut = Bitmap.createBitmap(bitmapIn.getWidth(),
bitmapIn.getHeight(), bitmapIn.getConfig());
RenderScript rs = RenderScript.create(this);
ScriptC_example script = new ScriptC_example(rs);
Allocation inAllocation = Allocation.createFromBitmap(rs, bitmapIn);
Allocation inAllocationExtra = Allocation.createFromBitmap(rs, bitmapInExtra);
Allocation outAllocation = Allocation.createFromBitmap(rs, bitmapOut);
// Execute this kernel on two inputs
script.set_extra_alloc(inAllocationExtra);
script.forEach_kernel(inAllocation, outAllocation);
// Get the data back into bitmap
outAllocation.copyTo(bitmapOut);

you want to do something like
rs_allocation input1;
rs_allocation input2;
uchar4 __attribute__((kernel)) kernel() {
... // body of kernel goes here
uchar4 out = ...;
return out;
}
Call set_input1 and set_input2 from your Java code to set those to the appropriate Allocations, then call forEach_kernel with your output Allocation.

This is how you do it :
in the .rs file :
uchar4 RS_KERNEL myKernel(float4 in1, int in2, uint32_t x, uint32_t y)
{
//My code
}
in java :
myScript.forEach_myKernel(allocationInput1, allocationInput2, allocationOutput);
uchar4, float4, and int are used as example. It works for me, you can add more than 2 inputs.

Access to raw data in ARGB_8888 Android Bitmap

I am trying to access the raw data of a Bitmap in ARGB_8888 format on Android, using the copyPixelsToBuffer and copyPixelsFromBuffer methods. However, invocation of those calls seems to always apply the alpha channel to the rgb channels. I need the raw data in a byte[] or similar (to pass through JNI; yes, I know about bitmap.h in Android 2.2, cannot use that).
Here is a sample:
// Create 1x1 Bitmap with alpha channel, 8 bits per channel
Bitmap one = Bitmap.createBitmap(1,1,Bitmap.Config.ARGB_8888);
one.setPixel(0,0,0xef234567);
Log.v("?","hasAlpha() = "+Boolean.toString(one.hasAlpha()));
Log.v("?","pixel before = "+Integer.toHexString(one.getPixel(0,0)));
// Copy Bitmap to buffer
byte[] store = new byte[4];
ByteBuffer buffer = ByteBuffer.wrap(store);
one.copyPixelsToBuffer(buffer);
// Change value of the pixel
int value=buffer.getInt(0);
Log.v("?", "value before = "+Integer.toHexString(value));
value = (value >> 8) | 0xffffff00;
buffer.putInt(0, value);
value=buffer.getInt(0);
Log.v("?", "value after = "+Integer.toHexString(value));
// Copy buffer back to Bitmap
buffer.position(0);
one.copyPixelsFromBuffer(buffer);
Log.v("?","pixel after = "+Integer.toHexString(one.getPixel(0,0)));
The log then shows
hasAlpha() = true
pixel before = ef234567
value before = 214161ef
value after = ffffff61
pixel after = 619e9e9e
I understand that the order of the argb channels is different; that's fine. But I don't
want the alpha channel to be applied upon every copy (which is what it seems to be doing).
Is this how copyPixelsToBuffer and copyPixelsFromBuffer are supposed to work? Is there any way to get the raw data in a byte[]?
Added in response to answer below:
Putting in buffer.order(ByteOrder.nativeOrder()); before the copyPixelsToBuffer does change the result, but still not in the way I want it:
pixel before = ef234567
value before = ef614121
value after = ffffff41
pixel after = ff41ffff
Seems to suffer from essentially the same problem (alpha being applied upon each copyPixelsFrom/ToBuffer).

One way to access data in Bitmap is to use getPixels() method. Below you can find an example I used to get grayscale image from argb data and then back from byte array to Bitmap (of course if you need rgb you reserve 3x bytes and save them all...):
/*Free to use licence by Sami Varjo (but nice if you retain this line)*/
public final class BitmapConverter {
private BitmapConverter(){};
/**
* Get grayscale data from argb image to byte array
*/
public static byte[] ARGB2Gray(Bitmap img)
{
int width = img.getWidth();
int height = img.getHeight();
int[] pixels = new int[height*width];
byte grayIm[] = new byte[height*width];
img.getPixels(pixels,0,width,0,0,width,height);
int pixel=0;
int count=width*height;
while(count-->0){
int inVal = pixels[pixel];
//Get the pixel channel values from int
double r = (double)( (inVal & 0x00ff0000)>>16 );
double g = (double)( (inVal & 0x0000ff00)>>8 );
double b = (double)( inVal & 0x000000ff) ;
grayIm[pixel++] = (byte)( 0.2989*r + 0.5870*g + 0.1140*b );
}
return grayIm;
}
/**
* Create a gray scale bitmap from byte array
*/
public static Bitmap gray2ARGB(byte[] data, int width, int height)
{
int count = height*width;
int[] outPix = new int[count];
int pixel=0;
while(count-->0){
int val = data[pixel] & 0xff; //convert byte to unsigned
outPix[pixel++] = 0xff000000 | val << 16 | val << 8 | val ;
}
Bitmap out = Bitmap.createBitmap(outPix,0,width,width, height, Bitmap.Config.ARGB_8888);
return out;
}
}

My guess is that this might have to do with the byte order of the ByteBuffer you are using. ByteBuffer uses big endian by default.
Set endianess on the buffer with
buffer.order(ByteOrder.nativeOrder());
See if it helps.
Moreover, copyPixelsFromBuffer/copyPixelsToBuffer does not change the pixel data in any way. They are copied raw.

I realize this is very stale and probably won't help you now, but I came across this recently in trying to get copyPixelsFromBuffer to work in my app. (Thank you for asking this question, btw! You saved me tons of time in debugging.) I'm adding this answer in the hopes it helps others like me going forward...
Although I haven't used this yet to ensure that it works, it looks like that, as of API Level 19, we'll finally have a way to specify not to "apply the alpha" (a.k.a. premultiply) within Bitmap. They're adding a setPremultiplied(boolean) method that should help in situations like this going forward by allowing us to specify false.
I hope this helps!

This is an old question, but i got to the same issue, and just figured out that the bitmap byte are pre-multiplied, you can set the bitmap (as of API 19) to not pre-multiply the buffer, but in the API they make no guarantee.
From the docs:
public final void setPremultiplied(boolean premultiplied)
Sets whether the bitmap should treat its data as pre-multiplied.
Bitmaps are always treated as pre-multiplied by the view system and Canvas for performance reasons. Storing un-pre-multiplied data in a Bitmap (through setPixel, setPixels, or BitmapFactory.Options.inPremultiplied) can lead to incorrect blending if drawn by the framework.
This method will not affect the behaviour of a bitmap without an alpha channel, or if hasAlpha() returns false.
Calling createBitmap or createScaledBitmap with a source Bitmap whose colors are not pre-multiplied may result in a RuntimeException, since those functions require drawing the source, which is not supported for un-pre-multiplied Bitmaps.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.