How to use half precision in renderscript?

How to use half precision in renderscript? - android

I use the following code to transfer an array of float numbers to a renderscript kernel:
float[] bufName = new float[3];
bufName [0] = 255;
bufName [1] = 255;
bufName [2] = 0;
Allocation alloc1 = Allocation.createSized(mRs, Element.F32(mRs), 3);
alloc1.copy1DRangeFrom(0, 3, mtmd);
ScriptC_foo foo = new ScriptC_foo(mRs);
foo.set_gIn(alloc1);
And I have defined gIn in the foo.rs file as follows:
rs_allocation gIn;
I would like to work with 16 bit floating point numbers. I know that I should change the allocation creation to this:
Allocation alloc1 = Allocation.createSized(mRs, Element.F16(mRs), 3);
However, I cannot find a solution for copying the bufName array to the allocation. Any help is appreciated.

Java does not define half-precision floats, so you'll have to do your own manipulation to get this to work. If you use Float.valueOf(f).shortValue() (where f is the specific flow you'd like to represent as half-precision). This should properly cast down the float to the new bit size. Then create an Allocation of Element.F16 size. You should be able to copy an array of short values down to RenderScript to do the work.

Related

Packing Java bitmap into ByteBuffer - byte order doesn't match pixel format and endianness (ARM)

I'm a bit puzzled with internal representation of Bitmap's pixels in ByteBuffer (testing on ARM/little endian):
1) In the Java layer I create an ARGB bitmap and fill it with 0xff112233 color:
Bitmap sampleBitmap = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888);
Canvas canvas = new Canvas(sampleBitmap);
Paint paint = new Paint();
paint.setStyle(Paint.Style.FILL);
paint.setColor(Color.rgb(0x11,0x22, 0x33));
canvas.drawRect(0,0, sampleBitmap.getWidth(), sampleBitmap.getHeight(), paint);
To test, sampleBitmap.getPixel(0,0) indeed returns 0xff112233 that matches ARGB pixel format.
2) The bitmap is packed into direct ByteBuffer before passing to the native layer:
final int byteSize = sampleBitmap.getAllocationByteCount();
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(byteSize);
//byteBuffer.order(ByteOrder.LITTLE_ENDIAN);// See below
sampleBitmap.copyPixelsToBuffer(byteBuffer);
To test, regardless of the buffer's order setting, in the debugger I see the byte layout which doesn't quite match ARGB but more like a big endian RGBA (or little endian ABGR!?)
byteBuffer.rewind();
final byte [] out = new byte[4];
byteBuffer.get(out, 0, out.length);
out = {byte[4]#12852}
0 = (0x11)
1 = (0x22)
2 = (0x33)
3 = (0xFF)
Now, I'm passing this bitmap to the native layer where I must extract pixels and I would expect Bitmap.Config.ARGB_8888 to be represented, depending on buffer's byte order as:
a) byteBuffer.order(ByteOrder.LITTLE_ENDIAN):
out = {byte[4]#12852}
0 = (0x33)
1 = (0x22)
2 = (0x11)
3 = (0xFF)
or
b) byteBuffer.order(ByteOrder.BIG_ENDIAN):
out = {byte[4]#12852}
0 = (0xFF)
1 = (0x11)
2 = (0x22)
3 = (0x33)
I can make the code which extracts the pixels work based on above output but I don't like it since I can't explain the behaviour which I hope someone will do :)
Thanks!

Let's take a look at the implementation. Both getPixel and copyPixelsToBuffer just call their native counterparts.
Bitmap_getPixels specifies an output format:
SkImageInfo dstInfo = SkImageInfo::Make(1, 1, kBGRA_8888_SkColorType, kUnpremul_SkAlphaType, sRGB);
bitmap.readPixels(dstInfo, &dst, dstInfo.minRowBytes(), x, y);
It effectively asks the bitmap to give the pixel value converted to BGRA_8888 (which becomes ARGB because of different native and java endianness).
Bitmap_copyPixelsToBuffer in its turn just copies raw data:
memcpy(abp.pointer(), src, bitmap.computeByteSize());
And does not have any conversion. It basically returns the data in the same format it uses to store it. Let's find out what this inner format is.
Bitmap_creator is used to create a new bitmap and it gets the format from the config passed by calling
SkColorType colorType = GraphicsJNI::legacyBitmapConfigToColorType(configHandle);
Looking at the legacyBitmapConfigToColorType implementation, ARGB_8888 (which has index 5) becomes kN32_SkColorType.
kN32_SkColorType is from skia library, so looking at the definitions find the comment
kN32_SkColorType is an alias for whichever 32bit ARGB format is the
"native" form for skia's blitters. Use this if you don't have a swizzle
preference for 32bit pixels.
and below is the definition:
#if SK_PMCOLOR_BYTE_ORDER(B,G,R,A)
kN32_SkColorType = kBGRA_8888_SkColorType,
#elif SK_PMCOLOR_BYTE_ORDER(R,G,B,A)
kN32_SkColorType = kRGBA_8888_SkColorType,
SK_PMCOLOR_BYTE_ORDER is defined here and it says SK_PMCOLOR_BYTE_ORDER(R,G,B,A) will be true on a little endian machine, which is our case. So it means the bitmap is stored in kRGBA_8888_SkColorType format internally.

Copying to and from half-precision F16 allocation in android (renderscript)

Also asked here with no luck (https://groups.google.com/forum/#!topic/android-developers/Rh_L9Jv_S8Q)
I'm trying to figure out how to do half-precision using types like half and half4. The only problem seems to be getting the numbers from java to renderscript and back.
The Java Code:
private float[] input;
private float[] half_output;
private RenderScript mRS;
private ScriptC_mono mScript;
private final int dimen = 15;
...
//onCreate
input = new float[dimen * dimen * 3]; //later loaded from file 182.24 3.98 105.83 226.08 15.2 80.01...
half_output = new float[dimen * dimen * 3];
...
//function calling renderscript
mRS = RenderScript.create(this);
ScriptC_halfPrecision mScript = new ScriptC_halfPrecision(mRS);
Allocation input2 = Allocation.createSized(mRS, Element.F16(mRS), dimen * dimen * 3);
input2.copyFromUnchecked(input); //copy float values to F16 allocation
Allocation halfIndex = Allocation.createSized(mRS, Element.F16(mRS), dimen * dimen);
Type.Builder half_output_type = new Type.Builder(mRS, Element.F16(mRS)).setX(dimen * dimen * 3);
Allocation output3 = Allocation.createTyped(mRS, half_output_type.create());
mScript.set_half_in(input2);
mScript.set_half_out(output3);
mScript.forEach_half_operation(halfIndex);
output3.copy1DRangeToUnchecked(0, dimen * dimen * 3, half_output); //copy F16 allocation back to float array
The Renderscript:
#pragma version(1)
#pragma rs java_package_name(com.example.android.rs.hellocompute)
rs_allocation half_in;
rs_allocation half_out;
half __attribute__((kernel)) half_operation(uint32_t x) {
half4 out = rsGetElementAt_half4(half_in, x);
out.x /= 2.0;
out.y /= 2.0;
out.z /= 2.0;
out.w /= 2.0;
rsSetElementAt_half4(half_out, out, x);
}
I also tried this instead of the last line shown in the Java code:
float temp_half[] = new float[1];
for (int i = 0; i < dimen * dimen * 3; ++i) { //copy F16 allocation back to float array
output3.copy1DRangeToUnchecked(i, 1, temp_half);
half_output[i]=temp_half[0];
}
All the above code works perfectly for float4 variables in the renderscript and F32 allocations in the java.
This is obviously because there is no issue going from renderscript float to java float.
But trying to go from java float (since there is no java half) to renderscript half and back again is very difficult.
Can anyone tell me how to do it?
Both of the above versions of the java code result in seemingly random values in the half_output array.
They are obviously not random because they are the same values every time I run it, no matter what the operation in the half_operation(uint32_t x) function.
I've tried changing the out.x /= 2.0; (and corresponding y,z,w code) to out.x /= 2000000.0; or out.x *= 2000000.0;
and still the values that end up in the half_output array are the same every time I run it.
Using input of 182.24 3.98 105.83 226.08 15.2 80.01...
Using this java
output3.copy1DRangeToUnchecked(0, dimen * dimen * 3, half_output); //copy F16 allocation back to float array
The resulting half_output is 46657.44 27094.48 3891.45 965.1825 36223.44 14959.08...
Using this java
float temp_half[] = new float[1];
for (int i = 0; i < dimen * dimen * 3; ++i) { //copy F16 allocation back to float array
output3.copy1DRangeToUnchecked(i, 1, temp_half);
half_output[i]=temp_half[0];
}
The resulting half_output is 2.3476E-41 2.5546E-41 6.2047E-41 2.5407E-41 1.9802E-41 2.4914E-41...
Again these are the results no matter what I change the out.x /= 2.0; algorithm to.

The problem is this copy does not do a conversion. It will just put your source FP32 values into memory, but then when you try and interpret those values as FP16, they will be incorrect.
input2.copyFromUnchecked(input); //copy float values to F16 allocation
You might port something like the answer from this question to renderscript:
32-bit to 16-bit Floating Point Conversion
If your input doesn't have denorms/infinity/nan/overflow/underflow this seems like an ok solution:
uint32_t x = *((uint32_t*)&f);
uint16_t h = ((x>>16)&0x8000)|((((x&0x7f800000)-0x38000000)>>13)&0x7c00)|((x>>13)&0x03ff);
Really the solution is to have your source values in the file in fp16 binary format already. Read them into a java byte[] array and then do the copy into the fp16 input allocation. Then when the renderscript kernel interprets them as fp16 then you should have no problem.

Can I set input and output allocations on Renderscript to be of different sizes/dimensions?

Background
I'm trying to learn Renderscript, so I wish to try to do some simple operations that I think about.
The problem
I thought of rotating a bitmap, which is something that's simple enough to manage.
on C/C++, it's a simple thing to do (search for "jniRotateBitmapCw90") :
https://github.com/AndroidDeveloperLB/AndroidJniBitmapOperations/blob/master/JniBitmapOperationsLibrary/jni/JniBitmapOperationsLibrary.cpp
Thing is, when I try this on Renderscript, I get this error:
android.support.v8.renderscript.RSRuntimeException: Dimension mismatch
between parameters ain and aout!
Here's what I do:
RS:
void rotate90CW(const uchar4 *in, uchar4 *out, uint32_t x, uint32_t y) {
// XY. ..X ... ...
// ...>..Y>...>Y..
// ... ... .YX X..
out[...]=in[...] ...
}
Java:
mRenderScript = RenderScript.create(this);
mInBitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sample_photo);
mOutBitmap = Bitmap.createBitmap(mInBitmap.getHeight(), mInBitmap.getWidth(), mInBitmap.getConfig());
final Allocation input = Allocation.createFromBitmap(mRenderScript, mInBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
final Allocation output = Allocation.createFromBitmap(mRenderScript, mOutBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
ScriptC_test script = new ScriptC_test(mRenderScript, getResources(), R.raw.test);
...
script.forEach_rotate90CW(input, output);
output.copyTo(mOutBitmap);
Even when I do set both allocations to be of the same size (squared bitmap), and I just set the output to be the input:
out[width * y + x] = in[width * y+x];
then what I get is a bitmap with holes... How come?
This is what I get:
The questions
Does this mean I can't do this kind of operation?
Does it mean that I can't use allocations of various sizes/dimensions?
Is it possible to overcome this issue (and still use Renderscript, of course) ? If so, how?
Maybe I could add an array variable inside the RS side, and set the allocation to it, instead?
Why do I get holes in the bitmap, for the case of a square input&output?
EDIT:This is my current code:
RS
rs_allocation *in;
uchar4 attribute((kernel)) rotate90CW(uint32_t x, uint32_t y){
// XY. ..X ... ...
// ...>..Y>...>Y..
// ... ... .YX X..
uchar4 curIn =rsGetElementAt_uchar4(in, 0, 0);
return curIn; //just for testing...
}
Java:
mRenderScript = RenderScript.create(this);
mInBitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sample_photo);
mOutBitmap = Bitmap.createBitmap(mInBitmap.getHeight(), mInBitmap.getWidth(), mInBitmap.getConfig());
final Allocation input = Allocation.createFromBitmap(mRenderScript, mInBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
final Allocation output = Allocation.createFromBitmap(mRenderScript, mOutBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
ScriptC_test script = new ScriptC_test(mRenderScript, getResources(), R.raw.test);
script.bind_in(input);
script.forEach_rotate90CW(output);
output.copyTo(mOutBitmap);
mImageView.setImageBitmap(mOutBitmap);

Here goes:
Does this mean I can't do this kind of operation?
No, not really. You just have to craft things correctly.
Does it mean that I can't use allocations of various sizes/dimensions?
No, but it does mean you can't use different size allocations in the way you currently are doing things. The default kernel in/out mechanism expects the input and output sizes to match so it can iterate over all of the elements correctly. If you need something different, it's up to you to manage it. More on that below.
Is it possible to overcome this issues...how?
The easiest solution would be to create an Allocation for input and bind it to the renderscript instance rather than pass it as a parameter. Then your RS would only need an output allocation (and your kernel only take output, x and y). From there you can determine which coordinate within the input allocation you want and place it directly into the output location:
int inX = ...;
int inY = ...;
uchar4 curIn = rsGetElementAt_uchar4(inAlloc, inX, inY);
*out = curIn;
Why do I get holes in the bitmap, for the case of a square input&output?
It's because you cannot use the x and y parameters to offset into the input and output allocation. Those in/out parameters are already pointing to the correct (same) location in both the input and output. The indexing you're doing is unnecessary and not really supported. Each time your kernel is called, it is being called for 1 element location within the allocation. This is why the input and output sizes must be the same when provided as parameters.

This should solve your problem
RS
rs_allocation *in;
uchar4 attribute((kernel)) rotate90CW(uint32_t x, uint32_t y){
...
uchar4 curIn =rsGetElementAt_uchar4(in, x, y);
return curIn;
}

Renderscript: 3D lookup table to convert an RGB triplet to a byte value

I load a 3D lookup map which associates triplets of RGB byte values, to a single byt value. I define my allocation like this:
Type.Builder tbLookup = new Type.Builder(rs, Element.U8(rs));
tbLookup.setX(256);
tbLookup.setY(256);
tbLookup.setZ(256);
tbLookup.setMipmaps(false);
tbLookup.setFaces(false);
lookup = Allocation.createTyped(rs, tbLookup.create(), Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_GRAPHICS_CONSTANTS);
int ncolors = 256*256*256;
byte[] sampledata = new byte[ncolors];
lookup.copyFrom(sampledata);
script.set_gLookup(lookup); //global variable gLookup in renderscript
Then I define my kernel in renderscript:
rs_allocation gLookup;
uchar4 __attribute__((kernel)) lookItUp(const uchar4 in, uint32_t x, uint32_t y)
{
uchar4 out = in;
uchar p = rsGetElementAt_uchar(gLookup, in.r,in.g,in.b);
out.r = p;
out.g = p;
out.b = p;
return out;
}
This doesn't work, it outputs zero values (black image) and works extremely slow. If I don't do the rsGetElementAt_uchar, then it works fast (I can assign a fixed value and it's okay). So I must be doing something wrong with the lookup table type. Any clue?
Thank you!
P.S: A 3d lookup table is not a crazy idea, there already is an Intrinsic function for converting RGB to RGBA by using a 3D lookup table. But I need my own lookup table.

that should be USAGE_SCRIPT--I don't know what USAGE_GRAPHICS_CONSTANT will do there, but it's definitely not what you want.

How work properly with array allocations in RenderScript

I've been working around with RenderScript for a few days, but I can't figure out how properly pass an array from Java to RenderScript. I saw some examples but none of them worked for me and I'm getting stuck with the lack of documentation.
In this code I'm trying to do some checks between bb and coords array for each index in that root() receives:
RenderScript code:
#pragma version(1)
#pragma rs java_package_name(com.me.example)
int4 bb;
rs_allocation coords;
void __attribute__((kernel)) root(int32_t in)
{
int index = in;
if(bb[index] > rsGetElementAt_int(coords, index))
{
if(bb[index + 1] > rsGetElementAt_int(coords, index + 1))
{
//do something
}
}
}
Java code:
RenderScript mRS = RenderScript.create(this);
ScriptC_script script = new ScriptC_script(mRS, getResources(), R.raw.match);
// This arrays comes with data from another place
int[] indices;
int[] coords;
// Create allocations
Allocation mIN = Allocation.createSized(mRS, Element.I32(mRS), indices.length);
Allocation mOUT = Allocation.createSized(mRS, Element.I32(mRS), indices.length);
Allocation coordsAlloc = Allocation.createSized(mRS, Element.I32(mRS), coords.length);
// Fill it with data
mIN.copyFrom(indices);
coordsAlloc.copyFrom(coords);
// Set the data array
script.set_coords(coordsAlloc);
// Create bb and run
script.set_bb(new Int4 (x, y, width, height));
script.forEach_root(mIN);
When I execute it I get this error on set_coords() statement:
Script::setVar unable to set allocation, invalid slot index
And program exits:
Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1), thread 12274 ...

The issue is that rs_allocation is an opaque handle and not a primitive or custom structure the reflected set_*() methods understand. Change your RS code to make coords an int32_t * and have a second type for the length:
int32_t *coords;
int32_t coordsLen;
...
void attribute((kernel)) root(int32_t in)
{
int index = in;
if(bb[index] > coords[index])
{
if(bb[index + 1] > coords[index + 1])
{
//do something
}
}
}
Then in your Java code, you create the allocation but now have to set the coordLen using the reflection method set_coordsLen() (so you can properly bounds check in your RS code, not shown here) then you have to bind the Java side array to the RS allocation:
...
Allocation coordsAlloc = Allocation.createSized(mRS, Element.I32(mRS), coords.length);
// Fill it with data
mIN.copyFrom(indices);
coordsAlloc.copyFrom(coords);
// Set the data array
script.set_coordsLen(coords.length);
script.bind_coords(coordsAlloc);
// Create bb and run
script.set_bb(new Int4 (x, y, width, height));
script.forEach_root(mIN);

Thanks for your reply Larry. I tried your approach and I got a new error in set_coordsLen() statement.
Script::setSlot unable to set allocation, invalid slot index
So I starting to think that I must have another problem in my script. I checked everything again from the beginning and I found the problem in another class. I was creating my script wrong, from another .rs file (copypaste fail):
ScriptC_script exmple = new ScriptC_script(mRS, getResources(), R.raw.exmple);
ScriptC_script script = new ScriptC_script(mRS, getResources(), R.raw.exmple);
Instead of:
ScriptC_script exmple = new ScriptC_script(mRS, getResources(), R.raw.exmple);
ScriptC_script script = new ScriptC_script(mRS, getResources(), R.raw.script);
I was getting those errors because I was setting allocations on inexistent variables. This little (and shameful) mistake cost me too many hours of frustration. Still weird that Eclipse let me invoke that sets methods.
I tried both ways of passing the array and both worked this time. I prefer yours though.
P.D: I watched your AnDevCon presentation video. Double thanks for share your knowledge about RS.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.