Optimizing renderscript summation of row and column cells - android

Any advise in optimizing the following code? The code first grayscales, inverts and then thresholds the image (code not included, because it is trivial). It then sums the elements of each row and column (all elements are either 1 or 0). It then finds the row and column index of the row and column with the highest value.
The code is supposed to find the centroid of the image and it works, but I want to make it faster
I'm developing for API 23, so a reduction kernel can not be used.
Java snippet:
private int[] sumValueY = new int[640];
private int[] sumValueX = new int[480];
rows_indices_alloc = Allocation.createSized( rs, Element.I32(rs), height, Allocation.USAGE_SCRIPT);
col_indices_alloc = Allocation.createSized( rs, Element.I32(rs), width, Allocation.USAGE_SCRIPT);
public RenderscriptProcessor(RenderScript rs, int width, int height)
{
mScript.set_gIn(mIntermAllocation);
mScript.forEach_detectX(rows_indices_alloc);
mScript.forEach_detectY(col_indices_alloc);
rows_indices_alloc.copyTo(sumValueX);
col_indices_alloc.copyTo(sumValueY);
}
Renderscript.rs snippet:
#pragma version(1)
#pragma rs java_package_name(org.gearvrf.renderscript)
#include "rs_debug.rsh"
#pragma rs_fp_relaxed
const int mImageWidth=640;
const int mImageHeight=480;
int32_t maxsX=-1;
int32_t maxIndexX;
int32_t maxsY=-1;
int32_t maxIndexY;
rs_allocation gIn;
void detectX(int32_t v_in, int32_t x, int32_t y) {
int32_t sum=0;
for ( int i = 0; i < (mImageWidth); i++) {
float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, i, x));
sum+=(int)f4.r;
}
if((sum>maxsX)){
maxsX=sum;
maxIndexX = x;
}
}
void detectY(int32_t v_in, int32_t x, int32_t y) {
int32_t sum=0;
for ( int i = 0; i < (mImageHeight); i++) {
float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, x, i));
sum+=(int)f4.r;
}
if((sum>maxsY)){
maxsY=sum;
maxIndexY = x;
}
}
Any help would be appreciated

float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, x, i));
sum+=(int)f4.r;
This converts from int to float and then back to int again. I think you can simplify by just doing this:
sum += rsGetElementAt_uchar4(gIn, x, i).r;
I don't know exactly how your previous stages work because you haven't posted them, but you should try generating packed values to read here. So either put your grayscale channels in .rgba or use a single channel format and then use rsAllocationVLoad_uchar4 to fetch 4 values at once.
Also, try combining previous stages with this one, if you don't need the intermediate results of those calculations it may be cheaper to do the memory load once and then do those transformations in registers.
You might also play with how many values your threads operate on. You could try having each kernel processing width/2, width/4, width/8 elements and see how they perform. This will give GPUs more threads to play with especially on lower-resolution images but with the trade off of having more reduction steps.
You also have a multiple-writers race condition on the maxsX/maxsY and maxIndexX/maxIndexY variables. All those writes need to use atomics if you care about the exact right answer. I think maybe you posted the wrong code because you don't store to the *_indices_alloc but you copy from them at the end. So, actually you should store all the sums to those and then use either a single threaded function or a kernel with atomics to get the absolute max and max index.

Related

Need help getting pixel location for Android Renderscript

I am working with Android Renderscript to analyze preview frames received from Camera2API. I intend to analyze each pixel and based on some rules(dependent on intensity and location of the pixel) I need to update a counter. I intend to use a ForEach loop but how do I get the pixel coordinates.
An example java loop would be.
for (int i = 0; i < 240; i++)
{
for (int j = 0; j < 320; j++)
{
tempPixelIntensity = image.getPixel(i,j);
x = i;
y = j ;
if(tempPixelIntensity=zzz&x<zzzandy<zzz)
{
counter++;
}
}
}
How would I go about doing the same in a renderscript? Thanks
You might try something like this:
#pragma rs_fp_relaxed // needed for some GPUs
uint32_t counter;
void RS_KERNEL process(uchar tempPixelIntensity, uint32_t x, uint32_t y)
{
if(tempPixelIntensity=zzz&x<zzzandy<zzz)
{
rsAtomicInc(&counter);
}
}
RS kernels are SPMD (single program multiple data). So you write only the inner part of your loop for a single pixel element and the framework does the looping.
In the java side you will do something like:
Type.Builder tb = new Type.Builder(rs, Element.U8(rs));
tb.setX(320);
tb.setY(240);
Allocation input = Allocation.createTyped(rs, tb.create(), Allocation.USAGE_SCRIPT);
script.forEach_process(input);
So the dimensions of the input allocation determines the bounds that the kernel will operate over. In this case x will vary from [0,319] and y will vary from [0,239]. The x,y parameters to the kernel are special parameters that are filled in by the RS runtime, similarly the tempPixelIntensity value will be populated by the value of the input allocation pixel at the given x,y coordinate.

Android Renderscript, remove white/whiteish background from Bitmap

I'm working on a picture based app and I'm blocked on an issue with Renderscript.
My purpose is pretty simple in theory, I want to remove the white background from the images loaded by the user, to show them on another image i've set as background. More specifically what I want to to is to simulate the effect of printing a user uploaded graphic on paper canvas (also a picture) with a realistic effect.
I cannot assume the user is able to upload nice PNGs with alpha channels, and one of the requirement is to operate with JPGs.
I've been trying to solve this with RenderScripts, with something like this which sets alpha 0 to anything with R,G,and B all equals or greater than 240:
#pragma version(1)
#pragma rs java_package_name(mypackagename)
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
const static float th = 239.f/256.f;
void root(const uchar4 v_in, uchar4 v_out, const void* usrData, uint32_t x,uint32_t y){
float4 f4 = rsUnpackColor8888(*v_in);
if(f4.r > th && f4.g > th && f4.b > th)
{
f4.a = 0;
}
*v_out = rsPackColorTo8888(f4);
}
void filter() {
rsForEach(gScript, gIn, gOut);
}
but the results are not satisfactory for mainly two reasons:
if a photo has a whiteish gradient not on the background the script causes an ugly noise effect
images with shadows close to the edges get a noise effects close to the edges
I understand that passing from alpha 0 to alpha 1 is too extreme and I've tried different solution involving linear increasing the alpha when the sum of the R,G,B components decrease but I still have noisy pixels and blocks around.
With plain white, or regular background (e.g. a snapshot of the Google home page) it works perfectly but with photos it's very far from anything acceptable.
I think that if I'd be able to process one "line" of pixels or one "block" of pixels instead that a single one it could be easier to detect flat backgrounds and to avoid hitting gradients but I don't know enough about renderscripts to do that.
Can anyone point me in the right direction?
PS
I can't use PorterDuff and multiply because the background and the foreground have different dimensions and moreover since I need to be able to drag the uploaded image around the background canvas once the effect is applied. If I multiply the image with a region of the background moving the result image around would cause a section of the background to move around as well.
If I get it right, you wants to determine whether the current pixel can is a white background based on a line/block of neighboring pixels.
You can try the use rsGetElementAt. For example, to process a line in your original code:
#pragma version(1)
#pragma rs java_package_name(mypackagename)
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
const static float th = 239.f/256.f;
void root(const uchar4 v_in, uchar4 v_out, const void* usrData, uint32_t x,uint32_t y){
float4 f4 = rsUnpackColor8888(*v_in);
uint32_t width = rsAllocationGetDimX(gIn);
// E.g: Processing a line from x to x+5.
bool isBackground = true;
for (uint32_t i=0; i<=5 && x+i<width; i++) {
uchar4 nPixel_u4 = rsGetElementAt_uchar4(gIn, x+i, y);
float4 nPixel_f4 = rsUnpackColor8888(nPixel_u4);
if(nPixel_f4.r <= th || nPixel_f4.g <= th || nPixel_f4.b <= th) {
isBackground = false;
break;
}
}
if (isBackground) {
f4.a = 0.0f;
*v_out = rsPackColorTo8888(f4);
}
}
void filter() {
rsForEach(gScript, gIn, gOut);
}
This is just a naive example of how you can use rsGetElementAt to get the data from a given position in a global Allocation. There is a corresponding rsSetElementAt for saving the data to a global Allocation. I am hoping it helps your project.

YUV_420_888 interpretation on Samsung Galaxy S7 (Camera2)

I wrote a conversion from YUV_420_888 to Bitmap, considering the following logic (as I understand it):
To summarize the approach: the kernel’s coordinates x and y are congruent both with the x and y of the non-padded part of the Y-Plane (2d-allocation) and the x and y of the output-Bitmap. The U- and V-Planes, however, have a different structure than the Y-Plane, because they use 1 byte for coverage of 4 pixels, and, in addition, may have a PixelStride that is more than one, in addition they might also have a padding that can be different from that of the Y-Plane. Therefore, in order to access the U’s and V’s efficiently by the kernel I put them into 1-d allocations and created an index “uvIndex” that gives the position of the corresponding U- and V within that 1-d allocation, for given (x,y) coordinates in the (non-padded) Y-plane (and, so, the output Bitmap).
In order to keep the rs-Kernel lean, I excluded the padding area in the yPlane by capping the x-range via LaunchOptions (this reflects the RowStride of the y-plane which thus can be ignored WITHIN the kernel). So we just need to consider the uvPixelStride and uvRowStride within the uvIndex, i.e. the index used in order to access to the u- and v-values.
This is my code:
Renderscript Kernel, named yuv420888.rs
#pragma version(1)
#pragma rs java_package_name(com.xxxyyy.testcamera2);
#pragma rs_fp_relaxed
int32_t width;
int32_t height;
uint picWidth, uvPixelStride, uvRowStride ;
rs_allocation ypsIn,uIn,vIn;
// The LaunchOptions ensure that the Kernel does not enter the padding zone of Y, so yRowStride can be ignored WITHIN the Kernel.
uchar4 __attribute__((kernel)) doConvert(uint32_t x, uint32_t y) {
// index for accessing the uIn's and vIn's
uint uvIndex= uvPixelStride * (x/2) + uvRowStride*(y/2);
// get the y,u,v values
uchar yps= rsGetElementAt_uchar(ypsIn, x, y);
uchar u= rsGetElementAt_uchar(uIn, uvIndex);
uchar v= rsGetElementAt_uchar(vIn, uvIndex);
// calc argb
int4 argb;
argb.r = yps + v * 1436 / 1024 - 179;
argb.g = yps -u * 46549 / 131072 + 44 -v * 93604 / 131072 + 91;
argb.b = yps +u * 1814 / 1024 - 227;
argb.a = 255;
uchar4 out = convert_uchar4(clamp(argb, 0, 255));
return out;
}
Java side:
private Bitmap YUV_420_888_toRGB(Image image, int width, int height){
// Get the three image planes
Image.Plane[] planes = image.getPlanes();
ByteBuffer buffer = planes[0].getBuffer();
byte[] y = new byte[buffer.remaining()];
buffer.get(y);
buffer = planes[1].getBuffer();
byte[] u = new byte[buffer.remaining()];
buffer.get(u);
buffer = planes[2].getBuffer();
byte[] v = new byte[buffer.remaining()];
buffer.get(v);
// get the relevant RowStrides and PixelStrides
// (we know from documentation that PixelStride is 1 for y)
int yRowStride= planes[0].getRowStride();
int uvRowStride= planes[1].getRowStride(); // we know from documentation that RowStride is the same for u and v.
int uvPixelStride= planes[1].getPixelStride(); // we know from documentation that PixelStride is the same for u and v.
// rs creation just for demo. Create rs just once in onCreate and use it again.
RenderScript rs = RenderScript.create(this);
//RenderScript rs = MainActivity.rs;
ScriptC_yuv420888 mYuv420=new ScriptC_yuv420888 (rs);
// Y,U,V are defined as global allocations, the out-Allocation is the Bitmap.
// Note also that uAlloc and vAlloc are 1-dimensional while yAlloc is 2-dimensional.
Type.Builder typeUcharY = new Type.Builder(rs, Element.U8(rs));
//using safe height
typeUcharY.setX(yRowStride).setY(y.length / yRowStride);
Allocation yAlloc = Allocation.createTyped(rs, typeUcharY.create());
yAlloc.copyFrom(y);
mYuv420.set_ypsIn(yAlloc);
Type.Builder typeUcharUV = new Type.Builder(rs, Element.U8(rs));
// note that the size of the u's and v's are as follows:
// ( (width/2)*PixelStride + padding ) * (height/2)
// = (RowStride ) * (height/2)
// but I noted that on the S7 it is 1 less...
typeUcharUV.setX(u.length);
Allocation uAlloc = Allocation.createTyped(rs, typeUcharUV.create());
uAlloc.copyFrom(u);
mYuv420.set_uIn(uAlloc);
Allocation vAlloc = Allocation.createTyped(rs, typeUcharUV.create());
vAlloc.copyFrom(v);
mYuv420.set_vIn(vAlloc);
// handover parameters
mYuv420.set_picWidth(width);
mYuv420.set_uvRowStride (uvRowStride);
mYuv420.set_uvPixelStride (uvPixelStride);
Bitmap outBitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);
Allocation outAlloc = Allocation.createFromBitmap(rs, outBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
Script.LaunchOptions lo = new Script.LaunchOptions();
lo.setX(0, width); // by this we ignore the y’s padding zone, i.e. the right side of x between width and yRowStride
//using safe height
lo.setY(0, y.length / yRowStride);
mYuv420.forEach_doConvert(outAlloc,lo);
outAlloc.copyTo(outBitmap);
return outBitmap;
}
Testing on Nexus 7 (API 22) this returns nice color Bitmaps. This device, however, has trivial pixelstrides (=1) and no padding (i.e. rowstride=width). Testing on the brandnew Samsung S7 (API 23) I get pictures whose colors are not correct - except of the green ones. But the Picture does not show a general bias towards green, it just seems that non-green colors are not reproduced correctly. Note, that the S7 applies an u/v pixelstride of 2, and no padding.
Since the most crucial code line is within the rs-code the Access of the u/v planes uint uvIndex= (...) I think, there could be the problem, probably with incorrect consideration of pixelstrides here. Does anyone see the solution? Thanks.
UPDATE: I checked everything, and I am pretty sure that the code regarding the access of y,u,v is correct. So the problem must be with the u and v values themselves. Non green colors have a purple tilt, and looking at the u,v values they seem to be in a rather narrow range of about 110-150. Is it really possible that we need to cope with device specific YUV -> RBG conversions...?! Did I miss anything?
UPDATE 2: have corrected code, it works now, thanks to Eddy's Feedback.
Look at
floor((float) uvPixelStride*(x)/2)
which calculates your U,V row offset (uv_row_offset) from the Y x-coordinate.
if uvPixelStride = 2, then as x increases:
x = 0, uv_row_offset = 0
x = 1, uv_row_offset = 1
x = 2, uv_row_offset = 2
x = 3, uv_row_offset = 3
and this is incorrect. There's no valid U/V pixel value at uv_row_offset = 1 or 3, since uvPixelStride = 2.
You want
uvPixelStride * floor(x/2)
(assuming you don't trust yourself to remember the critical round-down behavior of integer divide, if you do then):
uvPixelStride * (x/2)
should be enough
With that, your mapping becomes:
x = 0, uv_row_offset = 0
x = 1, uv_row_offset = 0
x = 2, uv_row_offset = 2
x = 3, uv_row_offset = 2
See if that fixes the color errors. In practice, the incorrect addressing here would mean every other color sample would be from the wrong color plane, since it's likely that the underlying YUV data is semiplanar (so the U plane starts at V plane + 1 byte, with the two planes interleaved)
For people who encounter error
android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type
use buffer.capacity() instead of buffer.remaining()
and if you already made some operations on the image, you'll need to call rewind() method on the buffer.
Furthermore for anyone else getting
android.support.v8.renderscript.RSIllegalArgumentException: Array too
small for allocation type
I fixed it by changing yAlloc.copyFrom(y); to yAlloc.copy1DRangeFrom(0, y.length, y);
Posting full solution to convert YUV->BGR (can be adopted for other formats too) and also rotate image to upright using renderscript. Allocation is used as input and byte array is used as output. It was tested on Android 8+ including Samsung devices too.
Java
/**
* Renderscript-based process to convert YUV_420_888 to BGR_888 and rotation to upright.
*/
public class ImageProcessor {
protected final String TAG = this.getClass().getSimpleName();
private Allocation mInputAllocation;
private Allocation mOutAllocLand;
private Allocation mOutAllocPort;
private Handler mProcessingHandler;
private ScriptC_yuv_bgr mConvertScript;
private byte[] frameBGR;
public ProcessingTask mTask;
private ImageListener listener;
private Supplier<Integer> rotation;
public ImageProcessor(RenderScript rs, Size dimensions, ImageListener listener, Supplier<Integer> rotation) {
this.listener = listener;
this.rotation = rotation;
int w = dimensions.getWidth();
int h = dimensions.getHeight();
Type.Builder yuvTypeBuilder = new Type.Builder(rs, Element.YUV(rs));
yuvTypeBuilder.setX(w);
yuvTypeBuilder.setY(h);
yuvTypeBuilder.setYuvFormat(ImageFormat.YUV_420_888);
mInputAllocation = Allocation.createTyped(rs, yuvTypeBuilder.create(),
Allocation.USAGE_IO_INPUT | Allocation.USAGE_SCRIPT);
//keep 2 allocations to handle different image rotations
mOutAllocLand = createOutBGRAlloc(rs, w, h);
mOutAllocPort = createOutBGRAlloc(rs, h, w);
frameBGR = new byte[w*h*3];
HandlerThread processingThread = new HandlerThread(this.getClass().getSimpleName());
processingThread.start();
mProcessingHandler = new Handler(processingThread.getLooper());
mConvertScript = new ScriptC_yuv_bgr(rs);
mConvertScript.set_inWidth(w);
mConvertScript.set_inHeight(h);
mTask = new ProcessingTask(mInputAllocation);
}
private Allocation createOutBGRAlloc(RenderScript rs, int width, int height) {
//Stored as Vec4, it's impossible to store as Vec3, buffer size will be for Vec4 anyway
//using RGB_888 as alternative for BGR_888, can be just U8_3 type
Type.Builder rgbTypeBuilderPort = new Type.Builder(rs, Element.RGB_888(rs));
rgbTypeBuilderPort.setX(width);
rgbTypeBuilderPort.setY(height);
Allocation allocation = Allocation.createTyped(
rs, rgbTypeBuilderPort.create(), Allocation.USAGE_SCRIPT
);
//Use auto-padding to be able to copy to x*h*3 bytes array
allocation.setAutoPadding(true);
return allocation;
}
public Surface getInputSurface() {
return mInputAllocation.getSurface();
}
/**
* Simple class to keep track of incoming frame count,
* and to process the newest one in the processing thread
*/
class ProcessingTask implements Runnable, Allocation.OnBufferAvailableListener {
private int mPendingFrames = 0;
private Allocation mInputAllocation;
public ProcessingTask(Allocation input) {
mInputAllocation = input;
mInputAllocation.setOnBufferAvailableListener(this);
}
#Override
public void onBufferAvailable(Allocation a) {
synchronized(this) {
mPendingFrames++;
mProcessingHandler.post(this);
}
}
#Override
public void run() {
// Find out how many frames have arrived
int pendingFrames;
synchronized(this) {
pendingFrames = mPendingFrames;
mPendingFrames = 0;
// Discard extra messages in case processing is slower than frame rate
mProcessingHandler.removeCallbacks(this);
}
// Get to newest input
for (int i = 0; i < pendingFrames; i++) {
mInputAllocation.ioReceive();
}
int rot = rotation.get();
mConvertScript.set_currentYUVFrame(mInputAllocation);
mConvertScript.set_rotation(rot);
Allocation allocOut = rot==90 || rot== 270 ? mOutAllocPort : mOutAllocLand;
// Run processing
// ain allocation isn't really used, global frame param is used to get data from
mConvertScript.forEach_yuv_bgr(allocOut);
//Save to byte array, BGR 24bit
allocOut.copyTo(frameBGR);
int w = allocOut.getType().getX();
int h = allocOut.getType().getY();
if (listener != null) {
listener.onImageAvailable(frameBGR, w, h);
}
}
}
public interface ImageListener {
/**
* Called when there is available image, image is in upright position.
*
* #param bgr BGR 24bit bytes
* #param width image width
* #param height image height
*/
void onImageAvailable(byte[] bgr, int width, int height);
}
}
RS
#pragma version(1)
#pragma rs java_package_name(com.affectiva.camera)
#pragma rs_fp_relaxed
//Script convers YUV to BGR(uchar3)
//current YUV frame to read pixels from
rs_allocation currentYUVFrame;
//input image rotation: 0,90,180,270 clockwise
uint32_t rotation;
uint32_t inWidth;
uint32_t inHeight;
//method returns uchar3 BGR which will be set to x,y in output allocation
uchar3 __attribute__((kernel)) yuv_bgr(uint32_t x, uint32_t y) {
// Read in pixel values from latest frame - YUV color space
uchar3 inPixel;
uint32_t xRot = x;
uint32_t yRot = y;
//Do not rotate if 0
if (rotation==90) {
//rotate 270 clockwise
xRot = y;
yRot = inHeight - 1 - x;
} else if (rotation==180) {
xRot = inWidth - 1 - x;
yRot = inHeight - 1 - y;
} else if (rotation==270) {
//rotate 90 clockwise
xRot = inWidth - 1 - y;
yRot = x;
}
inPixel.r = rsGetElementAtYuv_uchar_Y(currentYUVFrame, xRot, yRot);
inPixel.g = rsGetElementAtYuv_uchar_U(currentYUVFrame, xRot, yRot);
inPixel.b = rsGetElementAtYuv_uchar_V(currentYUVFrame, xRot, yRot);
// Convert YUV to RGB, JFIF transform with fixed-point math
// R = Y + 1.402 * (V - 128)
// G = Y - 0.34414 * (U - 128) - 0.71414 * (V - 128)
// B = Y + 1.772 * (U - 128)
int3 bgr;
//get red pixel and assing to b
bgr.b = inPixel.r +
inPixel.b * 1436 / 1024 - 179;
bgr.g = inPixel.r -
inPixel.g * 46549 / 131072 + 44 -
inPixel.b * 93604 / 131072 + 91;
//get blue pixel and assign to red
bgr.r = inPixel.r +
inPixel.g * 1814 / 1024 - 227;
// Write out
return convert_uchar3(clamp(bgr, 0, 255));
}
On a Samsung Galaxy Tab 5 (Tablet), android version 5.1.1 (22), with alleged YUV_420_888 format, the following renderscript math works well and produces correct colors:
uchar yValue = rsGetElementAt_uchar(gCurrentFrame, x + y * yRowStride);
uchar vValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) );
uchar uValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) + (xSize * ySize) / 4);
I do not understand why the horizontal value (i.e., y) is scaled by a factor of four instead of two, but it works well. I also needed to avoid use of rsGetElementAtYuv_uchar_Y|U|V. I believe the associated allocation stride value is set to zero instead of something proper. Use of rsGetElementAt_uchar() is a reasonable work-around.
On a Samsung Galaxy S5 (Smart Phone), android version 5.0 (21), with alleged YUV_420_888 format, I cannot recover the u and v values, they come through as all zeros. This results in a green looking image. Luminous is OK, but image is vertically flipped.
This code requires the use of the RenderScript compatibility library (android.support.v8.renderscript.*).
In order to get the compatibility library to work with Android API 23, I updated to gradle-plugin 2.1.0 and Build-Tools 23.0.3 as per Miao Wang's answer at How to create Renderscript scripts on Android Studio, and make them run?
If you follow his answer and get an error "Gradle version 2.10 is required" appears, do NOT change
classpath 'com.android.tools.build:gradle:2.1.0'
Instead, update the distributionUrl field of the Project\gradle\wrapper\gradle-wrapper.properties file to
distributionUrl=https\://services.gradle.org/distributions/gradle-2.10-all.zip
and change File > Settings > Builds,Execution,Deployment > Build Tools > Gradle >Gradle to Use default gradle wrapper as per "Gradle Version 2.10 is required." Error.
Re: RSIllegalArgumentException
In my case this was the case that buffer.remaining() was not multiple of stride:
The length of last line was less than stride (i.e. only up to where actual data was.)
An FYI in case someone else gets this as I was also getting "android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type" when trying out the code. In my case it turns out that the when allocating the buffer for Y i had to rewind the buffer because it was being left at the wrong end and wasn't copying the data. By doing buffer.rewind(); before allocation the new bytes array makes it work fine now.

RenderScript one input- and two output-Allocations

I managed to write a Kernel that transforms an input-Bitmap to a float[] of Sobel gradients (two separate Kernels for SobelX and SobelY). I did this by assigning the input-Bitmap as a global variable and then passing the Kernel based on the Output allocation and referencing the neighbors of the Input-Bitmap via rsGetElementAt. Since I actually want to calculate the Magnitude (hypot(Sx,Sy) AND the Direction (atan2(Sy,Sx)) it would be nice to do the whole thing in one Kernel-pass. If I only had to calculate the Magnitude Array, this could be done with the same structure (1 intput Bitmap, 1 Output float[]). Now I wonder, whether it is possible to just add an additional Allocation for the Direction Output (also a float[]). I tried this with the rs-function rsSetElementAt() as follows:
#pragma version(1)
#pragma rs java_package_name(com.example.xxx)
#pragma rs_fp_relaxed
rs_allocation gIn, direction;
int32_t width;
int32_t height;
// Sobel, Magnitude und Direction
float __attribute__((kernel)) sobel_XY(uint32_t x, uint32_t y) {
float outX=0, outY=0;
if (x>0 && y>0 && x<(width-1) && y<(height-1)){
uchar4 c11=rsGetElementAt_uchar4(gIn, x-1, y-1); uchar4 c12=rsGetElementAt_uchar4(gIn, x-1, y);uchar4 c13=rsGetElementAt_uchar4(gIn, x-1, y+1);
uchar4 c21=rsGetElementAt_uchar4(gIn, x, y-1);uchar4 c23=rsGetElementAt_uchar4(gIn, x, y+1);
uchar4 c31=rsGetElementAt_uchar4(gIn, x+1, y-1);uchar4 c32=rsGetElementAt_uchar4(gIn, x+1, y);uchar4 c33=rsGetElementAt_uchar4(gIn, x+1, y+1);
float4 f11=rsUnpackColor8888(c11);float4 f12=rsUnpackColor8888(c12);float4 f13=rsUnpackColor8888(c13);
float4 f21=rsUnpackColor8888(c21); float4 f23=rsUnpackColor8888(c23);
float4 f31=rsUnpackColor8888(c31);float4 f32=rsUnpackColor8888(c32);float4 f33=rsUnpackColor8888(c33);
outX= f11.r-f31.r + 2*(f12.r-f32.r) + f13.r-f33.r;
outY= f11.r-f13.r + 2*(f21.r-f23.r) + f31.r-f33.r;
float d = atan2(outY, outX);
rsSetElementAt_float(direction, d, x, y);
return hypot(outX, outY);
}
}
And the corresponding Java code:
ScriptC_sobel script;
script=new ScriptC_sobel(rs);
script.set_gIn(Allocation.createFromBitmap(rs, bmpGray));
Type.Builder TypeOut = new Type.Builder(rs, Element.F32(rs));
TypeOut.setX(width).setY(height);
Allocation outAllocation = Allocation.createTyped(rs, TypeOut.create());
// the following 3 lines shall reflect the allocation to the Direction output
Type.Builder TypeDir = new Type.Builder(rs, Element.F32(rs));
TypeDir.setX(width).setY(height);
Allocation dirAllocation = Allocation.createTyped(rs, TypeDir.create());
script.forEach_sobel_XY(outAllocation);
outAllocation.copyTo(gm) ;
dirAllocation.copyTo(gd);
Unfortunately this does not work. I am not sure, whether the problem is with the structural logic of the rs-kernel or is it because I cannot use a second Type.Builder assignment within the Java code (because the kernel is already tied to the Magnitude Output-allocation). Any help is highly appreciated.
PS: I see that there is no link between the second Type.Builder assignment and the "direction" allocaton in rs - but how can this be achieved?
The outAllocation is passed as a parameter to the kernel. But the existence and location of dirAllocation also has to be communicated to the Renderscript side. Do this just before starting the script:
script.set_direction(dirAllocation);
Also, read about memory allocation in Renderscript.

Passing Array to rsForEach in Renderscript Compute

I found there's lacking good documentation in RenderScript, for what I know, forEach in RS is to execute the root() for each individual item in the allocation.
I am trying to make a library for Renderscript that does Image processing, as a starting point, I reached this great answer. But the problem, is that the blur operation is on Each pixel and each pixel requires another loop (n with blur width) of calculation. Although running on multi-core, it is still a bit too slow.
I am trying to modify it to allow (two-pass) box filter, but that requires working on a single row or column instead of cell. So, is there any way to ask foreach to send an array to root()?
rsForEach can only operate upon Allocations.
If you want to have the rsForEach function call root() for each of the image rows you have to pass in an Allocation that is sized to be the same length as the number of rows and then work out which row you should be operating on inside root() (similarly for operating on each column). RenderScript should then divide up the work to run on the resources available (more than one row being processed at the same time on multi core devices).
One way you could do that is by passing in an Allocation that give the offsets (within the image data array) of the image rows. The v_in argument inside the root() will then be the row offset. Since the Allocations the rsForEach call is operating upon is not the image data you cannot write the image out using the v_out argument and you must bind the output image separately.
Here is some RenderScript that show this:
#pragma version(1)
#pragma rs java_package_name(com.android.example.hellocompute)
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
int mImageWidth;
const uchar4 *gInPixels;
uchar4 *gOutPixels;
void init() {
}
static const int kBlurWidth = 20;
//
// This is called per row.
// The row indices are passed in as v_in or you could also use the x argument and multiply it by image width.
//
void root(const int32_t *v_in, int32_t *v_out, const void *usrData, uint32_t x, uint32_t y) {
float3 blur[kBlurWidth];
float3 cur_colour = {0.0f, 0.0f, 0.0f};
for ( int i = 0; i < kBlurWidth; i++) {
float3 init_colour = {0.0f, 0.0f, 0.0f};
blur[i] = init_colour;
}
int32_t row_index = *v_in;
int blur_index = 0;
for ( int i = 0; i < mImageWidth; i++) {
float4 pixel_colour = rsUnpackColor8888(gInPixels[i + row_index]);
cur_colour -= blur[blur_index];
blur[blur_index] = pixel_colour.rgb;
cur_colour += blur[blur_index];
blur_index += 1;
if ( blur_index >= kBlurWidth) {
blur_index = 0;
}
gOutPixels[i + row_index] = rsPackColorTo8888(cur_colour/(float)kBlurWidth);
//gOutPixels[i + row_index] = rsPackColorTo8888(pixel_colour);
}
}
void filter() {
rsDebug("Number of rows:", rsAllocationGetDimX(gIn));
rsForEach(gScript, gIn, gOut, NULL);
}
This would be setup using the following Java:
mBlurRowScript = new ScriptC_blur_row(mRS, getResources(), R.raw.blur_row);
int row_width = mBitmapIn.getWidth();
//
// Create an allocation that indexes each row.
//
int num_rows = mBitmapIn.getHeight();
int[] row_indices = new int[num_rows];
for ( int i = 0; i < num_rows; i++) {
row_indices[i] = i * row_width;
}
Allocation row_indices_alloc = Allocation.createSized( mRS, Element.I32(mRS), num_rows, Allocation.USAGE_SCRIPT);
row_indices_alloc.copyFrom(row_indices);
//
// The image data has to be bound to the pointers within the RenderScript so it can be accessed
// from the root() function.
//
mBlurRowScript.bind_gInPixels(mInAllocation);
mBlurRowScript.bind_gOutPixels(mOutAllocation);
// Pass in the image width
mBlurRowScript.set_mImageWidth(row_width);
//
// Pass in the row indices Allocation as the input. It is also passed in as the output though the output is not used.
//
mBlurRowScript.set_gIn(row_indices_alloc);
mBlurRowScript.set_gOut(row_indices_alloc);
mBlurRowScript.set_gScript(mBlurRowScript);
mBlurRowScript.invoke_filter();

Categories

Resources