I am processing audio buffers in Android, the setup I have is as follows:
get system callback with a short buffer
convert short buffer to float buffer
do some DSP with float buffer
convert float buffer to short buffer
deliver short buffer to system
I want to reduce the latency of steps 2 and 4, the short to float and float to short conversions. (leaving aside the latency in the 3- DSP since I will take care of that later).
So, I would like to use NEON SIMD to calculate multiple values at a time.
What I currently have for 2 and 4 is the following code:
#define CONV16BIT 32768
#define CONVMYFLT (1./32768.)
static int i;
float * floatBuffer;
short * shortInBuffer;
short * shortOutBuffer;
...(malloc and init buffers method)
...(inside callback)
//2- short to float
for(i = 0; i < bufferSize; i++) {
floatBuffer[i] = (float) (shortInBuffer[i] * CONVMYFLT);
}
...(do dsp)
//4- float to short
for(i = 0; i < bufferSize; i++) {
shortOutBuffer[i] = (short) (floatBuffer[i] * CONV16BIT);
}
I believe that the steps I need for taking advantage of NEON are:
(for the short to float part)
Load the 16-bit shorts form short buffer
Convert them to 32-bit integers
Convert them to float
Multiply them by CONVMYFLT
Store them into float buffer
Found this info in this post (selected answer)
__m128 factor = _mm_set1_ps(1.0f / value);
for (int i = 0; i < W*H; i += 8)
{
// Load 8 16-bit ushorts.
// vi = {a,b,c,d,e,f,g,h}
__m128i vi = _mm_load_si128((const __m128i*)(source + i));
// Convert to 32-bit integers
// vi0 = {a,0,b,0,c,0,d,0}
// vi1 = {e,0,f,0,g,0,h,0}
__m128i vi0 = _mm_cvtepu16_epi32(vi);
__m128i vi1 = _mm_cvtepu16_epi32(_mm_unpackhi_epi64(vi,vi));
// Convert to float
__m128 vf0 = _mm_cvtepi32_ps(vi0);
__m128 vf1 = _mm_cvtepi32_ps(vi1);
// Multiply
vf0 = _mm_mul_ps(vf0,factor);
vf1 = _mm_mul_ps(vf1,factor);
// Store
_mm_store_ps(destination + i + 0,vf0);
_mm_store_ps(destination + i + 4,vf1);
}
However this is SIMD for intel SSE4.1 not for NEON.
What would be the equivalent implementation for NEON in Android? (had a hard time understanding the NEON intrinsics)
Update 1
From the answer of fsheikh I was able to build this:
- I was able to get int16_t from the system callback
- and all my buffer sizes are multiple of 8:
int16x8_t i16v;
int32x4_t i32vl, i32vh;
float32x4_t f32vl, f32vh;
for(i = 0; i < bufferSize; i += 8) {
//load 8 16-bit lanes on vector
i16v = vld1q_s16((const int16x8_t*) int16_t_inBuffer[i]);
// convert into 32-bit signed integer
i32vl = vmovl_s16 (i16v);
i32vh = vmovl_s16 (vzipq_s16(i16v, i16v).val[0]);
//convert to 32-bit float
f32vl = vcvtq_f32_s32(i32vl);
f32vh = vcvtq_f32_s32(i32vh);
//multiply by scalar
f32vl = vmulq_n_f32(f32vl, CONVMYFLT);
f32vh = vmulq_n_f32(f32vh, CONVMYFLT);
//store in float buffer
vst1q_f32(floatBuffer[i], f32vl);
vst1q_f32(floatBuffer[i + 4], f32vh);
}
Should this work right?
I have doubts over i should use the low or high part of the interleaved vector returned by vmovl_s16:
i32vh = vmovl_s16 (vzipq_s1 6(i16v, i16v).val[ 0 ]); or
i32vh = vmovl_s16 (vzipq_s16(i16v, i16v).val[ 1 ]);
Once you have the SSE version, you can use the GCC ARM NEON intrinsics list to port SSE macros to NEON. https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/ARM-NEON-Intrinsics.html
So for example:
// Load unsigned short
uint16x4_t vld1_u16 (const uint16_t *)
// Convert to unsigned int
uint32x4_t vmovl_u16 (uint16x4_t)
// Convert to float
float32x4_t vcvtq_f32_s32 (int32x4_t)
// Multiply floats with a scalar
float32x4_t vmulq_n_f32 (float32x4_t, float32_t)
// Store results into a float buffer
void vst1q_f32 (float32_t *, float32x4_t)
I am working with Android Renderscript to analyze preview frames received from Camera2API. I intend to analyze each pixel and based on some rules(dependent on intensity and location of the pixel) I need to update a counter. I intend to use a ForEach loop but how do I get the pixel coordinates.
An example java loop would be.
for (int i = 0; i < 240; i++)
{
for (int j = 0; j < 320; j++)
{
tempPixelIntensity = image.getPixel(i,j);
x = i;
y = j ;
if(tempPixelIntensity=zzz&x<zzzandy<zzz)
{
counter++;
}
}
}
How would I go about doing the same in a renderscript? Thanks
You might try something like this:
#pragma rs_fp_relaxed // needed for some GPUs
uint32_t counter;
void RS_KERNEL process(uchar tempPixelIntensity, uint32_t x, uint32_t y)
{
if(tempPixelIntensity=zzz&x<zzzandy<zzz)
{
rsAtomicInc(&counter);
}
}
RS kernels are SPMD (single program multiple data). So you write only the inner part of your loop for a single pixel element and the framework does the looping.
In the java side you will do something like:
Type.Builder tb = new Type.Builder(rs, Element.U8(rs));
tb.setX(320);
tb.setY(240);
Allocation input = Allocation.createTyped(rs, tb.create(), Allocation.USAGE_SCRIPT);
script.forEach_process(input);
So the dimensions of the input allocation determines the bounds that the kernel will operate over. In this case x will vary from [0,319] and y will vary from [0,239]. The x,y parameters to the kernel are special parameters that are filled in by the RS runtime, similarly the tempPixelIntensity value will be populated by the value of the input allocation pixel at the given x,y coordinate.
I use the following code to transfer an array of float numbers to a renderscript kernel:
float[] bufName = new float[3];
bufName [0] = 255;
bufName [1] = 255;
bufName [2] = 0;
Allocation alloc1 = Allocation.createSized(mRs, Element.F32(mRs), 3);
alloc1.copy1DRangeFrom(0, 3, mtmd);
ScriptC_foo foo = new ScriptC_foo(mRs);
foo.set_gIn(alloc1);
And I have defined gIn in the foo.rs file as follows:
rs_allocation gIn;
I would like to work with 16 bit floating point numbers. I know that I should change the allocation creation to this:
Allocation alloc1 = Allocation.createSized(mRs, Element.F16(mRs), 3);
However, I cannot find a solution for copying the bufName array to the allocation. Any help is appreciated.
Java does not define half-precision floats, so you'll have to do your own manipulation to get this to work. If you use Float.valueOf(f).shortValue() (where f is the specific flow you'd like to represent as half-precision). This should properly cast down the float to the new bit size. Then create an Allocation of Element.F16 size. You should be able to copy an array of short values down to RenderScript to do the work.
I wrote a conversion from YUV_420_888 to Bitmap, considering the following logic (as I understand it):
To summarize the approach: the kernel’s coordinates x and y are congruent both with the x and y of the non-padded part of the Y-Plane (2d-allocation) and the x and y of the output-Bitmap. The U- and V-Planes, however, have a different structure than the Y-Plane, because they use 1 byte for coverage of 4 pixels, and, in addition, may have a PixelStride that is more than one, in addition they might also have a padding that can be different from that of the Y-Plane. Therefore, in order to access the U’s and V’s efficiently by the kernel I put them into 1-d allocations and created an index “uvIndex” that gives the position of the corresponding U- and V within that 1-d allocation, for given (x,y) coordinates in the (non-padded) Y-plane (and, so, the output Bitmap).
In order to keep the rs-Kernel lean, I excluded the padding area in the yPlane by capping the x-range via LaunchOptions (this reflects the RowStride of the y-plane which thus can be ignored WITHIN the kernel). So we just need to consider the uvPixelStride and uvRowStride within the uvIndex, i.e. the index used in order to access to the u- and v-values.
This is my code:
Renderscript Kernel, named yuv420888.rs
#pragma version(1)
#pragma rs java_package_name(com.xxxyyy.testcamera2);
#pragma rs_fp_relaxed
int32_t width;
int32_t height;
uint picWidth, uvPixelStride, uvRowStride ;
rs_allocation ypsIn,uIn,vIn;
// The LaunchOptions ensure that the Kernel does not enter the padding zone of Y, so yRowStride can be ignored WITHIN the Kernel.
uchar4 __attribute__((kernel)) doConvert(uint32_t x, uint32_t y) {
// index for accessing the uIn's and vIn's
uint uvIndex= uvPixelStride * (x/2) + uvRowStride*(y/2);
// get the y,u,v values
uchar yps= rsGetElementAt_uchar(ypsIn, x, y);
uchar u= rsGetElementAt_uchar(uIn, uvIndex);
uchar v= rsGetElementAt_uchar(vIn, uvIndex);
// calc argb
int4 argb;
argb.r = yps + v * 1436 / 1024 - 179;
argb.g = yps -u * 46549 / 131072 + 44 -v * 93604 / 131072 + 91;
argb.b = yps +u * 1814 / 1024 - 227;
argb.a = 255;
uchar4 out = convert_uchar4(clamp(argb, 0, 255));
return out;
}
Java side:
private Bitmap YUV_420_888_toRGB(Image image, int width, int height){
// Get the three image planes
Image.Plane[] planes = image.getPlanes();
ByteBuffer buffer = planes[0].getBuffer();
byte[] y = new byte[buffer.remaining()];
buffer.get(y);
buffer = planes[1].getBuffer();
byte[] u = new byte[buffer.remaining()];
buffer.get(u);
buffer = planes[2].getBuffer();
byte[] v = new byte[buffer.remaining()];
buffer.get(v);
// get the relevant RowStrides and PixelStrides
// (we know from documentation that PixelStride is 1 for y)
int yRowStride= planes[0].getRowStride();
int uvRowStride= planes[1].getRowStride(); // we know from documentation that RowStride is the same for u and v.
int uvPixelStride= planes[1].getPixelStride(); // we know from documentation that PixelStride is the same for u and v.
// rs creation just for demo. Create rs just once in onCreate and use it again.
RenderScript rs = RenderScript.create(this);
//RenderScript rs = MainActivity.rs;
ScriptC_yuv420888 mYuv420=new ScriptC_yuv420888 (rs);
// Y,U,V are defined as global allocations, the out-Allocation is the Bitmap.
// Note also that uAlloc and vAlloc are 1-dimensional while yAlloc is 2-dimensional.
Type.Builder typeUcharY = new Type.Builder(rs, Element.U8(rs));
//using safe height
typeUcharY.setX(yRowStride).setY(y.length / yRowStride);
Allocation yAlloc = Allocation.createTyped(rs, typeUcharY.create());
yAlloc.copyFrom(y);
mYuv420.set_ypsIn(yAlloc);
Type.Builder typeUcharUV = new Type.Builder(rs, Element.U8(rs));
// note that the size of the u's and v's are as follows:
// ( (width/2)*PixelStride + padding ) * (height/2)
// = (RowStride ) * (height/2)
// but I noted that on the S7 it is 1 less...
typeUcharUV.setX(u.length);
Allocation uAlloc = Allocation.createTyped(rs, typeUcharUV.create());
uAlloc.copyFrom(u);
mYuv420.set_uIn(uAlloc);
Allocation vAlloc = Allocation.createTyped(rs, typeUcharUV.create());
vAlloc.copyFrom(v);
mYuv420.set_vIn(vAlloc);
// handover parameters
mYuv420.set_picWidth(width);
mYuv420.set_uvRowStride (uvRowStride);
mYuv420.set_uvPixelStride (uvPixelStride);
Bitmap outBitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);
Allocation outAlloc = Allocation.createFromBitmap(rs, outBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
Script.LaunchOptions lo = new Script.LaunchOptions();
lo.setX(0, width); // by this we ignore the y’s padding zone, i.e. the right side of x between width and yRowStride
//using safe height
lo.setY(0, y.length / yRowStride);
mYuv420.forEach_doConvert(outAlloc,lo);
outAlloc.copyTo(outBitmap);
return outBitmap;
}
Testing on Nexus 7 (API 22) this returns nice color Bitmaps. This device, however, has trivial pixelstrides (=1) and no padding (i.e. rowstride=width). Testing on the brandnew Samsung S7 (API 23) I get pictures whose colors are not correct - except of the green ones. But the Picture does not show a general bias towards green, it just seems that non-green colors are not reproduced correctly. Note, that the S7 applies an u/v pixelstride of 2, and no padding.
Since the most crucial code line is within the rs-code the Access of the u/v planes uint uvIndex= (...) I think, there could be the problem, probably with incorrect consideration of pixelstrides here. Does anyone see the solution? Thanks.
UPDATE: I checked everything, and I am pretty sure that the code regarding the access of y,u,v is correct. So the problem must be with the u and v values themselves. Non green colors have a purple tilt, and looking at the u,v values they seem to be in a rather narrow range of about 110-150. Is it really possible that we need to cope with device specific YUV -> RBG conversions...?! Did I miss anything?
UPDATE 2: have corrected code, it works now, thanks to Eddy's Feedback.
Look at
floor((float) uvPixelStride*(x)/2)
which calculates your U,V row offset (uv_row_offset) from the Y x-coordinate.
if uvPixelStride = 2, then as x increases:
x = 0, uv_row_offset = 0
x = 1, uv_row_offset = 1
x = 2, uv_row_offset = 2
x = 3, uv_row_offset = 3
and this is incorrect. There's no valid U/V pixel value at uv_row_offset = 1 or 3, since uvPixelStride = 2.
You want
uvPixelStride * floor(x/2)
(assuming you don't trust yourself to remember the critical round-down behavior of integer divide, if you do then):
uvPixelStride * (x/2)
should be enough
With that, your mapping becomes:
x = 0, uv_row_offset = 0
x = 1, uv_row_offset = 0
x = 2, uv_row_offset = 2
x = 3, uv_row_offset = 2
See if that fixes the color errors. In practice, the incorrect addressing here would mean every other color sample would be from the wrong color plane, since it's likely that the underlying YUV data is semiplanar (so the U plane starts at V plane + 1 byte, with the two planes interleaved)
For people who encounter error
android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type
use buffer.capacity() instead of buffer.remaining()
and if you already made some operations on the image, you'll need to call rewind() method on the buffer.
Furthermore for anyone else getting
android.support.v8.renderscript.RSIllegalArgumentException: Array too
small for allocation type
I fixed it by changing yAlloc.copyFrom(y); to yAlloc.copy1DRangeFrom(0, y.length, y);
Posting full solution to convert YUV->BGR (can be adopted for other formats too) and also rotate image to upright using renderscript. Allocation is used as input and byte array is used as output. It was tested on Android 8+ including Samsung devices too.
Java
/**
* Renderscript-based process to convert YUV_420_888 to BGR_888 and rotation to upright.
*/
public class ImageProcessor {
protected final String TAG = this.getClass().getSimpleName();
private Allocation mInputAllocation;
private Allocation mOutAllocLand;
private Allocation mOutAllocPort;
private Handler mProcessingHandler;
private ScriptC_yuv_bgr mConvertScript;
private byte[] frameBGR;
public ProcessingTask mTask;
private ImageListener listener;
private Supplier<Integer> rotation;
public ImageProcessor(RenderScript rs, Size dimensions, ImageListener listener, Supplier<Integer> rotation) {
this.listener = listener;
this.rotation = rotation;
int w = dimensions.getWidth();
int h = dimensions.getHeight();
Type.Builder yuvTypeBuilder = new Type.Builder(rs, Element.YUV(rs));
yuvTypeBuilder.setX(w);
yuvTypeBuilder.setY(h);
yuvTypeBuilder.setYuvFormat(ImageFormat.YUV_420_888);
mInputAllocation = Allocation.createTyped(rs, yuvTypeBuilder.create(),
Allocation.USAGE_IO_INPUT | Allocation.USAGE_SCRIPT);
//keep 2 allocations to handle different image rotations
mOutAllocLand = createOutBGRAlloc(rs, w, h);
mOutAllocPort = createOutBGRAlloc(rs, h, w);
frameBGR = new byte[w*h*3];
HandlerThread processingThread = new HandlerThread(this.getClass().getSimpleName());
processingThread.start();
mProcessingHandler = new Handler(processingThread.getLooper());
mConvertScript = new ScriptC_yuv_bgr(rs);
mConvertScript.set_inWidth(w);
mConvertScript.set_inHeight(h);
mTask = new ProcessingTask(mInputAllocation);
}
private Allocation createOutBGRAlloc(RenderScript rs, int width, int height) {
//Stored as Vec4, it's impossible to store as Vec3, buffer size will be for Vec4 anyway
//using RGB_888 as alternative for BGR_888, can be just U8_3 type
Type.Builder rgbTypeBuilderPort = new Type.Builder(rs, Element.RGB_888(rs));
rgbTypeBuilderPort.setX(width);
rgbTypeBuilderPort.setY(height);
Allocation allocation = Allocation.createTyped(
rs, rgbTypeBuilderPort.create(), Allocation.USAGE_SCRIPT
);
//Use auto-padding to be able to copy to x*h*3 bytes array
allocation.setAutoPadding(true);
return allocation;
}
public Surface getInputSurface() {
return mInputAllocation.getSurface();
}
/**
* Simple class to keep track of incoming frame count,
* and to process the newest one in the processing thread
*/
class ProcessingTask implements Runnable, Allocation.OnBufferAvailableListener {
private int mPendingFrames = 0;
private Allocation mInputAllocation;
public ProcessingTask(Allocation input) {
mInputAllocation = input;
mInputAllocation.setOnBufferAvailableListener(this);
}
#Override
public void onBufferAvailable(Allocation a) {
synchronized(this) {
mPendingFrames++;
mProcessingHandler.post(this);
}
}
#Override
public void run() {
// Find out how many frames have arrived
int pendingFrames;
synchronized(this) {
pendingFrames = mPendingFrames;
mPendingFrames = 0;
// Discard extra messages in case processing is slower than frame rate
mProcessingHandler.removeCallbacks(this);
}
// Get to newest input
for (int i = 0; i < pendingFrames; i++) {
mInputAllocation.ioReceive();
}
int rot = rotation.get();
mConvertScript.set_currentYUVFrame(mInputAllocation);
mConvertScript.set_rotation(rot);
Allocation allocOut = rot==90 || rot== 270 ? mOutAllocPort : mOutAllocLand;
// Run processing
// ain allocation isn't really used, global frame param is used to get data from
mConvertScript.forEach_yuv_bgr(allocOut);
//Save to byte array, BGR 24bit
allocOut.copyTo(frameBGR);
int w = allocOut.getType().getX();
int h = allocOut.getType().getY();
if (listener != null) {
listener.onImageAvailable(frameBGR, w, h);
}
}
}
public interface ImageListener {
/**
* Called when there is available image, image is in upright position.
*
* #param bgr BGR 24bit bytes
* #param width image width
* #param height image height
*/
void onImageAvailable(byte[] bgr, int width, int height);
}
}
RS
#pragma version(1)
#pragma rs java_package_name(com.affectiva.camera)
#pragma rs_fp_relaxed
//Script convers YUV to BGR(uchar3)
//current YUV frame to read pixels from
rs_allocation currentYUVFrame;
//input image rotation: 0,90,180,270 clockwise
uint32_t rotation;
uint32_t inWidth;
uint32_t inHeight;
//method returns uchar3 BGR which will be set to x,y in output allocation
uchar3 __attribute__((kernel)) yuv_bgr(uint32_t x, uint32_t y) {
// Read in pixel values from latest frame - YUV color space
uchar3 inPixel;
uint32_t xRot = x;
uint32_t yRot = y;
//Do not rotate if 0
if (rotation==90) {
//rotate 270 clockwise
xRot = y;
yRot = inHeight - 1 - x;
} else if (rotation==180) {
xRot = inWidth - 1 - x;
yRot = inHeight - 1 - y;
} else if (rotation==270) {
//rotate 90 clockwise
xRot = inWidth - 1 - y;
yRot = x;
}
inPixel.r = rsGetElementAtYuv_uchar_Y(currentYUVFrame, xRot, yRot);
inPixel.g = rsGetElementAtYuv_uchar_U(currentYUVFrame, xRot, yRot);
inPixel.b = rsGetElementAtYuv_uchar_V(currentYUVFrame, xRot, yRot);
// Convert YUV to RGB, JFIF transform with fixed-point math
// R = Y + 1.402 * (V - 128)
// G = Y - 0.34414 * (U - 128) - 0.71414 * (V - 128)
// B = Y + 1.772 * (U - 128)
int3 bgr;
//get red pixel and assing to b
bgr.b = inPixel.r +
inPixel.b * 1436 / 1024 - 179;
bgr.g = inPixel.r -
inPixel.g * 46549 / 131072 + 44 -
inPixel.b * 93604 / 131072 + 91;
//get blue pixel and assign to red
bgr.r = inPixel.r +
inPixel.g * 1814 / 1024 - 227;
// Write out
return convert_uchar3(clamp(bgr, 0, 255));
}
On a Samsung Galaxy Tab 5 (Tablet), android version 5.1.1 (22), with alleged YUV_420_888 format, the following renderscript math works well and produces correct colors:
uchar yValue = rsGetElementAt_uchar(gCurrentFrame, x + y * yRowStride);
uchar vValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) );
uchar uValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) + (xSize * ySize) / 4);
I do not understand why the horizontal value (i.e., y) is scaled by a factor of four instead of two, but it works well. I also needed to avoid use of rsGetElementAtYuv_uchar_Y|U|V. I believe the associated allocation stride value is set to zero instead of something proper. Use of rsGetElementAt_uchar() is a reasonable work-around.
On a Samsung Galaxy S5 (Smart Phone), android version 5.0 (21), with alleged YUV_420_888 format, I cannot recover the u and v values, they come through as all zeros. This results in a green looking image. Luminous is OK, but image is vertically flipped.
This code requires the use of the RenderScript compatibility library (android.support.v8.renderscript.*).
In order to get the compatibility library to work with Android API 23, I updated to gradle-plugin 2.1.0 and Build-Tools 23.0.3 as per Miao Wang's answer at How to create Renderscript scripts on Android Studio, and make them run?
If you follow his answer and get an error "Gradle version 2.10 is required" appears, do NOT change
classpath 'com.android.tools.build:gradle:2.1.0'
Instead, update the distributionUrl field of the Project\gradle\wrapper\gradle-wrapper.properties file to
distributionUrl=https\://services.gradle.org/distributions/gradle-2.10-all.zip
and change File > Settings > Builds,Execution,Deployment > Build Tools > Gradle >Gradle to Use default gradle wrapper as per "Gradle Version 2.10 is required." Error.
Re: RSIllegalArgumentException
In my case this was the case that buffer.remaining() was not multiple of stride:
The length of last line was less than stride (i.e. only up to where actual data was.)
An FYI in case someone else gets this as I was also getting "android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type" when trying out the code. In my case it turns out that the when allocating the buffer for Y i had to rewind the buffer because it was being left at the wrong end and wasn't copying the data. By doing buffer.rewind(); before allocation the new bytes array makes it work fine now.
I managed to write a Kernel that transforms an input-Bitmap to a float[] of Sobel gradients (two separate Kernels for SobelX and SobelY). I did this by assigning the input-Bitmap as a global variable and then passing the Kernel based on the Output allocation and referencing the neighbors of the Input-Bitmap via rsGetElementAt. Since I actually want to calculate the Magnitude (hypot(Sx,Sy) AND the Direction (atan2(Sy,Sx)) it would be nice to do the whole thing in one Kernel-pass. If I only had to calculate the Magnitude Array, this could be done with the same structure (1 intput Bitmap, 1 Output float[]). Now I wonder, whether it is possible to just add an additional Allocation for the Direction Output (also a float[]). I tried this with the rs-function rsSetElementAt() as follows:
#pragma version(1)
#pragma rs java_package_name(com.example.xxx)
#pragma rs_fp_relaxed
rs_allocation gIn, direction;
int32_t width;
int32_t height;
// Sobel, Magnitude und Direction
float __attribute__((kernel)) sobel_XY(uint32_t x, uint32_t y) {
float outX=0, outY=0;
if (x>0 && y>0 && x<(width-1) && y<(height-1)){
uchar4 c11=rsGetElementAt_uchar4(gIn, x-1, y-1); uchar4 c12=rsGetElementAt_uchar4(gIn, x-1, y);uchar4 c13=rsGetElementAt_uchar4(gIn, x-1, y+1);
uchar4 c21=rsGetElementAt_uchar4(gIn, x, y-1);uchar4 c23=rsGetElementAt_uchar4(gIn, x, y+1);
uchar4 c31=rsGetElementAt_uchar4(gIn, x+1, y-1);uchar4 c32=rsGetElementAt_uchar4(gIn, x+1, y);uchar4 c33=rsGetElementAt_uchar4(gIn, x+1, y+1);
float4 f11=rsUnpackColor8888(c11);float4 f12=rsUnpackColor8888(c12);float4 f13=rsUnpackColor8888(c13);
float4 f21=rsUnpackColor8888(c21); float4 f23=rsUnpackColor8888(c23);
float4 f31=rsUnpackColor8888(c31);float4 f32=rsUnpackColor8888(c32);float4 f33=rsUnpackColor8888(c33);
outX= f11.r-f31.r + 2*(f12.r-f32.r) + f13.r-f33.r;
outY= f11.r-f13.r + 2*(f21.r-f23.r) + f31.r-f33.r;
float d = atan2(outY, outX);
rsSetElementAt_float(direction, d, x, y);
return hypot(outX, outY);
}
}
And the corresponding Java code:
ScriptC_sobel script;
script=new ScriptC_sobel(rs);
script.set_gIn(Allocation.createFromBitmap(rs, bmpGray));
Type.Builder TypeOut = new Type.Builder(rs, Element.F32(rs));
TypeOut.setX(width).setY(height);
Allocation outAllocation = Allocation.createTyped(rs, TypeOut.create());
// the following 3 lines shall reflect the allocation to the Direction output
Type.Builder TypeDir = new Type.Builder(rs, Element.F32(rs));
TypeDir.setX(width).setY(height);
Allocation dirAllocation = Allocation.createTyped(rs, TypeDir.create());
script.forEach_sobel_XY(outAllocation);
outAllocation.copyTo(gm) ;
dirAllocation.copyTo(gd);
Unfortunately this does not work. I am not sure, whether the problem is with the structural logic of the rs-kernel or is it because I cannot use a second Type.Builder assignment within the Java code (because the kernel is already tied to the Magnitude Output-allocation). Any help is highly appreciated.
PS: I see that there is no link between the second Type.Builder assignment and the "direction" allocaton in rs - but how can this be achieved?
The outAllocation is passed as a parameter to the kernel. But the existence and location of dirAllocation also has to be communicated to the Renderscript side. Do this just before starting the script:
script.set_direction(dirAllocation);
Also, read about memory allocation in Renderscript.