How to write a convolution multiplication in Android Renderscript? - android

I am new to Android Renderscript.
I need to write a convolution multiplication in RenderScript since the final application is going to run on Android. Data stream is going to be an image.
More specifically, I am not able to write the core logic using forEach functionality, though I can do it in Java, but speed it too slow!
Please help!
Steve

During the rsForEach call (or other Renderscript function), you can access the neighbouring pixels of the original image (or whatever type of data you are using) by binding the original image allocation to a pointer within the Renderscript where it can then be accessed as an array. Here is an example based upon the HelloCompute example:
#pragma version(1)
#pragma rs java_package_name(com.android.example.hellocompute)
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
static int mImageWidth;
const uchar4 *gPixels;
const float4 kWhite = {
1.0f, 1.0f, 1.0f, 1.0f
};
const float4 kBlack = {
0.0f, 0.0f, 0.0f, 1.0f
};
void init() {
}
static const int kBlurWidth = 20;
static const float kMultiplier = 1.0f / (float)(kBlurWidth * 2 + 1);
void root(const uchar4 *v_in, uchar4 *v_out, const void *usrData, uint32_t x, uint32_t y) {
float4 original = rsUnpackColor8888(*v_in);
float4 colour = original * kMultiplier;
int y_component = mImageWidth * y;
for ( int i = -kBlurWidth; i < 0; i++) {
float4 temp_colour;
if ( (int)x + i >= 0) {
temp_colour = rsUnpackColor8888(gPixels[x+i + y_component]);
}
else {
temp_colour = kWhite;
}
colour += temp_colour * kMultiplier;
}
for ( int i = 1; i <= kBlurWidth; i++) {
float4 temp_colour;
if ( x + i < mImageWidth) {
temp_colour = rsUnpackColor8888(gPixels[x+i + y_component]);
}
else {
temp_colour = kWhite;
}
colour += temp_colour * kMultiplier;
}
colour.a = 1.0f;
*v_out = rsPackColorTo8888(colour);
}
void filter() {
mImageWidth = rsAllocationGetDimX(gIn);
rsDebug("Image size is ", rsAllocationGetDimX(gIn), rsAllocationGetDimY(gOut));
rsForEach(gScript, gIn, gOut, NULL);
}
Called from the following Java. Note the call to mScript.bind_gPixels(mInAllocation) which binds the original image data to the gPixel pointer in the Renderscript and, therefore, makes the image data available as an array.
mRS = RenderScript.create(this);
mInAllocation = Allocation.createFromBitmap(mRS, mBitmapIn,
Allocation.MipmapControl.MIPMAP_NONE,
Allocation.USAGE_SCRIPT);
mOutAllocation = Allocation.createTyped(mRS, mInAllocation.getType());
mScript = new ScriptC_blur(mRS, getResources(), R.raw.blur);
mScript.bind_gPixels(mInAllocation);
mScript.set_gIn(mInAllocation);
mScript.set_gOut(mOutAllocation);
mScript.set_gScript(mScript);
mScript.invoke_filter();
mOutAllocation.copyTo(mBitmapOut);

Related

Unable to use remap in c++ native in Android JNI

I am developing an Android application using Java. In my application, I am doing some image processing. So I am using c++ and Open CV for it and calling the c++ function through JNI. I am trying to convert equirectangular/spherical image to cubemap image.
I found this link for conversion, https://code.i-harness.com/en/q/1c4dbae/. I am passing Mat from Java and trying to return the converted image back to Java.
This is my C++ code
#include <jni.h>
#include <string>
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/features2d/features2d.hpp>
using namespace std;
using namespace cv;
extern "C"
JNIEXPORT jstring
JNICALL
Java_media_memento_memento_VRPhotoSphereActivity_convertEquiRectToCubeMap(
JNIEnv *env,
jobject /* this */, jlong addrMat, jlong addrNewMat) {
Mat& mat = *(Mat*)addrMat;
Mat& newMat = *(Mat*)addrNewMat;
newMat.create(mat.rows, mat.cols, mat.type());
memcpy(newMat.data, mat.data , sizeof(mat.data) -1);
//EquiRec to Cubemap conversion starts from here
float faceTransform[6][2] =
{
{0, 0},
{M_PI / 2, 0},
{M_PI, 0},
{-M_PI / 2, 0},
{0, -M_PI / 2},
{0, M_PI / 2}
};
//conversion ends here
const Mat &in= mat;
Mat face = newMat;
int faceId = 0;
const int width = -1;
const int height = -1;
float inWidth = in.cols;
float inHeight = in.rows;
// Allocate map
Mat mapx(height, width, CV_32F);
Mat mapy(height, width, CV_32F);
// Calculate adjacent (ak) and opposite (an) of the
// triangle that is spanned from the sphere center
//to our cube face.
const float an = sin(M_PI / 4);
const float ak = cos(M_PI / 4);
const float ftu = faceTransform[faceId][0];
const float ftv = faceTransform[faceId][1];
// For each point in the target image,
// calculate the corresponding source coordinates.
for(int y = 0; y < height; y++) {
for(int x = 0; x < width; x++) {
// Map face pixel coordinates to [-1, 1] on plane
float nx = (float)y / (float)height - 0.5f;
float ny = (float)x / (float)width - 0.5f;
nx *= 2;
ny *= 2;
// Map [-1, 1] plane coords to [-an, an]
// thats the coordinates in respect to a unit sphere
// that contains our box.
nx *= an;
ny *= an;
float u, v;
// Project from plane to sphere surface.
if(ftv == 0) {
// Center faces
u = atan2(nx, ak);
v = atan2(ny * cos(u), ak);
u += ftu;
} else if(ftv > 0) {
// Bottom face
float d = sqrt(nx * nx + ny * ny);
v = M_PI / 2 - atan2(d, ak);
u = atan2(ny, nx);
} else {
// Top face
float d = sqrt(nx * nx + ny * ny);
v = -M_PI / 2 + atan2(d, ak);
u = atan2(-ny, nx);
}
// Map from angular coordinates to [-1, 1], respectively.
u = u / (M_PI);
v = v / (M_PI / 2);
// Warp around, if our coordinates are out of bounds.
while (v < -1) {
v += 2;
u += 1;
}
while (v > 1) {
v -= 2;
u += 1;
}
while(u < -1) {
u += 2;
}
while(u > 1) {
u -= 2;
}
// Map from [-1, 1] to in texture space
u = u / 2.0f + 0.5f;
v = v / 2.0f + 0.5f;
u = u * (inWidth - 1);
v = v * (inHeight - 1);
// Save the result for this pixel in map
mapx.at<float>(x, y) = u;
mapy.at<float>(x, y) = v;
}
}
// Recreate output image if it has wrong size or type.
if(face.cols != width || face.rows != height ||
face.type() != in.type()) {
face = Mat(width, height, in.type());
}
// Do actual resampling using OpenCV's remap
Mat i = in;
Mat f = face;
remap(i, f, mapx, mapy,
CV_INTER_LINEAR, BORDER_CONSTANT, Scalar(0, 0, 0));
//send the image back here. For now the feature is not implemented yet.
std::string hello = "Spherical equirectangular photo converted to cubemap face photo";
return env->NewStringUTF(hello.c_str());
}
When I tried to run my application, it is giving me this compilation error.
Error:(146) undefined reference to `cv::remap(cv::_InputArray const&, cv::_OutputArray const&, cv::_InputArray const&, cv::_InputArray const&, int, int, cv::Scalar_<double> const&)'
This is the screenshot.
How can I fix that error?
Edit
Actually, it is throwing error starting from this line
Mat mapx(height, width, CV_32F);
Mat mapy(height, width, CV_32F);
This is the screenshot

RenderScript wrongly manipulating output of kernel

I'm trying to use Android's RenderScript to render a semi-transparent circle behind an image, but things go very wrong when returning a value from the RenderScript kernel.
This is my kernel:
#pragma version(1)
#pragma rs java_package_name(be.abyx.aurora)
// We don't need very high precision floating points
#pragma rs_fp_relaxed
// Center position of the circle
int centerX = 0;
int centerY = 0;
// Radius of the circle
int radius = 0;
// Destination colour of the background can be set here.
float destinationR;
float destinationG;
float destinationB;
float destinationA;
static int square(int input) {
return input * input;
}
uchar4 RS_KERNEL circleRender(uchar4 in, uint32_t x, uint32_t y) {
//Convert input uchar4 to float4
float4 f4 = rsUnpackColor8888(in);
// Check if the current coordinates fall inside the circle
if (square(x - centerX) + square(y - centerY) < square(radius)) {
// Check if current position is transparent, we then need to add the background!)
if (f4.a == 0) {
uchar4 temp = rsPackColorTo8888(0.686f, 0.686f, 0.686f, 0.561f);
return temp;
}
}
return rsPackColorTo8888(f4);
}
Now, the rsPackColorTo8888() function takes 4 floats with a value between 0.0 and 1.0. The resulting ARGB-color is then found by calculating 255 times each float value. So the given floats correspond to the color R = 0.686 * 255 = 175, G = 0.686 * 255 = 175, B = 0.686 * 255 = 175 and A = 0.561 * 255 = 143.
The rsPackColorTo8888() function itself works correctly, but when the found uchar4 value is returned from the kernel, something really weird happens. The R, G and B value changes to respectively Red * Alpha = 56, Green * Alpha = 56 and Blue * Alpha = 56 where Alpha is 0.561. This means that no value of R, G and B can ever be larger than A = 0.561 * 255.
Setting the output manually, instead of using rsPackColorTo8888() yields exact the same behavior. I mean that following code produces the exact same result, which in turn proofs that rsPackColorTo8888() is not the problem:
if (square(x - centerX) + square(y - centerY) < square(radius)) {
// Check if current position is transparent, we then need to add the background!)
if (f4.a == 0) {
uchar4 temp;
temp[0] = 175;
temp[1] = 175;
temp[2] = 175;
temp[3] = 143;
return temp;
}
}
This is the Java-code from which the script is called:
#Override
public Bitmap renderParallel(Bitmap input, int backgroundColour, int padding) {
ResizeUtility resizeUtility = new ResizeUtility();
// We want to end up with a square Bitmap with some padding applied to it, so we use the
// the length of the largest dimension (width or height) as the width of our square.
int dimension = resizeUtility.getLargestDimension(input.getWidth(), input.getHeight()) + 2 * padding;
Bitmap output = resizeUtility.createSquareBitmapWithPadding(input, padding);
output.setHasAlpha(true);
RenderScript rs = RenderScript.create(this.context);
Allocation inputAlloc = Allocation.createFromBitmap(rs, output);
Type t = inputAlloc.getType();
Allocation outputAlloc = Allocation.createTyped(rs, t);
ScriptC_circle_render circleRenderer = new ScriptC_circle_render(rs);
circleRenderer.set_centerX(dimension / 2);
circleRenderer.set_centerY(dimension / 2);
circleRenderer.set_radius(dimension / 2);
circleRenderer.set_destinationA(((float) Color.alpha(backgroundColour)) / 255.0f);
circleRenderer.set_destinationR(((float) Color.red(backgroundColour)) / 255.0f);
circleRenderer.set_destinationG(((float) Color.green(backgroundColour)) / 255.0f);
circleRenderer.set_destinationB(((float) Color.blue(backgroundColour)) / 255.0f);
circleRenderer.forEach_circleRender(inputAlloc, outputAlloc);
outputAlloc.copyTo(output);
inputAlloc.destroy();
outputAlloc.destroy();
circleRenderer.destroy();
rs.destroy();
return output;
}
When alpha is set to 255 (or 1.0 as a float), the returned color-values (inside my application's Java-code) are correct.
Am I doing something wrong, or is this really a bug somewhere in the RenderScript-implementation?
Note: I've checked and verified this behavior on a Oneplus 3T (Android 7.1.1), a Nexus 5 (Android 7.1.2), Android-emulator version 7.1.2 and 6.0
Instead of passing the values with the type:
uchar4 temp = rsPackColorTo8888(0.686f, 0.686f, 0.686f, 0.561f);
Trying creating a float4 and passing that.
float4 newFloat4 = { 0.686, 0.686, 0.686, 0.561 };
uchar4 temp = rsPackColorTo8888(newFloat4);

ScriptIntrinsicYuvToRGB or YUV420 Allocation is broken?

I try to use ScriptIntrinsicYuvToRGB class from renderscript to make YUV to RGB conversion, where source is in YUV420 format.
I have 3 raw planes which I read from files and try to feed them into YUV-kind of Allocation, and pass it through ScriptIntrinsicYuvToRGB.forEach.
It converts luma (Y plane) correctly, but fails on colors because chroma channels seem to read all values from buf[w*h] location - see commented part in code sample. It looks like bug when Allocation doesn't properly address UV planes. I assume so because I tested in a script using rsGetElementAtYuv_uchar_U function on the allocation, and it gives the same value (from buf[w*h]) for any coordinates.
I searched all places if I could further specify YUV format such as strides/offsets etc, but didn't find anything more that setting Element.DataKind.PIXEL_YUV and Type.Builder.setYuvFormat(ImageFormat.YUV_420_888).
Can someone help with this?
{
int w = 320, h = 172;
ScriptIntrinsicYuvToRGB yc = ScriptIntrinsicYuvToRGB.create(rs, Element.U8_4(rs));
{
Element elemYUV = Element.createPixel(rs, Element.DataType.UNSIGNED_8, Element.DataKind.PIXEL_YUV);
Type typeYUV = new Type.Builder(rs, elemYUV).setX(w).setY(h).setYuvFormat(ImageFormat.YUV_420_888).create();
Allocation yuv = Allocation.createTyped(rs, typeYUV);
byte[] buf = new byte[yuv.getBytesSize()];
int offs = 0;
for(int i=0; i<3; i++){
int sz = w*h;
if(i>0)
sz /= 4;
InputStream is = new FileInputStream("/sdcard/yuv/"+(i==0 ? 'y' : i==1 ? 'u' : 'v'));
int n = is.read(buf, offs, sz);
if(n!=sz)
throw new AssertionError("!");
offs += sz;
is.close();
}
// buf[w*h] = 0x40;
yuv.copyFrom(buf);
yc.setInput(yuv);
}
Type outType = new Type.Builder(rs, Element.U8_4(rs)).setX(w).setY(h).create();
Allocation out = Allocation.createTyped(rs, outType);
yc.forEach(out);
int[] buf = new int[out.getBytesSize()/4];
out.copy1DRangeToUnchecked(0, w*h, buf);
bm = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888);
bm.setPixels(buf, 0, bm.getWidth(), 0, 0, bm.getWidth(), bm.getHeight());
iv.setImageBitmap(bm);
yc.destroy();
}
I believe that you need setYuvFormat() for your Type. Here are the two lines that I use to build my Allocation:
Type typeYUV = Type.Builder(rs, Element.YUV(rs)).setYuvFormat(ImageFormat.NV21).create();
Allocation yuv = Allocation.createSized(rs, typeYUV.element, width*height*3/2);
One solution is to just fill in a U8 allocation and do the indexing yourself in a custom script:
#pragma rs_fp_relaxed
rs_allocation yuv_in;
uint32_t width;
uint32_t offset_to_u;
uint32_t offset_to_v;
uchar4 RS_KERNEL yuv_to_rgba(uint32_t x, uint32_t y) {
uint32_t index = y * width + x;
uint32_t uv_index = (y >> 1) * width + (x >> 1);
float Y = (float)rsGetElementAt_uchar(yuv_in, index);
float U = (float)rsGetElementAt_uchar(yuv_in, uv_index + offset_to_u);
float V = (float)rsGetElementAt_uchar(yuv_in, uv_index + offset_to_v);
float3 f_out;
f_out.r = Y + 1.403f * V;
f_out.g = Y - 0.344f * U - 0.714f * V;
f_out.b = Y + 1.770f * U;
f_out = clamp(f_out, 0.f, 255.f);
uchar4 out;
out.rgb = convert_uchar3(f_out);
out.a = 255;
return out;
}
java:
sc.set_yuv_in(yuv_allocation);
sc.set_width(width);
sc.set_offset_to_u(width * height);
sc.set_offset_to_v(width * height + (width/2 * height/2));
sc.forEach_yuv_to_rba(out);
YUV_420_888 is more of a generic YUV type for importing from other YUV resources in android. I don't think there is a way to set the stride/offset values for the u/v planes to make it useful for a custom conversion.

Android Renderscript set neighbor pixel transparent

I have a script, that clear pixels certain color.
uchar red = 100;
uchar green = 100;
uchar blue = 100;
float treshold = 100;
uchar4 __attribute__((kernel)) saturation(uchar4 in,uint32_t x, uint32_t y)
{
float ddd = ((in.r - red)*(in.r - red) + (in.g - green)*(in.g - green) + (in.b - blue)*(in.b - blue));
float dif = sqrt( ddd );
if (dif <= treshold){
in.a = 0;
in.r = 0;
in.g = 0;
in.b = 0;
}
return in;
}
That I run in Java lile:
mScript.set_red((short)r);
mScript.set_blue((short)b);
mScript.set_green((short)g);
mScript.set_treshold(treshold);
mScript.forEach_saturation(mInAllocation, mOutAllocations);
It works, but I need clear pixel neighbor with certain color pixel in RenderScript? In saturation we processing every pixels, and I don't know how to get access to all pixels.
Use a global rs_allocation variable and then use the rsGetElementAt_uchar4 function to sample the image at other locations:
#pragma rs_fp_relaxed
rs_allocation image;
int width_minus_one;
void RS_KERNEL root(uchar4 in, uint32_t x, uint32_t y) {
int newX = min(x + 1, width_minus_one);
uchar4 pixel = rsGetElementAt_uchar4(image, newX, y);
}
Java:
mScript.set_image(mInAllocation);
mScript.set_width_minus_one(mInAllocation.getType().getX() - 1);

why the ScriptIntrinsicBlur is faster than my method?

i use the Renderscript to do the gaussian blur on a image.
but no matter what i did. the ScriptIntrinsicBlur is more more faster.
why this happened? ScriptIntrinsicBlur is using another method?
this id my RS code:
#pragma version(1)
#pragma rs java_package_name(top.deepcolor.rsimage.utils)
//aussian blur algorithm.
//the max radius of gaussian blur
static const int MAX_BLUR_RADIUS = 1024;
//the ratio of pixels when blur
float blurRatio[(MAX_BLUR_RADIUS << 2) + 1];
//the acquiescent blur radius
int blurRadius = 0;
//the width and height of bitmap
uint32_t width;
uint32_t height;
//bind to the input bitmap
rs_allocation input;
//the temp alloction
rs_allocation temp;
//set the radius
void setBlurRadius(int radius)
{
if(1 > radius)
radius = 1;
else if(MAX_BLUR_RADIUS < radius)
radius = MAX_BLUR_RADIUS;
blurRadius = radius;
/**
calculate the blurRadius by Gaussian function
when the pixel is far way from the center, the pixel will not contribute to the center
so take the sigma is blurRadius / 2.57
*/
float sigma = 1.0f * blurRadius / 2.57f;
float deno = 1.0f / (sigma * sqrt(2.0f * M_PI));
float nume = -1.0 / (2.0f * sigma * sigma);
//calculate the gaussian function
float sum = 0.0f;
for(int i = 0, r = -blurRadius; r <= blurRadius; ++i, ++r)
{
blurRatio[i] = deno * exp(nume * r * r);
sum += blurRatio[i];
}
//normalization to 1
int len = radius + radius + 1;
for(int i = 0; i < len; ++i)
{
blurRatio[i] /= sum;
}
}
/**
the gaussian blur is decomposed two steps:1
1.blur in the horizontal
2.blur in the vertical
*/
uchar4 RS_KERNEL horizontal(uint32_t x, uint32_t y)
{
float a, r, g, b;
for(int k = -blurRadius; k <= blurRadius; ++k)
{
int horizontalIndex = x + k;
if(0 > horizontalIndex) horizontalIndex = 0;
if(width <= horizontalIndex) horizontalIndex = width - 1;
uchar4 inputPixel = rsGetElementAt_uchar4(input, horizontalIndex, y);
int blurRatioIndex = k + blurRadius;
a += inputPixel.a * blurRatio[blurRatioIndex];
r += inputPixel.r * blurRatio[blurRatioIndex];
g += inputPixel.g * blurRatio[blurRatioIndex];
b += inputPixel.b * blurRatio[blurRatioIndex];
}
uchar4 out;
out.a = (uchar) a;
out.r = (uchar) r;
out.g = (uchar) g;
out.b = (uchar) b;
return out;
}
uchar4 RS_KERNEL vertical(uint32_t x, uint32_t y)
{
float a, r, g, b;
for(int k = -blurRadius; k <= blurRadius; ++k)
{
int verticalIndex = y + k;
if(0 > verticalIndex) verticalIndex = 0;
if(height <= verticalIndex) verticalIndex = height - 1;
uchar4 inputPixel = rsGetElementAt_uchar4(temp, x, verticalIndex);
int blurRatioIndex = k + blurRadius;
a += inputPixel.a * blurRatio[blurRatioIndex];
r += inputPixel.r * blurRatio[blurRatioIndex];
g += inputPixel.g * blurRatio[blurRatioIndex];
b += inputPixel.b * blurRatio[blurRatioIndex];
}
uchar4 out;
out.a = (uchar) a;
out.r = (uchar) r;
out.g = (uchar) g;
out.b = (uchar) b;
return out;
}
Renderscript intrinsics are implemented very differently from what you can achieve with a script of your own. This is for several reasons, but mainly because they are built by the RS driver developer of individual devices in a way that makes the best possible use of that particular hardware/SoC configuration, and most likely makes low level calls to the hardware that is simply not available at the RS programming layer.
Android does provide a generic implementation of these intrinsics though, to sort of "fall back" in case no lower hardware implementation is available. Seeing how these generic ones are done will give you some better idea of how these intrinsics work. For example, you can see the source code of the generic implementation of the 3x3 convolution intrinsic here rsCpuIntrinsicConvolve3x3.cpp.
Take a very close look at the code starting from line 98 of that source file, and notice how they use no for loops whatsoever to do the convolution. This is known as unrolled loops, where you add and multiply explicitly the 9 corresponding memory locations in the code, thereby avoiding the need of a for loop structure. This is the first rule you must take into account when optimizing parallel code. You need to get rid of all branching in your kernel. Looking at your code, you have a lot of if's and for's that cause branching -- this means the control flow of the program is not straight through from beginning to end.
If you unroll your for loops, you will immediately see a boost in performance. Note that by removing your for structures you will no longer be able to generalize your kernel for all possible radius amounts. In that case, you would have to create fixed kernels for different radii, and this is exactly why you see separate 3x3 and 5x5 convolution intrinsics, because this is just what they do. (See line 99 of the 5x5 intrinsic at rsCpuIntrinsicConvolve5x5.cpp).
Furthermore, the fact that you have two separate kernels doesn't help. If you're doing a gaussian blur, the convolutional kernel is indeed separable and you can do 1xN + Nx1 convolutions as you've done there, but I would recommend putting both passes together in the same kernel.
Keep in mind though, that even doing these tricks will probably still not give you as fast results as the actual intrinsics, because those have probably been highly optimized for your specific device(s).

Categories

Resources