I am trying to create an image processing library, built in renderscript. I have been playing with the android samples for Renderscript here.
Renderscript appears to have everything I need to create this library, unfortunately I cannot seem to get many of the examples to work for me.
The ImageProcecssing example is a good example of how things tend to work for me. Most of the Script Intrinsics work out of the box, no errors. However, as soon as I move to a ScriptC file, even doing basic thngs tends to fail. And by fail, I mean
Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 21581 (enderscripttest)
So, to help debug, I have created a github repo with literally the most basic example I could come up with. It basically just attempts to apply a brightness filter to an imageview.
Relevant code:
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
imageView = (ImageView)findViewById(R.id.image);
BitmapFactory.Options opts = new BitmapFactory.Options();
opts.inSampleSize = 8;
originalBitmap = BitmapFactory.decodeResource(getResources(),R.drawable.colors,opts);
filteredBitmap = Bitmap.createBitmap(originalBitmap.getWidth(),originalBitmap.getHeight(), originalBitmap.getConfig());
//RENDERSCRIPT ALLOCATION
mRS = RenderScript.create(this);
mInAllocation = Allocation.createFromBitmap(mRS, originalBitmap,Allocation.MipmapControl.MIPMAP_NONE,Allocation.USAGE_SCRIPT);
mOutAllocation = Allocation.createTyped(mRS, mInAllocation.getType());
mOutAllocation.copyFrom(originalBitmap);
mOutAllocation.copyTo(filteredBitmap);
ScriptC_brightnessfilter helloworldScript = new ScriptC_brightnessfilter(mRS);
helloworldScript.set_brightnessValue(4.0f);
helloworldScript.bind_gPixels(mInAllocation);
helloworldScript.set_gIn(mInAllocation);
helloworldScript.set_gOut(mOutAllocation);
helloworldScript.set_gScript(helloworldScript);
helloworldScript.invoke_filter();
mOutAllocation.copyTo(filteredBitmap);
}
Then a renderscript file
#pragma version(1)
#pragma rs java_package_name(com.dss.renderscripttest)
float brightnessValue;
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
static int mImageWidth;
const uchar4 *gPixels;
void root(const uchar4 *v_in, uchar4 *v_out, const void *usrData, uint32_t x, uint32_t y) {
float4 apixel = rsUnpackColor8888(*v_in);
float3 pixel = apixel.rgb;
float factor = brightnessValue;
pixel = pixel + factor;
pixel = clamp(pixel,0.0f,1.0f);
*v_out = rsPackColorTo8888(pixel.rgb);
}
void filter() {
mImageWidth = rsAllocationGetDimX(gIn);
rsDebug("Image size is ", rsAllocationGetDimX(gIn), rsAllocationGetDimY(gOut));
rsForEach(gScript, gIn, gOut, 0, 0);
}
I have only tested this on the Galaxy S3 and S4. I will test on the Nexus 4 tonight and see if I get a different result.
Edit:
I have confirmed that this code works on the Nexus 4. I will run through some other devices for good measure. Also I will see if I can get an APK together in case Stephen Hines or Tim Murray wants to look, but right now it seems only Galaxy S3 and S4 (both 4.3) are effected
Edit 2: I believe this is the same issue that is happening over here. I will update as both issues progress
Related
I have this overheat issue, that it turns off my phone after running for a couple of hours. I want to run this 24/7, please help me to improve this:
I use Camera2 interface, RAW format followed by a renderscript to convert YUV420888 to rgba. My renderscript is as below:
#pragma version(1)
#pragma rs java_package_name(com.sensennetworks.sengaze)
#pragma rs_fp_relaxed
rs_allocation gCurrentFrame;
rs_allocation gByteFrame;
int32_t gFrameWidth;
uchar4 __attribute__((kernel)) yuv2RGBAByteArray(uchar4 prevPixel,uint32_t x,uint32_t y)
{
// Read in pixel values from latest frame - YUV color space
// The functions rsGetElementAtYuv_uchar_? require API 18
uchar4 curPixel;
curPixel.r = rsGetElementAtYuv_uchar_Y(gCurrentFrame, x, y);
curPixel.g = rsGetElementAtYuv_uchar_U(gCurrentFrame, x, y);
curPixel.b = rsGetElementAtYuv_uchar_V(gCurrentFrame, x, y);
// uchar4 rsYuvToRGBA_uchar4(uchar y, uchar u, uchar v);
// This function uses the NTSC formulae to convert YUV to RBG
uchar4 out = rsYuvToRGBA_uchar4(curPixel.r, curPixel.g, curPixel.b);
rsSetElementAt_uchar(gByteFrame, out.r, 4 * (y*gFrameWidth + x) + 0 );
rsSetElementAt_uchar(gByteFrame, out.g, 4 * (y*gFrameWidth + x) + 1 );
rsSetElementAt_uchar(gByteFrame, out.b, 4 * (y*gFrameWidth + x) + 2 );
rsSetElementAt_uchar(gByteFrame, 255, 4 * (y*gFrameWidth + x) + 3 );
return out;
}
This is where I call the renderscript to convert to rgba:
#Override
public void onBufferAvailable(Allocation a) {
inputAllocation.ioReceive();
// Run processing pass if we should send a frame
final long current = System.currentTimeMillis();
if ((current - lastProcessed) >= frameEveryMs) {
yuv2rgbaScript.forEach_yuv2RGBAByteArray(scriptAllocation, outputAllocation);
if (rgbaByteArrayCallback != null) {
outputAllocationByte.copyTo(outBufferByte);
rgbaByteArrayCallback.onRGBAArrayByte(outBufferByte);
}
lastProcessed = current;
}
}
And this is the callback to run image processing using OpenCV:
#Override
public void onRGBAArrayByte(byte[] rgbaByteArray) {
try {
/* Fill images. */
rgbaMat.put(0, 0, rgbaByteArray);
analytic.processFrame(rgbaMat);
/* Send fps to UI for debug purpose. */
calcFPS(true);
} catch (Exception e) {
e.printStackTrace();
}
}
The whole thing runs at ~22fps. I've checked carefully and there is no memory leaks. But after running this for some time even with the screen off, the phone gets very hot, and turn off itself. Note if I remove the image processing part, the issue still persists. What could be wrong with this? I could turn on the phone camera app and leave it running for hours without a problem.
Does renderscript cause the heat?
Does 22fps cause the heat? Maybe I should reduce it?
Does Android background service cause heat?
Thanks.
ps: I tested this on LG G4 with full Camera2 interface support.
In theory, your device should throttle itself if it starts to overheat, and never shut down. This would just reduce your frame rate as the device warms up. But some devices aren't as good at this as they should be, unfortunately.
Basically, anything that reduces your CPU / GPU usage will reduce power consumption and heat generation. Basic tips:
Do not copy buffers. Each copy is very expensive when you're doing it at ~30fps. Here, you're copying from Allocation to byte[], and then from that byte[] to the rgbaMat. That's 2x as expensive as just copying from the Allocation to the rgbaMat. Unfortunately, I'm not sure there's a direct way to copy from the Allocation to the rgbaMat, or to create an Allocation that's backed by the same memory as the rgbaMat.
Are you sure you can't do your OpenCV processing on YUV data instead? That'll save you a lot of overhead here; the RGB->YUV conversion is not cheap when not done in hardware.
There's also an RS intrinsic, ScriptIntrinsicYuvToRgb, which may give you better performance than your hand-written loop.
Exactly as the title says.
I have a parallelized image creating/processing algorithm that I'd like to use. This is a kind of perlin noise implementation.
// Logging is never used here
#pragma version(1)
#pragma rs java_package_name(my.package.name)
#pragma rs_fp_full
float sizeX, sizeY;
float ratio;
static float fbm(float2 coord)
{ ... }
uchar4 RS_KERNEL root(uint32_t x, uint32_t y)
{
float u = x / sizeX * ratio;
float v = y / sizeY;
float2 p = {u, v};
float res = fbm(p) * 2.0f; // rs.: 8245 ms, fs: 8307 ms; fs 9842 ms on tablet
float4 color = {res, res, res, 1.0f};
//float4 color = {p.x, p.y, 0.0, 1.0}; // rs.: 96 ms
return rsPackColorTo8888(color);
}
As a comparison, this exact algorithm runs with at least 30 fps when I implement it on the gpu via fragment shader on a textured quad.
The overhead for running the RenderScript should be max 100 ms which I calculated from making a simple bitmap by returning the x and y normalized coordinates.
Which means that in case it would use the gpu it would surely not become 10 seconds.
The code I am using the RenderScript with:
// The non-support version gives at least an extra 25% performance boost
import android.renderscript.Allocation;
import android.renderscript.RenderScript;
public class RSNoise {
private RenderScript renderScript;
private ScriptC_noise noiseScript;
private Allocation allOut;
private Bitmap outBitmap;
final int sizeX = 1536;
final int sizeY = 2048;
public RSNoise(Context context) {
renderScript = RenderScript.create(context);
outBitmap = Bitmap.createBitmap(sizeX, sizeY, Bitmap.Config.ARGB_8888);
allOut = Allocation.createFromBitmap(renderScript, outBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_GRAPHICS_TEXTURE);
noiseScript = new ScriptC_noise(renderScript);
}
// The render function is benchmarked only
public Bitmap render() {
noiseScript.set_sizeX((float) sizeX);
noiseScript.set_sizeY((float) sizeY);
noiseScript.set_ratio((float) sizeX / (float) sizeY);
noiseScript.forEach_root(allOut);
allOut.copyTo(outBitmap);
return outBitmap;
}
}
If I change it to FilterScript, from using this help (https://stackoverflow.com/a/14942723/4420543), I get several hundred milliseconds worse in case of support library and about double time worse in case of the non-support one. The precision did not influence the results.
I have also checked every question on stackoverflow, but most of them are outdated and I have also tried it with a nexus 5 (7.1.1 os version) among several other new devices, but the problem still remains.
So, when does RenderScript run on GPU? It would be enough if someone could give me an example on a GPU-running RenderScript.
Can you try to run it with rs_fp_relaxed instead of rs_fp_full?
#pragma rs_fp_relaxed
rs_fp_full will force your script running on CPU, since most GPUs don't support full precision floating point operations.
I can agree with your guess.
On Nexux 7 (2013, JellyBean 4.3) I wrote a renderscript and a filterscript, respectively, to calculate the famous Mandelbrot set.
Compared to an OpenGL fragment shader doing the same thing (all with 32 bit floats), the scripts were about 3 times slower. I assume OpenGL uses GPUs where renderscript (and filterscript !) do not.
Then I compared camera preview conversion (NV21 format -> RGB) with a renderscript, a filterscript and the ScriptIntrinsicYuvToRGB, respectively.
Here the Intrinsic is about 4 times faster than the self written scripts.
Again I see no differences in performance between renderscript and filterscript. In this case I assume the self written scripts again use CPUs only where the Intrinsic makes use of GPUs (too ?).
Any advise in optimizing the following code? The code first grayscales, inverts and then thresholds the image (code not included, because it is trivial). It then sums the elements of each row and column (all elements are either 1 or 0). It then finds the row and column index of the row and column with the highest value.
The code is supposed to find the centroid of the image and it works, but I want to make it faster
I'm developing for API 23, so a reduction kernel can not be used.
Java snippet:
private int[] sumValueY = new int[640];
private int[] sumValueX = new int[480];
rows_indices_alloc = Allocation.createSized( rs, Element.I32(rs), height, Allocation.USAGE_SCRIPT);
col_indices_alloc = Allocation.createSized( rs, Element.I32(rs), width, Allocation.USAGE_SCRIPT);
public RenderscriptProcessor(RenderScript rs, int width, int height)
{
mScript.set_gIn(mIntermAllocation);
mScript.forEach_detectX(rows_indices_alloc);
mScript.forEach_detectY(col_indices_alloc);
rows_indices_alloc.copyTo(sumValueX);
col_indices_alloc.copyTo(sumValueY);
}
Renderscript.rs snippet:
#pragma version(1)
#pragma rs java_package_name(org.gearvrf.renderscript)
#include "rs_debug.rsh"
#pragma rs_fp_relaxed
const int mImageWidth=640;
const int mImageHeight=480;
int32_t maxsX=-1;
int32_t maxIndexX;
int32_t maxsY=-1;
int32_t maxIndexY;
rs_allocation gIn;
void detectX(int32_t v_in, int32_t x, int32_t y) {
int32_t sum=0;
for ( int i = 0; i < (mImageWidth); i++) {
float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, i, x));
sum+=(int)f4.r;
}
if((sum>maxsX)){
maxsX=sum;
maxIndexX = x;
}
}
void detectY(int32_t v_in, int32_t x, int32_t y) {
int32_t sum=0;
for ( int i = 0; i < (mImageHeight); i++) {
float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, x, i));
sum+=(int)f4.r;
}
if((sum>maxsY)){
maxsY=sum;
maxIndexY = x;
}
}
Any help would be appreciated
float4 f4 = rsUnpackColor8888(rsGetElementAt_uchar4(gIn, x, i));
sum+=(int)f4.r;
This converts from int to float and then back to int again. I think you can simplify by just doing this:
sum += rsGetElementAt_uchar4(gIn, x, i).r;
I don't know exactly how your previous stages work because you haven't posted them, but you should try generating packed values to read here. So either put your grayscale channels in .rgba or use a single channel format and then use rsAllocationVLoad_uchar4 to fetch 4 values at once.
Also, try combining previous stages with this one, if you don't need the intermediate results of those calculations it may be cheaper to do the memory load once and then do those transformations in registers.
You might also play with how many values your threads operate on. You could try having each kernel processing width/2, width/4, width/8 elements and see how they perform. This will give GPUs more threads to play with especially on lower-resolution images but with the trade off of having more reduction steps.
You also have a multiple-writers race condition on the maxsX/maxsY and maxIndexX/maxIndexY variables. All those writes need to use atomics if you care about the exact right answer. I think maybe you posted the wrong code because you don't store to the *_indices_alloc but you copy from them at the end. So, actually you should store all the sums to those and then use either a single threaded function or a kernel with atomics to get the absolute max and max index.
I'm working on a picture based app and I'm blocked on an issue with Renderscript.
My purpose is pretty simple in theory, I want to remove the white background from the images loaded by the user, to show them on another image i've set as background. More specifically what I want to to is to simulate the effect of printing a user uploaded graphic on paper canvas (also a picture) with a realistic effect.
I cannot assume the user is able to upload nice PNGs with alpha channels, and one of the requirement is to operate with JPGs.
I've been trying to solve this with RenderScripts, with something like this which sets alpha 0 to anything with R,G,and B all equals or greater than 240:
#pragma version(1)
#pragma rs java_package_name(mypackagename)
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
const static float th = 239.f/256.f;
void root(const uchar4 v_in, uchar4 v_out, const void* usrData, uint32_t x,uint32_t y){
float4 f4 = rsUnpackColor8888(*v_in);
if(f4.r > th && f4.g > th && f4.b > th)
{
f4.a = 0;
}
*v_out = rsPackColorTo8888(f4);
}
void filter() {
rsForEach(gScript, gIn, gOut);
}
but the results are not satisfactory for mainly two reasons:
if a photo has a whiteish gradient not on the background the script causes an ugly noise effect
images with shadows close to the edges get a noise effects close to the edges
I understand that passing from alpha 0 to alpha 1 is too extreme and I've tried different solution involving linear increasing the alpha when the sum of the R,G,B components decrease but I still have noisy pixels and blocks around.
With plain white, or regular background (e.g. a snapshot of the Google home page) it works perfectly but with photos it's very far from anything acceptable.
I think that if I'd be able to process one "line" of pixels or one "block" of pixels instead that a single one it could be easier to detect flat backgrounds and to avoid hitting gradients but I don't know enough about renderscripts to do that.
Can anyone point me in the right direction?
PS
I can't use PorterDuff and multiply because the background and the foreground have different dimensions and moreover since I need to be able to drag the uploaded image around the background canvas once the effect is applied. If I multiply the image with a region of the background moving the result image around would cause a section of the background to move around as well.
If I get it right, you wants to determine whether the current pixel can is a white background based on a line/block of neighboring pixels.
You can try the use rsGetElementAt. For example, to process a line in your original code:
#pragma version(1)
#pragma rs java_package_name(mypackagename)
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
const static float th = 239.f/256.f;
void root(const uchar4 v_in, uchar4 v_out, const void* usrData, uint32_t x,uint32_t y){
float4 f4 = rsUnpackColor8888(*v_in);
uint32_t width = rsAllocationGetDimX(gIn);
// E.g: Processing a line from x to x+5.
bool isBackground = true;
for (uint32_t i=0; i<=5 && x+i<width; i++) {
uchar4 nPixel_u4 = rsGetElementAt_uchar4(gIn, x+i, y);
float4 nPixel_f4 = rsUnpackColor8888(nPixel_u4);
if(nPixel_f4.r <= th || nPixel_f4.g <= th || nPixel_f4.b <= th) {
isBackground = false;
break;
}
}
if (isBackground) {
f4.a = 0.0f;
*v_out = rsPackColorTo8888(f4);
}
}
void filter() {
rsForEach(gScript, gIn, gOut);
}
This is just a naive example of how you can use rsGetElementAt to get the data from a given position in a global Allocation. There is a corresponding rsSetElementAt for saving the data to a global Allocation. I am hoping it helps your project.
This question already has answers here:
Can I convert an image into a grid of dots?
(3 answers)
Closed 10 years ago.
I would like to create something similar to this question Can I convert an image into a grid of dots? but I cannot find any answer for my problem. The basic idea is to load a picture from the phone and apply this grid of dots. I would appreciate any suggestions to this.
As others may suggest, your problem can also be solved using a fragment shader in OpenGL Shading Language (GLSL). GLSL might require painful setup.
Here is my solution using Android Renderscript (a lot like GLSL, but specifically designed for Android. It isn't used much). First, setup the Renderscript > Hello Compute sample from inside the official Android SDK samples. Next, replace mono.rs with the following:
#pragma version(1)
#pragma rs java_package_name(com.android.example.hellocompute)
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
static int mImageWidth;
const uchar4 *gPixels;
const float4 kBlack = {
0.0f, 0.0f, 0.0f, 1.0f
};
// There are two radius's for each circle for anti-aliasing reasons.
const static uint32_t radius = 15;
const static uint32_t smallerRadius = 13;
// Used so that we have smooth circle edges
static float smooth_step(float start_threshold, float end_threshold, float value) {
if (value < start_threshold) {
return 0;
}
if (value > end_threshold) {
return 1;
}
value = (value - start_threshold)/(end_threshold - start_threshold);
// As defined at http://en.wikipedia.org/wiki/Smoothstep
return value*value*(3 - 2*value);
}
void root(const uchar4 *v_in, uchar4 *v_out, uint32_t u_x, uint32_t u_y) {
int32_t diameter = radius * 2;
// Compute distance from center of the circle
int32_t x = u_x % diameter - radius;
int32_t y = u_y % diameter - radius;
float dist = hypot((float)x, (float)y);
// Compute center of the circle
uint32_t center_x = u_x /diameter*diameter + radius;
uint32_t center_y = u_y /diameter*diameter + radius;
float4 centerColor = rsUnpackColor8888(gPixels[center_x + center_y*mImageWidth]);
float amount = smooth_step(smallerRadius, radius, dist);
*v_out = rsPackColorTo8888(mix(centerColor, kBlack, amount));
}
void filter() {
mImageWidth = rsAllocationGetDimX(gIn);
rsForEach(gScript, gIn, gOut); // You may need a forth parameter, depending on your target SDK.
}
Inside HelloCompute.java, replace createScript() with the following:
private void createScript() {
mRS = RenderScript.create(this);
mInAllocation = Allocation.createFromBitmap(mRS, mBitmapIn,
Allocation.MipmapControl.MIPMAP_NONE,
Allocation.USAGE_SCRIPT);
mOutAllocation = Allocation.createTyped(mRS, mInAllocation.getType());
mScript = new ScriptC_mono(mRS, getResources(), R.raw.mono);
mScript.bind_gPixels(mInAllocation);
mScript.set_gIn(mInAllocation);
mScript.set_gOut(mOutAllocation);
mScript.set_gScript(mScript);
mScript.invoke_filter();
mOutAllocation.copyTo(mBitmapOut);
}
The end result will look like this
ALTERNATIVE
If you don't care about having each dot a solid color, you can do the following:
There is a very easy way to do this. You need a BitmapDrawable for the picture and a BitmapDrawable for the overlay tile (lets call it overlayTile). On overlayTile, call
overlayTile.setTileModeX(Shader.TileMode.REPEAT);
overlayTile.setTileModeY(Shader.TileMode.REPEAT);
Next, combine the two Drawable's into a single Drawable using LayerDrawable. You can use the resulting LayerDrawable as src for some ImageView, if you wish. Or, you can convert the Drawable to a Bitmap and save it to disk.
I think studying OpenGL might help in what you want to achieve.
You may want to go through the basics of Displaying Graphics with OpenGL ES
Hope that helps. :)