Allocation.copyTo(Bitmap) corrupting pixel values - android

I'm new to Renderscript, and am striking issues with my first script. As far as I can see (from debugging statements I've inserted) my code works fine, but the computed values are getting mangled when they are being copied back to the Bitmap by the Allocation.copyTo(Bitmap) method.
I was getting weird colours out, so eventually stripped down my script to this sample which shows the problem:
void root(const uchar4 *v_in, uchar4 *v_out, const void *usrData, uint32_t x, uint32_t y)
{
*v_out = rsPackColorTo8888(1.f, 0.f, 0.f, 1.f);
if (x==0 && y==0) {
rsDebug("v_out ", v_out->x, v_out->y, v_out->z, v_out->w);
}
}
Here we are just writing out an opaque red pixel. The debug line seems to print the right value (255 0 0 255) and indeed I get a red pixel in the bitmap.
However if I change the alpha on the red pixel slightly:
*v_out = rsPackColorTo8888(1.f, 0.f, 0.f, 0.998f);
The debug prints (255 0 0 254) which still seems correct, but the final pixel value ends up being (0 0 0 254) ie. black.
Obviously I suspected it was a premulltiplied alpha issue, but my understanding is that the Allocation routines to copy from and to Bitmaps is supposed to handle that for you. At least that's what Chet Haase suggests in this blog post: https://plus.google.com/u/0/+ChetHaase/posts/ef6Deey6xKA.
Also none of the example compute scripts out there seem to mention any issues with pre-multiplied alpha. My script was based on the HelloComputer example from the SDK.
If I am making a mistake, I would love an RS guru to point it out for me.
It's a shame that after 2+ years the documentation for Renderscript is still so poor.
PS. The Bitmaps I'm using are ARGB_888 and I am building and targetting on SDK18 (Android 4.3)

The example works fine because the example does not modify alpha.
If you are going to modify alpha and then use the Allocation as a normal bitmap you should return (r*a, g*a, b*a, a).
However, if you were sending the Allocation to a GL surface which is not pre-multiplied, your code would work as-is.

Related

RenderScript low performance on Samsung Galaxy S8

Context
I have an Android app that takes a picture, blurs the picture, removes the blur based on a mask and applies a final layer (not relevant). The last 2 steps, removing the blur based on a mask and applying a final layer is done repeatedly, each time with a new mask (150 masks).
The output get's drawn on a canvas (SurfaceView). This way the app effectively creates a view of the image with an animated blur.
Technical details & code
All of these image processing steps are achieved with RenderScript.
I'm leaving out the code for step 1, blurring the picture, since this is irrelevant for the problem I'm facing.
Step 2: removing the blur based on a mask
I have a custom kernel which takes an in Allocation as argument and holds 2 global variables, which are Allocations as well.
These 3 Allocations all get their data from bitmaps using Allocation.copyFrom(bitmap).
Step 3: applying a final layer
Here I have a custom kernel as well which takes an in Allocation as argument and holds 3 global variables, of which 1 is and Allocation and 2 are floats.
How these kernels work is irrelevant to this question but just to be sure I included some simplified snippets below.
Another thing to note is that I am following all best practices to ensure performance is at its best regarding Allocations, RenderScript and my SurfaceView.
So common mistakes such as creating a new RenderScript instance each time, not re-using Allocations when possible,.. are safe to ignore.
blurMask.rs
#pragma version(1)
#pragma rs java_package_name(com.example.rs)
#pragma rs_fp_relaxed
// Extra allocations
rs_allocation allocBlur;
rs_allocation allocBlurMask;
/*
* RenderScript kernel that performs a masked blur manipulation.
* Blur Pseudo: out = original * blurMask + blur * (1.0 - blurMask)
* -> Execute this for all channels
*/
uchar4 __attribute__((kernel)) blur_mask(uchar4 inOriginal, uint32_t x, uint32_t y) {
// Manually getting current element from the blur and mask allocations
uchar4 inBlur = rsGetElementAt_uchar4(allocBlur, x, y);
uchar4 inBlurMask = rsGetElementAt_uchar4(allocBlurMask, x, y);
// normalize to 0.0 -> 1.0
float4 inOriginalNorm = rsUnpackColor8888(inOriginal);
float4 inBlurNorm = rsUnpackColor8888(inBlur);
float4 inBlurMaskNorm = rsUnpackColor8888(inBlurMask);
inBlurNorm.rgb = inBlurNorm.rgb * 0.7 + 0.3;
float4 outNorm = inOriginalNorm;
outNorm.rgb = inOriginalNorm.rgb * inBlurMaskNorm.rgb + inBlurNorm.rgb * (1.0 - inBlurMaskNorm.rgb);
return rsPackColorTo8888(outNorm);
}
myKernel.rs
#pragma version(1)
#pragma rs java_package_name(com.example.rs)
#pragma rs_fp_relaxed
// Extra allocations
rs_allocation allocExtra;
// Randoms; Values are set from kotlin, the values below just act as a min placeholder.
float randB = 0.1f;
float randW = 0.75f;
/*
* RenderScript kernel that performs a manipulation.
*/
uchar4 __attribute__((kernel)) my_kernel(uchar4 inOriginal, uint32_t x, uint32_t y) {
// Manually getting current element from the extra allocation
uchar4 inExtra = rsGetElementAt_uchar4(allocExtra, x, y);
// normalize to 0.0 -> 1.0
float4 inOriginalNorm = rsUnpackColor8888(inOriginal);
float4 inExtraNorm = rsUnpackColor8888(inExtra);
float4 outNorm = inOriginalNorm;
if (inExtraNorm.r > 0.0) {
outNorm.rgb = inOriginalNorm.rgb * 0.7 + 0.3;
// Separate channel operation since we are using inExtraNorm.r everywhere
outNorm.r = outNorm.r * inExtraNorm.r + inOriginalNorm.r * (1.0 - inExtraNorm.r);
outNorm.g = outNorm.g * inExtraNorm.r + inOriginalNorm.g * (1.0 - inExtraNorm.r);
outNorm.b = outNorm.b * inExtraNorm.r + inOriginalNorm.b * (1.0 - inExtraNorm.r);
}
else if (inExtraNorm.g > 0.0) {
...
}
return rsPackColorTo8888(outNorm);
}
Problem
So the app works great on a range of devices, even on low-end devices. I manually cap the FPS at 15, but when I remove this cap, I get results ranging from 15-20 on low-end devices to 35-40 on high-end devices.
The Samsung Galaxy S8 is where my problem occurs. For some reason I only manage to get around 10 FPS. If I use adb to force RenderScript to use CPU instead:
adb shell setprop debug.rs.default-CPU-driver 1
I get around 12-15 FPS, but obviously I want it to run on the GPU.
An important, weird thing I noticed
If I trigger a touch event, no matter where (even out of the app), the performance dramatically increases to around 35-40 FPS. If I lift my finger from the screen again, FPS drops back to 10 FPS.
NOTE: drawing the result on the SurfaceView can be excluded as an impacting factor since the results are the same with just the computation in RenderScript without drawing the actual result.
Questions
So I have more than one question really:
What could be the reason behind the low performance?
Why would a touch event improve this performance so dramatically?
How could I solve or work around this issue?

Renderscript Documentation and Advice - Android

I have been following this guide on how to use Render-script on Android.
http://www.jayway.com/2014/02/11/renderscript-on-android-basics/
My code is this (I got a wrapper class for the script):
public class PixelCalcScriptWrapper {
private Allocation inAllocation;
private Allocation outAllocation;
RenderScript rs;
ScriptC_pixelsCalc script;
public PixelCalcScriptWrapper(Context context){
rs = RenderScript.create(context);
script = new ScriptC_pixelsCalc(rs, context.getResources(), R.raw.pixelscalc);
};
public void setInAllocation(Bitmap bmp){
inAllocation = Allocation.createFromBitmap(rs,bmp);
};
public void setOutAllocation(Bitmap bmp){
outAllocation = Allocation.createFromBitmap(rs,bmp);
};
public void forEach_root(){
script.forEach_root(inAllocation, outAllocation);
}
}
This methods calls the script:
public Bitmap processBmp(Bitmap bmp, Bitmap bmpCopy) {
pixelCalcScriptWrapper.setInAllocation(bmp);
pixelCalcScriptWrapper.setOutAllocation(bmpCopy);
pixelCalcScriptWrapper.forEach_root();
return bmpCopy;
};
and here is my script:
#pragma version(1)
#pragma rs java_package_name(test.foo)
void root(const uchar4 *in, uchar4 *out, uint32_t x, uint32_t y) {
float3 pixel = convert_float4(in[0]).rgb;
if(pixel.z < 128) {
pixel.z = 0;
}else{
pixel.z = 255;
}
if(pixel.y < 128) {
pixel.y = 0;
}else{
pixel.y = 255;
}
if(pixel.x < 128) {
pixel.x = 0;
}else{
pixel.x = 255;
}
out->xyz = convert_uchar3(pixel);
}
Now where can I find some documentation about this ?
For example, I have these questions:
1) What does this convert_float4(in[0]) do ?
2) What does the rgb return here convert_float4(in[0]).rgb;?
3) What is float3 ?
4) I don't know where to start with this line out->xyz = convert_uchar3(pixel);
5) I am assuming in the parameters, in and out are the Allocations passed?
what are x and y?
http://developer.android.com/guide/topics/renderscript/reference/rs_convert.html#android_rs:convert
What does this convert_float4(in[0]) do?
convert_float4 will convert from a uchar4 to a float4;
.rgb turns it into a float3 of the first 3 elements.
What does the rgb return?
RenderScript vector types have .r .g .b .a or
.x .y .z .w representing the first, second, third and forth element respectively. You can use any combination (e.g. .xy or .xwy)
What is float3?
float3 is a "vector type" sort of like a float but 3 of them.
There are float2, float3 and float4 vector types of float.
(there are uchar4, int4 etc.)
http://developer.android.com/guide/topics/renderscript/reference/overview.html might be helpful
I hope this helps.
1) In the kernel, the in pointer is a 4-element unsigned char, that is, it represents a pixel color with R, G, B and A values in the 0-255 range. So convert_float4 simply casts each of the four uchar as a float. In this particular code you are using, it probably doesn't make much sense to work with floats, since you're doing a simple threshold, and you could just as well had worked with the uchar data directly. Using floats is better aimed when doing other types of image processing algorithms where you do need to have the extra precision (example: blurring an image).
2) The .rgb suffix is a shorthand to return only the first three values of the float4, i.e. the R, G, and B values. If you had used only .r it would give you the first value as a regular float, if you had used .g it would give you the second value as a float, etc... These three values are then assigned to that float3 variable, which now represents the pixel with only three color channels (that is, no A alpha channel).
3) See #2.
4) Now convert_uchar3 is again another cast that converts the float3 pixel variable back to a uchar3 variable. You are assigning the three values to each of the x, y, and z elements in that order. This is probably a good time to mention that X, Y and Z are completely interchangeable with R, G and B. That statement could just as well have used out->rgb, and it would actually have been more readable that way. Note that out is a uchar4, and by doing this, you are assigning only the first three "rgb" or "xyz" elements in that pointer, the fourth element is left undefined here.
5) Yes, in is the input pixel, out is the output pixel. Then x and y are the x, and y coordinates of the pixel in the overall image. This kernel function is going to be called once for every pixel in the image/allocation you're working with, and so it's usually good to know what coordinate you're at when processing an image. In this particular example since it's only thresholding all pixels in the same way, the coordinates are irrelevant.
Good documentation on RenderScript is very hard to find. I would greatly recommend you take a look at these two videos though, as they will give you a much better sense of how RenderScript works:
AnDevCon: A Deep Dive into RenderScript
Google I/O 2013 - High Performance Applications with RenderScript
Keep in mind that both videos are a couple years old, so some minor details may have changed on the recent APIs, but overall, those are probably the best sources of information for RS.

Render script rendering is much slower than OpenGL rendering on Android

BACKGROUND:
I want to add live filter based on the code of Android camera app. But the architecture of Android camera app is based on OpenGL ES 1.x. I need to use shader to custom our filter implementation. However, it is too difficult to update the camera app to OpenGL ES 2.0. Then I have to find some other methods to implement live filter instead of OpenGL. I decided to use render script after some research.
PROBLEM:
I have wrote a demo of a simple filter by render script. It shows that the fps is much lower than implementing it by OpenGL. About 5 fps vs 15 fps.
QUESTIONS:
The Android official offsite says: The RenderScript runtime will parallelize work across all processors available on a device, such as multi-core CPUs, GPUs, or DSPs, allowing you to focus on expressing algorithms rather than scheduling work or load balancing. Then why is render script implementation slower?
If render script cannot satisfy my requirement, is there a better way?
CODE DETAILS:
Hi I am in the same team with the questioner. We want to write a render-script based live-filter camera. In our test-demo-project, we use a simple filter: a YuvToRGB IntrinsicScript added with a overlay-filter ScriptC script.
In the OpenGL version, we set the camera data as textures and do the image-filter-procss with shader. Like this:
GLES20.glActiveTexture(GLES20.GL_TEXTURE0);
GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, textureYHandle);
GLES20.glUniform1i(shader.uniforms.get("uTextureY"), 0);
GLES20.glTexSubImage2D(GLES20.GL_TEXTURE_2D, 0, 0, 0, mTextureWidth,
mTextureHeight, GLES20.GL_LUMINANCE, GLES20.GL_UNSIGNED_BYTE,
mPixelsYBuffer.position(0));
In the RenderScript version, we set the camera data as Allocation and do the image-filter-procss with script-kernals. Like this:
// The belowing code is from onPreviewFrame(byte[] data, Camera camera) which gives the camera frame data
byte[] imageData = datas[0];
long timeBegin = System.currentTimeMillis();
mYUVInAllocation.copyFrom(imageData);
mYuv.setInput(mYUVInAllocation);
mYuv.forEach(mRGBAAllocationA);
// To make sure the process of YUVtoRGBA has finished!
mRGBAAllocationA.copyTo(mOutBitmap);
Log.e(TAG, "RS time: YUV to RGBA : " + String.valueOf((System.currentTimeMillis() - timeBegin)));
mLayerScript.forEach_overlay(mRGBAAllocationA, mRGBAAllocationB);
mRGBAAllocationB.copyTo(mOutBitmap);
Log.e(TAG, "RS time: overlay : " + String.valueOf((System.currentTimeMillis() - timeBegin)));
mCameraSurPreview.refresh(mOutBitmap, mCameraDisplayOrientation, timeBegin);
The two problems are :
(1) RenderScript process seems slower than OpenGL process.
(2) According to our time-log, the process of YUV to RGBA which uses intrinsic script is very quick, takes about 6ms; but the process of overlay which uses scriptC is very slow, takes about 180ms. How does this happen?
Here is the rs-kernal code of the ScriptC we use(mLayerScript):
#pragma version(1)
#pragma rs java_package_name(**.renderscript)
#pragma stateFragment(parent)
#include "rs_graphics.rsh"
static rs_allocation layer;
static uint32_t dimX;
static uint32_t dimY;
void setLayer(rs_allocation layer1) {
layer = layer1;
}
void setBitmapDim(uint32_t dimX1, uint32_t dimY1) {
dimX = dimX1;
dimY = dimY1;
}
static float BlendOverlayf(float base, float blend) {
return (base < 0.5 ? (2.0 * base * blend) : (1.0 - 2.0 * (1.0 - base) * (1.0 - blend)));
}
static float3 BlendOverlay(float3 base, float3 blend) {
float3 blendOverLayPixel = {BlendOverlayf(base.r, blend.r), BlendOverlayf(base.g, blend.g), BlendOverlayf(base.b, blend.b)};
return blendOverLayPixel;
}
uchar4 __attribute__((kernel)) overlay(uchar4 in, uint32_t x, uint32_t y) {
float4 inPixel = rsUnpackColor8888(in);
uint32_t layerDimX = rsAllocationGetDimX(layer);
uint32_t layerDimY = rsAllocationGetDimY(layer);
uint32_t layerX = x * layerDimX / dimX;
uint32_t layerY = y * layerDimY / dimY;
uchar4* p = (uchar4*)rsGetElementAt(layer, layerX, layerY);
float4 layerPixel = rsUnpackColor8888(*p);
float3 color = BlendOverlay(inPixel.rgb, layerPixel.rgb);
float4 outf = {color.r, color.g, color.b, inPixel.a};
uchar4 outc = rsPackColorTo8888(outf.r, outf.g, outf.b, outf.a);
return outc;
}
Renderscript does not use any GPU or DSPs cores. That is a common misconception encouraged by Google's deliberately vague documentation. Renderscript used to have an interface to OpenGL ES, but that has been deprecated and has never been used for much beyond animated wallpapers. Renderscript will use multiple CPU cores, if available, but I suspect Renderscript will be replaced by OpenCL.
Take a look at the Effects class and the Effects demo in the Android SDK. It shows how to use OpenGL ES 2.0 shaders to apply effects to images without writing OpenGL ES code.
http://software.intel.com/en-us/articles/porting-opengl-games-to-android-on-intel-atom-processors-part-1
UPDATE:
It's wonderful when I learn more answering a question than asking one and that is the case here. You can see from the lack of answers that Renderscript is hardly used outside of Google because of its strange architecture that ignores industry standards like OpenCL and almost non-existent documentation on how it actually works.
Nonetheless, my answer did evoke a rare response from the Renderscrpt development team which includes only one link that actually contains any useful information about renderscript - this article by Alexandru Voica at IMG, the PowerVR GPU vendor:
http://withimagination.imgtec.com/index.php/powervr/running-renderscript-efficiently-with-powervr-gpus-on-android
That article has some good information which was new to me. There are comments posted there from more people who are having trouble getting Renderscript code to actually run on the GPU.
But, I was incorrect to assume that Renderscript is no longer being developed at Google. Although my statement that "Renderscript does not use any GPU or DSPs cores." was true until just fairly recently, I have learned that this has changed as of one of the Jelly Bean releases.
It would have been great if one of the Renderscript developers could have explained that. Or even if they had a public webpage that explains that or that lists
which GPUs are actually supported and how can you tell if your code actually gets run on a GPU.
My opinion is that Google will replace Renderscript with OpenCL eventually and I would not invest time developing with it.

Where is the filterscript documentation (and how can I use it)? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
When Jelly Bean 4.2 was announced a month ago, Filterscript was also announced. It appears to be a language that is a subscript of Renderscript with a different file extension. And that's about all I know about the language.
I have read the two total paragraphs that exist about Filterscript on the entire Internet and created a small .fs file with pragma rs_fp_relaxed, but it does not get picked up by the ADT builders like a normal .rs file is in the same location.
My ADT is the latest public version (21.0.0), which seems to be too low for Filterscript. tools.android.com appears to have 21.0.1 Preview, but there is no mention of Filterscript in the release notes (in fact its just a bugfix release). There's just no documentation anywhere!
How can I use Filterscript? Where is its documentation?
What I have tried:
https://www.google.com/search?q=filterscript+site:android.com&tbs=li:1
http://developer.android.com/about/versions/android-4.2.html#Renderscript
http://developer.android.com/tools/sdk/eclipse-adt.html#notes
http://tools.android.com/recent/2101preview1
I have not found any documentation but maybe I can give you some useful information about what I have investigated so far:
Pointers are not available
Kernel functions need the attribute __attribute__((kernel)) otherwise compiler goes mad and expects pointer types, which are illegal
Renderscript API can be used (at least everything I tried so far was working)
Attribute "Min SDK version" must be set to "17" in AndroidManifest.xml -> "Uses Sdk"
I discovered most of the following information while reading the sources of the llvm-rs-cc compiler. Any further information or a link to a real documentation for Filterscript would appreciated!
Output allocation
In Filterscript you dont have a parameter for the output allocation. Instead you return the value to write at current position (this is the global thread id x and y):
uchar4 __attribute__((kernel)) root(uint32_t x, uint32_t y)
generates into:
public void forEach_root(Allocation aout)
Input allocation
You can optionally hand over an input allocation as parameter:
uchar4 __attribute__((kernel)) root(const uchar4 in, uint32_t x, uint32_t y)
generates into:
public void forEach_root(Allocation ain, Allocation aout)
Which is useful in only rare cases (e.g. point operators) because you can access the input allocation only at the current position.
Global allocation
If you want to do random access at input allocations than you will need global allocations. Here is a small example of a window operator using a global allocation that works for me.
blur.fs:
#pragma version(1)
#pragma rs java_package_name(com.example.myproject)
rs_allocation in;
uint32_t width;
uint32_t height;
uchar4 __attribute__((kernel)) root(uint32_t x, uint32_t y) {
uint4 sum = 0;
uint count = 0;
for (int yi = y-1; yi <= y+1; ++yi) {
for (int xi = x-1; xi <= x+1; ++xi) {
if (xi >= 0 && xi < width && yi >= 0 && yi < height) {
sum += convert_uint4(rsGetElementAt_uchar4(in, xi, yi));
++count;
}
}
}
return convert_uchar4(sum/count);
}
MainActivity.java:
...
mRS = RenderScript.create(this);
mInAllocation = Allocation.createFromBitmap(mRS, mBitmapIn,
Allocation.MipmapControl.MIPMAP_NONE,
Allocation.USAGE_SCRIPT);
mOutAllocation = Allocation.createTyped(mRS, mInAllocation.getType());
mScript = new ScriptC_blur(mRS, getResources(), R.raw.blur);
mScript.set_in(mInAllocation);
mScript.set_width(mBitmapIn.getWidth());
mScript.set_height(mBitmapIn.getHeight());
mScript.forEach_root(mOutAllocation);
mOutAllocation.copyTo(mBitmapOut);
...
Couple things here:
Yeah, we are behind on docs. We know, we've been busy. It's on my agenda for the relatively near future.
FS is intended as a more restrictive variant of RS that enables additional optimization opportunities for compiler backends. We don't have any of those in our CPU backend today that aren't available from equivalent RS files, but it is possible that an OEM may improve performance on their SoCs with FS files versus generic RS files. In general, it requires __attribute__((kernel)), no pointers, and no unions, and fp_relaxed is implied by the file type.
The host-side API is completely identical; the only difference is in what we actually pass around as the kernel binaries.
Some minor corrections to ofp's answer:
You should be using rsGetElementAt_(type). It's significantly cleaner than rsGetElementAt because you don't need casting or additional dereferencing or anything like that.
#pragma fp_relaxed is implied from the .fs extension and is not necessary in FS files.
You don't have to include rs_allocation.rsh (it's implied as well for all RS/FS files).
here is full introduction of filter script and a lot demos. http://developer.android.com/guide/topics/renderscript/compute.html

Weird issue with OpenCV drawContours on android

I stumbled upon a weird problem with OpenCV drawContours on android.
Sometimes, (without apparent pattern) function drawContours produces this:
drawContours http://img17.imageshack.us/img17/9031/screenshotgps.png
while it should obviously produce just the white part.
To put it in context:
I detect edges using canny algorithm and then I find contours with
Imgproc.findContours(dil, contours, dummy, Imgproc.RETR_LIST, Imgproc.CHAIN_APPROX_SIMPLE);
Then i select several contours that fit some requirements and I add them to a list:
List<MatOfPoint> goodContours = new ArrayList<MatOfPoint>();
After that I randomly select one contour and I draw it (filled with white) on mat and convert it to android Bitmap:
Mat oneContour = new Mat(orig.rows(), orig.cols(), CvType.CV_8UC1);
int index = (int) (Math.random() * goodContours.size());
Imgproc.drawContours(oneContour, goodContours, index, new Scalar(255, 255, 255), -1);
Bitmap oneContourBitmap = Bitmap.createBitmap(oneContour.cols(), oneContour.rows(), Bitmap.Config.ARGB_8888);
Utils.matToBitmap(oneContour, oneContourBitmap);
Most of the times I get what I expect: white patch on a pure black background, but sometimes I get the above. I'm totally at a loss here. I suspect there could be some memory leakage but I try hard to release all Mat's immediately after they are of no use anymore (I also tried to release them at the end of a function where it all happens but without effect) but I'm unable to pinpoint the source of the problem.
Has anyone had similar issues?
I first discovered this on OpenCV 2.4.0 but it stays the same on 2.4.3.
Any suggestion is appreciated.

Categories

Resources