Android renderscript never runs on the gpu

Android renderscript never runs on the gpu - android

Exactly as the title says.
I have a parallelized image creating/processing algorithm that I'd like to use. This is a kind of perlin noise implementation.
// Logging is never used here
#pragma version(1)
#pragma rs java_package_name(my.package.name)
#pragma rs_fp_full
float sizeX, sizeY;
float ratio;
static float fbm(float2 coord)
{ ... }
uchar4 RS_KERNEL root(uint32_t x, uint32_t y)
{
float u = x / sizeX * ratio;
float v = y / sizeY;
float2 p = {u, v};
float res = fbm(p) * 2.0f; // rs.: 8245 ms, fs: 8307 ms; fs 9842 ms on tablet
float4 color = {res, res, res, 1.0f};
//float4 color = {p.x, p.y, 0.0, 1.0}; // rs.: 96 ms
return rsPackColorTo8888(color);
}
As a comparison, this exact algorithm runs with at least 30 fps when I implement it on the gpu via fragment shader on a textured quad.
The overhead for running the RenderScript should be max 100 ms which I calculated from making a simple bitmap by returning the x and y normalized coordinates.
Which means that in case it would use the gpu it would surely not become 10 seconds.
The code I am using the RenderScript with:
// The non-support version gives at least an extra 25% performance boost
import android.renderscript.Allocation;
import android.renderscript.RenderScript;
public class RSNoise {
private RenderScript renderScript;
private ScriptC_noise noiseScript;
private Allocation allOut;
private Bitmap outBitmap;
final int sizeX = 1536;
final int sizeY = 2048;
public RSNoise(Context context) {
renderScript = RenderScript.create(context);
outBitmap = Bitmap.createBitmap(sizeX, sizeY, Bitmap.Config.ARGB_8888);
allOut = Allocation.createFromBitmap(renderScript, outBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_GRAPHICS_TEXTURE);
noiseScript = new ScriptC_noise(renderScript);
}
// The render function is benchmarked only
public Bitmap render() {
noiseScript.set_sizeX((float) sizeX);
noiseScript.set_sizeY((float) sizeY);
noiseScript.set_ratio((float) sizeX / (float) sizeY);
noiseScript.forEach_root(allOut);
allOut.copyTo(outBitmap);
return outBitmap;
}
}
If I change it to FilterScript, from using this help (https://stackoverflow.com/a/14942723/4420543), I get several hundred milliseconds worse in case of support library and about double time worse in case of the non-support one. The precision did not influence the results.
I have also checked every question on stackoverflow, but most of them are outdated and I have also tried it with a nexus 5 (7.1.1 os version) among several other new devices, but the problem still remains.
So, when does RenderScript run on GPU? It would be enough if someone could give me an example on a GPU-running RenderScript.

Can you try to run it with rs_fp_relaxed instead of rs_fp_full?
#pragma rs_fp_relaxed
rs_fp_full will force your script running on CPU, since most GPUs don't support full precision floating point operations.

I can agree with your guess.
On Nexux 7 (2013, JellyBean 4.3) I wrote a renderscript and a filterscript, respectively, to calculate the famous Mandelbrot set.
Compared to an OpenGL fragment shader doing the same thing (all with 32 bit floats), the scripts were about 3 times slower. I assume OpenGL uses GPUs where renderscript (and filterscript !) do not.
Then I compared camera preview conversion (NV21 format -> RGB) with a renderscript, a filterscript and the ScriptIntrinsicYuvToRGB, respectively.
Here the Intrinsic is about 4 times faster than the self written scripts.
Again I see no differences in performance between renderscript and filterscript. In this case I assume the self written scripts again use CPUs only where the Intrinsic makes use of GPUs (too ?).

Related

Count pixels on the render target

Using OpenGL ES 2.0 and Galaxy S4 phone, I have a Render Target 1024x1024 RGBA8888 where some textures are rendered each frame. I need to calculate how much red RGBA(1, 0, 0, 1) pixels was rendered on the render target (twice a second).
The main problem is that getting the texture from the GPU is very performance-expensive (~300-400 ms), and freezes are not applicable for my application.
I know about OES_shader_image_atomic extension for atomic counters (simply to increment some value when frag shader works), but it's available only in OpenGL ES 3.1 (and later), I have to stick to ES 2.0.
Is there any common solution I missed?

What you can try is to "reduce" texture in question to a significantly smaller one and read back to CPU that one (which should be less expensive performance-wise). For example, you can split your texture into squares N by N (where N is preferably is a power of two), then render a "whole screen" quad into a 1024/N by 1024/N texture with a fragment shader that sums number of red pixels in corresponding square:
sampler2D texture;
void main(void) {
vec2 offset = N * gl_FragCoord.xy;
int cnt = 0;
for (float x = 0.; x < float(N); x += 1) {
for(float y = 0.; y < float(N); y += 1) {
if (texture2D(texture, (offset + vec2(x, y)) / 1024.) == vec4(1, 0, 0, 1)) {
cnt += 1;
}
}
}
gl_FragColor = vec4((cnt % 256) / 255., ((cnt / 256) % 256) / 255., /* ... */);
}
Also remember that readPixels synchronously wait till GPU is done with all previously issued draws to the texture. So it may be beneficial to have two textures,
on each frame one is being rendered to, and the other is being read from. The next frame you swap them. That will somewhat delay obtaining the desired data, but should eliminate some freezes.

Renderscript fails on GPU enabled driver if USAGE_SHARED

We are using renderscript for audio dsp processing. It is simple and improves performance significantly for our use-case. But we run into an annoying issue with USAGE_SHARED on devices that have custom driver with GPU execution enabled.
As you may know, USAGE_SHARED flag makes the renderscript allocation to reuse the given memory without having to create a copy of it. As a consequence, it not only saves memory, in our case, improves performance to desired level.
The following code with USAGE_SHARED works fine on default renderscript driver (libRSDriver.so). With custom driver (libRSDriver_adreno.so) USAGE_SHARED does not reuse given memory and thus data.
This is the code that makes use of USAGE_SHARED and calls renderscript kernel
void process(float* in1, float* in2, float* out, size_t size) {
sp<RS> rs = new RS();
rs->init(app_cache_dir);
sp<const Element> e = Element::F32(rs);
sp<const Type> t = Type::create(rs, e, size, 0, 0);
sp<Allocation> in1Alloc = Allocation::createTyped(
rs, t,
RS_ALLOCATION_MIPMAP_NONE,
RS_ALLOCATION_USAGE_SCRIPT | RS_ALLOCATION_USAGE_SHARED,
in1);
sp<Allocation> in2Alloc = Allocation::createTyped(
rs, t,
RS_ALLOCATION_MIPMAP_NONE,
RS_ALLOCATION_USAGE_SCRIPT | RS_ALLOCATION_USAGE_SHARED,
in2);
sp<Allocation> outAlloc = Allocation::createTyped(
rs, t,
RS_ALLOCATION_MIPMAP_NONE,
RS_ALLOCATION_USAGE_SCRIPT | RS_ALLOCATION_USAGE_SHARED,
out);
ScriptC_x* rsX = new ScriptC_x(rs);
rsX->set_in1Alloc(in1Alloc);
rsX->set_in2Alloc(in2Alloc);
rsX->set_size(size);
rsX->forEach_compute(in1Alloc, outAlloc);
}
NOTE: This variation of Allocation::createTyped() is not mentioned in the documentation, but code rsCppStructs.h has it. This is the allocation factory method that allows providing backing pointer and respects USAGE_SHARED flag. This is how it is declared:
/**
* Creates an Allocation for use by scripts with a given Type and a backing pointer. For use
* with RS_ALLOCATION_USAGE_SHARED.
* #param[in] rs Context to which the Allocation will belong
* #param[in] type Type of the Allocation
* #param[in] mipmaps desired mipmap behavior for the Allocation
* #param[in] usage usage for the Allocation
* #param[in] pointer existing backing store to use for this Allocation if possible
* #return new Allocation
*/
static sp<Allocation> createTyped(
const sp<RS>& rs, const sp<const Type>& type,
RsAllocationMipmapControl mipmaps,
uint32_t usage,
void * pointer);
This is the renderscript kernel
rs_allocation in1Alloc, in2Alloc;
uint32_t size;
// JUST AN EXAMPLE KERNEL
// Not using reduction kernel since it is only available in later API levels.
// Not sure if support library helps here. Anyways, unrelated to the current problem
float compute(float ignored, uint32_t x) {
float result = 0.0f;
for (uint32_t i=0; i<size; i++) {
result += rsGetElementAt_float(in1Alloc, x) * rsGetElementAt_float(in2Alloc, size-i-1); // just an example computation
}
return result;
}
As mentioned, out doesn't have any of the result of the calculation.
syncAll(RS_ALLOCATION_USAGE_SHARED) also didn't help.
The following works though (but much slower)
void process(float* in1, float* in2, float* out, size_t size) {
sp<RS> rs = new RS();
rs->init(app_cache_dir);
sp<const Element> e = Element::F32(rs);
sp<const Type> t = Type::create(rs, e, size, 0, 0);
sp<Allocation> in1Alloc = Allocation::createTyped(rs, t);
in1Alloc->copy1DFrom(in1);
sp<Allocation> in2Alloc = Allocation::createTyped(rs, t);
in2Alloc->copy1DFrom(in2);
sp<Allocation> outAlloc = Allocation::createTyped(rs, t);
ScriptC_x* rsX = new ScriptC_x(rs);
rsX->set_in1Alloc(in1Alloc);
rsX->set_in2Alloc(in2Alloc);
rsX->set_size(size);
rsX->forEach_compute(in1Alloc, outAlloc);
outAlloc->copy1DTo(out);
}
Copying makes it to work, but in our testing, copying back and forth significantly degrades performance.
If we switch off GPU execution through debug.rs.default-CPU-driver system property, we could see that custom driver works well with desired performance.
Aligning memory given to renderscript to 16,32,.., or 1024, etc did not help to make the custom driver respect USAGE_SHARED.
Question
So, our question is this: How to make this kernel work for devices that use custom renderscript driver that enables GPU execution?

You need to have the copy even if you use USAGE_SHARED.
USAGE_SHARED is just a hint to the driver, it doesn’t have to use it.
If the driver does share the memory the copy will be ignored and performance will be the same.

Android Camera2 Renderscript overheat issue

I have this overheat issue, that it turns off my phone after running for a couple of hours. I want to run this 24/7, please help me to improve this:
I use Camera2 interface, RAW format followed by a renderscript to convert YUV420888 to rgba. My renderscript is as below:
#pragma version(1)
#pragma rs java_package_name(com.sensennetworks.sengaze)
#pragma rs_fp_relaxed
rs_allocation gCurrentFrame;
rs_allocation gByteFrame;
int32_t gFrameWidth;
uchar4 __attribute__((kernel)) yuv2RGBAByteArray(uchar4 prevPixel,uint32_t x,uint32_t y)
{
// Read in pixel values from latest frame - YUV color space
// The functions rsGetElementAtYuv_uchar_? require API 18
uchar4 curPixel;
curPixel.r = rsGetElementAtYuv_uchar_Y(gCurrentFrame, x, y);
curPixel.g = rsGetElementAtYuv_uchar_U(gCurrentFrame, x, y);
curPixel.b = rsGetElementAtYuv_uchar_V(gCurrentFrame, x, y);
// uchar4 rsYuvToRGBA_uchar4(uchar y, uchar u, uchar v);
// This function uses the NTSC formulae to convert YUV to RBG
uchar4 out = rsYuvToRGBA_uchar4(curPixel.r, curPixel.g, curPixel.b);
rsSetElementAt_uchar(gByteFrame, out.r, 4 * (y*gFrameWidth + x) + 0 );
rsSetElementAt_uchar(gByteFrame, out.g, 4 * (y*gFrameWidth + x) + 1 );
rsSetElementAt_uchar(gByteFrame, out.b, 4 * (y*gFrameWidth + x) + 2 );
rsSetElementAt_uchar(gByteFrame, 255, 4 * (y*gFrameWidth + x) + 3 );
return out;
}
This is where I call the renderscript to convert to rgba:
#Override
public void onBufferAvailable(Allocation a) {
inputAllocation.ioReceive();
// Run processing pass if we should send a frame
final long current = System.currentTimeMillis();
if ((current - lastProcessed) >= frameEveryMs) {
yuv2rgbaScript.forEach_yuv2RGBAByteArray(scriptAllocation, outputAllocation);
if (rgbaByteArrayCallback != null) {
outputAllocationByte.copyTo(outBufferByte);
rgbaByteArrayCallback.onRGBAArrayByte(outBufferByte);
}
lastProcessed = current;
}
}
And this is the callback to run image processing using OpenCV:
#Override
public void onRGBAArrayByte(byte[] rgbaByteArray) {
try {
/* Fill images. */
rgbaMat.put(0, 0, rgbaByteArray);
analytic.processFrame(rgbaMat);
/* Send fps to UI for debug purpose. */
calcFPS(true);
} catch (Exception e) {
e.printStackTrace();
}
}
The whole thing runs at ~22fps. I've checked carefully and there is no memory leaks. But after running this for some time even with the screen off, the phone gets very hot, and turn off itself. Note if I remove the image processing part, the issue still persists. What could be wrong with this? I could turn on the phone camera app and leave it running for hours without a problem.
Does renderscript cause the heat?
Does 22fps cause the heat? Maybe I should reduce it?
Does Android background service cause heat?
Thanks.
ps: I tested this on LG G4 with full Camera2 interface support.

In theory, your device should throttle itself if it starts to overheat, and never shut down. This would just reduce your frame rate as the device warms up. But some devices aren't as good at this as they should be, unfortunately.
Basically, anything that reduces your CPU / GPU usage will reduce power consumption and heat generation. Basic tips:
Do not copy buffers. Each copy is very expensive when you're doing it at ~30fps. Here, you're copying from Allocation to byte[], and then from that byte[] to the rgbaMat. That's 2x as expensive as just copying from the Allocation to the rgbaMat. Unfortunately, I'm not sure there's a direct way to copy from the Allocation to the rgbaMat, or to create an Allocation that's backed by the same memory as the rgbaMat.
Are you sure you can't do your OpenCV processing on YUV data instead? That'll save you a lot of overhead here; the RGB->YUV conversion is not cheap when not done in hardware.
There's also an RS intrinsic, ScriptIntrinsicYuvToRgb, which may give you better performance than your hand-written loop.

YUV_420_888 interpretation on Samsung Galaxy S7 (Camera2)

I wrote a conversion from YUV_420_888 to Bitmap, considering the following logic (as I understand it):
To summarize the approach: the kernel’s coordinates x and y are congruent both with the x and y of the non-padded part of the Y-Plane (2d-allocation) and the x and y of the output-Bitmap. The U- and V-Planes, however, have a different structure than the Y-Plane, because they use 1 byte for coverage of 4 pixels, and, in addition, may have a PixelStride that is more than one, in addition they might also have a padding that can be different from that of the Y-Plane. Therefore, in order to access the U’s and V’s efficiently by the kernel I put them into 1-d allocations and created an index “uvIndex” that gives the position of the corresponding U- and V within that 1-d allocation, for given (x,y) coordinates in the (non-padded) Y-plane (and, so, the output Bitmap).
In order to keep the rs-Kernel lean, I excluded the padding area in the yPlane by capping the x-range via LaunchOptions (this reflects the RowStride of the y-plane which thus can be ignored WITHIN the kernel). So we just need to consider the uvPixelStride and uvRowStride within the uvIndex, i.e. the index used in order to access to the u- and v-values.
This is my code:
Renderscript Kernel, named yuv420888.rs
#pragma version(1)
#pragma rs java_package_name(com.xxxyyy.testcamera2);
#pragma rs_fp_relaxed
int32_t width;
int32_t height;
uint picWidth, uvPixelStride, uvRowStride ;
rs_allocation ypsIn,uIn,vIn;
// The LaunchOptions ensure that the Kernel does not enter the padding zone of Y, so yRowStride can be ignored WITHIN the Kernel.
uchar4 __attribute__((kernel)) doConvert(uint32_t x, uint32_t y) {
// index for accessing the uIn's and vIn's
uint uvIndex= uvPixelStride * (x/2) + uvRowStride*(y/2);
// get the y,u,v values
uchar yps= rsGetElementAt_uchar(ypsIn, x, y);
uchar u= rsGetElementAt_uchar(uIn, uvIndex);
uchar v= rsGetElementAt_uchar(vIn, uvIndex);
// calc argb
int4 argb;
argb.r = yps + v * 1436 / 1024 - 179;
argb.g = yps -u * 46549 / 131072 + 44 -v * 93604 / 131072 + 91;
argb.b = yps +u * 1814 / 1024 - 227;
argb.a = 255;
uchar4 out = convert_uchar4(clamp(argb, 0, 255));
return out;
}
Java side:
private Bitmap YUV_420_888_toRGB(Image image, int width, int height){
// Get the three image planes
Image.Plane[] planes = image.getPlanes();
ByteBuffer buffer = planes[0].getBuffer();
byte[] y = new byte[buffer.remaining()];
buffer.get(y);
buffer = planes[1].getBuffer();
byte[] u = new byte[buffer.remaining()];
buffer.get(u);
buffer = planes[2].getBuffer();
byte[] v = new byte[buffer.remaining()];
buffer.get(v);
// get the relevant RowStrides and PixelStrides
// (we know from documentation that PixelStride is 1 for y)
int yRowStride= planes[0].getRowStride();
int uvRowStride= planes[1].getRowStride(); // we know from documentation that RowStride is the same for u and v.
int uvPixelStride= planes[1].getPixelStride(); // we know from documentation that PixelStride is the same for u and v.
// rs creation just for demo. Create rs just once in onCreate and use it again.
RenderScript rs = RenderScript.create(this);
//RenderScript rs = MainActivity.rs;
ScriptC_yuv420888 mYuv420=new ScriptC_yuv420888 (rs);
// Y,U,V are defined as global allocations, the out-Allocation is the Bitmap.
// Note also that uAlloc and vAlloc are 1-dimensional while yAlloc is 2-dimensional.
Type.Builder typeUcharY = new Type.Builder(rs, Element.U8(rs));
//using safe height
typeUcharY.setX(yRowStride).setY(y.length / yRowStride);
Allocation yAlloc = Allocation.createTyped(rs, typeUcharY.create());
yAlloc.copyFrom(y);
mYuv420.set_ypsIn(yAlloc);
Type.Builder typeUcharUV = new Type.Builder(rs, Element.U8(rs));
// note that the size of the u's and v's are as follows:
// ( (width/2)*PixelStride + padding ) * (height/2)
// = (RowStride ) * (height/2)
// but I noted that on the S7 it is 1 less...
typeUcharUV.setX(u.length);
Allocation uAlloc = Allocation.createTyped(rs, typeUcharUV.create());
uAlloc.copyFrom(u);
mYuv420.set_uIn(uAlloc);
Allocation vAlloc = Allocation.createTyped(rs, typeUcharUV.create());
vAlloc.copyFrom(v);
mYuv420.set_vIn(vAlloc);
// handover parameters
mYuv420.set_picWidth(width);
mYuv420.set_uvRowStride (uvRowStride);
mYuv420.set_uvPixelStride (uvPixelStride);
Bitmap outBitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);
Allocation outAlloc = Allocation.createFromBitmap(rs, outBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
Script.LaunchOptions lo = new Script.LaunchOptions();
lo.setX(0, width); // by this we ignore the y’s padding zone, i.e. the right side of x between width and yRowStride
//using safe height
lo.setY(0, y.length / yRowStride);
mYuv420.forEach_doConvert(outAlloc,lo);
outAlloc.copyTo(outBitmap);
return outBitmap;
}
Testing on Nexus 7 (API 22) this returns nice color Bitmaps. This device, however, has trivial pixelstrides (=1) and no padding (i.e. rowstride=width). Testing on the brandnew Samsung S7 (API 23) I get pictures whose colors are not correct - except of the green ones. But the Picture does not show a general bias towards green, it just seems that non-green colors are not reproduced correctly. Note, that the S7 applies an u/v pixelstride of 2, and no padding.
Since the most crucial code line is within the rs-code the Access of the u/v planes uint uvIndex= (...) I think, there could be the problem, probably with incorrect consideration of pixelstrides here. Does anyone see the solution? Thanks.
UPDATE: I checked everything, and I am pretty sure that the code regarding the access of y,u,v is correct. So the problem must be with the u and v values themselves. Non green colors have a purple tilt, and looking at the u,v values they seem to be in a rather narrow range of about 110-150. Is it really possible that we need to cope with device specific YUV -> RBG conversions...?! Did I miss anything?
UPDATE 2: have corrected code, it works now, thanks to Eddy's Feedback.

Look at
floor((float) uvPixelStride*(x)/2)
which calculates your U,V row offset (uv_row_offset) from the Y x-coordinate.
if uvPixelStride = 2, then as x increases:
x = 0, uv_row_offset = 0
x = 1, uv_row_offset = 1
x = 2, uv_row_offset = 2
x = 3, uv_row_offset = 3
and this is incorrect. There's no valid U/V pixel value at uv_row_offset = 1 or 3, since uvPixelStride = 2.
You want
uvPixelStride * floor(x/2)
(assuming you don't trust yourself to remember the critical round-down behavior of integer divide, if you do then):
uvPixelStride * (x/2)
should be enough
With that, your mapping becomes:
x = 0, uv_row_offset = 0
x = 1, uv_row_offset = 0
x = 2, uv_row_offset = 2
x = 3, uv_row_offset = 2
See if that fixes the color errors. In practice, the incorrect addressing here would mean every other color sample would be from the wrong color plane, since it's likely that the underlying YUV data is semiplanar (so the U plane starts at V plane + 1 byte, with the two planes interleaved)

For people who encounter error
android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type
use buffer.capacity() instead of buffer.remaining()
and if you already made some operations on the image, you'll need to call rewind() method on the buffer.

Furthermore for anyone else getting
android.support.v8.renderscript.RSIllegalArgumentException: Array too
small for allocation type
I fixed it by changing yAlloc.copyFrom(y); to yAlloc.copy1DRangeFrom(0, y.length, y);

Posting full solution to convert YUV->BGR (can be adopted for other formats too) and also rotate image to upright using renderscript. Allocation is used as input and byte array is used as output. It was tested on Android 8+ including Samsung devices too.
Java
/**
* Renderscript-based process to convert YUV_420_888 to BGR_888 and rotation to upright.
*/
public class ImageProcessor {
protected final String TAG = this.getClass().getSimpleName();
private Allocation mInputAllocation;
private Allocation mOutAllocLand;
private Allocation mOutAllocPort;
private Handler mProcessingHandler;
private ScriptC_yuv_bgr mConvertScript;
private byte[] frameBGR;
public ProcessingTask mTask;
private ImageListener listener;
private Supplier<Integer> rotation;
public ImageProcessor(RenderScript rs, Size dimensions, ImageListener listener, Supplier<Integer> rotation) {
this.listener = listener;
this.rotation = rotation;
int w = dimensions.getWidth();
int h = dimensions.getHeight();
Type.Builder yuvTypeBuilder = new Type.Builder(rs, Element.YUV(rs));
yuvTypeBuilder.setX(w);
yuvTypeBuilder.setY(h);
yuvTypeBuilder.setYuvFormat(ImageFormat.YUV_420_888);
mInputAllocation = Allocation.createTyped(rs, yuvTypeBuilder.create(),
Allocation.USAGE_IO_INPUT | Allocation.USAGE_SCRIPT);
//keep 2 allocations to handle different image rotations
mOutAllocLand = createOutBGRAlloc(rs, w, h);
mOutAllocPort = createOutBGRAlloc(rs, h, w);
frameBGR = new byte[w*h*3];
HandlerThread processingThread = new HandlerThread(this.getClass().getSimpleName());
processingThread.start();
mProcessingHandler = new Handler(processingThread.getLooper());
mConvertScript = new ScriptC_yuv_bgr(rs);
mConvertScript.set_inWidth(w);
mConvertScript.set_inHeight(h);
mTask = new ProcessingTask(mInputAllocation);
}
private Allocation createOutBGRAlloc(RenderScript rs, int width, int height) {
//Stored as Vec4, it's impossible to store as Vec3, buffer size will be for Vec4 anyway
//using RGB_888 as alternative for BGR_888, can be just U8_3 type
Type.Builder rgbTypeBuilderPort = new Type.Builder(rs, Element.RGB_888(rs));
rgbTypeBuilderPort.setX(width);
rgbTypeBuilderPort.setY(height);
Allocation allocation = Allocation.createTyped(
rs, rgbTypeBuilderPort.create(), Allocation.USAGE_SCRIPT
);
//Use auto-padding to be able to copy to x*h*3 bytes array
allocation.setAutoPadding(true);
return allocation;
}
public Surface getInputSurface() {
return mInputAllocation.getSurface();
}
/**
* Simple class to keep track of incoming frame count,
* and to process the newest one in the processing thread
*/
class ProcessingTask implements Runnable, Allocation.OnBufferAvailableListener {
private int mPendingFrames = 0;
private Allocation mInputAllocation;
public ProcessingTask(Allocation input) {
mInputAllocation = input;
mInputAllocation.setOnBufferAvailableListener(this);
}
#Override
public void onBufferAvailable(Allocation a) {
synchronized(this) {
mPendingFrames++;
mProcessingHandler.post(this);
}
}
#Override
public void run() {
// Find out how many frames have arrived
int pendingFrames;
synchronized(this) {
pendingFrames = mPendingFrames;
mPendingFrames = 0;
// Discard extra messages in case processing is slower than frame rate
mProcessingHandler.removeCallbacks(this);
}
// Get to newest input
for (int i = 0; i < pendingFrames; i++) {
mInputAllocation.ioReceive();
}
int rot = rotation.get();
mConvertScript.set_currentYUVFrame(mInputAllocation);
mConvertScript.set_rotation(rot);
Allocation allocOut = rot==90 || rot== 270 ? mOutAllocPort : mOutAllocLand;
// Run processing
// ain allocation isn't really used, global frame param is used to get data from
mConvertScript.forEach_yuv_bgr(allocOut);
//Save to byte array, BGR 24bit
allocOut.copyTo(frameBGR);
int w = allocOut.getType().getX();
int h = allocOut.getType().getY();
if (listener != null) {
listener.onImageAvailable(frameBGR, w, h);
}
}
}
public interface ImageListener {
/**
* Called when there is available image, image is in upright position.
*
* #param bgr BGR 24bit bytes
* #param width image width
* #param height image height
*/
void onImageAvailable(byte[] bgr, int width, int height);
}
}
RS
#pragma version(1)
#pragma rs java_package_name(com.affectiva.camera)
#pragma rs_fp_relaxed
//Script convers YUV to BGR(uchar3)
//current YUV frame to read pixels from
rs_allocation currentYUVFrame;
//input image rotation: 0,90,180,270 clockwise
uint32_t rotation;
uint32_t inWidth;
uint32_t inHeight;
//method returns uchar3 BGR which will be set to x,y in output allocation
uchar3 __attribute__((kernel)) yuv_bgr(uint32_t x, uint32_t y) {
// Read in pixel values from latest frame - YUV color space
uchar3 inPixel;
uint32_t xRot = x;
uint32_t yRot = y;
//Do not rotate if 0
if (rotation==90) {
//rotate 270 clockwise
xRot = y;
yRot = inHeight - 1 - x;
} else if (rotation==180) {
xRot = inWidth - 1 - x;
yRot = inHeight - 1 - y;
} else if (rotation==270) {
//rotate 90 clockwise
xRot = inWidth - 1 - y;
yRot = x;
}
inPixel.r = rsGetElementAtYuv_uchar_Y(currentYUVFrame, xRot, yRot);
inPixel.g = rsGetElementAtYuv_uchar_U(currentYUVFrame, xRot, yRot);
inPixel.b = rsGetElementAtYuv_uchar_V(currentYUVFrame, xRot, yRot);
// Convert YUV to RGB, JFIF transform with fixed-point math
// R = Y + 1.402 * (V - 128)
// G = Y - 0.34414 * (U - 128) - 0.71414 * (V - 128)
// B = Y + 1.772 * (U - 128)
int3 bgr;
//get red pixel and assing to b
bgr.b = inPixel.r +
inPixel.b * 1436 / 1024 - 179;
bgr.g = inPixel.r -
inPixel.g * 46549 / 131072 + 44 -
inPixel.b * 93604 / 131072 + 91;
//get blue pixel and assign to red
bgr.r = inPixel.r +
inPixel.g * 1814 / 1024 - 227;
// Write out
return convert_uchar3(clamp(bgr, 0, 255));
}

On a Samsung Galaxy Tab 5 (Tablet), android version 5.1.1 (22), with alleged YUV_420_888 format, the following renderscript math works well and produces correct colors:
uchar yValue = rsGetElementAt_uchar(gCurrentFrame, x + y * yRowStride);
uchar vValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) );
uchar uValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) + (xSize * ySize) / 4);
I do not understand why the horizontal value (i.e., y) is scaled by a factor of four instead of two, but it works well. I also needed to avoid use of rsGetElementAtYuv_uchar_Y|U|V. I believe the associated allocation stride value is set to zero instead of something proper. Use of rsGetElementAt_uchar() is a reasonable work-around.
On a Samsung Galaxy S5 (Smart Phone), android version 5.0 (21), with alleged YUV_420_888 format, I cannot recover the u and v values, they come through as all zeros. This results in a green looking image. Luminous is OK, but image is vertically flipped.

This code requires the use of the RenderScript compatibility library (android.support.v8.renderscript.*).
In order to get the compatibility library to work with Android API 23, I updated to gradle-plugin 2.1.0 and Build-Tools 23.0.3 as per Miao Wang's answer at How to create Renderscript scripts on Android Studio, and make them run?
If you follow his answer and get an error "Gradle version 2.10 is required" appears, do NOT change
classpath 'com.android.tools.build:gradle:2.1.0'
Instead, update the distributionUrl field of the Project\gradle\wrapper\gradle-wrapper.properties file to
distributionUrl=https\://services.gradle.org/distributions/gradle-2.10-all.zip
and change File > Settings > Builds,Execution,Deployment > Build Tools > Gradle >Gradle to Use default gradle wrapper as per "Gradle Version 2.10 is required." Error.

Re: RSIllegalArgumentException
In my case this was the case that buffer.remaining() was not multiple of stride:
The length of last line was less than stride (i.e. only up to where actual data was.)

An FYI in case someone else gets this as I was also getting "android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type" when trying out the code. In my case it turns out that the when allocating the buffer for Y i had to rewind the buffer because it was being left at the wrong end and wasn't copying the data. By doing buffer.rewind(); before allocation the new bytes array makes it work fine now.

Simple Renderscript Example failing

I am trying to create an image processing library, built in renderscript. I have been playing with the android samples for Renderscript here.
Renderscript appears to have everything I need to create this library, unfortunately I cannot seem to get many of the examples to work for me.
The ImageProcecssing example is a good example of how things tend to work for me. Most of the Script Intrinsics work out of the box, no errors. However, as soon as I move to a ScriptC file, even doing basic thngs tends to fail. And by fail, I mean
Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 21581 (enderscripttest)
So, to help debug, I have created a github repo with literally the most basic example I could come up with. It basically just attempts to apply a brightness filter to an imageview.
Relevant code:
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
imageView = (ImageView)findViewById(R.id.image);
BitmapFactory.Options opts = new BitmapFactory.Options();
opts.inSampleSize = 8;
originalBitmap = BitmapFactory.decodeResource(getResources(),R.drawable.colors,opts);
filteredBitmap = Bitmap.createBitmap(originalBitmap.getWidth(),originalBitmap.getHeight(), originalBitmap.getConfig());
//RENDERSCRIPT ALLOCATION
mRS = RenderScript.create(this);
mInAllocation = Allocation.createFromBitmap(mRS, originalBitmap,Allocation.MipmapControl.MIPMAP_NONE,Allocation.USAGE_SCRIPT);
mOutAllocation = Allocation.createTyped(mRS, mInAllocation.getType());
mOutAllocation.copyFrom(originalBitmap);
mOutAllocation.copyTo(filteredBitmap);
ScriptC_brightnessfilter helloworldScript = new ScriptC_brightnessfilter(mRS);
helloworldScript.set_brightnessValue(4.0f);
helloworldScript.bind_gPixels(mInAllocation);
helloworldScript.set_gIn(mInAllocation);
helloworldScript.set_gOut(mOutAllocation);
helloworldScript.set_gScript(helloworldScript);
helloworldScript.invoke_filter();
mOutAllocation.copyTo(filteredBitmap);
}
Then a renderscript file
#pragma version(1)
#pragma rs java_package_name(com.dss.renderscripttest)
float brightnessValue;
rs_allocation gIn;
rs_allocation gOut;
rs_script gScript;
static int mImageWidth;
const uchar4 *gPixels;
void root(const uchar4 *v_in, uchar4 *v_out, const void *usrData, uint32_t x, uint32_t y) {
float4 apixel = rsUnpackColor8888(*v_in);
float3 pixel = apixel.rgb;
float factor = brightnessValue;
pixel = pixel + factor;
pixel = clamp(pixel,0.0f,1.0f);
*v_out = rsPackColorTo8888(pixel.rgb);
}
void filter() {
mImageWidth = rsAllocationGetDimX(gIn);
rsDebug("Image size is ", rsAllocationGetDimX(gIn), rsAllocationGetDimY(gOut));
rsForEach(gScript, gIn, gOut, 0, 0);
}
I have only tested this on the Galaxy S3 and S4. I will test on the Nexus 4 tonight and see if I get a different result.
Edit:
I have confirmed that this code works on the Nexus 4. I will run through some other devices for good measure. Also I will see if I can get an APK together in case Stephen Hines or Tim Murray wants to look, but right now it seems only Galaxy S3 and S4 (both 4.3) are effected
Edit 2: I believe this is the same issue that is happening over here. I will update as both issues progress

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Android renderscript never runs on the gpu - android

Can you try to run it with rs_fp_relaxed instead of rs_fp_full? #pragma rs_fp_relaxed rs_fp_full will force your script running on CPU, since most GPUs don't support full precision floating point operations.

Related

Count pixels on the render target

Renderscript fails on GPU enabled driver if USAGE_SHARED

Android Camera2 Renderscript overheat issue

YUV_420_888 interpretation on Samsung Galaxy S7 (Camera2)

Simple Renderscript Example failing

Categories

Resources