The x y parameters definition - android

I'm trying to understand the mapping kernel in Renderscript.
A sample mapping kernel looks like this
uchar4 RS_KERNEL invert(uchar4 in, uint32_t x, uint32_t y) {
uchar4 out = in;
out.r = 255 - in.r;
out.g = 255 - in.g;
out.b = 255 - in.b;
return out;
}
However, there is not much clarity regarding what the x, y parameters refer to ( whether x points to height or width of the given pixel in a bitmap)
The official documentation only says so much about x, y
A mapping kernel function or a reduction kernel accumulator function may access the coordinates of the current execution using the special arguments x, y, and z, which must be of type int or uint32_t. These arguments are optional.
This is critical information as interchanging and accessing can lead to out of bounds error. If you have worked on it, please give your insights on this.

The x and y (and z, if using a 3D allocation) are the width and height (and depth for 3D). This means that the in parameter of your kernel function corresponds to the data in your allocation at the point x, y.

Related

RenderScript low performance on Samsung Galaxy S8

Context
I have an Android app that takes a picture, blurs the picture, removes the blur based on a mask and applies a final layer (not relevant). The last 2 steps, removing the blur based on a mask and applying a final layer is done repeatedly, each time with a new mask (150 masks).
The output get's drawn on a canvas (SurfaceView). This way the app effectively creates a view of the image with an animated blur.
Technical details & code
All of these image processing steps are achieved with RenderScript.
I'm leaving out the code for step 1, blurring the picture, since this is irrelevant for the problem I'm facing.
Step 2: removing the blur based on a mask
I have a custom kernel which takes an in Allocation as argument and holds 2 global variables, which are Allocations as well.
These 3 Allocations all get their data from bitmaps using Allocation.copyFrom(bitmap).
Step 3: applying a final layer
Here I have a custom kernel as well which takes an in Allocation as argument and holds 3 global variables, of which 1 is and Allocation and 2 are floats.
How these kernels work is irrelevant to this question but just to be sure I included some simplified snippets below.
Another thing to note is that I am following all best practices to ensure performance is at its best regarding Allocations, RenderScript and my SurfaceView.
So common mistakes such as creating a new RenderScript instance each time, not re-using Allocations when possible,.. are safe to ignore.
blurMask.rs
#pragma version(1)
#pragma rs java_package_name(com.example.rs)
#pragma rs_fp_relaxed
// Extra allocations
rs_allocation allocBlur;
rs_allocation allocBlurMask;
/*
* RenderScript kernel that performs a masked blur manipulation.
* Blur Pseudo: out = original * blurMask + blur * (1.0 - blurMask)
* -> Execute this for all channels
*/
uchar4 __attribute__((kernel)) blur_mask(uchar4 inOriginal, uint32_t x, uint32_t y) {
// Manually getting current element from the blur and mask allocations
uchar4 inBlur = rsGetElementAt_uchar4(allocBlur, x, y);
uchar4 inBlurMask = rsGetElementAt_uchar4(allocBlurMask, x, y);
// normalize to 0.0 -> 1.0
float4 inOriginalNorm = rsUnpackColor8888(inOriginal);
float4 inBlurNorm = rsUnpackColor8888(inBlur);
float4 inBlurMaskNorm = rsUnpackColor8888(inBlurMask);
inBlurNorm.rgb = inBlurNorm.rgb * 0.7 + 0.3;
float4 outNorm = inOriginalNorm;
outNorm.rgb = inOriginalNorm.rgb * inBlurMaskNorm.rgb + inBlurNorm.rgb * (1.0 - inBlurMaskNorm.rgb);
return rsPackColorTo8888(outNorm);
}
myKernel.rs
#pragma version(1)
#pragma rs java_package_name(com.example.rs)
#pragma rs_fp_relaxed
// Extra allocations
rs_allocation allocExtra;
// Randoms; Values are set from kotlin, the values below just act as a min placeholder.
float randB = 0.1f;
float randW = 0.75f;
/*
* RenderScript kernel that performs a manipulation.
*/
uchar4 __attribute__((kernel)) my_kernel(uchar4 inOriginal, uint32_t x, uint32_t y) {
// Manually getting current element from the extra allocation
uchar4 inExtra = rsGetElementAt_uchar4(allocExtra, x, y);
// normalize to 0.0 -> 1.0
float4 inOriginalNorm = rsUnpackColor8888(inOriginal);
float4 inExtraNorm = rsUnpackColor8888(inExtra);
float4 outNorm = inOriginalNorm;
if (inExtraNorm.r > 0.0) {
outNorm.rgb = inOriginalNorm.rgb * 0.7 + 0.3;
// Separate channel operation since we are using inExtraNorm.r everywhere
outNorm.r = outNorm.r * inExtraNorm.r + inOriginalNorm.r * (1.0 - inExtraNorm.r);
outNorm.g = outNorm.g * inExtraNorm.r + inOriginalNorm.g * (1.0 - inExtraNorm.r);
outNorm.b = outNorm.b * inExtraNorm.r + inOriginalNorm.b * (1.0 - inExtraNorm.r);
}
else if (inExtraNorm.g > 0.0) {
...
}
return rsPackColorTo8888(outNorm);
}
Problem
So the app works great on a range of devices, even on low-end devices. I manually cap the FPS at 15, but when I remove this cap, I get results ranging from 15-20 on low-end devices to 35-40 on high-end devices.
The Samsung Galaxy S8 is where my problem occurs. For some reason I only manage to get around 10 FPS. If I use adb to force RenderScript to use CPU instead:
adb shell setprop debug.rs.default-CPU-driver 1
I get around 12-15 FPS, but obviously I want it to run on the GPU.
An important, weird thing I noticed
If I trigger a touch event, no matter where (even out of the app), the performance dramatically increases to around 35-40 FPS. If I lift my finger from the screen again, FPS drops back to 10 FPS.
NOTE: drawing the result on the SurfaceView can be excluded as an impacting factor since the results are the same with just the computation in RenderScript without drawing the actual result.
Questions
So I have more than one question really:
What could be the reason behind the low performance?
Why would a touch event improve this performance so dramatically?
How could I solve or work around this issue?

how to check ray intersection with object in ARCore

Is there a way to check if I touched the object on the screen ? As I understand the HitResult class allows me to check if I touched the recognized and maped surface. But I want to check this I touched the object that is set on that surface.
ARCore doesn't really have a concept of an object, so we can't directly provide that. I suggest looking at ray-sphere tests for a starting point.
However, I can help with getting the ray itself (to be added to HelloArActivity):
/**
* Returns a world coordinate frame ray for a screen point. The ray is
* defined using a 6-element float array containing the head location
* followed by a normalized direction vector.
*/
float[] screenPointToWorldRay(float xPx, float yPx, Frame frame) {
float[] points = new float[12]; // {clip query, camera query, camera origin}
// Set up the clip-space coordinates of our query point
// +x is right:
points[0] = 2.0f * xPx / mSurfaceView.getMeasuredWidth() - 1.0f;
// +y is up (android UI Y is down):
points[1] = 1.0f - 2.0f * yPx / mSurfaceView.getMeasuredHeight();
points[2] = 1.0f; // +z is forwards (remember clip, not camera)
points[3] = 1.0f; // w (homogenous coordinates)
float[] matrices = new float[32]; // {proj, inverse proj}
// If you'll be calling this several times per frame factor out
// the next two lines to run when Frame.isDisplayRotationChanged().
mSession.getProjectionMatrix(matrices, 0, 1.0f, 100.0f);
Matrix.invertM(matrices, 16, matrices, 0);
// Transform clip-space point to camera-space.
Matrix.multiplyMV(points, 4, matrices, 16, points, 0);
// points[4,5,6] is now a camera-space vector. Transform to world space to get a point
// along the ray.
float[] out = new float[6];
frame.getPose().transformPoint(points, 4, out, 3);
// use points[8,9,10] as a zero vector to get the ray head position in world space.
frame.getPose().transformPoint(points, 8, out, 0);
// normalize the direction vector:
float dx = out[3] - out[0];
float dy = out[4] - out[1];
float dz = out[5] - out[2];
float scale = 1.0f / (float) Math.sqrt(dx*dx + dy*dy + dz*dz);
out[3] = dx * scale;
out[4] = dy * scale;
out[5] = dz * scale;
return out;
}
If you're calling this several times per frame see the comment about the getProjectionMatrix and invertM calls.
Apart from Mouse Picking with Ray Casting, cf. Ian's answer, the other commonly used technique is a picking buffer, explained in detail (with C++ code) here
The trick behind 3D picking is very simple. We will attach a running
index to each triangle and have the FS output the index of the
triangle that the pixel belongs to. The end result is that we get a
"color" buffer that doesn't really contain colors. Instead, for each
pixel which is covered by some primitive we get the index of this
primitive. When the mouse is clicked on the window we will read back
that index (according to the location of the mouse) and render the
select triangle red. By combining a depth buffer in the process we
guarantee that when several primitives are overlapping the same pixel
we get the index of the top-most primitive (closest to the camera).
So in a nutshell:
Every object's draw method needs an ongoing index and a boolean for whether this draw renders the pixel buffer or not.
The render method converts the index into a grayscale color and the scene is rendered
After the whole rendering is done, retrieve the pixel color at the touch position GL11.glReadPixels(x, y, /*the x and y of the pixel you want the colour of*/). Then translate the color back to an index and the index back to an object. VoilĂ , you have your clicked object.
To be fair, for a mobile usecase you should probably read a 10x10 rectangle, iterate trough it and pick the first found non-background color - because touches are never that precise.
This approach works independently of the complexity of your objects

Renderscript Documentation and Advice - Android

I have been following this guide on how to use Render-script on Android.
http://www.jayway.com/2014/02/11/renderscript-on-android-basics/
My code is this (I got a wrapper class for the script):
public class PixelCalcScriptWrapper {
private Allocation inAllocation;
private Allocation outAllocation;
RenderScript rs;
ScriptC_pixelsCalc script;
public PixelCalcScriptWrapper(Context context){
rs = RenderScript.create(context);
script = new ScriptC_pixelsCalc(rs, context.getResources(), R.raw.pixelscalc);
};
public void setInAllocation(Bitmap bmp){
inAllocation = Allocation.createFromBitmap(rs,bmp);
};
public void setOutAllocation(Bitmap bmp){
outAllocation = Allocation.createFromBitmap(rs,bmp);
};
public void forEach_root(){
script.forEach_root(inAllocation, outAllocation);
}
}
This methods calls the script:
public Bitmap processBmp(Bitmap bmp, Bitmap bmpCopy) {
pixelCalcScriptWrapper.setInAllocation(bmp);
pixelCalcScriptWrapper.setOutAllocation(bmpCopy);
pixelCalcScriptWrapper.forEach_root();
return bmpCopy;
};
and here is my script:
#pragma version(1)
#pragma rs java_package_name(test.foo)
void root(const uchar4 *in, uchar4 *out, uint32_t x, uint32_t y) {
float3 pixel = convert_float4(in[0]).rgb;
if(pixel.z < 128) {
pixel.z = 0;
}else{
pixel.z = 255;
}
if(pixel.y < 128) {
pixel.y = 0;
}else{
pixel.y = 255;
}
if(pixel.x < 128) {
pixel.x = 0;
}else{
pixel.x = 255;
}
out->xyz = convert_uchar3(pixel);
}
Now where can I find some documentation about this ?
For example, I have these questions:
1) What does this convert_float4(in[0]) do ?
2) What does the rgb return here convert_float4(in[0]).rgb;?
3) What is float3 ?
4) I don't know where to start with this line out->xyz = convert_uchar3(pixel);
5) I am assuming in the parameters, in and out are the Allocations passed?
what are x and y?
http://developer.android.com/guide/topics/renderscript/reference/rs_convert.html#android_rs:convert
What does this convert_float4(in[0]) do?
convert_float4 will convert from a uchar4 to a float4;
.rgb turns it into a float3 of the first 3 elements.
What does the rgb return?
RenderScript vector types have .r .g .b .a or
.x .y .z .w representing the first, second, third and forth element respectively. You can use any combination (e.g. .xy or .xwy)
What is float3?
float3 is a "vector type" sort of like a float but 3 of them.
There are float2, float3 and float4 vector types of float.
(there are uchar4, int4 etc.)
http://developer.android.com/guide/topics/renderscript/reference/overview.html might be helpful
I hope this helps.
1) In the kernel, the in pointer is a 4-element unsigned char, that is, it represents a pixel color with R, G, B and A values in the 0-255 range. So convert_float4 simply casts each of the four uchar as a float. In this particular code you are using, it probably doesn't make much sense to work with floats, since you're doing a simple threshold, and you could just as well had worked with the uchar data directly. Using floats is better aimed when doing other types of image processing algorithms where you do need to have the extra precision (example: blurring an image).
2) The .rgb suffix is a shorthand to return only the first three values of the float4, i.e. the R, G, and B values. If you had used only .r it would give you the first value as a regular float, if you had used .g it would give you the second value as a float, etc... These three values are then assigned to that float3 variable, which now represents the pixel with only three color channels (that is, no A alpha channel).
3) See #2.
4) Now convert_uchar3 is again another cast that converts the float3 pixel variable back to a uchar3 variable. You are assigning the three values to each of the x, y, and z elements in that order. This is probably a good time to mention that X, Y and Z are completely interchangeable with R, G and B. That statement could just as well have used out->rgb, and it would actually have been more readable that way. Note that out is a uchar4, and by doing this, you are assigning only the first three "rgb" or "xyz" elements in that pointer, the fourth element is left undefined here.
5) Yes, in is the input pixel, out is the output pixel. Then x and y are the x, and y coordinates of the pixel in the overall image. This kernel function is going to be called once for every pixel in the image/allocation you're working with, and so it's usually good to know what coordinate you're at when processing an image. In this particular example since it's only thresholding all pixels in the same way, the coordinates are irrelevant.
Good documentation on RenderScript is very hard to find. I would greatly recommend you take a look at these two videos though, as they will give you a much better sense of how RenderScript works:
AnDevCon: A Deep Dive into RenderScript
Google I/O 2013 - High Performance Applications with RenderScript
Keep in mind that both videos are a couple years old, so some minor details may have changed on the recent APIs, but overall, those are probably the best sources of information for RS.

RenderScript Bound Pointers vs. Allocations

Does RenderScript guarantee the memory layout or stride in global pointers bound from the Java layer?
I read somewhere that it is best to use rsGetElementAt / rsSetElementAt functions because the layout is not guaranteed.
But elsewhere it was said to avoid those when targetting GPU optimizations, whereas bound pointers are ok.
In my particular case, I need the kernel to access the value of many surrounding pixels. So far, I have done quite well with float pointers bound from the Java layer.
Java:
script.set_width(inputWidth);
script.bind_input(inputAllocation);
RS:
int width;
float *input;
void root(const float *v_in, float *v_out, uint32_t x, uint32_t y) {
int current = x + width * y;
int above = current - width;
int below = current + width;
*v_out = input[above - 1] + input[above ] + input[above + 1] +
input[current - 1] + input[current] + input[current + 1] +
input[below - 1] + input[below ] + input[below + 1] ;
}
This is a trivial simplification of what I'm actually doing, just to easily illustrate with an example. In reality, I'm doing far more of these combinations and with multiple input images at the same time, so much so, that simly pre-computing the positions for the "above" and "below" rows helps a great deal with the processing time.
As long as memory is guaranteed to be sequential and in the same order you'd normally expect, all is good, and so far I haven't had any problems on my test devices.
But if this memory layout is truly not guaranteed across all devices/processors, and the stride can actually vary, then my code would obviously break and I'd be forced to use rsGetElementAt, such as:
Java:
script.set_input(inputAllocation);
RS:
rs_allocation input;
void root(const float *v_in, float *v_out, uint32_t x, uint32_t y) {
*v_out = rsGetElementAt_float(input, x - 1, y - 1) + rsGetElementAt_float(input, x, y - 1) + rsGetElementAt_float(input, x + 1, y - 1) +
rsGetElementAt_float(input, x - 1, y ) + rsGetElementAt_float(input, x, y ) + rsGetElementAt_float(input, x + 1, y ) +
rsGetElementAt_float(input, x - 1, y + 1) + rsGetElementAt_float(input, x, y + 1) + rsGetElementAt_float(input, x + 1, y + 1) ;
}
The average execution time of the script using rsGetElementAt() (710 ms) is almost twice as much as that of the kernel using input[] (390 ms), I'm guessing because each call must independently re-compute the memory offset for the given x,y coordinates.
My script needs to run continuously, so I'm trying to get every possible bit of performance out of it, and it would be a real pity to ignore such a considerable speedup.
So I'm wondering if anyone could shed some light on this.
Are there really any cases under which bound pointers will not be fully sequential, and is there a way to force them to be?
Is rsGetElementAt() truly necessary in this case, or is it safe to keep using bound pointers relying on a pre-defined stride?
Bound pointers are only guaranteed to be sequential for simple 1D allocations. Any type with more than one dimension should be accessed with get/setElementAt_.
Comments on performance:
rsGetElementAt_float() will typically outperform rsGetElementAt() because it knows the type and can avoid the lookup for stride. This is true of all the typed get/set methods.
Which OS version are you testing on? 4.4 brought some major improvements to this type of code which should be able to pull the address calculations out of the loops for many cases.
The manipulate the pointers approach will force some GPU driver to fallback to the safe path.
Some newer drivers (4.4.1) will be using the HW address calculation unit removing the overhead completely.

Difference between rotation methods?

My question is in about rotate methods in android.graphics.Camera.In Docs,I saw these comments:
public void rotateX (float deg) Since: API Level 1
Applies a rotation
transform around the X axis.
public void rotate (float x, float y, float z) Since: API Level 12
Applies a rotation transform around all three axis.
There is my question:What is difference between using rotate (float x, float y, float z) and a sequence of rotate* methods,for example difference between these two snippets A and B:
A)
camera.rotate (x, y, z);
B)
camera.rotateX (x);
camera.rotateY (y);
camera.rotateZ (z);
The importance lies in the order the rotations are applied in.
Consider for example, an aircraft flying forward which first rotates 90 degrees on its Z axis (roll) and then rotates 90 degrees on its X axis (pitch). The result is that the aircraft is now flying to the right with its right wing pointing downward. Now consider the operation in reverse order with a 90 degree pitch followed by a 90 degree roll. The aircraft is now flying up with its right wing pointing forward (these results may vary depending on your coordinate system).
camera.rotate provides a quick and easy function for applying all three rotations using one function. The reason for the remaining three rotation functions is to allow for situations in which the developer wants to apply one or more of the rotations in a specific order.
Looking at the source in frameworks/base/core/jni/android/graphics/Camera.cpp there is no difference:
static void Camera_rotate(JNIEnv* env, jobject obj, jfloat x, jfloat y, jfloat z) {
Sk3DView* v = (Sk3DView*)env->GetIntField(obj, gNativeInstanceFieldID);
v->rotateX(SkFloatToScalar(x));
v->rotateY(SkFloatToScalar(y));
v->rotateZ(SkFloatToScalar(z));
}

Categories

Resources