I've got a bitmap and I need to remove all pixels that have alpha. Sounds easy, but I'm stuck with it.
I've got this Java code:
public static Bitmap overdrawAlphaBits(Bitmap image, int color) {
Bitmap coloredBitmap = image.copy(Bitmap.Config.ARGB_8888, true);
for (int y = 0; y < coloredBitmap.getHeight(); y++) {
for (int x = 0; x < coloredBitmap.getWidth(); x++) {
int pixel = coloredBitmap.getPixel(x, y);
if (pixel != 0) {
coloredBitmap.setPixel(x, y, color);
}
}
}
return coloredBitmap;
}
And it works fine, but slowly, processing of one bitmap takes around 2 second.
I'my trying with RenderScript. It works fast, but not stable.
here is my code:
public static Bitmap overdrawAlphaBits(Bitmap image, Context context) {
Bitmap blackbitmap = Bitmap.createBitmap(image.getWidth(), image.getHeight(), image.getConfig());
RenderScript mRS = RenderScript.create(context);
ScriptC_replace_with_main_green_color script = new ScriptC_replace_with_main_green_color(mRS);
Allocation allocationRaster0 = Allocation.createFromBitmap(mRS, image, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
Allocation allocationRaster1 = Allocation.createTyped(mRS, allocationRaster0.getType());
script.forEach_root(allocationRaster0, allocationRaster1);
allocationRaster1.copyTo(blackbitmap);
allocationRaster0.destroy();
allocationRaster1.destroy();
script.destroy();
mRS.destroy();
return blackbitmap;
}
And my .rs file:
void root(const uchar4 *v_in, uchar4 *v_out) {
uint32_t rValue = v_in->r;
uint32_t gValue = v_in->g;
uint32_t bValue = v_in->b;
uint32_t aValue = v_in->a;
if(rValue!=0 || gValue!=0 || bValue!=0 || aValue!=0){
v_out->r = 0x55;
v_out->g = 0xED;
v_out->b = 0x69;
}
}
So I use this method on multiple bitmaps - at first bitmap is works fine, but than I receive corrupted images. By the way when I apply this method again on first bitmap it also corrupts it.
Looks like there is not closed memory allocation or shared resources, idk.
Any ideas, please?
Maybe there is an easier solution?
Thanks everyone in advance!
Actually you can use getPixels method to read all pixels in array and than manipulate them. It works fast enough. The problem is that getPixel works slow.
So here is the code:
public static Bitmap overdrawAlphaBits(Bitmap image, int color) {
int[] pixels = new int[image.getHeight() * image.getWidth()];
image.getPixels(pixels, 0, image.getWidth(), 0, 0, image.getWidth(), image.getHeight());
for (int i = 0; i < image.getWidth() * image.getHeight(); i++) {
if (pixels[i] != 0) {
pixels[i] = color;
}
}
image.setPixels(pixels, 0, image.getWidth(), 0, 0, image.getWidth(), image.getHeight());
return image;
}
In your .rs file, I think the rValue, gValue, etc should be of type uchar, not uint32_t. Also the if-statement is missing an else-clause where the v_in values are copied to v_out, otherwise you get undefined output values. Note that in your Java code, the output bitmap is initialised as a copy. This is not the case in the renderscript code, where you create an output allocation of the same type as the input, but the values are not copied. Therefore you need to copy the input values in the kernel.
Related
I'm trying to convert an YUV image to grayscale, so basically I just need the Y values.
To do so I wrote this little piece of code (with frame being the YUV image):
imageConversionTime = System.currentTimeMillis();
size = frame.getSize();
byte nv21ByteArray[] = frame.getImage();
int lol;
for (int i = 0; i < size.width; i++) {
for (int j = 0; j < size.height; j++) {
lol = size.width*j + i;
yMatrix.put(j, i, nv21ByteArray[lol]);
}
}
bitmap = Bitmap.createBitmap(size.width, size.height, Bitmap.Config.ARGB_8888);
Utils.matToBitmap(yMatrix, bitmap);
imageConversionTime = System.currentTimeMillis() - imageConversionTime;
However, this takes about 13500 ms. I need it to be A LOT faster (on my computer it takes 8.5 ms in python) (I work on a Motorola Moto E 4G 2nd generation, not super powerful but it should be enough for converting images right?).
Any suggestions?
Thanks in advance.
First of all I would assign size.width and size.height to a variable. I don't think the compiler will optimize this by default, but I am not sure about this.
Furthermore Create a byte[] representing the result instead of using a Matrix.
Then you could do something like this:
int[] grayScalePixels = new int[size.width * size.height];
int cntPixels = 0;
In your inner loop set
grayScalePixels[cntPixels] = nv21ByteArray[lol];
cntPixels++;
To get your final image do the following:
Bitmap grayScaleBitmap = Bitmap.createBitmap(grayScalePixels, size.width, size.height, Bitmap.Config.ARGB_8888);
Hope it works properly (I have not tested it, however at least the shown principle should be applicable -> relying on a byte[] instead of Matrix)
Probably 2 years too late but anyways ;)
To convert to gray scale, all you need to do is set the u/v values to 128 and leave the y values as is. Note that this code is for YUY2 format. You can refer to this document for other formats.
private void convertToBW(byte[] ptrIn, String filePath) {
// change all u and v values to 127 (cause 128 will cause byte overflow)
byte[] ptrOut = Arrays.copyOf(ptrIn, ptrIn.length);
for (int i = 0, ptrInLength = ptrOut.length; i < ptrInLength; i++) {
if (i % 2 != 0) {
ptrOut[i] = (byte) 127;
}
}
convertToJpeg(ptrOut, filePath);
}
For NV21/NV12, I think the loop would change to:
for (int i = ptrOut.length/2, ptrInLength = ptrOut.length; i < ptrInLength; i++) {}
Note: (didn't try this myself)
Also I would suggest to profile your utils method and createBitmap functions separately.
So my issue is that I get for a video call the frames in my c code as a byte array of I420. Which I then convert to NV21 and send the byte array to create the bitmap. But because I need to create a YUV Image from the byte array, and then a bitmap from that, I have a conversion overhead and that is causing delays and loss in quality.
I am wondering if there is another way to do this. Somehow so that I can create the bitmap directly in the c code, and maybe even add it to the bitmap, or a surface view from the c code? Or just simply send the bitmap to my function so I can set it there, without needing to create the bitmap in Android.
This is what I do with the byte array in the c code:
if(size == 0)
return;
jboolean isAttached;
JNIEnv *env;
jint jParticipant;
jint jWidth;
jint jHeight;
jbyteArray jRawImageBytes;
env = getJniEnv(&isAttached);
if (env == NULL)
goto FAIL0;
//LOGE(".... **** ....TRYING TO FIND CALLBACK");
LOGI("FrameReceived will reach here 1");
char *modifiedRawImageBytes = malloc(size);
memcpy(modifiedRawImageBytes, rawImageBytes, size);
jint sizeWH = width * height;
jint quarter = sizeWH/4;
jint v0 = sizeWH + quarter;
for (int u = sizeWH, v = v0, o = sizeWH; u < v0; u++, v++, o += 2) {
modifiedRawImageBytes[o] = rawImageBytes[v]; // For NV21, V first
modifiedRawImageBytes[o + 1] = rawImageBytes[u]; // For NV21, U second
}
if(remote)
{
if(frameReceivedRemoteMethod == NULL)
frameReceivedRemoteMethod = getApplicationJniMethodId(env, applicationJniObj, "vidyoConferenceFrameReceivedRemoteCallback", "(III[B)V");
if (frameReceivedRemoteMethod == NULL) {
//LOGE(".... **** ....CALLBACK NOT FOUND");
goto FAIL1;
}
}
This is what I do in the Android java code:
remoteResolution = width + "x" + height;
remoteBAOS = new ByteArrayOutputStream();
remoteYUV = new YuvImage(rawImageBytes, ImageFormat.NV21, width, height, null);
remoteYUV.compressToJpeg(new Rect(0, 0, width, height), 100, remoteBAOS);
remoteBA = remoteBAOS.toByteArray();
remoteBitmap = BitmapFactory.decodeByteArray(remoteBA, 0, remoteBA.length);
new Handler(Looper.getMainLooper()).post(new Runnable() {
#Override
public void run() {
remoteView.setImageBitmap(remoteBitmap);
}
});
This is how the sample app of the SDK I am using had the sample. but I feel that this is not at all best practice, and there has to be a way to get the Bitmap quicker from the byte array, and preferably in the c code. Any ideas on how to improve this?
EDIT:
I modified my Java code. I know use this library: https://github.com/silvaren/easyrs
so my code will be:
remoteBitmap = Nv21Image.nv21ToBitmap(rs, rawImageBytes, width, height);
new Handler(Looper.getMainLooper()).post(new Runnable() {
#Override
public void run() {
remoteView.setImageBitmap(remoteBitmap);
}
});
Where nv21ToBitmap does this:
public static Bitmap yuvToRgb(RenderScript rs, Nv21Image nv21Image) {
long startTime = System.currentTimeMillis();
Type.Builder yuvTypeBuilder = new Type.Builder(rs, Element.U8(rs))
.setX(nv21Image.nv21ByteArray.length);
Type yuvType = yuvTypeBuilder.create();
Allocation yuvAllocation = Allocation.createTyped(rs, yuvType, Allocation.USAGE_SCRIPT);
yuvAllocation.copyFrom(nv21Image.nv21ByteArray);
Type.Builder rgbTypeBuilder = new Type.Builder(rs, Element.RGBA_8888(rs));
rgbTypeBuilder.setX(nv21Image.width);
rgbTypeBuilder.setY(nv21Image.height);
Allocation rgbAllocation = Allocation.createTyped(rs, rgbTypeBuilder.create());
ScriptIntrinsicYuvToRGB yuvToRgbScript = ScriptIntrinsicYuvToRGB.create(rs, Element.RGBA_8888(rs));
yuvToRgbScript.setInput(yuvAllocation);
yuvToRgbScript.forEach(rgbAllocation);
Bitmap bitmap = Bitmap.createBitmap(nv21Image.width, nv21Image.height, Bitmap.Config.ARGB_8888);
rgbAllocation.copyTo(bitmap);
Log.d("NV21", "Conversion to Bitmap: " + (System.currentTimeMillis() - startTime) + "ms");
return bitmap;
}
This is faster. but still I feel there still is some delay. Now that I get my bitmap from renderscript instead of using a YUV Image. Is it possible to set it to my imageView somehow faster? or set it on a surfaceView somehow?
EDIT: Solved! See below.
I need to crop my image (YUV422888 color space) which I obtain from the onImageAvailable listener of Camera2. I don't want or need to convert it to Bitmap as it affects performance a lot, and also I'm actually interested in luma and not in RGB information (which is contained in Plane 0 of the Image).
I came up with the following solution:
Get the Y' information contained in the Plane 0 of the Image object made available by Camera2 in the listener.
Convert the Y' Plane into a byte[] array in.
Convert the byte[] array to a 2d byte[][] array in order to crop.
Use some for loops to crop at desired left, right, top and bottom coordinates.
Fold the 2d byte[][] array back to a 1d byte[] array out, containing cropped luma Y' information.
Point 4 unfortunately yields a corrupt image. What am I doing wrong?
In the onImageAvailableListener of Camera2 (please note that although I am computing a bitmap, it's only to see what's happening, as I'm not interested in the Bitmap/RGB data):
Image.Plane[] planes = image.getPlanes();
ByteBuffer buffer = planes[0].getBuffer(); // Grab just the Y' Plane.
buffer.rewind();
byte[] data = new byte[buffer.capacity()];
buffer.get(data);
Bitmap bitmap = cropByteArray(data, image.getWidth(), image.getHeight()); // Just for preview/sanity check purposes. The bitmap is **corrupt**.
runOnUiThread(new bitmapRunnable(bitmap) {
#Override
public void run() {
image_view_preview.setImageBitmap(this.bitmap);
}
});
The cropByteArray function needs fixing. It outputs a bitmap that is corrupt, and should output an out byte[] array similar to in, but containing only the cropped area:
public Bitmap cropByteArray(byte[] in, int inw, int inh) {
int l = 100; // left crop start
int r = 400; // right crop end
int t = 400; // top crop start
int b = 700; // top crop end
int outw = r-l;
int outh = b-t;
byte[][] in2d = new byte[inw][inh]; // input width and height are 1080 x 1920.
byte[] out = new byte[outw*outh];
int[] pixels = new int[outw*outh];
i = 0;
for(int col = 0; col < inw; col++) {
for(int row = 0; row < inh; row++) {
in2d[col][row] = in[i++];
}
}
i = 0;
for(int col = l; col < r; col++) {
for(int row = t; row < b; row++) {
//out[i++] = in2d[col][row]; // out is the desired output of the function, but for now we output a bitmap instead
int grey = in2d[col][row] & 0xff;
pixels[i++] = 0xFF000000 | (grey * 0x00010101);
}
}
return Bitmap.createBitmap(pixels, inw, inh, Bitmap.Config.ARGB_8888);
}
EDIT Solved thanks to the suggestion by Eddy Talvala. The following code will yield the Y' (luma plane 0 from ImageReader) cropped to the desired coordinates. The cropped data is in the out byte array. The bitmap is generated just for confirmation. I am also attaching the handy YUVtoGrayscale() function below.
Image.Plane[] planes = image.getPlanes();
ByteBuffer buffer = planes[0].getBuffer();
int stride = planes[0].getRowStride();
buffer.rewind();
byte[] Y = new byte[buffer.capacity()];
buffer.get(Y);
int t=200; int l=600;
int out_h = 600; int out_w = 600;
byte[] out = new byte[out_w*out_h];
int firstRowOffset = stride * t + l;
for (int row = 0; row < out_h; row++) {
buffer.position(firstRowOffset + row * stride);
buffer.get(out, row * out_w, out_w);
}
Bitmap bitmap = YUVtoGrayscale(out, out_w, out_h);
Here goes the YUVtoGrayscale().
public Bitmap YUVtoGrayscale(byte[] yuv, int width, int height) {
int[] pixels = new int[yuv.length];
for (int i = 0; i < yuv.length; i++) {
int grey = yuv[i] & 0xff;
pixels[i] = 0xFF000000 | (grey * 0x00010101);
}
return Bitmap.createBitmap(pixels, width, height, Bitmap.Config.ARGB_8888);
}
There are some remaining issues. I am using the front camera and although the preview orientation is correct inside the TextureView, the image returned by ImageViewer is rotated clockwise and flipped vertically (a person is lying on their right cheek in the preview, only the right cheek is the left cheek because of the vertical flip) on my device which has sensor orientation of 270 deg. Is there an accepted solution to have both the preview and saved photos in the same, correct orientation using Camera2?
Cheers.
It'd be helpful if you described how the image is corrupt - do you see a valid image but it's distorted, or is it just total garbage, or just total black?
But I'm guessing you're not paying attention to the row stride of the Y plane (https://developer.android.com/reference/android/media/Image.Plane.html#getRowStride() ), which would typically result in an image that's skewed (vertical lines become angled lines).
When accessing the Y plane, the byte index of pixel (x,y) is:
y * rowStride + x
not
y * width + x
because row stride may be larger than width.
I'd also avoid copying so much; you really don't need the 2D array, and a large byte[] for the image also wastes memory.
You can instead seek() to the start of each output row, and then only read the bytes you need to copy straight into your destination byte[] out with ByteBuffer.get(byte[], offset, length).
That'd look something like
int stride = planes[0].getRowStride();
ByteBuffer img = planes[0].getBuffer();
int firstRowOffset = stride * t + l;
for (int row = 0; row < outh; row++) {
img.position(firstRowOffset + row * stride);
img.get(out, row * outw, outw);
}
I am developing an application which includes filters and crop too. Here I am using cropping library. Here I used 8*8 luts like sample lut. Here I want to CROP the filtered image(8*8 lut)
Here is the logic to crop the image.
Bitmap cropbitmap = ivCropimageView.getCroppedImage();
Using this bitmap I generate a thumbnail bitmap like below.
Bitmap thumbImage = ThumbnailUtils.extractThumbnail(cropbitmap, 190, 250);
When I am trying to generate thumbnails for all filters then the thumbnails are displaying as too noise like this.
This result is when I implemented the answer from renderscript.
So if anyone has ab idea please help me..
I'm working on a LUT applier library which eases the use of LUT images in Android. Now it also guesses the color axes of the LUT:
https://github.com/dntks/easyLUT/wiki
It uses the algorythm I mentioned in the other post
u can go through this, hope it will help you to get the right process.
photo is the main bitmap here.
mLut3D is the array of LUT images stored in drawable
RenderScript mRs;
Bitmap mLutBitmap, mBitmap;
ScriptIntrinsic3DLUT mScriptlut;
Bitmap mOutputBitmap;
Allocation mAllocIn;
Allocation mAllocOut;
Allocation mAllocCube;
int mFilter = 0;
mRs = RenderScript.create(yourActivity.this);
public Bitmap filterapply() {
int redDim, greenDim, blueDim;
int w, h;
int[] lut;
if (mScriptlut == null) {
mScriptlut = ScriptIntrinsic3DLUT.create(mRs, Element.U8_4(mRs));
}
if (mBitmap == null) {
mBitmap = photo;
}
mOutputBitmap = Bitmap.createBitmap(mBitmap.getWidth(),
mBitmap.getHeight(), mBitmap.getConfig());
mAllocIn = Allocation.createFromBitmap(mRs, mBitmap);
mAllocOut = Allocation.createFromBitmap(mRs, mOutputBitmap);
// }
mLutBitmap = BitmapFactory.decodeResource(getResources(),
mLut3D[mFilter]);
w = mLutBitmap.getWidth();
h = mLutBitmap.getHeight();
redDim = w / h;
greenDim = redDim;
blueDim = redDim;
int[] pixels = new int[w * h];
lut = new int[w * h];
mLutBitmap.getPixels(pixels, 0, w, 0, 0, w, h);
int i = 0;
for (int r = 0; r < redDim; r++) {
for (int g = 0; g < greenDim; g++) {
int p = r + g * w;
for (int b = 0; b < blueDim; b++) {
lut[i++] = pixels[p + b * h];
}
}
}
Type.Builder tb = new Type.Builder(mRs, Element.U8_4(mRs));
tb.setX(redDim).setY(greenDim).setZ(blueDim);
Type t = tb.create();
mAllocCube = Allocation.createTyped(mRs, t);
mAllocCube.copyFromUnchecked(lut);
mScriptlut.setLUT(mAllocCube);
mScriptlut.forEach(mAllocIn, mAllocOut);
mAllocOut.copyTo(mOutputBitmap);
return mOutputBitmap;
}
you increase the mFilter value to get different filter effect with different LUT images, you have, check it out.
you can go through the this link on github for more help, i got the answer from here:-
https://github.com/RenderScript/RsLutDemo
hope it will help
In order to align the intensity values of two grayscale Images (as a first step for further processing) I wrote a Java method that:
converts the bitmaps of the two images into two int[] arrays containing the bitmap's intensities (I just take the red component here, since it's grayscale, i.e. r=g=b ).
public static int[] bmpToData(Bitmap bmp){
int width = bmp.getWidth();
int height = bmp.getHeight();
int anzpixel = width*height;
int [] pixels = new int[anzpixel];
int [] data = new int[anzpixel];
bmp.getPixels(pixels, 0, width, 0, 0, width, height);
for (int i = 0 ; i < anzpixel ; i++) {
int p = pixels[i];
int r = (p & 0xff0000) >> 16;
//int g = (p & 0xff00) >> 8;
//int b = p & 0xff;
data[i] = r;
}
return data;
}
aligns the cumulated intensity distributions of Bitmap 2 to that of Bitmap 1
//aligns the intensity distribution of a grayscale picture moving (given by int[] //data2) the the intensity distribution of a reference picture fixed (given by // int[] data1)
public static int[] histMatch(int[] data1, int[] data2){
int anzpixel = data1.length;
int[] histogram_fixed = new int[256];
int[] histogram_moving = new int[256];
int[] cumhist_fixed = new int[256];
int[] cumhist_moving = new int[256];
int i=0;
int j=0;
//read intensities of fixed und moving in histogram
for (int n = 0; n < anzpixel; n++) {
histogram_fixed[data1[n]]++;
histogram_moving[data2[n]]++;
}
// calc cumulated distributions
cumhist_fixed[0]=histogram_fixed[0];
cumhist_moving[0]=histogram_moving[0];
for ( i=1; i < 256; ++i ) {
cumhist_fixed[i] = cumhist_fixed[i-1]+histogram_fixed[i];
cumhist_moving[i] = cumhist_moving[i-1]+histogram_moving [i];
}
// look-up-table lut[]. For each quantile i of the moving picture search the
// value j of the fixed picture where the quantile is the same as that of moving
int[] lut = new int[anzpixel];
j=0;
for ( i=0; i < 256; ++i ){
while(cumhist_fixed[j]< cumhist_moving[i]){
j++;
}
// check, whether the distance to the next-lower intensity is even lower, and if so, take this value
if ((j!=0) && ((cumhist_fixed[j-1]- cumhist_fixed[i]) < (cumhist_fixed[j]- cumhist_fixed[i]))){
lut[i]= (j-1);
}
else {
lut[i]= (j);
}
}
// apply the lut[] to moving picture.
i=0;
for (int n = 0; n < anzpixel; n++) {
data2[n]=(int) lut[data2[n]];
}
return data2;
}
converts the int[] arrays back to Bitmap.
public static Bitmap dataToBitmap(int[] data, int width, int heigth) {
int index=0;
Bitmap bmp = Bitmap.createBitmap(width, heigth, Bitmap.Config.ARGB_8888);
for (int x = 0; x < width; x++) {
for (int y = 0; y < heigth; y++) {
index=y*width+x;
int c = data[index];
bmp.setPixel(x,y,Color.rgb(c, c, c));
}
}
return bmp;
}
While the core procedure 2) is straightforward and fast, the conversion steps 1) and 3) are rather inefficient. It would be more than cool to do the whole thing in Renderscript. But, honestly, I am completely lost in doing so because of missing documentation and, while there are many impressing examples on what Renderscript COULD perform, I don't see a way to benefit from these possibilities (no books, no docu). Any advice is highly appreciated!
As a starting point, use Android Studio to "Import Sample..." and select Basic Render Script. This will give you a working project that we will now modify.
First, let's add more Allocation references to MainActivity. We will use them to communicate image data, histograms and the LUT between Java and Renderscript.
private Allocation mInAllocation;
private Allocation mInAllocation2;
private Allocation[] mOutAllocations;
private Allocation mHistogramAllocation;
private Allocation mHistogramAllocation2;
private Allocation mLUTAllocation;
Then in onCreate() load another image, which you will also need to add to /res/drawables/.
mBitmapIn2 = loadBitmap(R.drawable.cat_480x400);
In createScript() create additional allocations:
mInAllocation2 = Allocation.createFromBitmap(mRS, mBitmapIn2);
mHistogramAllocation = Allocation.createSized(mRS, Element.U32(mRS), 256);
mHistogramAllocation2 = Allocation.createSized(mRS, Element.U32(mRS), 256);
mLUTAllocation = Allocation.createSized(mRS, Element.U32(mRS), 256);
And now the main part (in RenderScriptTask):
/*
* Invoke histogram kernel for both images
*/
mScript.bind_histogram(mHistogramAllocation);
mScript.forEach_compute_histogram(mInAllocation);
mScript.bind_histogram(mHistogramAllocation2);
mScript.forEach_compute_histogram(mInAllocation2);
/*
* Variables copied verbatim from your code.
*/
int []histogram_fixed = new int[256];
int []histogram_moving = new int[256];
int[] cumhist_fixed = new int[256];
int[] cumhist_moving = new int[256];
int i=0;
int j=0;
// copy computed histograms to Java side
mHistogramAllocation.copyTo(histogram_fixed);
mHistogramAllocation2.copyTo(histogram_moving);
// your code again...
// calc cumulated distributions
cumhist_fixed[0]=histogram_fixed[0];
cumhist_moving[0]=histogram_moving[0];
for ( i=1; i < 256; ++i ) {
cumhist_fixed[i] = cumhist_fixed[i-1]+histogram_fixed[i];
cumhist_moving[i] = cumhist_moving[i-1]+histogram_moving [i];
}
// look-up-table lut[]. For each quantile i of the moving picture search the
// value j of the fixed picture where the quantile is the same as that of moving
int[] lut = new int[256];
j=0;
for ( i=0; i < 256; ++i ){
while(cumhist_fixed[j]< cumhist_moving[i]){
j++;
}
// check, whether the distance to the next-lower intensity is even lower, and if so, take this value
if ((j!=0) && ((cumhist_fixed[j-1]- cumhist_fixed[i]) < (cumhist_fixed[j]- cumhist_fixed[i]))){
lut[i]= (j-1);
}
else {
lut[i]= (j);
}
}
// copy the LUT to Renderscript side
mLUTAllocation.copyFrom(lut);
mScript.bind_LUT(mLUTAllocation);
// Apply LUT to the destination image
mScript.forEach_apply_histogram(mInAllocation2, mInAllocation2);
/*
* Copy to bitmap and invalidate image view
*/
//mOutAllocations[index].copyTo(mBitmapsOut[index]);
// copy back to Bitmap in preparation for viewing the results
mInAllocation2.copyTo((mBitmapsOut[index]));
Couple notes:
In your part of the code I also fixed LUT allocation size - only 256 locations are needed,
As you can see, I left the computation of cumulative histogram and LUT on Java side. These are rather difficult to efficiently parallelize due to data dependencies and small scale of the calculations, but considering the latter I don't think it's a problem.
Finally, the Renderscript code. The only non-obvious part is the use of rsAtomicInc() to increase values in histogram bins - this is necessary due to potentially many threads attempting to increase the same bin concurrently.
#pragma version(1)
#pragma rs java_package_name(com.example.android.basicrenderscript)
#pragma rs_fp_relaxed
int32_t *histogram;
int32_t *LUT;
void __attribute__((kernel)) compute_histogram(uchar4 in)
{
volatile int32_t *addr = &histogram[in.r];
rsAtomicInc(addr);
}
uchar4 __attribute__((kernel)) apply_histogram(uchar4 in)
{
uchar val = LUT[in.r];
uchar4 result;
result.r = result.g = result.b = val;
result.a = in.a;
return(result);
}