OpenGL ES 3.1, Android.
I have set up SSBO with the intention to write something in fragment shader and read it back in the application. Things almost work, i.e. I can read back the value I have written, with one issue: when I read an INT, its bytes come reversed (a '17' = 0x00000011 written in the shader comes back as '285212672' = 0x11000000 ).
Here's how I do it:
Shader
(...)
layout (std140,binding=0) buffer SSBO
{
int ssbocount[];
};
(...)
ssbocount[0] = 17;
(...)
Application code
int SIZE = 40;
int[] mSSBO = new int[1];
ByteBuffer buf = ByteBuffer.allocateDirect(SIZE).order(ByteOrder.nativeOrder());
(...)
glGenBuffers(1,mSSBO,0);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, mSSBO[0]);
glBufferData(GL_SHADER_STORAGE_BUFFER, SIZE, null, GL_DYNAMIC_READ);
buf = (ByteBuffer) glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, SIZE, GL_MAP_READ_BIT );
glBindBufferBase(GL_SHADER_STORAGE_BUFFER,0, mSSBO[0]);
(...)
int readValue = buf.getInt(0);
Now print out the Value and it comes as '17' with reversed bytes.
Notice I DO allocate the ByteBuffer with 'nativeOrder'. Of course, I could manually flip the bytes, but the concern is this would only sometimes work, depending on the endianness of the host machine...
The fix is to use native endianess, and create an integer view of the ByteBuffer using ByteBuffer.asIntBuffer(). For some reason the local getInt() calls do not seem to respect the local ByteBuffer endianness settings.
Related
I'm a bit puzzled with internal representation of Bitmap's pixels in ByteBuffer (testing on ARM/little endian):
1) In the Java layer I create an ARGB bitmap and fill it with 0xff112233 color:
Bitmap sampleBitmap = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888);
Canvas canvas = new Canvas(sampleBitmap);
Paint paint = new Paint();
paint.setStyle(Paint.Style.FILL);
paint.setColor(Color.rgb(0x11,0x22, 0x33));
canvas.drawRect(0,0, sampleBitmap.getWidth(), sampleBitmap.getHeight(), paint);
To test, sampleBitmap.getPixel(0,0) indeed returns 0xff112233 that matches ARGB pixel format.
2) The bitmap is packed into direct ByteBuffer before passing to the native layer:
final int byteSize = sampleBitmap.getAllocationByteCount();
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(byteSize);
//byteBuffer.order(ByteOrder.LITTLE_ENDIAN);// See below
sampleBitmap.copyPixelsToBuffer(byteBuffer);
To test, regardless of the buffer's order setting, in the debugger I see the byte layout which doesn't quite match ARGB but more like a big endian RGBA (or little endian ABGR!?)
byteBuffer.rewind();
final byte [] out = new byte[4];
byteBuffer.get(out, 0, out.length);
out = {byte[4]#12852}
0 = (0x11)
1 = (0x22)
2 = (0x33)
3 = (0xFF)
Now, I'm passing this bitmap to the native layer where I must extract pixels and I would expect Bitmap.Config.ARGB_8888 to be represented, depending on buffer's byte order as:
a) byteBuffer.order(ByteOrder.LITTLE_ENDIAN):
out = {byte[4]#12852}
0 = (0x33)
1 = (0x22)
2 = (0x11)
3 = (0xFF)
or
b) byteBuffer.order(ByteOrder.BIG_ENDIAN):
out = {byte[4]#12852}
0 = (0xFF)
1 = (0x11)
2 = (0x22)
3 = (0x33)
I can make the code which extracts the pixels work based on above output but I don't like it since I can't explain the behaviour which I hope someone will do :)
Thanks!
Let's take a look at the implementation. Both getPixel and copyPixelsToBuffer just call their native counterparts.
Bitmap_getPixels specifies an output format:
SkImageInfo dstInfo = SkImageInfo::Make(1, 1, kBGRA_8888_SkColorType, kUnpremul_SkAlphaType, sRGB);
bitmap.readPixels(dstInfo, &dst, dstInfo.minRowBytes(), x, y);
It effectively asks the bitmap to give the pixel value converted to BGRA_8888 (which becomes ARGB because of different native and java endianness).
Bitmap_copyPixelsToBuffer in its turn just copies raw data:
memcpy(abp.pointer(), src, bitmap.computeByteSize());
And does not have any conversion. It basically returns the data in the same format it uses to store it. Let's find out what this inner format is.
Bitmap_creator is used to create a new bitmap and it gets the format from the config passed by calling
SkColorType colorType = GraphicsJNI::legacyBitmapConfigToColorType(configHandle);
Looking at the legacyBitmapConfigToColorType implementation, ARGB_8888 (which has index 5) becomes kN32_SkColorType.
kN32_SkColorType is from skia library, so looking at the definitions find the comment
kN32_SkColorType is an alias for whichever 32bit ARGB format is the
"native" form for skia's blitters. Use this if you don't have a swizzle
preference for 32bit pixels.
and below is the definition:
#if SK_PMCOLOR_BYTE_ORDER(B,G,R,A)
kN32_SkColorType = kBGRA_8888_SkColorType,
#elif SK_PMCOLOR_BYTE_ORDER(R,G,B,A)
kN32_SkColorType = kRGBA_8888_SkColorType,
SK_PMCOLOR_BYTE_ORDER is defined here and it says SK_PMCOLOR_BYTE_ORDER(R,G,B,A) will be true on a little endian machine, which is our case. So it means the bitmap is stored in kRGBA_8888_SkColorType format internally.
Tensorflow lite gpu delegate documentation provides a faster method for running tflite inference using Opengl and SSBO in Android[3]. The documentation provides sample code to create and bind a SSBO with a
image already in GPU. How can we copy or convert an image from android live camera and copy it to SSBO using OpenGL shader code? When we just dump CPU memory to a SSBO the performance becomes worse compared to the
normal gpu delegate execution. So what is the proper or most efficient way to pass camera image to SSBO so as to make the tflite inference faster?
In the following code we have tried to convert the camera frame to bitmap
and then convert it to texture and finally copy it to SSBO. However this method is compratively slower than normal GPU delegate execution pipeline (where data is copied from CPU to GPU -overhead). The aim is to reduce the
CPU to GPU copying of image data by making the image data availabel in GPU memory and then passing it to the model.
We are able to run the model[1] at 40-50 ms using standard GPU delegate inference mechanism; whereas it takes 90-100 ms
using the aforesaid SSBO method (unoptimized). The above timing refers to
the time for running interpreter.run() method in tensorflow lite.
Also it looks like this SSBO mechanism only works with OpenGL ES 3.1 or higher.
The ideal use case (as suggested by tensorflow) is the following[2]:
You get the camera input in the form of a surface texture.
Create an OpenGL shader storage buffer object (SSBO).
Use GPUDelegate.bindGlBufferToTensor() to associate that SSBO with the input tensor.
Write a small shader program to dump surface texture of [1] into that SSBO of [2] efficiently.
Run inference.
We are able to get camera frames as raw bytes or convert it into texture and even render it to GLSurface View.
But we are uanble to acheive the speedup as suggetsed by tensorflow.
https://github.com/tensorflow/tensorflow/issues/26297
https://github.com/tensorflow/tensorflow/issues/25657#issuecomment-466489248
https://www.tensorflow.org/lite/performance/gpu_advanced#android_2
Android Code:
public int[] initializeShaderBuffer(){
android.opengl.EGLContext eglContext = eglGetCurrentContext();
int[] id = new int[1];
GLES31.glGenBuffers(id.length, id, 0);
GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, id[0]);
GLES31.glBufferData(GL_SHADER_STORAGE_BUFFER, 257*257*3*4, null, GLES31.GL_STREAM_COPY);
GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);// unbind
return id;
}
#Override
public void onSurfaceCreated(GL10 glUnused, EGLConfig config) {
.....
.....
mTextureDataHandle0 = TextureHelper.loadTexture(mActivityContext,
R.drawable.srcim);//No error
}
#Override
public void onDrawFrame(GL10 glUnused) {
int inputSsboId = initializeShaderBuffer()[0];
interpreter = new Interpreter(GLActivity.tfliteModel);
Tensor inputTensor = interpreter.getInputTensor(0);
GpuDelegate gpuDelegate = new GpuDelegate();
gpuDelegate.bindGlBufferToTensor(inputTensor, inputSsboId);
interpreter.modifyGraphWithDelegate(gpuDelegate);
final int computeShaderHandle = ShaderHelper.compileShader(
GLES31.GL_COMPUTE_SHADER, fragmentShader);//No error
mProgramHandle = ShaderHelper.createAndLinkProgram(vertexShaderHandle,
computeShaderHandle);//No error
mTextureUniformHandle0 = GLES31.glGetUniformLocation(mProgramHandle,
"u_Texture0");
/**
* First texture map
*/
// Set the active texture0 unit to texture unit 0.
GLES31.glActiveTexture(GLES31.GL_TEXTURE0 );
// Bind the texture to this unit.
GLES31.glBindTexture(GLES31.GL_TEXTURE_2D, mTextureDataHandle0);
// Tell the texture uniform sampler to use this texture in the shader by
// binding to texture unit 0.
GLES31.glUniform1i(mTextureUniformHandle0, 0);
GLES31.glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 1, inputSsboId, 0, 257*257*3*4);
GLES31.glUseProgram(mProgramHandle);
if(compute==1)//Always set to 1
GLES31.glDispatchCompute(16,16,1);
GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0); // unbind
GLES31.glBindTexture(GLES31.GL_TEXTURE_2D, 0); // unbind
//Tflite code ...
byte [][] outputArray = new byte [1][66049];//size based on model output
Log.d("GPU_CALL_RUN","DONE");
long oms1=System.currentTimeMillis();
interpreter.run(null,outputArray);
long cms1=System.currentTimeMillis();
Log.d("TIME_RUN_MODEL",""+(cms1-oms1));
Log.d("OUTVAL", Arrays.deepToString(outputArray));
}
Compute Shader :-
#version 310 es
layout(local_size_x = 16, local_size_y = 16) in;
layout(binding = 0) uniform sampler2D u_Texture0;
layout(std430) buffer;
layout(binding = 1) buffer Output { float elements[]; } output_data;
void main() {
ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
//if (gid.x >= 257 || gid.y >= 257) return;
vec3 pixel = texelFetch(u_Texture0, gid, 0).xyz;
int linear_index = 3 * (gid.y * 257 + gid.x);
output_data.elements[linear_index + 0] = pixel.x;
output_data.elements[linear_index + 1] = pixel.y;
output_data.elements[linear_index + 2] = pixel.z;
}
There is no simple way to dump SurfaceTexture to SSBO directly. The simplest path would be SurfaceTexture -> GlTexture -> SSBO. TFLite GPU team is also trying to introduce another API (bindGlTextureToTensor), but until that is there, here is a shader program I used for GlTexutre -> SSBO conversion:
#version 310 es
layout(local_size_x = 16, local_size_y = 16) in;
layout(binding = 0) uniform sampler2D input_texture;
layout(std430) buffer;
layout(binding = 1) buffer Output { float elements[]; } output_data;
void main() {
ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
if (gid.x >= 224 || gid.y >= 224) return;
vec3 pixel = texelFetch(input_texture, gid, 0).xyz;
int linear_index = 3 * (gid.y * 224 + gid.x);
output_data.elements[linear_index + 0] = pixel.x;
output_data.elements[linear_index + 1] = pixel.y;
output_data.elements[linear_index + 2] = pixel.z;
}
Note that this was for MobileNet v1 of input tensor size 224x224x3.
I'm using Camera 2 API to save JPEG images on disk. I currently have 3-4 fps on my Nexus 5X, I'd like to improve it to 20-30. Is it possible?
Changing the image format to YUV I manage to generate 30 fps. Is it possible to save them at this frame-rate, or should I give up and live with my 3-4 fps?
Obviously I can share code if needed, but if everyone agree that it's not possible, I'll just give up. Using the NDK (with libjpeg for instance) is an option (but obviously I'd prefer to avoid it...).
Thanks
EDIT: here is how I convert the YUV android.media.Image to a single byte[]:
private byte[] toByteArray(Image image, File destination) {
ByteBuffer buffer0 = image.getPlanes()[0].getBuffer();
ByteBuffer buffer2 = image.getPlanes()[2].getBuffer();
int buffer0_size = buffer0.remaining();
int buffer2_size = buffer2.remaining();
byte[] bytes = new byte[buffer0_size + buffer2_size];
buffer0.get(bytes, 0, buffer0_size);
buffer2.get(bytes, buffer0_size, buffer2_size);
return bytes;
}
EDIT 2: another method I found to convert the YUV image into a byte[]:
private byte[] toByteArray(Image image, File destination) {
Image.Plane yPlane = image.getPlanes()[0];
Image.Plane uPlane = image.getPlanes()[1];
Image.Plane vPlane = image.getPlanes()[2];
int ySize = yPlane.getBuffer().remaining();
// be aware that this size does not include the padding at the end, if there is any
// (e.g. if pixel stride is 2 the size is ySize / 2 - 1)
int uSize = uPlane.getBuffer().remaining();
int vSize = vPlane.getBuffer().remaining();
byte[] data = new byte[ySize + (ySize/2)];
yPlane.getBuffer().get(data, 0, ySize);
ByteBuffer ub = uPlane.getBuffer();
ByteBuffer vb = vPlane.getBuffer();
int uvPixelStride = uPlane.getPixelStride(); //stride guaranteed to be the same for u and v planes
if (uvPixelStride == 1) {
uPlane.getBuffer().get(data, ySize, uSize);
vPlane.getBuffer().get(data, ySize + uSize, vSize);
}
else {
// if pixel stride is 2 there is padding between each pixel
// converting it to NV21 by filling the gaps of the v plane with the u values
vb.get(data, ySize, vSize);
for (int i = 0; i < uSize; i += 2) {
data[ySize + i + 1] = ub.get(i);
}
}
return data;
}
The dedicated JPEG encoder units on mobile phones are efficient, but not generally optimized for throughput. (Historically, users took one photo every second or two). At full resolution, the 5X's camera pipeline will not generate JPEGs at faster than a few FPS.
If you need higher rates, you need to capture in uncompressed YUV. As mentioned by CommonsWare, there's not enough disk bandwidth to stream full-resolution uncompressed YUV to disk, so you can only hold on to some number of frames before you run out of memory.
You can use libjpeg-turbo or some other high-efficiency JPEG encoder and see how many frames per second you can compress yourself - this may be higher than the hardware JPEG unit. The simplest way to maximize the rate is to capture YUV at 30fps, and run some number of JPEG encoding threads in parallel. For maximum speed, you'll want to hand-write the code talking to the JPEG encoder, because your source data is YUV, not RGB, which most JPEG encoding interfaces tend to accept (even though typically the colorspace of an encoded JPEG is actually YUV as well).
Whenever an encoder thread finishes the previous frame, it can grab the next frame that comes from the camera (you can maintain a small circular buffer of the latest YUV Images to make this simpler).
I try to save, through JNI, the output of the camera modified by OpenGL ES 2 on my tablet.
To achieve this, I use the libjpeg library compiled by the NDK-r8b.
I use the following code:
In the rendering function:
renderImage();
if (iIsPictureRequired)
{
savePicture();
iIsPictureRequired=false;
}
The saving procedure:
bool Image::savePicture()
{
bool l_res =false;
char p_filename[]={"/sdcard/Pictures/testPic.jpg"};
// Allocates the image buffer (RGBA)
int l_size = iWidth*iHeight*4*sizeof(GLubyte);
GLubyte *l_image = (GLubyte*)malloc(l_size);
if (l_image==NULL)
{
LOGE("Image::savePicture:could not allocate %d bytes",l_size);
return l_res;
}
// Reads pixels from the color buffer (byte-aligned)
glPixelStorei(GL_PACK_ALIGNMENT, 1);
checkGlError("glPixelStorei");
// Saves the pixel buffer
glReadPixels(0,0,iWidth,iHeight,GL_RGBA,GL_UNSIGNED_BYTE,l_image);
checkGlError("glReadPixels");
// Stores the file
FILE* l_file = fopen(p_filename, "wb");
if (l_file==NULL)
{
LOGE("Image::savePicture:could not create %s:errno=%d",p_filename,errno);
free(l_image);
return l_res;
}
// JPEG structures
struct jpeg_compress_struct cinfo;
struct jpeg_error_mgr jerr;
cinfo.err = jpeg_std_error(&jerr);
jerr.trace_level = 10;
jpeg_create_compress(&cinfo);
jpeg_stdio_dest(&cinfo, l_file);
cinfo.image_width = iWidth;
cinfo.image_height = iHeight;
cinfo.input_components = 3;
cinfo.in_color_space = JCS_RGB;
jpeg_set_defaults(&cinfo);
// Image quality [0..100]
jpeg_set_quality (&cinfo, 70, true);
jpeg_start_compress(&cinfo, true);
// Saves the buffer
JSAMPROW row_pointer[1]; // pointer to a single row
// JPEG stores the image from top to bottom (OpenGL does the opposite)
while (cinfo.next_scanline < cinfo.image_height)
{
row_pointer[0] = (JSAMPROW)&l_image[(cinfo.image_height-1-cinfo.next_scanline)* (cinfo.input_components)*iWidth];
jpeg_write_scanlines(&cinfo, row_pointer, 1);
}
// End of the process
jpeg_finish_compress(&cinfo);
fclose(l_file);
free(l_image);
l_res =true;
return l_res;
}
The display is correct but the generated JPEG seems tripled and overlap from left to right.
What did I do wrong ?
It appears that the internal format of the jpeg lib and the canvas do not match. Other appears to read/encode with RGBRGBRGB, other with RGBARGBARGBA.
You might be able to rearrange the image data, if everything else fails...
char *dst_ptr = l_image; char *src_ptr = l_image;
for (i=0;i<width*height;i++) { *dst_ptr++=*src_ptr++;
*dst_ptr++=*src_ptr++; *dst_ptr++=*src_ptr++; src_ptr++; }
EDIT: now that the cause is verified, there might be even simpler modification.
You might be able to get data from gl pixel buffer in the correct format:
int l_size = iWidth*iHeight*3*sizeof(GLubyte);
...
glReadPixels(0,0,iWidth,iHeight,GL_RGB,GL_UNSIGNED_BYTE,l_image);
And one more piece of warning: if this compiles, but the output is tilted, then it means that your screen width is not a multiple of 4, but that opengl wants to start each new row at dword boundary. But in that case there's a good opportunity of a crash, because in that case the l_size should have been 1,2 or 3 bytes larger than expected.
I am trying to access the raw data of a Bitmap in ARGB_8888 format on Android, using the copyPixelsToBuffer and copyPixelsFromBuffer methods. However, invocation of those calls seems to always apply the alpha channel to the rgb channels. I need the raw data in a byte[] or similar (to pass through JNI; yes, I know about bitmap.h in Android 2.2, cannot use that).
Here is a sample:
// Create 1x1 Bitmap with alpha channel, 8 bits per channel
Bitmap one = Bitmap.createBitmap(1,1,Bitmap.Config.ARGB_8888);
one.setPixel(0,0,0xef234567);
Log.v("?","hasAlpha() = "+Boolean.toString(one.hasAlpha()));
Log.v("?","pixel before = "+Integer.toHexString(one.getPixel(0,0)));
// Copy Bitmap to buffer
byte[] store = new byte[4];
ByteBuffer buffer = ByteBuffer.wrap(store);
one.copyPixelsToBuffer(buffer);
// Change value of the pixel
int value=buffer.getInt(0);
Log.v("?", "value before = "+Integer.toHexString(value));
value = (value >> 8) | 0xffffff00;
buffer.putInt(0, value);
value=buffer.getInt(0);
Log.v("?", "value after = "+Integer.toHexString(value));
// Copy buffer back to Bitmap
buffer.position(0);
one.copyPixelsFromBuffer(buffer);
Log.v("?","pixel after = "+Integer.toHexString(one.getPixel(0,0)));
The log then shows
hasAlpha() = true
pixel before = ef234567
value before = 214161ef
value after = ffffff61
pixel after = 619e9e9e
I understand that the order of the argb channels is different; that's fine. But I don't
want the alpha channel to be applied upon every copy (which is what it seems to be doing).
Is this how copyPixelsToBuffer and copyPixelsFromBuffer are supposed to work? Is there any way to get the raw data in a byte[]?
Added in response to answer below:
Putting in buffer.order(ByteOrder.nativeOrder()); before the copyPixelsToBuffer does change the result, but still not in the way I want it:
pixel before = ef234567
value before = ef614121
value after = ffffff41
pixel after = ff41ffff
Seems to suffer from essentially the same problem (alpha being applied upon each copyPixelsFrom/ToBuffer).
One way to access data in Bitmap is to use getPixels() method. Below you can find an example I used to get grayscale image from argb data and then back from byte array to Bitmap (of course if you need rgb you reserve 3x bytes and save them all...):
/*Free to use licence by Sami Varjo (but nice if you retain this line)*/
public final class BitmapConverter {
private BitmapConverter(){};
/**
* Get grayscale data from argb image to byte array
*/
public static byte[] ARGB2Gray(Bitmap img)
{
int width = img.getWidth();
int height = img.getHeight();
int[] pixels = new int[height*width];
byte grayIm[] = new byte[height*width];
img.getPixels(pixels,0,width,0,0,width,height);
int pixel=0;
int count=width*height;
while(count-->0){
int inVal = pixels[pixel];
//Get the pixel channel values from int
double r = (double)( (inVal & 0x00ff0000)>>16 );
double g = (double)( (inVal & 0x0000ff00)>>8 );
double b = (double)( inVal & 0x000000ff) ;
grayIm[pixel++] = (byte)( 0.2989*r + 0.5870*g + 0.1140*b );
}
return grayIm;
}
/**
* Create a gray scale bitmap from byte array
*/
public static Bitmap gray2ARGB(byte[] data, int width, int height)
{
int count = height*width;
int[] outPix = new int[count];
int pixel=0;
while(count-->0){
int val = data[pixel] & 0xff; //convert byte to unsigned
outPix[pixel++] = 0xff000000 | val << 16 | val << 8 | val ;
}
Bitmap out = Bitmap.createBitmap(outPix,0,width,width, height, Bitmap.Config.ARGB_8888);
return out;
}
}
My guess is that this might have to do with the byte order of the ByteBuffer you are using. ByteBuffer uses big endian by default.
Set endianess on the buffer with
buffer.order(ByteOrder.nativeOrder());
See if it helps.
Moreover, copyPixelsFromBuffer/copyPixelsToBuffer does not change the pixel data in any way. They are copied raw.
I realize this is very stale and probably won't help you now, but I came across this recently in trying to get copyPixelsFromBuffer to work in my app. (Thank you for asking this question, btw! You saved me tons of time in debugging.) I'm adding this answer in the hopes it helps others like me going forward...
Although I haven't used this yet to ensure that it works, it looks like that, as of API Level 19, we'll finally have a way to specify not to "apply the alpha" (a.k.a. premultiply) within Bitmap. They're adding a setPremultiplied(boolean) method that should help in situations like this going forward by allowing us to specify false.
I hope this helps!
This is an old question, but i got to the same issue, and just figured out that the bitmap byte are pre-multiplied, you can set the bitmap (as of API 19) to not pre-multiply the buffer, but in the API they make no guarantee.
From the docs:
public final void setPremultiplied(boolean premultiplied)
Sets whether the bitmap should treat its data as pre-multiplied.
Bitmaps are always treated as pre-multiplied by the view system and Canvas for performance reasons. Storing un-pre-multiplied data in a Bitmap (through setPixel, setPixels, or BitmapFactory.Options.inPremultiplied) can lead to incorrect blending if drawn by the framework.
This method will not affect the behaviour of a bitmap without an alpha channel, or if hasAlpha() returns false.
Calling createBitmap or createScaledBitmap with a source Bitmap whose colors are not pre-multiplied may result in a RuntimeException, since those functions require drawing the source, which is not supported for un-pre-multiplied Bitmaps.