PBO on Android doesn't improve glReadPixels performance

PBO on Android doesn't improve glReadPixels performance - android

I want to take screenshot of current frame in OpenGL for further processing and I'm trying to improve the performance of glReadPixels by using PBO to asynchronously read framebuffers.
I'm under the impression that glReadPixels after GL_PIXEL_PACK_BUFFER is bound to buffer should return immediately, but it actually takes similar or even more time than not using PBO.
Here are samples of my codes:
// Setup PBO
GLES30.glGenBuffers(nPbo, pboIndex, 0);
for(int i=0;i<nPbo; i++){
GLES30.glBindBuffer (GL_PIXEL_PACK_BUFFER, pboIndex[i]);
GLES30.glBufferData(GL_PIXEL_PACK_BUFFER, size, null,GL_STREAM_READ);
}
GLES30.glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
......
// For each frame, trigger async transfer of framebuffer to PBO.
// Note that I don't even map the PBO to memory yet
GLES30.glBindBuffer (GL_PIXEL_PACK_BUFFER, pboIndex[index]);
// The following is a JNI method to overload glReadPixels in GLES20.glReadPixels,
// to allow passing int offset to the last param in order to use PBO,
// and slowdown (around 500ms on my device) happens here
GLES3PBOReadPixelsFix.glReadPixelsPBO(0, 0, mWidth, mHeight, GLES30.GL_RGBA, GLES30.GL_UNSIGNED_BYTE, 0);
GLES30.glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
Based on this article, the cause of the slowdown could be due to conversion between internal format, which may be GL_BGRA, and pixel transfer format, which is GL_RGBA in my code. Changing the transfer format to GL_RGB will reduce the latency of glReadPixels to around 100ms, but when I map the buffer with GLES30.glMapBufferRange the output frame doesn't look rendered correctly. I also tried the GL_BGRA format in GLES11Ext but it will cause GL_INVALID_OPERATION in glReadPixel.
Is there any other way to make glReadPixels on Android return immediately so that PBO can improve performance?

As Reto has suggested, it turns out to be an implementation specific issue. The GPU that I was originally testing with is Adreno 306. When I test the same codes on Samsung Note 4 (Adreno 420), it works as expected. So it's always worthwhile to test on different devices and GPUs for such types of issues.

Related

glReadPixels to EGLImage direct texture slower than glReadPixels to ByteBuffer and glTexSubImage2D?

I have an Android OpenGL-ES application featuring two threads. Call Thread 1 the "display thread" which "blends" its current texture with a texture emanating from Thread 2 a.k.a the "worker thread". Thread 2 performs off-screen rendering (render to texture), and then Thread 1 combines this texture with it's own texture to generate the frame which is displayed to the user.
I have a working solution but I know it is inefficient and am trying to improve upon it. In it's OnSurfaceCreated() method, Thread 1 creates two textures. Thread 2, in it's draw method, does a glReadPixels() into a ByteBuffer (let's refer to it as bb). Thread 2 then signals to Thread 1 that a new frame is ready, at which point Thread 1 invokes glTexSubImage2D(bb) to update it's texture with the new data from Thread 2, and proceed with it's "blending" in order to generate a new frame.
This architecture works better on some Android devices than others, and I have been able to garner a slight improvement in performance by using PBOs. But I figured that by using so-called "direct textures" via the EGL Image extension (https://software.intel.com/en-us/articles/using-opengl-es-to-accelerate-apps-with-legacy-2d-guis) I would gain some benefit by removing the need for the costly glTexSubImage2D() call. Yes, I'd still have the glReadPixels() call which bothers me still but at least I should measure some improvement. In fact, at least on a Samsung Galaxy Tab S (Mali T628 GPU) my new code is dramatically slower than before! How can this be?
In the new code Thread 1 instantiates the EGLImage object by using gralloc and proceeds to bind it to a texture:
// note gbuffer::create() is a wrapper around gralloc
buffer = gbuffer::create(width, height, gbuffer::FORMAT_RGBA_8888);
EGLClientBuffer anb = buffer->getNativeBuffer();
EGLImageKHR pEGLImage = _eglCreateImageKHR(eglGetCurrentDisplay(), EGL_NO_CONTEXT, EGL_NATIVE_BUFFER_ANDROID, (EGLClientBuffer)anb, attrs);
glBindTexture(GL_TEXTURE_2D, texid); // texid from glGenTextures(...)
_glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, pEGLImage);
Then Thread 2 in it's main loop does it's off-screen render-to-texture stuff and essentially pushes the data back over to Thread 1 via glReadPixels() with the destination address as the backing storage behind the EGLImage:
void* vaddr = buffer->lock();
glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, vaddr);
buffer->unlock();
How can this be slower than glReadPixels() into a ByteBuffer followed by glTexSubImage2D from the aforementioned ByteBuffer? I'm also interested in alternative techniques as I am not limited to OpenGL-ES 2.0 and can use OpenGL-ES 3.0. I have tried FBOs but ran into some issues.
In response to the first answer I decided to take a stab at implementing a different approach. Namely, sharing the textures between Thread 1 and Thread 2. While I don't have the sharing part working yet, I do have Thread 1's EGLContext passed down to Thread 2's EGLContext so that in theory, Thread 2 can share textures with Thread 1. With these changes in place, and with the glReadPixels() and glTexSubImage2D() calls remaining, the app works but is far slower than before. Strange.
The other oddity I uncovered is the deal with the difference between javax.microedition.khronos.egl.EGLContext and android.opengl.EGLContext. GLSurfaceView exposes an interface method setEGLContextFactory() which allows me to pass Thread 1's EGLContext to Thread 2, as in the following:
public Thread1SurfaceView extends GLSurfaceView {
public Thread1SurfaceView(Context context) {
super(context);
// here is how I pass Thread 1's EGLContext to Thread 2
setEGLContextFactory(new EGLContextFactory() {
#Override
public javax.microedition.khronos.egl.EGLContext createContext(
final javax.microedition.khronos.egl.EGL10 egl,
final javax.microedition.khronos.egl.EGLDisplay display,
final javax.microedition.khronos.egl.EGLConfig eglConfig) {
// Configure context for OpenGL ES 3.0.
int[] attrib_list = {EGL14.EGL_CONTEXT_CLIENT_VERSION, 3, EGL14.EGL_NONE};
javax.microedition.khronos.egl.EGLContext renderContext =
egl.eglCreateContextdisplay, eglConfig, EGL10.EGL_NO_CONTEXT, attrib_list);
mThread2 = new Thread2(renderContext);
}
});
}
Previously, I used stuff out of the EGL14 namespace but since the interface for GLSurfaceView apparently relies on EGL10 stuff I had to change the implementation for Thread 2. Everywhere I used EGL14 I replaced with javax.microedition.khronos.egl.EGL10. Then my shaders stopped compiling until I added GLES3 to the attribute list. Now things work, albeit slower than before (but next I will remove the calls to glReadPixels and glTexSubImage2D).
My follow-on question is, is this the right way to handle the javax.microedition.khronos.egl.* versus android.opengl.* issue? Can I typecast javax.microedition.khronos.egl.EGL10 to android.opengl.EGL14, javax.microedition.khronos.egl.EGLDisplay to android.opengl.EGLDisplay, and javax.microedition.khronos.egl.EGLContext to android.opengl.EGLContext? What I have right now just seems ugly and doesn't feel right, although this proposed casting doesn't sit right either. Am I missing something?

Based on the description of what you're trying to do, both approaches sound way more complicated and inefficient than necessary.
The way I've always understood EGLImage, it's a mechanism for sharing images between different processes, and possibly different APIs.
For multiple OpenGL ES contexts in the same process, you can simply share the textures. All you need to do is make both contexts part of the same share group, and they can both use the same textures. In your use case, you can then have one thread rendering to a texture using an FBO, and then sample from it in the other thread. This way, there is no extra data copying, and your code should become much simpler.
The only slightly tricky aspect is synchronization. ES 2.0 does not have synchronization mechanisms in the API. The best you can probably do is call glFinish() in one thread (e.g. after your thread 2 finished rendering to the texture), and then use standard IPC mechanisms to signal the other thread. ES 3.0 has sync objects, which make this much more elegant.
My earlier answer here sketches some of the steps needed to create multiple contexts that are in the same share group: about opengles and texture on android. The key part of creating multiple contexts in the same share group is the 3rd argument to eglCreateContext, where you specify the context that you want to share objects with.

Display a stream of bitmaps at 60fps smoothly on Android 4.x

(This is due to the limitations of the server software I will be using, if I could change it, I would).
I am receiving a sequence of 720x480 JPEG files (about 6kb in size), over a socket. I have benchmarked the network, and have found that I am capable of receiving those JPEGs smoothly, at 60FPS.
My current drawing operation is on a Nexus 10 display of 2560x1600, and here's my decoding method, once I have received the byte array from the socket:
public static void decode(byte[] tmp, Long time) {
try {
BitmapFactory.Options options = new BitmapFactory.Options();
options.inPreferQualityOverSpeed = false;
options.inDither = false;
Bitmap bitmap = BitmapFactory.decodeByteArray(tmp, 0, tmp.length, options);
Bitmap background = Bitmap.createScaledBitmap
(bitmap, MainActivity.screenwidth, MainActivity.screenheight, false);
background.setHasAlpha(false);
Canvas canvas = MainActivity.surface.getHolder().lockCanvas();
canvas.drawColor(Color.BLACK);
canvas.drawBitmap(background, 0, 0, new Paint());
MainActivity.surface.getHolder().unlockCanvasAndPost(canvas);
} catch (Exception e) {
e.printStackTrace();
}
}
As you can see, I am clearing the canvas from a SurfaceView, and then drawing the Bitmap to the SurfaceView. My issue is that it is very, very, slow.
Some tests based on adding System.currentTimeMillis() before and after the lock operation result in approximately a 30ms difference between getting the canvas, drawing the bitmap, and then pushing the canvas back. The displayed SurfaceView is very laggy, sometimes it jumps back and forth, and the frame rate is terrible.
Is there a referred method for drawing like this? Again, I can't modify what I'm getting from the server, but I'd like the bitmaps to be displayed at 60FPS when possible.
(I've tried setting the contents of an ImageView, and am receiving similar results). I have no other code in the SurfaceView that could impact this. I have set the holder to the RGBA_8888 format:
getHolder().setFormat(PixelFormat.RGBA_8888);
Is it possible to convert this stream of Bitmaps into a VideoView? Would that be faster?
Thanks.

Whenever you run into performance questions, use Traceview to figure out exactly where your problem lies. Using System.currentTimeMillis() is like attempting to trim a steak with a hammer.
The #1 thing her is to get the bitmap decoding off the main application thread. Do that in a background thread. Your main application thread should just be drawing the bitmaps, pulling them off of a queue populated by that background thread. Android has the main application thread set to render on a 60fps basis as of Android 4.1 (a.k.a., "Project Butter"), so as long as you can draw your Bitmap in a couple of milliseconds, and assuming that your network and decoding can keep your queue current, you should get 60fps results.
Also, always use inBitmap with BitmapFactory.Options on Android 3.0+ when you have images of consistent size, as part of your problem will be GC stealing CPU time. Work off a pool of Bitmap objects that you rotate through, so that you generate less garbage and do not fragment your heap so much.
I suspect that you are better served letting Android scale the image for you in an ImageView (or just by drawing to a View canvas) than you are in having BitmapFactory scale the image, as Android can take advantage of hardware graphics acceleration for rendering, which BitmapFactory cannot. Again, Traceview is your friend here.
With regards to:
and have found that I am capable of receiving those JPEGs smoothly, at 60FPS.
that will only be true sometimes. Mobile devices tend to be mobile. Assuming that by "6kb" you mean 6KB (six kilobytes), you are assuming a ~3Mbps (three megabits per second) connection, and that's far from certain.
With regards to:
Is it possible to convert this stream of Bitmaps into a VideoView?
VideoView is a widget that plays videos, and you do not have a video.
Push come to shove, you might need to drop down to the NDK and do this in native code, though I would hope not.

Using Pixel Buffer Objects (PBO's) on Android

On Android, I'm trying to perform some OpenGL processing on camera frames, show those frames in the camera preview and then encode the frames in a video file. I'm trying to do this with OpenGL, using the GLSurfaceView and GLSurfaceView.Renderer and with FFMPEG for video encoding.
I've successfully processed the image frames using a shader. Now I need to encode the processed frames to video. The GLSurfaceView.Renderer provides the onDrawFrame(GL10 ..) method. It's in this method that I'm attempting to read the image frames using just glReadPixels() and then place the frames on a queue for encoding to video. On it's own, glReadPixels() is much too slow - my frame rate is in the single digits. I'm attempting to speed this up using Pixel Buffer Objects. This is not working. After plugging in the pbo, the frame rate is unchanged. This is my first time using OpenGL and I do not know where to begin looking for the problem. Am I doing this right? Can anyone give me some direction? Thanks in advance.
public class MainRenderer implements GLSurfaceView.Renderer, SurfaceTexture.OnFrameAvailableListener {
.
.
public void onDrawFrame ( GL10 gl10 ) {
//Create a buffer to hold the image frame
ByteBuffer byte_buffer = ByteBuffer.allocateDirect(this.width * this.height * 4);
byte_buffer.order(ByteOrder.nativeOrder());
//Generate a pointer to the frame buffers
IntBuffer image_buffers = IntBuffer.allocate(1);
GLES20.glGenBuffers(1, image_buffers);
//Create the buffer
GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, image_buffers.get(0));
GLES20.glBufferData(GLES20.GL_ARRAY_BUFFER, byte_buffer.limit(), byte_buffer, GLES20.GL_STATIC_DRAW);
GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, image_buffers.get(0));
//Read the pixel data into the buffer
gl10.glReadPixels(0, 0, this.width, this.height, GL10.GL_RGBA, GL10.GL_UNSIGNED_BYTE, byte_buffer);
//encode the frame to video
enQueueForEncoding(byte_buffer);
//unbind the buffer
GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, 0);
}
.
.
}

I have never tried something like that before (opengl+video enconding) but I can tell you that reading from device memory is SLOW. Try double buffering, this may help since the GPU can keep rendering to the second buffer while the DMA controller reads back stuff.
Load a profiler (check your devices' GPU vendor), this may give you some idea. Another thing that may help is setting internal pbuffer format to something else, try lower numbers and dropping a channel (alpha).
EDIT: If you feel like that, you can encode the video at the GPU, that's going to boost, memory and processing wise, your application.

As I remember glBufferData() is not mapping your internal buffer onto GPU memory, it just copies data from your memory into the buffer (initializes).
To get access to the memory, which is allocated by glBufferData(), you should use glMapBufferRange(). That function returns a Java Buffer object which you can read.

Handling large bitmaps on Android - int[] larger than max heap size

I'm using very large bitmaps and I store data in a big int[]. The images can be really large and I can't downsample them (I'm getting the bitmaps over the wire and rendering them).
The problem I'm hitting is on very large bitmaps (bitmap size = 64MB), where I try to allocate int array with size 16384000. I'm testing this on Samsung Galaxy SII, which should have enough memory to handle this, but it seems there is a "cap" on heap size. The method Runtime.getRuntime().maxMemory() returns 64MB, so this is the max heap size for this particular device.
The API level is set to 10, so I can't use android:largeHeap attribute suggested elsewhere (and I don't even know if that would help).
Is there any way to allocate more than 64MB? I tried allocating the array in native (using JNI NewIntArray function), but that fails as well. It seems as though it is bound by the same limit as jvm.
I could however allocate memory on the native side using NewDirectByteBuffer, but since this byte buffer is not backed by an array, I can not get access to int[] (using asIntBuffer().array() in java which I need in order to display the image using setPixels or createBitmap. I guess OpenGL would be a way to go, but I have (so far) 0 experience with OpenGL.
Is there a way to somehow access allocated memory as int[] that I am missing?

So, the only way I've found so far is to allocate image using NDK. Furthermore, since Bitmap does not use existing Buffer as pixel "storage" (the method copyPixelsFromBuffer is also bound to memory limits; and judging by the method name, it copies the data anyway).
The solution (I've only prototyped it roughly) is to malloc whatever the size of the image is, and fill it using c/c++ and then use ByteBuffer in Java with OpenGLES.
The current prototype creates a simple plane and applies this image as a texture to it (luckily, OpenGLES methods take Buffer as input, which seems to work as expected). I'm using glTexImage2D to apply this buffer as a texture to a plane. Here is a sample, where mImageData is ByteBuffer allocated (and filled) on the native side.
int[] textures = new int[1];
gl.glGenTextures(1, textures, 0);
mTextureId = textures[0];
gl.glBindTexture(GL10.GL_TEXTURE_2D, mTextureId);
gl.glTexParameterf(GL10.GL_TEXTURE_2D, GL10.GL_TEXTURE_WRAP_T, GL10.GL_REPEAT);
gl.glTexImage2D(GL10.GL_TEXTURE_2D, 0, GL10.GL_RGBA, 4000, 4096, 0, GL10.GL_RGBA, GL10.GL_UNSIGNED_BYTE, mImageData);

I assume OP already solved this, but if you have a Stream, you can stream the bitmap into a file and read that file using inSampleSize

Android OpenGL buffering and glFlush

So I'm making an Android 2.2 app that uses GLSurfaceView. My question is, since OpenGL tends to buffer commands, does that mean it requires associated memory (e.g. the bitmap in a call to glTexSubImage2D() ) to stick around until it is done? Or does it make itself a copy of any memory needed for buffered commands?
I ask as this code tends to cause a long stall and an eventual crash on hardware (HTC Desrire) but not on the emulator:
//bm is a Bitmap stored in a vector from previous commands
//pt is a Point stored at the same time
GLUtils.texSubImage2D(GL10.GL_TEXTURE_2D, 0, pt.x, pt.y, bm);
bm.recycle();
Now if I add glFlush() like so:
//bm is a Bitmap stored in a vector from previous commands
//pt is a Point stored at the same time
GLUtils.texSubImage2D(GL10.GL_TEXTURE_2D, 0, pt.x, pt.y, bm);
**AGL.glFlush();** //or glFinish
bm.recycle();
It appears to work great. Now is this an actual functionality for glFlush/glFinish, to prevent memory from being cleared out from underneath OpenGL?

Good question. With Textures, and Vertex Buffer Objects incidentally, once you have called the methods that actually load the image data for the Texture, you do not need to hold onto the buffer/array you have in client memory.
Don't mistake the rendering buffer with memory associated with Textures. The Texture graphics are saved in memory for the GPU, they are not part of the buffer that is being flushed to finally draw.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.