On Android, I'm trying to perform some OpenGL processing on camera frames, show those frames in the camera preview and then encode the frames in a video file. I'm trying to do this with OpenGL, using the GLSurfaceView and GLSurfaceView.Renderer and with FFMPEG for video encoding.
I've successfully processed the image frames using a shader. Now I need to encode the processed frames to video. The GLSurfaceView.Renderer provides the onDrawFrame(GL10 ..) method. It's in this method that I'm attempting to read the image frames using just glReadPixels() and then place the frames on a queue for encoding to video. On it's own, glReadPixels() is much too slow - my frame rate is in the single digits. I'm attempting to speed this up using Pixel Buffer Objects. This is not working. After plugging in the pbo, the frame rate is unchanged. This is my first time using OpenGL and I do not know where to begin looking for the problem. Am I doing this right? Can anyone give me some direction? Thanks in advance.
public class MainRenderer implements GLSurfaceView.Renderer, SurfaceTexture.OnFrameAvailableListener {
.
.
public void onDrawFrame ( GL10 gl10 ) {
//Create a buffer to hold the image frame
ByteBuffer byte_buffer = ByteBuffer.allocateDirect(this.width * this.height * 4);
byte_buffer.order(ByteOrder.nativeOrder());
//Generate a pointer to the frame buffers
IntBuffer image_buffers = IntBuffer.allocate(1);
GLES20.glGenBuffers(1, image_buffers);
//Create the buffer
GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, image_buffers.get(0));
GLES20.glBufferData(GLES20.GL_ARRAY_BUFFER, byte_buffer.limit(), byte_buffer, GLES20.GL_STATIC_DRAW);
GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, image_buffers.get(0));
//Read the pixel data into the buffer
gl10.glReadPixels(0, 0, this.width, this.height, GL10.GL_RGBA, GL10.GL_UNSIGNED_BYTE, byte_buffer);
//encode the frame to video
enQueueForEncoding(byte_buffer);
//unbind the buffer
GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, 0);
}
.
.
}
I have never tried something like that before (opengl+video enconding) but I can tell you that reading from device memory is SLOW. Try double buffering, this may help since the GPU can keep rendering to the second buffer while the DMA controller reads back stuff.
Load a profiler (check your devices' GPU vendor), this may give you some idea. Another thing that may help is setting internal pbuffer format to something else, try lower numbers and dropping a channel (alpha).
EDIT: If you feel like that, you can encode the video at the GPU, that's going to boost, memory and processing wise, your application.
As I remember glBufferData() is not mapping your internal buffer onto GPU memory, it just copies data from your memory into the buffer (initializes).
To get access to the memory, which is allocated by glBufferData(), you should use glMapBufferRange(). That function returns a Java Buffer object which you can read.
Related
I have a FloatBuffer as an output from the neural network, where the RGB channels are encoded with [-1 .. +1] values. I would like to render them on-screen, using GLSurfaceView. What is the best way to handle it?
I can dump the buffer into SSBO and write a compute shader, which maps it to ByteBuffer of [0 .. 255] range, then somehow bind it to regular texture. Or maybe I can set up my compute shader to output directly to some texture buffer? Or maybe I am supposed to read my SSBO directly from the fragment shader (and implement my own linear interpolation)?
So, which is the best way to render stuff via OpenGL ES? Please, help.
You can try to load it with but it depends how many update you need per seconde. That is to test with you machine.
First Bind your texture (you must create one) then when your input buffer is ready use
GLES20.glTexSubImage2D(GLES20.GL_TEXTURE_2D, 0, 0, 0, width, height, GLES30.GL_RGB, GLES20.GL_FLOAT, InputFloatBuffer);
It work well with ByteBuffer and i did not try with Float but there is no Signed_float format.
Use a kernel to change the signed float to byte.
I want to take screenshot of current frame in OpenGL for further processing and I'm trying to improve the performance of glReadPixels by using PBO to asynchronously read framebuffers.
I'm under the impression that glReadPixels after GL_PIXEL_PACK_BUFFER is bound to buffer should return immediately, but it actually takes similar or even more time than not using PBO.
Here are samples of my codes:
// Setup PBO
GLES30.glGenBuffers(nPbo, pboIndex, 0);
for(int i=0;i<nPbo; i++){
GLES30.glBindBuffer (GL_PIXEL_PACK_BUFFER, pboIndex[i]);
GLES30.glBufferData(GL_PIXEL_PACK_BUFFER, size, null,GL_STREAM_READ);
}
GLES30.glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
......
// For each frame, trigger async transfer of framebuffer to PBO.
// Note that I don't even map the PBO to memory yet
GLES30.glBindBuffer (GL_PIXEL_PACK_BUFFER, pboIndex[index]);
// The following is a JNI method to overload glReadPixels in GLES20.glReadPixels,
// to allow passing int offset to the last param in order to use PBO,
// and slowdown (around 500ms on my device) happens here
GLES3PBOReadPixelsFix.glReadPixelsPBO(0, 0, mWidth, mHeight, GLES30.GL_RGBA, GLES30.GL_UNSIGNED_BYTE, 0);
GLES30.glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
Based on this article, the cause of the slowdown could be due to conversion between internal format, which may be GL_BGRA, and pixel transfer format, which is GL_RGBA in my code. Changing the transfer format to GL_RGB will reduce the latency of glReadPixels to around 100ms, but when I map the buffer with GLES30.glMapBufferRange the output frame doesn't look rendered correctly. I also tried the GL_BGRA format in GLES11Ext but it will cause GL_INVALID_OPERATION in glReadPixel.
Is there any other way to make glReadPixels on Android return immediately so that PBO can improve performance?
As Reto has suggested, it turns out to be an implementation specific issue. The GPU that I was originally testing with is Adreno 306. When I test the same codes on Samsung Note 4 (Adreno 420), it works as expected. So it's always worthwhile to test on different devices and GPUs for such types of issues.
I am trying to apply face detection on camera preview frames. I am using OpenGL and OpenCV to process these camera frames at run-time.
#Override
public void onDrawFrame(GL10 unused) {
if (VERBOSE) {
Log.d(TAG, "onDrawFrame tex=" + mTextureId);
}
mSurfaceTexture.updateTexImage();
mSurfaceTexture.getTransformMatrix(mSTMatrix);
// TODO: need to implement
//JniCppManager.processFrame();
drawFrame(mTextureId, mSTMatrix);
}
I am trying to implement a c++ implementation of processFrame(). How can I get a Mat object in c++ from transformation matrix? Could anyone provide me some pointers to the solution.
Your pipeline is currently:
Camera (produces frame)
SurfaceTexture (receives frame, converts to GLES "external" texture)
[missing stuff]
Array of RGB bytes passed to C++
What you need to do for [missing stuff] is render the pixels to an off-screen pbuffer and read them back with glReadPixels(). You can do this from code written in Java or native; for the former you'd want to read them into a "direct" ByteBuffer so you can easily access the pixels from native code. The EGL context used by GLES is held in thread-local storage, so the native code running on the GLSurfaceView render thread will be able to access it.
An example of this can be found in the bigflake ExtractMpegFramesTest, which differs primarily in that it's grabbing frames from a video rather than a Camera.
For API 19+, if you can process frames in YV12 or NV21 rather than RGB, you can feed the Camera to an ImageReader and get access to the data without having to copy/convert it.
I'm using RenderScript and Allocation to obtain YUV_420_888 frames from the Android Camera2 API, but once I copy the byte[] from the Allocation I receive only the Y plane from the 3 planes which compose the frame, while the U and V planes values are set to 0 in the byte[]. I'm trying to mimic the onPreviewframe from the previos camera API in order to perform in app processing of the camera frames. My Allocation is created like:
Type.Builder yuvTypeBuilderIn = new Type.Builder(rs, Element.YUV(rs));
yuvTypeBuilderIn.setX(dimensions.getWidth());
yuvTypeBuilderIn.setY(dimensions.getHeight());
yuvTypeBuilderIn.setYuvFormat(ImageFormat.YUV_420_888);
allocation = Allocation.createTyped(rs, yuvTypeBuilderIn.create(),
Allocation.USAGE_IO_INPUT | Allocation.USAGE_SCRIPT);
while my script looks like:
#pragma version(1)
#pragma rs java_package_name(my_package)
#pragma rs_fp_relaxed
rs_allocation my_frame;
The Android sample app HdrViewfinderDemo uses RenderScript to process YUV data from camera2.
https://github.com/googlesamples/android-HdrViewfinder
Specifically, the ViewfinderProcessor sets up the Allocations, and hdr_merge.rs reads from them.
Yes I did it, since I couldn't find anything useful. But I didn't go the proposed way of defining an allocation to the surface. Instead I just converted the output of the three image planes to RGB. The reason for this approach is that I use the YUV420_888 data twofold. First on a high frequency basis just the intensity values (Y). Second, I need to make some color Bitmaps too. Thus, the following solution. The script takes about 80ms for a 1280x720 YUV_420_888 image, maybe not ultra fast, but ok for my purpose.
UPDATE: I deleted the code here, since I wrote a more general solution here YUV_420_888 -> Bitmap conversion that takes into account pixelStride and rowStride too.
I think that you can use an ImageReader to get the frames of you camera into YUV_420_888
reader = ImageReader.newInstance(previewSize.getWidth(), previewSize.getHeight(), ImageFormat.YUV_420_888, 2);
Then you set an OnImageAvailableListener to the reader :
reader.setOnImageAvailableListener(new ImageReader.OnImageAvailableListener() {
#Override
public void onImageAvailable(ImageReader reader) {
int jump = 4; //Le nombre d'image à sauter avant d'en traiter une, pour liberer de la mémoire
Image readImage = reader.acquireNextImage();
readImage.getPlane[0] // The Y plane
readImage.getPlane[1] //The U plane
readImage.getPlane[2] //The V plane
readImage.close();
}
}, null);
Hope that will help you
I'm using almost the same method as widea in their answer.
The exception you keep getting after ~50 frames might be due to the fact that you're processing all the frames by using acquireNextImage. The documentation suggest to:
Warning: Consider using acquireLatestImage() instead, as it will automatically release older images, and allow slower-running processing routines to catch up to the newest frame. [..]
So in case your exception is a IllegalStateException, switching to acquireLatestImage might help.
And make sure you call close() on all images retrieved from ImageReader.
this is how i am made an array of triangle
float[] tableVerticesWithTriangle = {
// triangle 1
0f, 0f, 9f, 14f, 0f, 14f,
// triangle 2
0f, 0f, 9f, 0f, 9f, 14f
};
and this is how i have allocated the block in native environment
vertexData = ByteBuffer
.allocateDirect(
tableVerticesWithTriangle.length * BYTES_PER_FLOAT)
.order(ByteOrder.nativeOrder()).asFloatBuffer();
vertexData.put(tableVerticesWithTriangle);
The reason people use ByteBuffer.allocateDirect() is that other buffer classes, like FloatBuffer, do not have an allocateDirect() method. Only ByteBuffer can be allocated as a direct buffer. So allocating a ByteBuffer, and then using the memory as a FloatBuffer, is the only way to get a directly allocated FloatBuffer.
What is a direct buffer?
The documentation of isDirect() of the FloatBuffer class explains it like this:
Indicates whether this buffer is direct. A direct buffer will try its best to take advantage of native memory APIs and it may not stay in the Java heap, so it is not affected by garbage collection.
A float buffer is direct if it is based on a byte buffer and the byte buffer is direct.
In other (less formal) words, a native buffer is a native memory allocation that Java is not messing with.
When are direct buffers required?
Strangely enough, I have never been able to find clear documentation for this. So the following is a hypothesis that I confirmed with experiments, without finding any counter-examples so far.
Direct buffers have to be used when a buffer is passed to an OpenGL API where the memory is used by the OpenGL implementation after the call returns.
There is only one example of this I could find: client side vertex arrays (which BTW are marked as a legacy feature in ES 3.0, but still supported). This is the glVertexAttribPointer() call with the following signature, which supports vertex arrays without the use of VBOs:
glVertexAttribPointer(int indx, int size, int type, boolean normalized,
int stride, Buffer ptr)
In this case, OpenGL will pull vertex data from the buffer in later draw calls, so the buffer content has to remain accessible to OpenGL after the call returns, and will potentially be read directly by the GPU.
In all other cases (again according to my hypothesis), it is not necessary to use direct buffers. You can for example do the following:
float[] vertexData = {...};
GLES20.glBufferData(GL_ARRAY_BUFFER, vertexData.length * 4,
FloatBuffer.wrap(vertexData), GLES20.GL_STATIC_DRAW);
The glBufferData() call consumes the data during the call, and the original buffer can not be accessed by OpenGL after the call returns. Therefore, it is not necessary to use a direct buffer.