How to properly use glDiscardFramebufferEXT - android

This question relates to the OpenGL ES 2.0 Extension EXT_discard_framebuffer.
It is unclear to me which cases justify the use of this extension. If I call glDiscardFramebufferEXT() and it puts the specified attachable images in an undefined state this means that either:
- I don't care about the content anymore since it has been used with glReadPixels() already,
- I don't care about the content anymore since it has been used with glCopyTexSubImage() already,
- I shouldn't have made the render in the first place.
Clearly, only the 1st two cases make sense or are there other cases in which glDiscardFramebufferEXT() is useful? If yes, which are these cases?

glDiscardFramebufferEXT is a performance hint to the driver. Mobile GPUs use tile based deferred rendering. In that context setting any of your framebuffer to be discarded saves the gpu work and memory bandwith as it does not need to write it back to uniform memory.
Typically you will discard:
the depth buffer as it is not presented on screen. It is just used during rendering on the gpu.
the msaa buffer as it is resolved to a smaller buffer for presenting to screen.
Additionally any buffer that is just used for rendering on the GPU should be discarded so it is not written back to uniform memory.

The main situation where I've seen DiscardFramebuffer used is when you have a multi-sampled renderbuffer that you just resolved to a texture using BlitFramebuffer or ResolveMultisampleFramebufferAPPLE (on iOS) in which case you no longer care about the contents of the original buffer.

Related

OpenGL: Fetch depth on Tiled GPU with color-only fetch extension

I want to fetch depth from tile local memory with OpenGL ES.
That is needed to make soft particles effect and color-only deferred decals in a videogame.
ARM_shader_framebuffer_fetch_depth_stencil extension works great and gives direct access to depth value.
Now I want to achieve same result with ARM_shader_framebuffer_fetch and EXT_shader_framebuffer_fetch.
In GDC talk Bringing Fortnite to Mobile with Vulkan and OpenGL ES I see that one possible solution is writing depth value to alpha.
This approach doesn't work for me because of major precision loss, my alpha is 8 bits only.
I consider adding second attachment with enough precision and writing to it with MRT.
The question is: is MRT a way to go, or I miss some important trick?
Implementations that support ARM_shader_framebuffer_fetch do not guarantee to support MRT at all. If they do support it, then only the color of attachment zero can be retrieved. There are also some restrictions around color format; e.g. it only supports unorm color formats, so likely cannot have enough precision for depth even if you put the depth info in attachment zero.
Using EXT_shader_framebuffer_fetch is more generic and adds full MRT support, but not all tile-based GPUs support it.

Most efficient way of creating large textures at runtime in OpenGL ES for Android

I'm working on an Android app built in Unity3D that needs to create new textures at runtime every so often based off different images pixel data.
Since Unity for Android uses OpenGL ES and my app is a graphical one that needs to run at ideally a solid 60 frames per second, I've created a C++ plugin operating on OpenGL code instead of just using Unity's Texture2D slow texture construction. The plugin allows me to upload the pixel data to a new OpenGL texture, then let Unity know about it through their Texture2D's CreateExternalTexture() function.
Since the version of OpenGL ES running in this setup is unfortunately single-threaded, in order to keep things running in frame I do a glTexImage2D() call with an already gen'd TextureID but with null data in the first frame. And then call glTexSubImage2D() with a section of my buffer of pixel data, over multiple subsequent frames to fill out the whole texture, essentially doing the texture creation synchronously but chunking the operation up over multiple frames!
Now, the problem I'm having is that every time I create a new texture with large dimensions, that very first glTexImage2D() call will still lead to a frame-out, even though I'm putting null data into it. I'm guessing that the reason for this is that there is still a pretty large memory allocation going on in the background with that first glTexImage2D() call, even though I'm not filling in the image until later.
Unfortunately, these images that I'm creating textures for are of varying sizes that I don't know of beforehand and so I can't just create a bunch of textures up front on load, I need to specify a new width and height with each new texture every time. =(
Is there anyway I can avoid this memory allocation, maybe allocating a huge block of memory at the start and using it as a pool for new textures? I've read around and people seem to suggest using FBO's instead? I may have misunderstood but it seemed to me like you still need to do a glTexImage2D() call to allocate the texture before attaching it to the FBO?
Any and all advice is welcome, thanks in advance! =)
PS: I don't come from a Graphics background, so I'm not aware of best practices with OpenGL or other graphics libraries, I'm just trying to create new textures at runtime without framing out!
I haven't dealt with the specific problem you've faced, but I've found texture pools to be immensely useful in OpenGL in terms of getting efficient results without having to put much thought into it.
In my case the problem was that I can't use the same texture for an input to a deferred shader as the texture used to output the results. Yet I often wanted to do just that:
// Make the texture blurry.
blur(texture);
Yet instead I was having to create 11 different textures with varying resolutions and having to swap between them as inputs and outputs for horizontal/vertical blur shaders with FBOs to get a decent-looking blur. I never liked GPU programming very much because some of the most complex state management I've ever encountered was often there. It felt incredibly wrong that I needed to go to the drawing board just to figure out how to minimize the number of textures allocated due to this fundamental requirement that texture inputs for shaders cannot also be used as texture outputs.
So I created a texture pool and OMG, it simplified things so much! It made it so I could just create temporary texture objects left and right and not worry about it because the destroying the texture object doesn't actually call glDeleteTextures, it simply returns them to the pool. So I was able to finally be able to just do:
blur(texture);
... as I wanted all along. And for some funny reason, when I started using the pool more and more, it sped up frame rates. I guess even with all the thought I put into minimizing the number of textures being allocated, I was still allocating more than I needed in ways the pool eliminated (note that the actual real-world example does a whole lot more than blurs including DOF, bloom, hipass, lowpass, CMAA, etc, and the GLSL code is actually generated on the fly based on a visual programming language the users can use to create new shaders on the fly).
So I really recommend starting with exploring that idea. It sounds like it would be helpful for your problem. In my case I used this:
struct GlTextureDesc
{
...
};
... and it's a pretty hefty structure given how many texture parameters we can specify (pixel format, number of color components, LOD level, width, height, etc. etc.).
Yet the structure is comparable and hashable and ends up being used as a key in a hash table (like unordered_multimap) along with the actual texture handle as the value associated.
That allows us to then do this:
// Provides a pool of textures. This allows us to conveniently rapidly create,
// and destroy texture objects without allocating and freeing an excessive number
// of textures.
class GlTexturePool
{
public:
// Creates an empty pool.
GlTexturePool();
// Cleans up any textures which haven't been accessed in a while.
void cleanup();
// Allocates a texture with the specified properties, retrieving an existing
// one from the pool if available. The function returns a handle to the
// allocated texture.
GLuint allocate(const GlTextureDesc& desc);
// Returns the texture with the specified key and handle to the pool.
void free(const GlTextureDesc& desc, GLuint texture);
private:
...
};
At which point we can create temporary texture objects left and right without worrying about excessive calls to glTexImage2D and glDeleteTextures.I found it enormously helpful.
Finally of note is that cleanup function above. When I store textures in the hash table, I put a time stamp on them (using system real time). Periodically I call this cleanup function which then scans through the textures in the hash table and checks the time stamp. If a certain period of time has passed while they're just sitting there idling in the pool (say, 8 seconds), I call glDeleteTextures and remove them from the pool. I use a separate thread along with a condition variable to build up a list of textures to remove the next time a valid context is available by periodically scanning the hash table, but if your application is all single-threaded, you might just invoke this cleanup function every few seconds in your main loop.
That said, I work in VFX which doesn't have quite as tight realtime requirements as, say, AAA games. There's more of a focus on offline rendering in my field and I'm far from a GPU wizard. There might be better ways to tackle this problem. However, I found it enormously helpful to start with this texture pool and I think it might be helpful in your case as well. And it's fairly trivial to implement (just took me half an hour or so).
This could still end up allocating and deleting lots and lots of textures if the texture sizes and formats and parameters you request to allocate/free are all over the place. There it might help to unify things a bit, like at least using POT (power of two) sizes and so forth and deciding on a minimum number of pixel formats to use. In my case that wasn't that much of a problem since I only use one pixel format and the majority of the texture temporaries I wanted to create are exactly the size of a viewport scaled up to the ceiling POT.
As for FBOs, I'm not sure how they help your immediate problem with excessive texture allocation/freeing either. I use them primarily for deferred shading to do post-processing for effects like DOF after rendering geometry in multiple passes in a compositing-style way applied to the 2D textures that result. I use FBOs for that naturally but I can't think of how FBOs immediately reduce the number of textures you have to allocate/deallocate, unless you can just use one big texture with an FBO and render multiple input textures to it to an offscreen output texture. In that case it wouldn't be the FBO helping directly so much as just being able to create one huge texture whose sections you can use as input/output instead of many smaller ones.

Lowest overhead camera to CPU to GPU approach on android

My application needs to do some processing on live camera frames on the CPU, before rendering them on the GPU. There's also some other stuff being rendered on the GPU which is dependent on the results of the CPU processing; therefore it's important to keep everything synchronised so we don't render the frame itself on the GPU until the results of the CPU processing for that frame are also available.
The question is what's the lowest overhead approach for this on android?
The CPU processing in my case just needs a greyscale image, so a YUV format where the Y plane is packed is ideal (and tends to be a good match to the native format of the camera devices too). NV12, NV21 or fully planar YUV would all provide ideal low-overhead access to greyscale, so that would be preferred on the CPU side.
In the original camera API the setPreviewCallbackWithBuffer() was the only sensible way to get data onto the CPU for processing. This had the Y plane separate so was ideal for the CPU processing. Getting this frame available to OpenGL for rendering in a low overhead way was the more challenging aspect. In the end I wrote a NEON color conversion routine to output RGB565 and just use glTexSubImage2d to get this available on the GPU. This was first implemented in the Nexus 1 timeframe, where even a 320x240 glTexSubImage2d call took 50ms of CPU time (poor drivers trying to do texture swizzling I presume - this was significantly improved in a system update later on).
Back in the day I looked into things like eglImage extensions, but they don't seem to be available or well documented enough for user apps. I had a little look into the internal android GraphicsBuffer classes but ideally want to stay in the world of supported public APIs.
The android.hardware.camera2 API had promise with being able to attach both an ImageReader and a SurfaceTexture to a capture session. Unfortunately I can't see any way of ensuring the right sequential pipeline here - holding off calling updateTexImage() until the CPU has processed is easy enough, but if another frame has arrived during that processing then updateTexImage() will skip straight to the latest frame. It also seems with multiple outputs there will be independent copies of the frames in each of the queues that ideally I'd like to avoid.
Ideally this is what I'd like:
Camera driver fills some memory with the latest frame
CPU obtains pointer to the data in memory, can read Y data without a copy being made
CPU processes data and sets a flag in my code when frame is ready
When beginning to render a frame, check if a new frame is ready
Call some API to bind the same memory as a GL texture
When a newer frame is ready, release the buffer holding the previous frame back into the pool
I can't see a way of doing exactly that zero-copy style with public API on android, but what's the closest that it's possible to get?
One crazy thing I tried that seems to work, but is not documented: The ANativeWindow NDK API can accept data NV12 format, even though the appropriate format constant is not one of the ones in the public headers. That allows a SurfaceTexture to be filled with NV12 data by memcpy() to avoid CPU-side colour conversion and any swizzling that happens driver side in glTexImage2d. That is still an extra copy of the data though that feels like it should be unnecessary, and again as it's undocumented might not work on all devices. A supported sequential zero-copy Camera -> ImageReader -> SurfaceTexture or equivalent would be perfect.
The most efficient way to process video is to avoid the CPU altogether, but it sounds like that's not an option for you. The public APIs are generally geared toward doing everything in hardware, since that's what the framework itself needs, though there are some paths for RenderScript. (I'm assuming you've seen the Grafika filter demo that uses fragment shaders.)
Accessing the data on the CPU used to mean slow Camera APIs or working with GraphicBuffer and relatively obscure EGL functions (e.g. this question). The point of ImageReader was to provide zero-copy access to YUV data from the camera.
You can't really serialize Camera -> ImageReader -> SurfaceTexture as ImageReader doesn't have a "forward the buffer" API. Which is unfortunate, as that would make this trivial. You could try to replicate what SurfaceTexture does, using EGL functions to package the buffer as an external texture, but again you're into non-public GraphicBuffer-land, and I worry about ownership/lifetime issues of the buffer.
I'm not sure how the parallel paths help you (Camera2 -> ImageReader, Camera2 -> SurfaceTexture), as what's being sent to the SurfaceTexture wouldn't have your modifications. FWIW, it doesn't involve an extra copy -- in Lollipop or thereabouts, BufferQueue was updated to allow individual buffers to move through multiple queues.
It's entirely possible there's some fancy new APIs I haven't seen yet, but from what I know your ANativeWindow approach is probably the winner. I suspect you'd be better off with one of the Camera formats (YV12 or NV21) than NV12, but I don't know for sure.
FWIW, you will drop frames if your processing takes too long, but unless your processing is uneven (some frames take much longer than others) you'll have to drop frames no matter what. Getting into the realm of non-public APIs again, you could switch the SurfaceTexture to "synchronous" mode, but if your buffers fill up you're still dropping frames.

Image processing in android using OpenGL. glReadPixels is slow and don't understand how to get EGL_KHR_image_base included and working in my project

So I'm trying to get the camera pixel data, monitor any major changes in luminosity and then save the image. I have decided to use open gl as I figured it would be quicker to do the luminosity checks in the fragment shader.
I bind a surface texture to the camera to get the image to the shader and am currently using glReadPixels to get the pixels back which I then put in a bitmap and save.
The bottle neck on the glReadPixels is crazy so I looked into other options and saw that EGL_KHR_image_base was probably my best bet as I'm using OpenGL-ES 2.0.
Unfortunately I have no experience with extensions and don't know where to find exactly what I need. I've downloaded the ndk but am pretty stumped. Could anyone point me in the direction of some documentation and help explain it if I don't understand fully?
Copying pixels with glReadPixels() can be slow, though it may vary significantly depending on the specific device and pixel format. Some tests with using glReadPixels() to save frames from video data (which is also initially YUV) found that 96.5% of the time was in PNG compression and file I/O on a Nexus 5.
In some cases, the time required goes up substantially if the source and destination formats don't match. On one particular device I found that copying to RGBA, instead of RGB, reduced the time required.
The EGL calls can work but require non-public API calls. And it's a bit tricky; see e.g. this answer. (I think the comment in the edit would allow it to work, but I never got back around to trying it, and I'm not in a position to do so now.)
The only solution would be using Pixel Pack Buffer (PBO), where the reading is asynchronous. However, to utilize this asynchronous, you need to have PBO and use it as ping pong buffer.
I refer to http://www.jianshu.com/p/3bc4db687546 where I reduce the read time for 1080p from 40ms to 20ms.

Reason for no simple way to determine 3D coordinate of screen coordinate in Open GL ES 2.0 Android

My question comes from why OpenGL and/or Android does not have a way to simply grab the current matrix and store it as a float[]. All the research I have found suggests using these classes which it looks like I have to download and put in my project called MatrixGrabber to be able to grab the current state of the Open GL matrix.
My overall goal is to easily determine what the Open GL world location is of an event caused by touching the screen where I can retriever the X and Y coordinates by the event.
The best workaround I have found is Android OpenGL 3D picking. but I wonder why there isn't a way where you can simply retriever the matrices you want and then just call
GLU.gluUnProject(...);
My question comes from why OpenGL and/or Android does not have a way to simply grab the current matrix and store it as a float[].
Because OpenGL ES 2.0 (and core desktop GL 3.1 and above) do not necessarily have a "current matrix." All transforms are done via shader logic, so matrices don't even have to be involved. It could be doing anything in there.
There is no current matrix, so there is nothing to get. And nothing to unproject.
In ES 1 you can grab the current matrix and store it as a float using glGetFloatv, with the pname GL_MODELVIEW_MATRIX, GL_PROJECTION_MATRIX or GL_TEXTURE_MATRIX as applicable.
GLU is not inherently part of OpenGL ES because ES is intended to be a minimal specification and GLU is sort of an optional extra. But you can grab SGI's reference implementation of GLU and use its gluUnProject quite easily.
EDIT: and to round off the thought, as Nicol points out there's no such thing as the current matrix in ES 2. You supply to your shaders arbitrarily many matrices for arbitrarily many purposes, and since you supplied them in the first place you shouldn't need to ask GL to get them back again.
I just took a look at http://developer.android.com/resources/samples/ApiDemos/src/com/example/android/apis/graphics/spritetext/MatrixGrabber.html which looks like the MatrixGrabber you're referring to and it doesn't look especially complicated — in fact, it's overbuilt. You can just use gl2.getMatrix directly and plug that into gluUnProject.
The probable reason for the design of the MatrixGrabber code is that it caches the values for multiple uses — because the GPU runs asynchronously to the CPU, your code may have to wait for the getMatrix response, so it is more efficient to get it as few times as possible and reuse the data.
Another source of complexity in the general problem is that a touch specifies only two dimensions. A single touch does not indicate any depth, so you have to do that in some application-specific way. The obvious approacj is to read the depth buffer (though some OpenGL implementations don't support that), but that doesn't work if you have e.g things that should be "transparent" to touches. An alternative is to construct a ray (such as by unprojecting twice with two different depths) and then do raycasting into your scene.

Categories

Resources