Does opengl render a large mesh broken into smaller triangle faster? - android

If you render a large quad that extends beyond the borders of the screen would opengl render faster if you break that quad into smaller triangle/segments? To a point of course, say instead of two triangles in the quad you managed to break it up into 18 triangles or so. This way the triangles that are off the screen would not be sent to the fragment shader? Too many triangles obviously defeats the purpose and slows it down. Am I way off on that one? Based on my understanding of opengl when triangles are off screen they aren't rendered (sent to fragment shader) even if a draw call is called. The fragment shader is the slowest part and if less fo the quad is going to it, it would be more efficient?

I wouldn't expect splitting large triangles into smaller triangles to be beneficial for performance.
Clipping happens in a fixed function block between vertex shader and fragment shader. The fragment shader will never see fragments that are outside the window. So you don't have to worry about processing fragments outside the window. They will be eliminated no matter how big or small the triangles are.
The spec suggests that clipping happens on the vertices in clip coordinate space, before rasterization. But as always, OpenGL implementations are free to handle this differently, as long as the end result is the same. So some hardware architectures may clip the vertices, while others may rasterize first and then perform clipping on the resulting fragments. Or they may use a hybrid between the two approaches.
Now, if you're running on a hardware architecture that (partly) clips in fragment space, it's theoretically possible that you have more rasterization overhead with large triangles that are mostly off screen. But it seems very unlikely that this would ever turn into a real bottleneck.
If a large part of your geometry is getting clipped, it may be beneficial to apply some form of culling before rendering, so that objects completely outside the view volume are never rendered.

Related

OpenGL ES 2.0 How to increase performance in a voxel world?

I created a voxel world using OpenGL ES 2.0 using a VBO to store a basic cube and using a different position matrix for each cube. I am able to get 30fps on my Galaxy S3 when there are 500-600 cubes being rendered, but anything more than 1500 cubes isn't able to run at a faster rate than 8 fps. This is unacceptable because the voxel world should be able to handle more than 5,000 voxels being rendered at a stable 30fps. I have played other mobile games on my phone that run at good framerates and render much more than 5000 blocks at a time. What kind of techniques would be best for getting good performance?
Here is what I have set up in more detail:
There is one VBO containing vertex information for a basic cube.
Each block has its own matrix that is translated to the block's position in world space (This matrix is calculated only once when the block is created). The block calls glDrawArrays to draw the cube using its position matrix. Unfortunately this means there are thousands of calls to glDrawArrays in each frame.
Is there a better technique to this? I don't know how to group all the blocks into one single call to glDrawArrays because that would mean the VBO would need a huge allocation, to add all the vertex data for every single cube, and it is impossible to know how much space the VBO would need before drawing them. What I was thinking was to allocate a VBO for every 500 or so blocks so that if it needs more space for blocks it can always create a new VBO for it. And this way it wouldn't be allocating too much extra space since it will only allocate enough space for 500 blocks, and this way if we have 5000 blocks in the world, there will be only 10 calls to glDrawArrays instead of having thousands of those calls.
Another idea I have is that instead of having a VBO for the cube, I could make a VBO for a quad, and use a transformation matrix on each quad. This would require even more calls to glDrawArrays since I would have to call it for each face of the cube, but the plus side is that this way I can remove the faces that already have a block next to them. For the floor level, each block has 4 blocks surrounding it, so those 4 faces don't actually need to be drawn. This would save drawing those 4 quads for each block, but it would require more than double the amount of glDrawArrays calls. To reduce the amount of glDrawArrays calls I could create a new VBO for every 500 or so quads, and add/remove quads to the current VBOs whenever necessary. This would reduce the amount of glDrawArrays calls, but it would mean that I have to group each quad based on its texture, which is another issue because if I have to create a VBO for each texture, that would require me to allocate a lot of extra unnecessary space because there might be just one block that uses a certain texture and I may end up allocating space for 500 blocks for that texture.
These are my thoughts on some of the methods I can think of to optimise the rendering, but I don't think any of these techniques will drastically improve the fps of the game, because every method comes with its own issues. Is there anything that I have not thought of that could be a better solution?
EDIT: I switched to rendering quads instead of cubes because this way I can skip over the faces that are not visible. After that I also added frustum culling so that only blocks visible inside the frustum are shown. This increased the performance so that I can render a decent sized world at 30 fps now. But I think there is still a lot of room for improvement, because there are currently 23,000 calls to glDrawArrays(GL_TRIANGLES) (one for each quad rendered on screen). Would switching to using glDrawArrays(GL_TRIANGLE_STRIPS) make any real difference? And also creating VBO's that hold 1,000 quads each instead of just 1 quad is a possibility, but that would mean I would have to allocate a lot more space in the VBO's. (Right now there is only one quad stored in the VBO which is transformed by a matrix to its position/rotation).
if using Octtrees (wich is definitely THE WAY) does not suit you, you can optimize the code for calling the vbo lists.
In my work, I started with a scene rendering at 3fps rate, just optimizing the opengl calls and context switches, now runs on 53fps (wich is quite fine considering the starting point).
So, try not to change any register inside the gpu between calls:
order all the objects with the same shader to render them all at the time using only one glUseProgram
order objects with transparency, so you only draw translucent objects at the end.
draw objects in such a fashion that fragments are drawn only once (if a object is behind another, draw the front object first, cause depth test is faster than fragment calculation).
use shaders without "discard;" wich is costly for the cpu to process.
use reversed loops to get a little bit of cpu speed
dont select the texture if it is already the same than selected in the GPU (a cpu 'if' is less costly than a GPU register change).
try not to update the shader attributes if there is no need to (cpu if is less costly).
if you post some pieces of code I can help you better.
I am currently implementing a voxel world using java on a normal PC with OpenGL 4.x.
At the beginning I had the same issue but that I followed a very basic tutorial: https://sites.google.com/site/letsmakeavoxelengine/
With one render call per chunk there is no problem having 10 Chunks of 32*32*32 Blocks rendered (FPS > 30). You should load the Chunk and only add those faces which are not occluded by other faces (so that they are visible to the player) to an array which will be uploaded to a VBO. Therefore you have one rendercall per Chunk with the minimum amout of faces
In 2D is looks like this
_ _ _
|B B B|
|B B |
|B B B|
- - -
There is no need to draw the faces between the outter faces. In addition you can use frustrum culling: How to check if an object lies outside the clipping volume in OpenGL?
So you just need to make a render call for those chunks which are actually inside your frustrum. Do not render chunks behind the camera. OpenGL will make a lot of calculations for all vertices of the chunk, but then the chunk is not visible so why render it in the first place. This can happen in your java code.
A third optimazation could be deferred shading: http://en.wikipedia.org/wiki/Deferred_shading
As far as I know the shading is processed before depth testing and throwing away those triangels/ faces occluded by others, you can speed up your shader using deferred shading as you only shade those vertices which will pass the depth-testing.
There are a lot of more ways to optimize voxel rendering but for me this are the most basic operations. The given tutorial behind the first link isn't finished yet, but he shows a lot of ideas for optimizing voxel rendering.
Edit:
If you want to use textures, which different textures for each cube, I recommend to place all textures in a big one, so you do not need to swap textures, a simple texture lookup is much more faster than swapping a texture (glBindTexture(..)) and then make a lookup and later swap back to this texture. Use one big huge texture and apply the right UV coordinates to your vertices.
You should use BSP Octrees to discard big blocks of offscreen cubes.
You divide the world into 8 "space cubes" wich go in the different axis.
Then, you check if the camera can see something inside the cube, if it can't you discard all the blocks in that section (wich can speed up to 8x). Then, inside the block, you divide again in 8 sections, and check again if they are visible. An so on, speeding checks and renders.
http://en.wikipedia.org/wiki/Octree
http://i.ytimg.com/vi/S-oIeUiw2UY/hqdefault.jpg
Octree can be accelerated using "portals" (and I dont mean GladOs ;) ) wich discard voxels and Octrees depending on the visibility inside doors and windows, but is only good for interiors.

OpenGL/Android big sprite quads bad performance

I am developping a game on android using opengl and am having a little performance problem.
Let's say for example I want to draw a background partially filled with grass "bushes". Bushes have different x,y,z, different sizes and so on (each bush is a 2D sprite), and potentially partially hide each other (I use a perspective camera). I am having a big performance problem if those sprites are big (i.e. the quad sizes, not the texture size/resolution) :
If I use a classical front to back draw (to avoid overdraw), I find myself having problems because of (I think) alpha testing. Even if the bushes have only opaque and fully transparent pixels (no partial transparency), and if I use the proper alpha testing comparison (GL_EQUAL 1) the performances are bad because a lot of pixels have to be alpha tested (If I understand right).
If I use a back to front display with alpha testing disabled, I lose a lot of performance too (but this time because of overdraw problems), even when disabling depth buffer writing (not sure if it does anything if depth test is disabled by the way).
I am having good performances if using front to back without alpha testing, but of course sprite cutout is completely gone, which is really really bad.
All the bushes have the same texture, I use 16 bit colors, mip mapping, geometry batching, cull faces, no shaders, etc. All what I can think of to improve performances (which are not bad in other cases), except texture compression. I even filter the sprites to avoid "displaying" the ones out the screen. I have also tried some "violent optimizations" for test purposes, such as making the textures fully opaque, lowering the texture resolution a lot, disabling blending, etc, but nothing was fantastic performance-wise except the alpha testing removal.
I was wondering if I was forgetting something here to help with the performance. Back to front creates overdraw, front to back is slow because of alpha testing (and I do not want my bushes to be "square" images so I cannot disable alpha testing). If I create smaller sprites performances are far better (even with a lot more sprites), but this is only a workaround.
To summarize, how can you display overlapping big quads needing cutout, without losing performance?
PS : I am testing on a nexus one.
PS2 : Some optimizations suggest to not create quads but geometries more "fitting" the texture, but it seems to be a really tedious process, and would not help me a lot I think.
Drawing front-to-back is normally a benefit because of early-z: the hardware can do the depth test right after rasterization, before doing the texture fetch or shading. With front-to-back sorting, most fragments fail the depth test, and you save a lot of texture bandwidth, shading throughput, and zbuffer-write bandwidth.
But alpha test breaks that. If a fragment passes the depth test, it might still be killed by alpha test, so zwrite can't happen until after texturing/shading. Most hardware that can do early-z still has to do the depth test at the same point in the pipeline as it does zwrite, so with alpha test you end up doing ztest + zwrite after texturing and shading. As a result, front-to-back sorting only saves you zwrite bandwidth, nothing else.
I think you have two options, if you really want large sprites that overlap significantly:
(a) Only use two or three distinct Z values for your sprites. Draw them back-to-front with blending (and alpha-test, if it helps). No overlap within a layer: you can pre-render each layer either in the original assets or once at runtime, then just shift it left and right.
(b) If your sprites have large opaque regions surrounded by a semi-transparent border, you can draw the opaque regions in a first pass with no alpha test, then draw borders as a separate pass. This will cut down on the number of alpha-tested fragments.

Android / Offscreen rendering to texture

I am making a 2D graphical app that will display planets. I say 2D because the majority of the app will be 2D. I however want to render some 3D objects into dynamic sprites offscreen (to a texture), with transparent (possibly translucent) areas, and subsequently render those rendered textures to the active screen as 2D textured quads. Rendering directly to the screen as 3D objects is not optimal in this case, because it would require me to implement some sort of 3D picking. I am not that advanced in math yet. Note also that the main screen render will be orthographic, while the offscreen render would be perspective.
How can I accomplish this (general idea, no need for specifics), and what would be the most efficient way to do this? Would this reduce support for a wide variety of devices? Also, if the 3D sprite renderings were constantly refreshed every frame (such as being rotated fine amounts) would that kill framerates with continuous unloading/reloading of texture to memory? I suppose that some scenes could have as many as 10 of these 3D offscreen sprites.
Thanks for the help
If you really must use the offscreen rendering just search for FBO(frame buffer object) and attach a texture to it, then use the texture in your main view as 2D. It is quite a straight forward procedure but might decrease the speed. You will probably not be able to do any multithreading on it so you should create just 1 FBO. Its dimensions will probably have to be a power of 2 so the resolution might be different then you wish. This procedure does not continually load/unload anything, the data is allocated when creating the texture and GL draws/reads directly from it. The largest drawback here will be the memory.. You will create as many as 10 of this textures just to draw on them and present once.
It might be very easy to place this objects on a specific place on your main buffer though: Make all the logic as if you would want to draw a full screen planet but use "viewport" method to place it to a specific part of the screen.
If those planet images will be updated only on user request (you don't want to draw them every frame) then I suggest you try to make a combination of both: Create a FBO with a texture of same size or larger then main view and draw all the planets to this single texture using "viewport" method. Then you can update any you want, just don't clear the buffer, rather draw a clear rect on the specific part of the buffer/texture. And keep drawing the whole texture to the main buffer.

OpenGL ES drop shadows for 2D sprites

I've got a an OpenGL scene rendered with a bunch of sprites, and I'd like to automagically add drop shadows to all of them. Here's a picture showing what I mean:
The scene uses orthographic projection, the sprites are textured quads, and I'm using the depth buffer to draw them front to back. I'm working with OpenGL ES 2.0, but thoughts from the iOS or non-ES worlds would be appreciated as well. I've tossed a few ideas around in my head of how I can go about this, and I'd like to find out which has the most promise.
Draw each sprite twice, the first normally, the second with some kind of drop shadow shader a bit deeper in the scene. Not sure if this is possible?
Draw a sprite, then draw it again, darkened and with some alpha, several times with some random jitter applied to the verticies. This may look silly and not at all like a shadow.
Draw the base scene without background to a texture, then blur and darken it to create one large drop shadow. Then draw the base scene over the drop shadow texture, then finally over the background. This would lose the shadows between sprites, though.
SSAO in a post-processing pass. Might be the most dynamic and automatic, but could look fuzzy/grainy and really slow things down.
At creation time, generate a shadow texture for each sprite. For rendering, draw a sprite and then its shadow texuture a bit deeper in the scene. I think I'd like to avoid this due to the loading time and extra memory requirements, but this may be the fastest and best looking?
I don't want to do any shadow work with external textures, since I use the same sprite textures at varying scales, and pre-baked shadows would scale unnaturally.
So are any of these better than the others? Are there other options I'm not thinking of? Thanks!
Those are all some well thought out options, here are my thoughts on each
It is definitely possible to use a shader but it might not be the most performant option, since the blurring will have to be done inside the shader and might involve multiple texture lookups.
Drawing the texture multiple times would work and would look like a shadow, because each "jittered" image would have slightly modified alpha values. But again, blending and multiple renders of each sprite would add up and might affect performance.
I like and recommend this option, because you can set a shader that puts black pixels instead of colored pixels (considering alpha) into a render target smaller than the screen (1/4th?) and then use this as the shadow texture. Since the texture is now being stretched, you'd get the "blurring" for free, too. The pixel shader that does the "blackening" would be very simple and not affect performance too much.
Unless you really need high-quality shadows (and the previous method doesn't suffice) I wouldn't recommend this.
This is of course the most flexible option and has an x2 rendering complexity. Unfortunately, it will consume more memory than all the other options above.
Hope this helps!

Scrolling/zooming a scene in OpenGL and subdivision

We are to develop a scrolling/zooming scene in OpenGL ES on Android, very much like a level in Angry Birds but more like a level in World Of Goo. More like the latter as the world will not consist of repeated layers as featured in Angry Birds but of a large image. As the scene needs to scroll/zoom and therefore a lot of it will not be visible, I was wondering about the most efficient way to implement the rendering, focusing on the environment only (ie not the objects within the world but background layers).
We will be using an orthographic projection.
The first that comes to mind is creating a large 4 vertices rectangle at world size, which has the background texture mapped to it, and translate/scale this using glTranslatef / glScalef. However, I was wondering if the non visible area outside of the screens boundaries is still being rendered by OpenGL as it is not being culled (you would lose the visible area as well as there are only 4 vertices). Therefore, would it be more efficient to subdivide this rectangle, so non visible smaller rectangles can be culled?
Another option would be creating a 4 vertice rectangle that would fill the screen, then move the background by adjusting its texture coordinates. However, I guess we would run into problems when building bigger worlds, considering the texture size limit. It seems like a nice implementation for repeated backgrounds like AngryBirds has.
Maybe there is another way..?
If someone has an idea on how it might be done in AngryBirds / World of Goo, please share as I'd love to hear. They seem to have implemented a system that allows for the world to be moved and zoomed very (WorldOfGoo = VERY) smoothly.
This is probably your best bet for implementation.
In my experience, keeping a large texture in memory is very expensive on Android. I would get quite a few OutOfMemoryError exceptions for the background texture before I moved to tiling.
I think the biggest rendering bottleneck would be with memory transfer speeds and fill rate instead of any graphics computation.
Edit: Check out 53:28 of this presentation from Google I/O 2009.
You could split the background rectangle into smaller rectangles, so that OpenGL only renders the visible rectangles. You won't have a big ass rectangle with a big ass texture loaded but smallers rectangles with smaller textures that you could load/unload, depending on what is visible on screen...
Afaik there would be no performance drop due to large areas being rendered off-screen, subdividing and culling is normally done just to reduce vertex count, but you would actually be adding to it here.
Putting that aside for now; from the way you phrased the question I am unsure whether you have a large background texture or a small repeating one. If it is large, then you will need to subdivide because of texture size limitations anyway, so the question is moot! If it is small, then I would suggest the second method, fit a quad to the screen and move the background by changing the texture coordinates.
I feel like I may have missed something, though, as I am unsure why you mentioned the texture size limitation issue when talking about the the texture coordinate method and not the large quad method. Surely for both this is not a problem for repeating textures as you can use GL_REPEAT texture wrap mode...
But for both it is a problem for a single large texture unless you subdivide, which would make the texture coordinate tactic way more complicated than necessary. In this case subdividing the mesh along texture subdivisions would be best, and culling off-screen sections. Deciding which parts to cull should be trivial with this technique.
Cheers.

Categories

Resources