How to optimize per pixel lighting on Android?

How to optimize per pixel lighting on Android? - android

I'm drawing a scene in 3D using per pixel lighting on Android in OpenGL ES 2.0. When I render only a few small objects I get 60 FPS, but if I try to use a landscape to put the objects on, the framerate drops heavily, to about 15 frames. I guess this is because it has to calculate the lighting for all those pixels from the landscape. Is there any way I could optimize this? I wouldn't want to use per vertex lighting on the landscape alone because I would like to cast shadows on it.

From comments above I see that you believe the code of shaders could not be faster than it is and you are stuck into GPU fillrate limit.
In this case you can improve performance with wise culling of object: first draw nearest objects which use fast and simple shaders and which occlude your landscape object (which uses complex shaders for per-pixel lighting). This way GPU will draw only visible parts of landscape, doing less heavy computations in landscape shader. In my case I was able to gain extra 3 fps by reordering objects to more effectively occlude each other.
However, if performance drop is really that huge you can post code of your shaders, maybe there are some ways to improve performance of shaders too.

I suggest you to partition your landscape depending on distance to camera. For the nearest partition you can draw per pixel and also add more objects and detail. For far partition use vertex lighting. Also use impostors for really far objects. When you partition your landscape, you need take into your model how much area of the near landscape is projected into screen space, because if you are near the ground and let the player see down, all your screen will need to be renderer with per-pixel lighting and will suffer from rate drop.

Related

OpenGL ES 2.0 How to increase performance in a voxel world?

I created a voxel world using OpenGL ES 2.0 using a VBO to store a basic cube and using a different position matrix for each cube. I am able to get 30fps on my Galaxy S3 when there are 500-600 cubes being rendered, but anything more than 1500 cubes isn't able to run at a faster rate than 8 fps. This is unacceptable because the voxel world should be able to handle more than 5,000 voxels being rendered at a stable 30fps. I have played other mobile games on my phone that run at good framerates and render much more than 5000 blocks at a time. What kind of techniques would be best for getting good performance?
Here is what I have set up in more detail:
There is one VBO containing vertex information for a basic cube.
Each block has its own matrix that is translated to the block's position in world space (This matrix is calculated only once when the block is created). The block calls glDrawArrays to draw the cube using its position matrix. Unfortunately this means there are thousands of calls to glDrawArrays in each frame.
Is there a better technique to this? I don't know how to group all the blocks into one single call to glDrawArrays because that would mean the VBO would need a huge allocation, to add all the vertex data for every single cube, and it is impossible to know how much space the VBO would need before drawing them. What I was thinking was to allocate a VBO for every 500 or so blocks so that if it needs more space for blocks it can always create a new VBO for it. And this way it wouldn't be allocating too much extra space since it will only allocate enough space for 500 blocks, and this way if we have 5000 blocks in the world, there will be only 10 calls to glDrawArrays instead of having thousands of those calls.
Another idea I have is that instead of having a VBO for the cube, I could make a VBO for a quad, and use a transformation matrix on each quad. This would require even more calls to glDrawArrays since I would have to call it for each face of the cube, but the plus side is that this way I can remove the faces that already have a block next to them. For the floor level, each block has 4 blocks surrounding it, so those 4 faces don't actually need to be drawn. This would save drawing those 4 quads for each block, but it would require more than double the amount of glDrawArrays calls. To reduce the amount of glDrawArrays calls I could create a new VBO for every 500 or so quads, and add/remove quads to the current VBOs whenever necessary. This would reduce the amount of glDrawArrays calls, but it would mean that I have to group each quad based on its texture, which is another issue because if I have to create a VBO for each texture, that would require me to allocate a lot of extra unnecessary space because there might be just one block that uses a certain texture and I may end up allocating space for 500 blocks for that texture.
These are my thoughts on some of the methods I can think of to optimise the rendering, but I don't think any of these techniques will drastically improve the fps of the game, because every method comes with its own issues. Is there anything that I have not thought of that could be a better solution?
EDIT: I switched to rendering quads instead of cubes because this way I can skip over the faces that are not visible. After that I also added frustum culling so that only blocks visible inside the frustum are shown. This increased the performance so that I can render a decent sized world at 30 fps now. But I think there is still a lot of room for improvement, because there are currently 23,000 calls to glDrawArrays(GL_TRIANGLES) (one for each quad rendered on screen). Would switching to using glDrawArrays(GL_TRIANGLE_STRIPS) make any real difference? And also creating VBO's that hold 1,000 quads each instead of just 1 quad is a possibility, but that would mean I would have to allocate a lot more space in the VBO's. (Right now there is only one quad stored in the VBO which is transformed by a matrix to its position/rotation).

if using Octtrees (wich is definitely THE WAY) does not suit you, you can optimize the code for calling the vbo lists.
In my work, I started with a scene rendering at 3fps rate, just optimizing the opengl calls and context switches, now runs on 53fps (wich is quite fine considering the starting point).
So, try not to change any register inside the gpu between calls:
order all the objects with the same shader to render them all at the time using only one glUseProgram
order objects with transparency, so you only draw translucent objects at the end.
draw objects in such a fashion that fragments are drawn only once (if a object is behind another, draw the front object first, cause depth test is faster than fragment calculation).
use shaders without "discard;" wich is costly for the cpu to process.
use reversed loops to get a little bit of cpu speed
dont select the texture if it is already the same than selected in the GPU (a cpu 'if' is less costly than a GPU register change).
try not to update the shader attributes if there is no need to (cpu if is less costly).
if you post some pieces of code I can help you better.

I am currently implementing a voxel world using java on a normal PC with OpenGL 4.x.
At the beginning I had the same issue but that I followed a very basic tutorial: https://sites.google.com/site/letsmakeavoxelengine/
With one render call per chunk there is no problem having 10 Chunks of 32*32*32 Blocks rendered (FPS > 30). You should load the Chunk and only add those faces which are not occluded by other faces (so that they are visible to the player) to an array which will be uploaded to a VBO. Therefore you have one rendercall per Chunk with the minimum amout of faces
In 2D is looks like this
_ _ _
|B B B|
|B B |
|B B B|
- - -
There is no need to draw the faces between the outter faces. In addition you can use frustrum culling: How to check if an object lies outside the clipping volume in OpenGL?
So you just need to make a render call for those chunks which are actually inside your frustrum. Do not render chunks behind the camera. OpenGL will make a lot of calculations for all vertices of the chunk, but then the chunk is not visible so why render it in the first place. This can happen in your java code.
A third optimazation could be deferred shading: http://en.wikipedia.org/wiki/Deferred_shading
As far as I know the shading is processed before depth testing and throwing away those triangels/ faces occluded by others, you can speed up your shader using deferred shading as you only shade those vertices which will pass the depth-testing.
There are a lot of more ways to optimize voxel rendering but for me this are the most basic operations. The given tutorial behind the first link isn't finished yet, but he shows a lot of ideas for optimizing voxel rendering.
Edit:
If you want to use textures, which different textures for each cube, I recommend to place all textures in a big one, so you do not need to swap textures, a simple texture lookup is much more faster than swapping a texture (glBindTexture(..)) and then make a lookup and later swap back to this texture. Use one big huge texture and apply the right UV coordinates to your vertices.

You should use BSP Octrees to discard big blocks of offscreen cubes.
You divide the world into 8 "space cubes" wich go in the different axis.
Then, you check if the camera can see something inside the cube, if it can't you discard all the blocks in that section (wich can speed up to 8x). Then, inside the block, you divide again in 8 sections, and check again if they are visible. An so on, speeding checks and renders.
http://en.wikipedia.org/wiki/Octree
http://i.ytimg.com/vi/S-oIeUiw2UY/hqdefault.jpg
Octree can be accelerated using "portals" (and I dont mean GladOs ;) ) wich discard voxels and Octrees depending on the visibility inside doors and windows, but is only good for interiors.

Android OpenGL ES 1.1 flickering

While you are promoting my Android project, I discovered a strange.
I can display the map in the ocean Android OpenGL ES 2D graphics.
So, to be used only to determine the phase order of the object, the value is reduced to about 0.0001 Z-axis.
I tried over 1000 times the size of the object In the meantime.
Then, a phenomenon depending on the zoom in / zoom out, some objects flickering occurred.
Why such problems occur??
It is the problem of the target terminal-specific this can not be resolved if?
Or is it a problem of Android OpenGL ES itself?
***More....
The photo below is what you screen shot every time the screen of the actual device.
***I occurs when such a phenomenon to zoom in / zoom out each time.

I assume what you are experiencing is z-fighting: http://en.wikipedia.org/wiki/Z-fighting
This results due to the fact that your objects are too close together so that the z-buffer for certain pixels can't distinguish between which pixel is below or above the other.
You have three choices now:
1) Adjust your projection, specifically adjust znear and zfar values. Read more here: http://www.opengl.org/archives/resources/faq/technical/depthbuffer.htm
2) Increase the distance between both objects
3) Since you are drawing a 2D scene, you might use orthogonal projection. In that case it might be worth not to use depth buffering at all and just draw the objects from back to front (Painters Algorithm, http://en.wikipedia.org/wiki/Painters_algorithm).

OpenGL/Android big sprite quads bad performance

I am developping a game on android using opengl and am having a little performance problem.
Let's say for example I want to draw a background partially filled with grass "bushes". Bushes have different x,y,z, different sizes and so on (each bush is a 2D sprite), and potentially partially hide each other (I use a perspective camera). I am having a big performance problem if those sprites are big (i.e. the quad sizes, not the texture size/resolution) :
If I use a classical front to back draw (to avoid overdraw), I find myself having problems because of (I think) alpha testing. Even if the bushes have only opaque and fully transparent pixels (no partial transparency), and if I use the proper alpha testing comparison (GL_EQUAL 1) the performances are bad because a lot of pixels have to be alpha tested (If I understand right).
If I use a back to front display with alpha testing disabled, I lose a lot of performance too (but this time because of overdraw problems), even when disabling depth buffer writing (not sure if it does anything if depth test is disabled by the way).
I am having good performances if using front to back without alpha testing, but of course sprite cutout is completely gone, which is really really bad.
All the bushes have the same texture, I use 16 bit colors, mip mapping, geometry batching, cull faces, no shaders, etc. All what I can think of to improve performances (which are not bad in other cases), except texture compression. I even filter the sprites to avoid "displaying" the ones out the screen. I have also tried some "violent optimizations" for test purposes, such as making the textures fully opaque, lowering the texture resolution a lot, disabling blending, etc, but nothing was fantastic performance-wise except the alpha testing removal.
I was wondering if I was forgetting something here to help with the performance. Back to front creates overdraw, front to back is slow because of alpha testing (and I do not want my bushes to be "square" images so I cannot disable alpha testing). If I create smaller sprites performances are far better (even with a lot more sprites), but this is only a workaround.
To summarize, how can you display overlapping big quads needing cutout, without losing performance?
PS : I am testing on a nexus one.
PS2 : Some optimizations suggest to not create quads but geometries more "fitting" the texture, but it seems to be a really tedious process, and would not help me a lot I think.

Drawing front-to-back is normally a benefit because of early-z: the hardware can do the depth test right after rasterization, before doing the texture fetch or shading. With front-to-back sorting, most fragments fail the depth test, and you save a lot of texture bandwidth, shading throughput, and zbuffer-write bandwidth.
But alpha test breaks that. If a fragment passes the depth test, it might still be killed by alpha test, so zwrite can't happen until after texturing/shading. Most hardware that can do early-z still has to do the depth test at the same point in the pipeline as it does zwrite, so with alpha test you end up doing ztest + zwrite after texturing and shading. As a result, front-to-back sorting only saves you zwrite bandwidth, nothing else.
I think you have two options, if you really want large sprites that overlap significantly:
(a) Only use two or three distinct Z values for your sprites. Draw them back-to-front with blending (and alpha-test, if it helps). No overlap within a layer: you can pre-render each layer either in the original assets or once at runtime, then just shift it left and right.
(b) If your sprites have large opaque regions surrounded by a semi-transparent border, you can draw the opaque regions in a first pass with no alpha test, then draw borders as a separate pass. This will cut down on the number of alpha-tested fragments.

Rotating all OpenGL output

External requirements --- you have to hate them...
I have an OpenGL ES game, which uses EGL and OpenGL ES to draw on the screen. I don't have source to this; it's supplied as a binary blob. I'm implementing the interface layer that mediates between the game's calls to EGL and OpenGL and the platform's implementation.
It works fine. But I now have the unexpected external requirement that I need to be able to rotate the entire game's output 90 degrees.
Can anyone suggest any good (easy, fast) ways to do this? Off the top of my head, I can think of:
insert the appropriate transformation into the game's projection matrix. This seems to me to be the fastest solution; but I don't think I have enough knowledge of the game's manipulation of the projection matrix to do this reliably. Plus it'll confuse the game if it uses any OpenGL calls to access the screen which don't go through the projection matrix. (glReadPixels(), for example.)
give the game a rendering context to an off-screen buffer; it renders there, and then when the game calls eglSwapBuffers() I copy the result onto the screen. Render-to-texture would help here. Problems: this will affect performance as I'm effectively doing two drawing passes instead of one; and render-to-texture isn't standardised in OpenGL ES. (My target platform, Android, doesn't even reliably support shared contexts.)
render into the colour buffer, then use glReadPixels() to copy the data out and do a software rotate onto the screen. Problems: dead slow, and I have no control of the size of the buffer (i.e. if the screen is 640x480 and we're drawing 90° rotated, I really want to give the game a 480x640 colour buffer).
other?
Game-specific hacks aren't an option here because I need to be able to swap out the game binary with another one; this has to be a generic fix. Changing the game isn't an option because we don't have control of the game source code.
Any suggestions? Other than the non-technical one of trying to persuade the requirement to go away?

What is the issue with you have to use glRotate along the z axis ??
Approach 1 is the way to go.
Pixel operations are heavy and it is possible, that you could be messing up with the aspect ratio, etc etc.
The steps which go into drawing are
1. Set the transformation matrix (the model/ projection)
If landscape, apply the glRotate
2. Set the view port (this might change each time you rotate the screen)
if landscape - set a b as height/widht respectively
if landscape - set b a as height/widht respectively
3. Draw the matrix
When you rotate the screen, the objects are rendered again. So glRotate is the best way to go.

Scrolling/zooming a scene in OpenGL and subdivision

We are to develop a scrolling/zooming scene in OpenGL ES on Android, very much like a level in Angry Birds but more like a level in World Of Goo. More like the latter as the world will not consist of repeated layers as featured in Angry Birds but of a large image. As the scene needs to scroll/zoom and therefore a lot of it will not be visible, I was wondering about the most efficient way to implement the rendering, focusing on the environment only (ie not the objects within the world but background layers).
We will be using an orthographic projection.
The first that comes to mind is creating a large 4 vertices rectangle at world size, which has the background texture mapped to it, and translate/scale this using glTranslatef / glScalef. However, I was wondering if the non visible area outside of the screens boundaries is still being rendered by OpenGL as it is not being culled (you would lose the visible area as well as there are only 4 vertices). Therefore, would it be more efficient to subdivide this rectangle, so non visible smaller rectangles can be culled?
Another option would be creating a 4 vertice rectangle that would fill the screen, then move the background by adjusting its texture coordinates. However, I guess we would run into problems when building bigger worlds, considering the texture size limit. It seems like a nice implementation for repeated backgrounds like AngryBirds has.
Maybe there is another way..?
If someone has an idea on how it might be done in AngryBirds / World of Goo, please share as I'd love to hear. They seem to have implemented a system that allows for the world to be moved and zoomed very (WorldOfGoo = VERY) smoothly.

This is probably your best bet for implementation.
In my experience, keeping a large texture in memory is very expensive on Android. I would get quite a few OutOfMemoryError exceptions for the background texture before I moved to tiling.
I think the biggest rendering bottleneck would be with memory transfer speeds and fill rate instead of any graphics computation.
Edit: Check out 53:28 of this presentation from Google I/O 2009.

You could split the background rectangle into smaller rectangles, so that OpenGL only renders the visible rectangles. You won't have a big ass rectangle with a big ass texture loaded but smallers rectangles with smaller textures that you could load/unload, depending on what is visible on screen...

Afaik there would be no performance drop due to large areas being rendered off-screen, subdividing and culling is normally done just to reduce vertex count, but you would actually be adding to it here.
Putting that aside for now; from the way you phrased the question I am unsure whether you have a large background texture or a small repeating one. If it is large, then you will need to subdivide because of texture size limitations anyway, so the question is moot! If it is small, then I would suggest the second method, fit a quad to the screen and move the background by changing the texture coordinates.
I feel like I may have missed something, though, as I am unsure why you mentioned the texture size limitation issue when talking about the the texture coordinate method and not the large quad method. Surely for both this is not a problem for repeating textures as you can use GL_REPEAT texture wrap mode...
But for both it is a problem for a single large texture unless you subdivide, which would make the texture coordinate tactic way more complicated than necessary. In this case subdividing the mesh along texture subdivisions would be best, and culling off-screen sections. Deciding which parts to cull should be trivial with this technique.
Cheers.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.