I am making a 2D graphical app that will display planets. I say 2D because the majority of the app will be 2D. I however want to render some 3D objects into dynamic sprites offscreen (to a texture), with transparent (possibly translucent) areas, and subsequently render those rendered textures to the active screen as 2D textured quads. Rendering directly to the screen as 3D objects is not optimal in this case, because it would require me to implement some sort of 3D picking. I am not that advanced in math yet. Note also that the main screen render will be orthographic, while the offscreen render would be perspective.
How can I accomplish this (general idea, no need for specifics), and what would be the most efficient way to do this? Would this reduce support for a wide variety of devices? Also, if the 3D sprite renderings were constantly refreshed every frame (such as being rotated fine amounts) would that kill framerates with continuous unloading/reloading of texture to memory? I suppose that some scenes could have as many as 10 of these 3D offscreen sprites.
Thanks for the help
If you really must use the offscreen rendering just search for FBO(frame buffer object) and attach a texture to it, then use the texture in your main view as 2D. It is quite a straight forward procedure but might decrease the speed. You will probably not be able to do any multithreading on it so you should create just 1 FBO. Its dimensions will probably have to be a power of 2 so the resolution might be different then you wish. This procedure does not continually load/unload anything, the data is allocated when creating the texture and GL draws/reads directly from it. The largest drawback here will be the memory.. You will create as many as 10 of this textures just to draw on them and present once.
It might be very easy to place this objects on a specific place on your main buffer though: Make all the logic as if you would want to draw a full screen planet but use "viewport" method to place it to a specific part of the screen.
If those planet images will be updated only on user request (you don't want to draw them every frame) then I suggest you try to make a combination of both: Create a FBO with a texture of same size or larger then main view and draw all the planets to this single texture using "viewport" method. Then you can update any you want, just don't clear the buffer, rather draw a clear rect on the specific part of the buffer/texture. And keep drawing the whole texture to the main buffer.
Related
I'm trying to make a 2d map (for a game, think tiled world map) in OpenGL ES 2.0 for an android game. Basically, there are a few tile types that have different textures, and the map is randomly generated from these types, so from game-to-game the map changes but for the duration of a single game it stays the same.
My first thought was to generate a single large texture / image / bitmap (independent from OpenGL) beforehand basically stitching duplicate tile textures together to make the larger map, and then using this single texture for one large map rectangle. In theory I think this is simple and would work fine, but I'm worried that it won't scale well for larger maps and especially on mobile I'll run out of memory with such a large image map. Plus, there's a small set of tiles that are duplicated over and over so it seems like a tremendous waste to duplicate the pixel data in a big texture over and over.
My second thought was having many textures, one for each of the tile textures. But I'm not sure how this would work, texture-binding-wise, would I need the shaders to contain multiple texture references and within the shader have logic for using the right one?
Finally, I thought using a texture atlas could work, have one texture / image with all of the tile data in it, this would be relatively small. But I'm struggling to imagine how to get the maths to work out such that "tiles" or subsections of the map rectangle would use completely different texture coordinates.
Am I approaching this the wrong way? Should I be using a rectangle for each tile? At least this way I can pass the shaders both vertex and texture coordinates for each tile independently. This seems easier, but also seems wrong since the map really is just one rectangle that won't be changing.
My first thought was to generate a single large texture...
Actualy, something like this has already been used in id Software's id Tech since version 4. It's called MegaTexture. Basicaly, it's a big texture, which could also hold additional data.
My second thought was having many textures...
You don't need to hold all the textures in a shader. Do it like this:
Implement a loop with n iterations, where n is how much different types of textures are used.
Inside a loop, bind the current texture type.
Pass any data, like position/color/texture coords to shaders.
Draw all tiles that use the bounded texture. You could use GLES30.glDrawElementsInstanced or GLES30.glDrawArraysInstanced if you are targeting devices with GLES 3.x or an appropriate extension support. Otherwise, draw your tiles using GLES20.glDrawArrays or GLES20.glDrawElements.
Shaders won't be complicated with this approach.
Finally, I thought using a texture atlas could work...
You could use loop here too and compute the texture coordinates for each tile type on CPU, then just pass them to shaders.
Considering your map is not changing through a game session, MegaTexture approach looks good. However, it depends on how large your map is and how much memory is available. Also, note that max texture size is limited. Max size differs from device to device but should be (AFAIK) equal or greater than screen size and at least 64 texels(16 for cube-mapped textures). You can get the maximum texture size on any device using glGet(GL_MAX_TEXTURE_SIZE ).
Take an application containing a glSurfaceView which loads several separate images on start (GLUtils.texImage2D(GL10.GL_TEXTURE_2D, 0, bitmap, 0); for each image).
This application will then have to call gl.glBindTexture(GL10.GL_TEXTURE_2D, texture pointer int) before it can proceed to draw a different texture to a "texturable square" gl object.
Is it recommended to instead load one bitmap with all the textures (ie: a sprite sheet), and then create an array of "texturable squares" that each map a different area of the giant single image, like that gl.glBindTexture(...) only needs be called once...?
Or perhaps is there no significant difference between the two techniques?
As far as I know, once a texture has been loaded via texImage2D, binding a texture is simply a matter of pointing the native OpenGL library to the correct preloaded texture, so performance wise, it shouldn't be costly.
However, you raise a good option which you should probably consider regardless of performance issues.
Normally, the textures you need don't have to be in power of two dimensions, but are set to that size anyway because of the requirements of OpenGL. This often results in a very wasteful memory allocation. Utilising "sprite sheets" as you put it can help save the time & memory of loading multiple bitmaps into textures which are usually larger than the parts you'll be rendering anyway.
For this reason, I would recommend using sprite sheets anyway, simply because it saves calls to texImage2D (which are quite costly), and potentially saves memory as well. If you properly manage the texture coordinates when switching between objects you want to render, this is the method I would recommend.
I created a voxel world using OpenGL ES 2.0 using a VBO to store a basic cube and using a different position matrix for each cube. I am able to get 30fps on my Galaxy S3 when there are 500-600 cubes being rendered, but anything more than 1500 cubes isn't able to run at a faster rate than 8 fps. This is unacceptable because the voxel world should be able to handle more than 5,000 voxels being rendered at a stable 30fps. I have played other mobile games on my phone that run at good framerates and render much more than 5000 blocks at a time. What kind of techniques would be best for getting good performance?
Here is what I have set up in more detail:
There is one VBO containing vertex information for a basic cube.
Each block has its own matrix that is translated to the block's position in world space (This matrix is calculated only once when the block is created). The block calls glDrawArrays to draw the cube using its position matrix. Unfortunately this means there are thousands of calls to glDrawArrays in each frame.
Is there a better technique to this? I don't know how to group all the blocks into one single call to glDrawArrays because that would mean the VBO would need a huge allocation, to add all the vertex data for every single cube, and it is impossible to know how much space the VBO would need before drawing them. What I was thinking was to allocate a VBO for every 500 or so blocks so that if it needs more space for blocks it can always create a new VBO for it. And this way it wouldn't be allocating too much extra space since it will only allocate enough space for 500 blocks, and this way if we have 5000 blocks in the world, there will be only 10 calls to glDrawArrays instead of having thousands of those calls.
Another idea I have is that instead of having a VBO for the cube, I could make a VBO for a quad, and use a transformation matrix on each quad. This would require even more calls to glDrawArrays since I would have to call it for each face of the cube, but the plus side is that this way I can remove the faces that already have a block next to them. For the floor level, each block has 4 blocks surrounding it, so those 4 faces don't actually need to be drawn. This would save drawing those 4 quads for each block, but it would require more than double the amount of glDrawArrays calls. To reduce the amount of glDrawArrays calls I could create a new VBO for every 500 or so quads, and add/remove quads to the current VBOs whenever necessary. This would reduce the amount of glDrawArrays calls, but it would mean that I have to group each quad based on its texture, which is another issue because if I have to create a VBO for each texture, that would require me to allocate a lot of extra unnecessary space because there might be just one block that uses a certain texture and I may end up allocating space for 500 blocks for that texture.
These are my thoughts on some of the methods I can think of to optimise the rendering, but I don't think any of these techniques will drastically improve the fps of the game, because every method comes with its own issues. Is there anything that I have not thought of that could be a better solution?
EDIT: I switched to rendering quads instead of cubes because this way I can skip over the faces that are not visible. After that I also added frustum culling so that only blocks visible inside the frustum are shown. This increased the performance so that I can render a decent sized world at 30 fps now. But I think there is still a lot of room for improvement, because there are currently 23,000 calls to glDrawArrays(GL_TRIANGLES) (one for each quad rendered on screen). Would switching to using glDrawArrays(GL_TRIANGLE_STRIPS) make any real difference? And also creating VBO's that hold 1,000 quads each instead of just 1 quad is a possibility, but that would mean I would have to allocate a lot more space in the VBO's. (Right now there is only one quad stored in the VBO which is transformed by a matrix to its position/rotation).
if using Octtrees (wich is definitely THE WAY) does not suit you, you can optimize the code for calling the vbo lists.
In my work, I started with a scene rendering at 3fps rate, just optimizing the opengl calls and context switches, now runs on 53fps (wich is quite fine considering the starting point).
So, try not to change any register inside the gpu between calls:
order all the objects with the same shader to render them all at the time using only one glUseProgram
order objects with transparency, so you only draw translucent objects at the end.
draw objects in such a fashion that fragments are drawn only once (if a object is behind another, draw the front object first, cause depth test is faster than fragment calculation).
use shaders without "discard;" wich is costly for the cpu to process.
use reversed loops to get a little bit of cpu speed
dont select the texture if it is already the same than selected in the GPU (a cpu 'if' is less costly than a GPU register change).
try not to update the shader attributes if there is no need to (cpu if is less costly).
if you post some pieces of code I can help you better.
I am currently implementing a voxel world using java on a normal PC with OpenGL 4.x.
At the beginning I had the same issue but that I followed a very basic tutorial: https://sites.google.com/site/letsmakeavoxelengine/
With one render call per chunk there is no problem having 10 Chunks of 32*32*32 Blocks rendered (FPS > 30). You should load the Chunk and only add those faces which are not occluded by other faces (so that they are visible to the player) to an array which will be uploaded to a VBO. Therefore you have one rendercall per Chunk with the minimum amout of faces
In 2D is looks like this
_ _ _
|B B B|
|B B |
|B B B|
- - -
There is no need to draw the faces between the outter faces. In addition you can use frustrum culling: How to check if an object lies outside the clipping volume in OpenGL?
So you just need to make a render call for those chunks which are actually inside your frustrum. Do not render chunks behind the camera. OpenGL will make a lot of calculations for all vertices of the chunk, but then the chunk is not visible so why render it in the first place. This can happen in your java code.
A third optimazation could be deferred shading: http://en.wikipedia.org/wiki/Deferred_shading
As far as I know the shading is processed before depth testing and throwing away those triangels/ faces occluded by others, you can speed up your shader using deferred shading as you only shade those vertices which will pass the depth-testing.
There are a lot of more ways to optimize voxel rendering but for me this are the most basic operations. The given tutorial behind the first link isn't finished yet, but he shows a lot of ideas for optimizing voxel rendering.
Edit:
If you want to use textures, which different textures for each cube, I recommend to place all textures in a big one, so you do not need to swap textures, a simple texture lookup is much more faster than swapping a texture (glBindTexture(..)) and then make a lookup and later swap back to this texture. Use one big huge texture and apply the right UV coordinates to your vertices.
You should use BSP Octrees to discard big blocks of offscreen cubes.
You divide the world into 8 "space cubes" wich go in the different axis.
Then, you check if the camera can see something inside the cube, if it can't you discard all the blocks in that section (wich can speed up to 8x). Then, inside the block, you divide again in 8 sections, and check again if they are visible. An so on, speeding checks and renders.
http://en.wikipedia.org/wiki/Octree
http://i.ytimg.com/vi/S-oIeUiw2UY/hqdefault.jpg
Octree can be accelerated using "portals" (and I dont mean GladOs ;) ) wich discard voxels and Octrees depending on the visibility inside doors and windows, but is only good for interiors.
Although I'm technically working in the android platform with OpenGL 2.0 ES, I believe this can be applied to more OpenGL technologies.
I have a list of objects (enemies, characters, etc) that I'm attempting to draw onto a grid, each space being 1x1, and each object matching. Presently, each object is self translating... that is, it's taking its model coordinates and going through a simple loop to adjust them to be located in the world coordinates in its appropriate grid location. (i.e. if it should be at (3,2) it will translate it's coordinates accordingly.
The problem I've reached is I'm not sure how to effeciently draw them. I have a loop going through all the objects and calling draw for each object, similar to the android tutorial, but this seems wildly ineffecient.
The objects are each textured with their own square images, matching the 1x1 grid they fill. They likely will never need their own unique shaders, so the only thing that seems to change between objects is the verticies and the shaders.
Is there an effecient way to get each model into the pipeline without flushing because of uniform changes?
This probably requires some try and error procedure an probably is hardware dependent. I would use buffer objects for the meshes with GL_STATIC_DRAW, pack some textures in a bigger one and draw all objects depending on that bigger texture in batch to avoid states changes as much as possible. Profile and get us more information on where is your bottleneck.
We are to develop a scrolling/zooming scene in OpenGL ES on Android, very much like a level in Angry Birds but more like a level in World Of Goo. More like the latter as the world will not consist of repeated layers as featured in Angry Birds but of a large image. As the scene needs to scroll/zoom and therefore a lot of it will not be visible, I was wondering about the most efficient way to implement the rendering, focusing on the environment only (ie not the objects within the world but background layers).
We will be using an orthographic projection.
The first that comes to mind is creating a large 4 vertices rectangle at world size, which has the background texture mapped to it, and translate/scale this using glTranslatef / glScalef. However, I was wondering if the non visible area outside of the screens boundaries is still being rendered by OpenGL as it is not being culled (you would lose the visible area as well as there are only 4 vertices). Therefore, would it be more efficient to subdivide this rectangle, so non visible smaller rectangles can be culled?
Another option would be creating a 4 vertice rectangle that would fill the screen, then move the background by adjusting its texture coordinates. However, I guess we would run into problems when building bigger worlds, considering the texture size limit. It seems like a nice implementation for repeated backgrounds like AngryBirds has.
Maybe there is another way..?
If someone has an idea on how it might be done in AngryBirds / World of Goo, please share as I'd love to hear. They seem to have implemented a system that allows for the world to be moved and zoomed very (WorldOfGoo = VERY) smoothly.
This is probably your best bet for implementation.
In my experience, keeping a large texture in memory is very expensive on Android. I would get quite a few OutOfMemoryError exceptions for the background texture before I moved to tiling.
I think the biggest rendering bottleneck would be with memory transfer speeds and fill rate instead of any graphics computation.
Edit: Check out 53:28 of this presentation from Google I/O 2009.
You could split the background rectangle into smaller rectangles, so that OpenGL only renders the visible rectangles. You won't have a big ass rectangle with a big ass texture loaded but smallers rectangles with smaller textures that you could load/unload, depending on what is visible on screen...
Afaik there would be no performance drop due to large areas being rendered off-screen, subdividing and culling is normally done just to reduce vertex count, but you would actually be adding to it here.
Putting that aside for now; from the way you phrased the question I am unsure whether you have a large background texture or a small repeating one. If it is large, then you will need to subdivide because of texture size limitations anyway, so the question is moot! If it is small, then I would suggest the second method, fit a quad to the screen and move the background by changing the texture coordinates.
I feel like I may have missed something, though, as I am unsure why you mentioned the texture size limitation issue when talking about the the texture coordinate method and not the large quad method. Surely for both this is not a problem for repeating textures as you can use GL_REPEAT texture wrap mode...
But for both it is a problem for a single large texture unless you subdivide, which would make the texture coordinate tactic way more complicated than necessary. In this case subdividing the mesh along texture subdivisions would be best, and culling off-screen sections. Deciding which parts to cull should be trivial with this technique.
Cheers.