Although I'm technically working in the android platform with OpenGL 2.0 ES, I believe this can be applied to more OpenGL technologies.
I have a list of objects (enemies, characters, etc) that I'm attempting to draw onto a grid, each space being 1x1, and each object matching. Presently, each object is self translating... that is, it's taking its model coordinates and going through a simple loop to adjust them to be located in the world coordinates in its appropriate grid location. (i.e. if it should be at (3,2) it will translate it's coordinates accordingly.
The problem I've reached is I'm not sure how to effeciently draw them. I have a loop going through all the objects and calling draw for each object, similar to the android tutorial, but this seems wildly ineffecient.
The objects are each textured with their own square images, matching the 1x1 grid they fill. They likely will never need their own unique shaders, so the only thing that seems to change between objects is the verticies and the shaders.
Is there an effecient way to get each model into the pipeline without flushing because of uniform changes?
This probably requires some try and error procedure an probably is hardware dependent. I would use buffer objects for the meshes with GL_STATIC_DRAW, pack some textures in a bigger one and draw all objects depending on that bigger texture in batch to avoid states changes as much as possible. Profile and get us more information on where is your bottleneck.
Related
I'm trying to make a 2d map (for a game, think tiled world map) in OpenGL ES 2.0 for an android game. Basically, there are a few tile types that have different textures, and the map is randomly generated from these types, so from game-to-game the map changes but for the duration of a single game it stays the same.
My first thought was to generate a single large texture / image / bitmap (independent from OpenGL) beforehand basically stitching duplicate tile textures together to make the larger map, and then using this single texture for one large map rectangle. In theory I think this is simple and would work fine, but I'm worried that it won't scale well for larger maps and especially on mobile I'll run out of memory with such a large image map. Plus, there's a small set of tiles that are duplicated over and over so it seems like a tremendous waste to duplicate the pixel data in a big texture over and over.
My second thought was having many textures, one for each of the tile textures. But I'm not sure how this would work, texture-binding-wise, would I need the shaders to contain multiple texture references and within the shader have logic for using the right one?
Finally, I thought using a texture atlas could work, have one texture / image with all of the tile data in it, this would be relatively small. But I'm struggling to imagine how to get the maths to work out such that "tiles" or subsections of the map rectangle would use completely different texture coordinates.
Am I approaching this the wrong way? Should I be using a rectangle for each tile? At least this way I can pass the shaders both vertex and texture coordinates for each tile independently. This seems easier, but also seems wrong since the map really is just one rectangle that won't be changing.
My first thought was to generate a single large texture...
Actualy, something like this has already been used in id Software's id Tech since version 4. It's called MegaTexture. Basicaly, it's a big texture, which could also hold additional data.
My second thought was having many textures...
You don't need to hold all the textures in a shader. Do it like this:
Implement a loop with n iterations, where n is how much different types of textures are used.
Inside a loop, bind the current texture type.
Pass any data, like position/color/texture coords to shaders.
Draw all tiles that use the bounded texture. You could use GLES30.glDrawElementsInstanced or GLES30.glDrawArraysInstanced if you are targeting devices with GLES 3.x or an appropriate extension support. Otherwise, draw your tiles using GLES20.glDrawArrays or GLES20.glDrawElements.
Shaders won't be complicated with this approach.
Finally, I thought using a texture atlas could work...
You could use loop here too and compute the texture coordinates for each tile type on CPU, then just pass them to shaders.
Considering your map is not changing through a game session, MegaTexture approach looks good. However, it depends on how large your map is and how much memory is available. Also, note that max texture size is limited. Max size differs from device to device but should be (AFAIK) equal or greater than screen size and at least 64 texels(16 for cube-mapped textures). You can get the maximum texture size on any device using glGet(GL_MAX_TEXTURE_SIZE ).
I have imported a model (e.g. a teapot) using Rajawali into my scene.
What I would like is to label parts of the model (e.g. the lid, body, foot, handle and the spout)
using plain Android views, but I have no idea how this could be achieved. Specifically, positioning
the labels on the right place seems challenging. The idea is that when I transform my model's position in the scene, the tips of the labels are still correctly positioned
Rajawali tutorial show how Android views can be placed on top of the scene here https://github.com/Rajawali/Rajawali/wiki/Tutorial-08-Adding-User-Interface-Elements
. I also understand how using the transformation matrices a 3D coordinate on the model can be
transformed into a 2D coordinate on the screen, but I have no idea how to determine the exact 3D coordinates
on the model itself. The model is exported to OBJ format using Blender, so I assume there is some clever way of determining
the coordinates in Blender and exporting them to a separate file or include them somehow in the OBJ file (but not
render those points, only include them as metadata), but I have no idea how I could do that.
Any ideas are very appreciated! :)
I would use a screenquad, not a view. This is a general GL solution, and will also work with iOS.
You must determine the indices of the desired model vertices. Using the text rendering algo below, you can just fiddle them until you hit the right ones.
Create a reasonable ARGB bitmap with same aspect ratio as the screen.
Create the screenquad texture using this bitmap
Create a canvas using this bitmap
The rest happens in onDrawFrame(). Clear the canvas using clear paint.
Use the MVP matrix to convert desired model vertices to canvas coordinates.
Draw your desired text at the canvas coordinates
Update the texture.
Your text will render very precisely at the vertices you specfied. The GL thread will double-buffer and loop you back to #4. Super smooth 3D text animation!
Use double floating point math to avoid loss of precision during coordinate conversion, which results in wobbly text. You could even use the z value of the vertex to scale the text. Fancy!
The performance bottleneck is #7 since the entire bitmap must be copied to GL texture memory, every frame. Try to keep the bitmap as small as possible, maintaining aspect ratio. Maybe let the user toggle the labels.
Note that the copy to GL texture memory is redundant since in OpenGL-ES, GL memory is just regular memory. For compatibility reasons, a redundant chunk of regular memory is reserved to artificially enforce the copy.
I am working on an Android project a bit like Minecraft. I am finding this a great way to learn about OpenGL Performance.
I have moved over to a vertex buffer object which has given me huge performance gains but now I am seeing the down sides.
I am right in thinking I need a vertex buffer object per:
Different mesh
Different texture
Different colour
Am I also right in thinking that every time the player adds a cube I need to add that on to the end of the VBO and every time the user removes a cube I need to regenerate the VBO?
I can't see how you could map a object with properties to its place in the VBO.
Does anyone know if Minecraft type games use VBO's
Yeah if you malloc() a memory space then you need to create new VBO-s. If you want to expand it because you need more memory. If you want to show less then I guess you could play with IBO-s but again you have to rearrange the VBO at some point.
I'm not really sure what you mean by object properties but if you want them to be shown then I think you'll need different VBO-s for each kind of property/cube-type / shader pairs. And draw them in groups.
If you want to store other kind of properties then you shouldn't store it in VBO that you pass to OpenGL.
I have no idea what Minecraft uses but my best advice is that you store the not likely to reach cubes in VBO-s and the the likely to use cubes in easy to modify container. (I don't know if it would help or not)
I created a voxel world using OpenGL ES 2.0 using a VBO to store a basic cube and using a different position matrix for each cube. I am able to get 30fps on my Galaxy S3 when there are 500-600 cubes being rendered, but anything more than 1500 cubes isn't able to run at a faster rate than 8 fps. This is unacceptable because the voxel world should be able to handle more than 5,000 voxels being rendered at a stable 30fps. I have played other mobile games on my phone that run at good framerates and render much more than 5000 blocks at a time. What kind of techniques would be best for getting good performance?
Here is what I have set up in more detail:
There is one VBO containing vertex information for a basic cube.
Each block has its own matrix that is translated to the block's position in world space (This matrix is calculated only once when the block is created). The block calls glDrawArrays to draw the cube using its position matrix. Unfortunately this means there are thousands of calls to glDrawArrays in each frame.
Is there a better technique to this? I don't know how to group all the blocks into one single call to glDrawArrays because that would mean the VBO would need a huge allocation, to add all the vertex data for every single cube, and it is impossible to know how much space the VBO would need before drawing them. What I was thinking was to allocate a VBO for every 500 or so blocks so that if it needs more space for blocks it can always create a new VBO for it. And this way it wouldn't be allocating too much extra space since it will only allocate enough space for 500 blocks, and this way if we have 5000 blocks in the world, there will be only 10 calls to glDrawArrays instead of having thousands of those calls.
Another idea I have is that instead of having a VBO for the cube, I could make a VBO for a quad, and use a transformation matrix on each quad. This would require even more calls to glDrawArrays since I would have to call it for each face of the cube, but the plus side is that this way I can remove the faces that already have a block next to them. For the floor level, each block has 4 blocks surrounding it, so those 4 faces don't actually need to be drawn. This would save drawing those 4 quads for each block, but it would require more than double the amount of glDrawArrays calls. To reduce the amount of glDrawArrays calls I could create a new VBO for every 500 or so quads, and add/remove quads to the current VBOs whenever necessary. This would reduce the amount of glDrawArrays calls, but it would mean that I have to group each quad based on its texture, which is another issue because if I have to create a VBO for each texture, that would require me to allocate a lot of extra unnecessary space because there might be just one block that uses a certain texture and I may end up allocating space for 500 blocks for that texture.
These are my thoughts on some of the methods I can think of to optimise the rendering, but I don't think any of these techniques will drastically improve the fps of the game, because every method comes with its own issues. Is there anything that I have not thought of that could be a better solution?
EDIT: I switched to rendering quads instead of cubes because this way I can skip over the faces that are not visible. After that I also added frustum culling so that only blocks visible inside the frustum are shown. This increased the performance so that I can render a decent sized world at 30 fps now. But I think there is still a lot of room for improvement, because there are currently 23,000 calls to glDrawArrays(GL_TRIANGLES) (one for each quad rendered on screen). Would switching to using glDrawArrays(GL_TRIANGLE_STRIPS) make any real difference? And also creating VBO's that hold 1,000 quads each instead of just 1 quad is a possibility, but that would mean I would have to allocate a lot more space in the VBO's. (Right now there is only one quad stored in the VBO which is transformed by a matrix to its position/rotation).
if using Octtrees (wich is definitely THE WAY) does not suit you, you can optimize the code for calling the vbo lists.
In my work, I started with a scene rendering at 3fps rate, just optimizing the opengl calls and context switches, now runs on 53fps (wich is quite fine considering the starting point).
So, try not to change any register inside the gpu between calls:
order all the objects with the same shader to render them all at the time using only one glUseProgram
order objects with transparency, so you only draw translucent objects at the end.
draw objects in such a fashion that fragments are drawn only once (if a object is behind another, draw the front object first, cause depth test is faster than fragment calculation).
use shaders without "discard;" wich is costly for the cpu to process.
use reversed loops to get a little bit of cpu speed
dont select the texture if it is already the same than selected in the GPU (a cpu 'if' is less costly than a GPU register change).
try not to update the shader attributes if there is no need to (cpu if is less costly).
if you post some pieces of code I can help you better.
I am currently implementing a voxel world using java on a normal PC with OpenGL 4.x.
At the beginning I had the same issue but that I followed a very basic tutorial: https://sites.google.com/site/letsmakeavoxelengine/
With one render call per chunk there is no problem having 10 Chunks of 32*32*32 Blocks rendered (FPS > 30). You should load the Chunk and only add those faces which are not occluded by other faces (so that they are visible to the player) to an array which will be uploaded to a VBO. Therefore you have one rendercall per Chunk with the minimum amout of faces
In 2D is looks like this
_ _ _
|B B B|
|B B |
|B B B|
- - -
There is no need to draw the faces between the outter faces. In addition you can use frustrum culling: How to check if an object lies outside the clipping volume in OpenGL?
So you just need to make a render call for those chunks which are actually inside your frustrum. Do not render chunks behind the camera. OpenGL will make a lot of calculations for all vertices of the chunk, but then the chunk is not visible so why render it in the first place. This can happen in your java code.
A third optimazation could be deferred shading: http://en.wikipedia.org/wiki/Deferred_shading
As far as I know the shading is processed before depth testing and throwing away those triangels/ faces occluded by others, you can speed up your shader using deferred shading as you only shade those vertices which will pass the depth-testing.
There are a lot of more ways to optimize voxel rendering but for me this are the most basic operations. The given tutorial behind the first link isn't finished yet, but he shows a lot of ideas for optimizing voxel rendering.
Edit:
If you want to use textures, which different textures for each cube, I recommend to place all textures in a big one, so you do not need to swap textures, a simple texture lookup is much more faster than swapping a texture (glBindTexture(..)) and then make a lookup and later swap back to this texture. Use one big huge texture and apply the right UV coordinates to your vertices.
You should use BSP Octrees to discard big blocks of offscreen cubes.
You divide the world into 8 "space cubes" wich go in the different axis.
Then, you check if the camera can see something inside the cube, if it can't you discard all the blocks in that section (wich can speed up to 8x). Then, inside the block, you divide again in 8 sections, and check again if they are visible. An so on, speeding checks and renders.
http://en.wikipedia.org/wiki/Octree
http://i.ytimg.com/vi/S-oIeUiw2UY/hqdefault.jpg
Octree can be accelerated using "portals" (and I dont mean GladOs ;) ) wich discard voxels and Octrees depending on the visibility inside doors and windows, but is only good for interiors.
There are many examples for OpenGL ES 2 in how to visualize a single triangle or rectangle.
Google provides an example for drawing shapes (triangles, rectangles) by creating a Triangle and Rectangle class which basically do all the opengl-stuff required for visualize these objects.
But what should you do, if you have more than one triangle? What if you have objects, consists of hundreds of triangles of different colors, different sizes and positions? I can't find any good tutorial for dealing with complex scenarios in opengl es.
My approaches:
So I tried it out. First of all I've slightely changed the Triangle-Class to a more dynamic class (the constructor now gets the position and the color of the triangle). Basically this is "enough" for drawing complexe scenes. Every object would consist out of hundreds of these Triangle-classes and I render each of them seperately. But this consumes much computing power and I think most of the steps in the rendering process are redundant.
So I tried to "group" triangles into different categories. Now every object has his only vertexbuffer and puts every triangle at once in it. Now the performance is far better than before (where every triangle had his own buffer) but I still think, that it's not the correct way to go.
Is there any good example in the internet, where someone is drawing more than simple triangles or do you know where I can get these information from? I really like OpenGL but it's pretty hard for beginners because of the lack of tutorials (for OpenGL ES 2 in Android).
The standard representation of (triangle) meshes for rendering is using a vertex array containing all the vertices in the mesh, and an index array connecting storing the connectivity (triangles). You definitively want at most one draw call per object (but you might even be able to coalesce several objects).
Interleaved attribute arrays are the most efficient variant wrt. cache efficiency, so one Buffer object for the VA per object is enough. You might even combine several objects into one buffer object, even if you can not use a single draw call for both.
As GLES might be limited to 16 Bit indices, large models must be splitted into several "patches".