I've been programming for about a couple years now, and I know the syntax of different languages pretty well, but I've spent most of my time making programs in basic Lua. I have played around with game developing with Java and C#, with DirectX and OpenGl, but most times I don't even know terms that others may know really well such as 'shaders' and 'VBOs'. I've heard of them, but I really don't know what they mean, so please explain to me like I'm five if you have a answer.
I'm trying to make a simple Android game using Open GL ES 2.0, and, I've seen many tutorials explaining how to draw one triangle, and not many. For my case, I have a world full of triangles, its simply an array of true and false's, where if there is a true a triangle will be drawn in that position, if false the triangle won't be drawn. I've made it so it will only draw triangle's until they are off screen, but I realized one problem still, and that is that rendering is still super slow, takes about 2 seconds to render one frame.
At the moment, I currently 1 triangle class, and when it checks if a triangle is supposed to belong where it is, it will create vertex points for that triangle and pass them into the triangle class, then creating a new triangle object with those updated vertices. Now I can see how wrong this can go, but it's honestly the only thing I can come up with, with my very-limited knowledge on OpenGL ES
What I am looking for is to draw all of these triangles, in their correct position, with the same color and size, in a very efficient way, so that it wouldn't take about 2 seconds to draw one frame.
If anyone has a solution, Thank you.
What you are looking for is for instanced draw calls that basically means that you are going to send one VBO (Vertex Buffer Object - this holds all the information about the vertex of a mesh) and an array of MVPM(Model View Projection Matrices - These hold all the transformations information of every draw you want to do of that VBO) to the shader, all this in one single draw call.
unfortunately for us (i'm using right now OpenGL Es 2.0 as well) there is no way provided by opengl to do this.
Any way, there is this article about fake that but i think is too much trouble.
In other words ir are pretty much trapped with a draw call per mesh.
Related
I'm trying to implement deferred rendering on an Android phone using OPENGL ES 3.0. I've gotten this to work ok but only very slowly, which rather defeats the whole point. What really slows things up is the multiple calls to the shaders. Here, briefly, is what my code does:
Geometry Pass:
Render scene - output position, normal and colour to off-screen buffers.
For each light:
a) Stencil Pass:
Render a sphere at the current light position, sized according to the lights intensity. Mark these pixels as influenced by current light. No actual output.
b) Light Pass:
Render a sphere again, this time using the data from the geometry pass to apply lighting equations to pixels marked in the previous step. Add this to off-screen buffer
Blit to screen
It's this restarting the shaders for each light causing the bottleneck. For example, with 25 lights the above steps run at about 5 fps. If instead I do: Geometry Pass / Stencil Pass - draw 25 lights / Light Pass - draw 25 lights it runs at around 30 fps. So, does anybody know how I can avoid having to re-initialize the shaders? Or, in fact, just explain what's taking up the time? Would it help or even be possible (and I'm sorry if this sounds daft) to keep the shader 'open' and overwrite the previous data rather than doing whatever it is that takes so much time restarting the shader? Or should I give this up as a method for having multiple lights, on a mobile devise anyway.
Well, I solved the problem of having to swap shaders for each light by using an integer texture as a stencil map, where a certain bit is set to represent each light. (So, limited to 32 lights.) This means step 2a (above) can be looped, then a single change of shader, and looping step 2b. However, (ahahaha!) it turns out that this didn't really speed things up as it's not, after all, swapping shaders that's the problem but changing write destination. That is, multiple calls to glDrawBuffers. As I had two such calls in the stencil creation loop - one to draw nowhere when drawing a sphere to calculate which pixels are influenced and one to draw to the integer texture used as the stencil map. I finally realized that as I use blending (each write with a colour where a singe bit is on) it doesn't matter if I write at the pixel calculation stage, so long as it's with all zeros. Getting rid of the unnecessary calls to glDrawBuffers takes the FPS from single figures to the high twenties.
In summary, this method of deferred rendering is certainly faster than forward rendering but limited to 32 lights.
I'd like to say that me code was written just to see if this was a viable method and many small optimizations could be made. Incidentally, as I was limited to 4 draw buffers, I had to scratch the position map and instead recover this from gl_FragCoord.xyz. I don't have proper benchmarking tools so I'd be interested to hear from anyone who can tell me what difference this makes, speedwise.
I'm a newbie in the OpenGL ES world, and learning some basics on 3d graphics on Android OpenGL ES. I'm wondering how to create a image plane that emitting light? This is easy to be implemented in 3d model software like Blender (using the Cycles Render), see the image below for effects I'm looking for. Through some research, I learnt that they may be related to Blur or Bloom effect using shader. But I'm not very sure, and I don't know how to implement them.
As per Paul-Jan's comment, what you want is far from basic in OpenGL.
The default approach for OpenGL is forward rendering. i.e. every time you specify a piece of geometry the calculation goes forwards from triangle to pixels, a function is applied to determine the colour for each of those pixels and they're forwarded to the frame buffer. So the starting position is that each individual pixel has no concept of the world around it. Each exists in isolation.
In your scene, the floor below the box has no idea it should be blue because it has no idea that there is a box above it.
Programs like Blender use a different approach, which in this context could accurate be called backwards rendering. It starts from each pixel and asks what geometry lies behind it. In doing that it explicitly has an idea of all the geometry in the scene. So when it spots that the floor is behind a certain position it can then continue and ask "and which light sources can the floor see?" to establish lighting.
The default OpenGL approach is long established for real-time rendering. If you look at old video games you'll notice evidence of it all over the place: objects often don't cast shadows on each other (or such shadows are very rough approximations), there's only one source of light which is infinitely far away (i.e. it's in a fixed position as far as geometry is concerned; no need to know about the scene really).
So solutions are to invest the geometry with some knowledge of the whole scene. A common approach is to perform internal renderings of the scene from the point of view of the light source. That generates a depth buffer. By handing the light position and depth buffer off to every piece of geometry in the scene they can calculate whether they're visible to the light source. If so then they're illuminated by it. If not then they're not.
Another option is deferred rendering; you do a standard pass of your scene, populating at each pixel the depth, the surface colour, the surface normal, etc. So you get the full scene information broken down into pixel-by-pixel storage from the point of view of the camera. You then pretend that everything the camera can see is everything that there is. So you just need to pass that buffer around for pixels to be able to work out, approximately, which light sources they can and can't see. You can also have different parts of the screen only consider which lights they're close enough to by a broad-phase 2d distance check, which saves time.
In either case we're actually talking about relatively advanced OpenGL stuff.
I created a voxel world using OpenGL ES 2.0 using a VBO to store a basic cube and using a different position matrix for each cube. I am able to get 30fps on my Galaxy S3 when there are 500-600 cubes being rendered, but anything more than 1500 cubes isn't able to run at a faster rate than 8 fps. This is unacceptable because the voxel world should be able to handle more than 5,000 voxels being rendered at a stable 30fps. I have played other mobile games on my phone that run at good framerates and render much more than 5000 blocks at a time. What kind of techniques would be best for getting good performance?
Here is what I have set up in more detail:
There is one VBO containing vertex information for a basic cube.
Each block has its own matrix that is translated to the block's position in world space (This matrix is calculated only once when the block is created). The block calls glDrawArrays to draw the cube using its position matrix. Unfortunately this means there are thousands of calls to glDrawArrays in each frame.
Is there a better technique to this? I don't know how to group all the blocks into one single call to glDrawArrays because that would mean the VBO would need a huge allocation, to add all the vertex data for every single cube, and it is impossible to know how much space the VBO would need before drawing them. What I was thinking was to allocate a VBO for every 500 or so blocks so that if it needs more space for blocks it can always create a new VBO for it. And this way it wouldn't be allocating too much extra space since it will only allocate enough space for 500 blocks, and this way if we have 5000 blocks in the world, there will be only 10 calls to glDrawArrays instead of having thousands of those calls.
Another idea I have is that instead of having a VBO for the cube, I could make a VBO for a quad, and use a transformation matrix on each quad. This would require even more calls to glDrawArrays since I would have to call it for each face of the cube, but the plus side is that this way I can remove the faces that already have a block next to them. For the floor level, each block has 4 blocks surrounding it, so those 4 faces don't actually need to be drawn. This would save drawing those 4 quads for each block, but it would require more than double the amount of glDrawArrays calls. To reduce the amount of glDrawArrays calls I could create a new VBO for every 500 or so quads, and add/remove quads to the current VBOs whenever necessary. This would reduce the amount of glDrawArrays calls, but it would mean that I have to group each quad based on its texture, which is another issue because if I have to create a VBO for each texture, that would require me to allocate a lot of extra unnecessary space because there might be just one block that uses a certain texture and I may end up allocating space for 500 blocks for that texture.
These are my thoughts on some of the methods I can think of to optimise the rendering, but I don't think any of these techniques will drastically improve the fps of the game, because every method comes with its own issues. Is there anything that I have not thought of that could be a better solution?
EDIT: I switched to rendering quads instead of cubes because this way I can skip over the faces that are not visible. After that I also added frustum culling so that only blocks visible inside the frustum are shown. This increased the performance so that I can render a decent sized world at 30 fps now. But I think there is still a lot of room for improvement, because there are currently 23,000 calls to glDrawArrays(GL_TRIANGLES) (one for each quad rendered on screen). Would switching to using glDrawArrays(GL_TRIANGLE_STRIPS) make any real difference? And also creating VBO's that hold 1,000 quads each instead of just 1 quad is a possibility, but that would mean I would have to allocate a lot more space in the VBO's. (Right now there is only one quad stored in the VBO which is transformed by a matrix to its position/rotation).
if using Octtrees (wich is definitely THE WAY) does not suit you, you can optimize the code for calling the vbo lists.
In my work, I started with a scene rendering at 3fps rate, just optimizing the opengl calls and context switches, now runs on 53fps (wich is quite fine considering the starting point).
So, try not to change any register inside the gpu between calls:
order all the objects with the same shader to render them all at the time using only one glUseProgram
order objects with transparency, so you only draw translucent objects at the end.
draw objects in such a fashion that fragments are drawn only once (if a object is behind another, draw the front object first, cause depth test is faster than fragment calculation).
use shaders without "discard;" wich is costly for the cpu to process.
use reversed loops to get a little bit of cpu speed
dont select the texture if it is already the same than selected in the GPU (a cpu 'if' is less costly than a GPU register change).
try not to update the shader attributes if there is no need to (cpu if is less costly).
if you post some pieces of code I can help you better.
I am currently implementing a voxel world using java on a normal PC with OpenGL 4.x.
At the beginning I had the same issue but that I followed a very basic tutorial: https://sites.google.com/site/letsmakeavoxelengine/
With one render call per chunk there is no problem having 10 Chunks of 32*32*32 Blocks rendered (FPS > 30). You should load the Chunk and only add those faces which are not occluded by other faces (so that they are visible to the player) to an array which will be uploaded to a VBO. Therefore you have one rendercall per Chunk with the minimum amout of faces
In 2D is looks like this
_ _ _
|B B B|
|B B |
|B B B|
- - -
There is no need to draw the faces between the outter faces. In addition you can use frustrum culling: How to check if an object lies outside the clipping volume in OpenGL?
So you just need to make a render call for those chunks which are actually inside your frustrum. Do not render chunks behind the camera. OpenGL will make a lot of calculations for all vertices of the chunk, but then the chunk is not visible so why render it in the first place. This can happen in your java code.
A third optimazation could be deferred shading: http://en.wikipedia.org/wiki/Deferred_shading
As far as I know the shading is processed before depth testing and throwing away those triangels/ faces occluded by others, you can speed up your shader using deferred shading as you only shade those vertices which will pass the depth-testing.
There are a lot of more ways to optimize voxel rendering but for me this are the most basic operations. The given tutorial behind the first link isn't finished yet, but he shows a lot of ideas for optimizing voxel rendering.
Edit:
If you want to use textures, which different textures for each cube, I recommend to place all textures in a big one, so you do not need to swap textures, a simple texture lookup is much more faster than swapping a texture (glBindTexture(..)) and then make a lookup and later swap back to this texture. Use one big huge texture and apply the right UV coordinates to your vertices.
You should use BSP Octrees to discard big blocks of offscreen cubes.
You divide the world into 8 "space cubes" wich go in the different axis.
Then, you check if the camera can see something inside the cube, if it can't you discard all the blocks in that section (wich can speed up to 8x). Then, inside the block, you divide again in 8 sections, and check again if they are visible. An so on, speeding checks and renders.
http://en.wikipedia.org/wiki/Octree
http://i.ytimg.com/vi/S-oIeUiw2UY/hqdefault.jpg
Octree can be accelerated using "portals" (and I dont mean GladOs ;) ) wich discard voxels and Octrees depending on the visibility inside doors and windows, but is only good for interiors.
There are many examples for OpenGL ES 2 in how to visualize a single triangle or rectangle.
Google provides an example for drawing shapes (triangles, rectangles) by creating a Triangle and Rectangle class which basically do all the opengl-stuff required for visualize these objects.
But what should you do, if you have more than one triangle? What if you have objects, consists of hundreds of triangles of different colors, different sizes and positions? I can't find any good tutorial for dealing with complex scenarios in opengl es.
My approaches:
So I tried it out. First of all I've slightely changed the Triangle-Class to a more dynamic class (the constructor now gets the position and the color of the triangle). Basically this is "enough" for drawing complexe scenes. Every object would consist out of hundreds of these Triangle-classes and I render each of them seperately. But this consumes much computing power and I think most of the steps in the rendering process are redundant.
So I tried to "group" triangles into different categories. Now every object has his only vertexbuffer and puts every triangle at once in it. Now the performance is far better than before (where every triangle had his own buffer) but I still think, that it's not the correct way to go.
Is there any good example in the internet, where someone is drawing more than simple triangles or do you know where I can get these information from? I really like OpenGL but it's pretty hard for beginners because of the lack of tutorials (for OpenGL ES 2 in Android).
The standard representation of (triangle) meshes for rendering is using a vertex array containing all the vertices in the mesh, and an index array connecting storing the connectivity (triangles). You definitively want at most one draw call per object (but you might even be able to coalesce several objects).
Interleaved attribute arrays are the most efficient variant wrt. cache efficiency, so one Buffer object for the VA per object is enough. You might even combine several objects into one buffer object, even if you can not use a single draw call for both.
As GLES might be limited to 16 Bit indices, large models must be splitted into several "patches".
I'm currently working through a set of tutorials on Android OpenGL ES (1.1) and feel like I'm starting to get a grasp of how the vertices and textures work, along with some sprite animation. As I understand, the only primitives here are points, straight lines, and triangles.
I'm now trying to create a simple curve and really don't know where to start.
I want the curve to be drawn dynamically to represent something like a beam deflection like this where I could input a force and have the curve change.
Is it something I would create with a line loop or triangle fan with a ton of vertices? Or perhaps a texture that I then manipulate?
Any input or a point in the right direction is much appreciated, thanks.
I can recommend this blog post http://blog.uncle.se/2012/02/opengl-es-tutorial-for-android-part-ii-building-a-polygon/ sadly the original source returns a 404. Hopefully the link provides the same quality of information. Anyhow, a good read for openGL.
You have the general idea. Whatever you do must be made of lines, points or triangles. You can generate all the numbers for any pseudo-curve however you want, but you're always going to be passing the resulting vertices to OPENGL then connecting those with lines and triangles.