i've tried using
floatbuffer.put(float[]);
but as i am handling more than 200 squares, all with diferent texture coordinates that are updated each frame, my fps drop drastically, and the game becomes too far to be fluid.
y thought the method mentioned on badlogicgames.com, about of, instead use floatbuffer, use a intbuffer but, is the same thing, equal of slow at the moment of the method "put" of the buffer.
so, how i could update all my floatbuffers with a best performance?
EDIT: i've solved my problem, the "put" method itself is not slow, the problem is when a new float is initialized for each floatbuffer, instead of that, i just change the value of each element contained in the floatarray and that avoids many GC activity.. well i think.
There are a number of ways to improve performance. Briefly, here are a few:
1) Initialize the floatbuffer and an apply a transformation each each frame: - If your squares are moving across the screen, place them once in a floatbuffer, and then apply a separate matrix transformation to each one. In this method, you fill the floatbuffer once, and just update the transformations each turn.
2) Put only unique shapes in the floatbuffer - If any of your squares are duplicates of each other, call the same buffer each frame to conserve memory.
3) Use Vertex Buffer Objects and Index Buffer Objects - It sounds like you might have done this, but store the data in a vertex buffer, and call the vertices using the indices of the array.
If you are still having issues, I recommend taking a look at this tutorial: http://www.learnopengles.com/android-lesson-one-getting-started/
Related
So Im trying to figure out how to draw a single textured quad many times. My issue is that since these are create and deleted and every one of them has a unique position and rotation. Im not sure a vbo is the best solution as I've heard modifying buffers is extremely slow on android and it seems I would need to create a new one each frame since different quads might disappear randomly (collide with an enemy). If I simply do a draw call for each one I get 20fps around 100, which is unusable. any advice?
Edit: I'm trying to create a bullethell, but figuring out how to draw 500+ things is hurting my head.
I think you're after a particle system. A similar question is here: Drawing many textured particles quickly in OpenGL ES 1.1.
Using point sprites is quite cheap, but you have to do extra work in the fragment shader and I'm not sure if GLES2 supports gl_PointSize if you need different sized particles. gl_PointSize Corresponding to World Space Size
My go-to particle system is storing positions in a double buffered texture, then draw using a single draw call and a static array of quads. This is related but I'll describe it a bit more here...
Create a texture (floating point if you can, but this may limit the supported devices). Each pixel holds the particle position and maybe rotation information.
[EDITED] If you need to animate the particles you want to change the values in the texture each frame. To make it fast, get the GPU to do it in a shader. Using an FBO, draw a fullscreen polygon and update the values in the fragment shader. The problem is you can't read and write to the same texture (or shouldn't). The common approach is to double buffer the texture by creating a second one to render to while you read from the first, then ping-pong between them.
Create a VBO for drawing triangles. The positions are all the same, filling a -1 to 1 quad. However make texture coordinates for each quad address the correct pixel in the above texture.
Draw the VBO, binding your positions texture. In the vertex shader, read the position given the vertex texture coordinate. Scale the -1 to 1 vertex positions to the right size, apply the position and any rotation. Use the original -1 to 1 position as the texture coordinate to pass to the fragment shader to add any regular colour textures.
If you ever have a GLSL version with gl_Vertex, I quite like generating these coordinates in the vertex shader, saving storing unnecessarily trivial data just to draw simple objects. This for example.
To spawn particles, use glTexSubImage2D and write a block of particles into the position texture. You may need a few textures if you start storing more particle attributes.
I created a voxel world using OpenGL ES 2.0 using a VBO to store a basic cube and using a different position matrix for each cube. I am able to get 30fps on my Galaxy S3 when there are 500-600 cubes being rendered, but anything more than 1500 cubes isn't able to run at a faster rate than 8 fps. This is unacceptable because the voxel world should be able to handle more than 5,000 voxels being rendered at a stable 30fps. I have played other mobile games on my phone that run at good framerates and render much more than 5000 blocks at a time. What kind of techniques would be best for getting good performance?
Here is what I have set up in more detail:
There is one VBO containing vertex information for a basic cube.
Each block has its own matrix that is translated to the block's position in world space (This matrix is calculated only once when the block is created). The block calls glDrawArrays to draw the cube using its position matrix. Unfortunately this means there are thousands of calls to glDrawArrays in each frame.
Is there a better technique to this? I don't know how to group all the blocks into one single call to glDrawArrays because that would mean the VBO would need a huge allocation, to add all the vertex data for every single cube, and it is impossible to know how much space the VBO would need before drawing them. What I was thinking was to allocate a VBO for every 500 or so blocks so that if it needs more space for blocks it can always create a new VBO for it. And this way it wouldn't be allocating too much extra space since it will only allocate enough space for 500 blocks, and this way if we have 5000 blocks in the world, there will be only 10 calls to glDrawArrays instead of having thousands of those calls.
Another idea I have is that instead of having a VBO for the cube, I could make a VBO for a quad, and use a transformation matrix on each quad. This would require even more calls to glDrawArrays since I would have to call it for each face of the cube, but the plus side is that this way I can remove the faces that already have a block next to them. For the floor level, each block has 4 blocks surrounding it, so those 4 faces don't actually need to be drawn. This would save drawing those 4 quads for each block, but it would require more than double the amount of glDrawArrays calls. To reduce the amount of glDrawArrays calls I could create a new VBO for every 500 or so quads, and add/remove quads to the current VBOs whenever necessary. This would reduce the amount of glDrawArrays calls, but it would mean that I have to group each quad based on its texture, which is another issue because if I have to create a VBO for each texture, that would require me to allocate a lot of extra unnecessary space because there might be just one block that uses a certain texture and I may end up allocating space for 500 blocks for that texture.
These are my thoughts on some of the methods I can think of to optimise the rendering, but I don't think any of these techniques will drastically improve the fps of the game, because every method comes with its own issues. Is there anything that I have not thought of that could be a better solution?
EDIT: I switched to rendering quads instead of cubes because this way I can skip over the faces that are not visible. After that I also added frustum culling so that only blocks visible inside the frustum are shown. This increased the performance so that I can render a decent sized world at 30 fps now. But I think there is still a lot of room for improvement, because there are currently 23,000 calls to glDrawArrays(GL_TRIANGLES) (one for each quad rendered on screen). Would switching to using glDrawArrays(GL_TRIANGLE_STRIPS) make any real difference? And also creating VBO's that hold 1,000 quads each instead of just 1 quad is a possibility, but that would mean I would have to allocate a lot more space in the VBO's. (Right now there is only one quad stored in the VBO which is transformed by a matrix to its position/rotation).
if using Octtrees (wich is definitely THE WAY) does not suit you, you can optimize the code for calling the vbo lists.
In my work, I started with a scene rendering at 3fps rate, just optimizing the opengl calls and context switches, now runs on 53fps (wich is quite fine considering the starting point).
So, try not to change any register inside the gpu between calls:
order all the objects with the same shader to render them all at the time using only one glUseProgram
order objects with transparency, so you only draw translucent objects at the end.
draw objects in such a fashion that fragments are drawn only once (if a object is behind another, draw the front object first, cause depth test is faster than fragment calculation).
use shaders without "discard;" wich is costly for the cpu to process.
use reversed loops to get a little bit of cpu speed
dont select the texture if it is already the same than selected in the GPU (a cpu 'if' is less costly than a GPU register change).
try not to update the shader attributes if there is no need to (cpu if is less costly).
if you post some pieces of code I can help you better.
I am currently implementing a voxel world using java on a normal PC with OpenGL 4.x.
At the beginning I had the same issue but that I followed a very basic tutorial: https://sites.google.com/site/letsmakeavoxelengine/
With one render call per chunk there is no problem having 10 Chunks of 32*32*32 Blocks rendered (FPS > 30). You should load the Chunk and only add those faces which are not occluded by other faces (so that they are visible to the player) to an array which will be uploaded to a VBO. Therefore you have one rendercall per Chunk with the minimum amout of faces
In 2D is looks like this
_ _ _
|B B B|
|B B |
|B B B|
- - -
There is no need to draw the faces between the outter faces. In addition you can use frustrum culling: How to check if an object lies outside the clipping volume in OpenGL?
So you just need to make a render call for those chunks which are actually inside your frustrum. Do not render chunks behind the camera. OpenGL will make a lot of calculations for all vertices of the chunk, but then the chunk is not visible so why render it in the first place. This can happen in your java code.
A third optimazation could be deferred shading: http://en.wikipedia.org/wiki/Deferred_shading
As far as I know the shading is processed before depth testing and throwing away those triangels/ faces occluded by others, you can speed up your shader using deferred shading as you only shade those vertices which will pass the depth-testing.
There are a lot of more ways to optimize voxel rendering but for me this are the most basic operations. The given tutorial behind the first link isn't finished yet, but he shows a lot of ideas for optimizing voxel rendering.
Edit:
If you want to use textures, which different textures for each cube, I recommend to place all textures in a big one, so you do not need to swap textures, a simple texture lookup is much more faster than swapping a texture (glBindTexture(..)) and then make a lookup and later swap back to this texture. Use one big huge texture and apply the right UV coordinates to your vertices.
You should use BSP Octrees to discard big blocks of offscreen cubes.
You divide the world into 8 "space cubes" wich go in the different axis.
Then, you check if the camera can see something inside the cube, if it can't you discard all the blocks in that section (wich can speed up to 8x). Then, inside the block, you divide again in 8 sections, and check again if they are visible. An so on, speeding checks and renders.
http://en.wikipedia.org/wiki/Octree
http://i.ytimg.com/vi/S-oIeUiw2UY/hqdefault.jpg
Octree can be accelerated using "portals" (and I dont mean GladOs ;) ) wich discard voxels and Octrees depending on the visibility inside doors and windows, but is only good for interiors.
Dabbling around with OGLES in Android. I want to create some triangles and translate them. Straightforward enough.
I look at the examples and see that FloatBuffers are recommended as the storage construct for polygon coordinates using GLES in Android, because they are on native code steroids. Good stuff! Explanation here.
Why FloatBuffer instead of float[]?
Now what's confusing to me is the Matrix class in Android operates almost exclusively on java float arrays. So on one hand I'm being told to use FloatBuffers for speed, but on the other hand every transformation I want to do now has to look like this?
float[] myJavaFloatArray = new float[polylength];
myFloatBuffer.get(myJavaFloatArray);
Matrix.translateM(myJavaFloatArray,0,-1.0f,0.0f,0.0f);
myFloatBuffer.put(myJavaFloatArray);
Is this really the best practice? Is the assumption just that my ratio of translates to draws will be low enough that this expense is worth it? What if I'm doing lots of translates, one on every object per frame, is that an argument for using java float arrays throughout?
First of all, isn't there a method array() for FloatBuffer that returns float[] value pointing directly to the buffer data? You should use that in translate method so you don't copy the data around the memory, this should turn your 4 lines of code into a single line:
Matrix.translateM(myFloatBuffer.array(),0,-1.0f,0.0f,0.0f);
Second it is not best practice to transform your vertex data on the CPU, you have a vertex shader for that. Use matrices and push them to the GPU so you can benefit from it. Also if you transform those data the way you do you kind of corrupt them: If the float buffer contained vertex array for some object you want to draw in several location you would have to either translate it back to 0 before translating it to a new location for each draw call or have multiple instances of the vertex data.
Any way, although it can be very handy to be able to transform the array buffer you should avoid using it per frame if possible. Do it in loading time or in background when preparing some element. As for per frame rather try to use vertex shader for that. In case you are dealing with ES1 and you have no vertex shader there are still matrix operations that work in the same way (translate, rotate, scale, set, push, pop...)
I'm implementing some native 2D-draw functions in my graphics engine for android, but now there's another question coming up, when I observe the performance of my program.
At the moment I'm implementing a drawLine/drawImage function. In summary, there are following different values for drawing each different line / image:
the color
the alpha value
the width of the line
rotation (only for images)
size/scale (also for images)
blending method (subrtract, add, normal-alpha)
Now, when an imageLine is drawn,
I put the CPU-calculated vertex-positions and uv-values for 6 vertices (2 triangles), into a Floatbuffer and draw it immediately with drawArrays, after passing information for drawing (color,alpha, etc.) via uniforms to the shader. When I draw an image, the pre-set VBO is directly drawn after passing information.
The first fact I recognized, is: of course drawing Images is much faster, than imagelines (beacuse of VBOs), but also: I cannot pre-put vertex-data into a VBO for imageLines, because imageLines have no static shape like normal images (varying linelength, varying linewidth and the vertex positions of x1,y1 and x2,y2 change too often)
That's why I use a normal Floatbuffer, instead of a VBO.
So my question is: What's the best way for managing images, and other 2D-graphics functions. For me it's some kind of important, that the user of the engine is able to draw as many images/2D graphics as possible, without loosing to much performance.
You can find the functions for drawing images, imagelines, rects, quads, etc. here:
https://github.com/Chrise55/LLama3D/blob/master/Llama3DLibrary/src/com/llama3d/object/graphics/image/ImageBase.java
Here an example how it looks with many images (testing artificial neural networks), it works fine, but already little bit slow with that many images... :(
What is the best way: if I use glDrawArrays, or if I use glDrawElements? Any difference?
For both, you pass OpenGL some buffers containing vertex data.
glDrawArrays is basically "draw this contiguous range of vertices, using the data I gave you earlier".
Good:
You don't need to build an index buffer
Bad:
If you organise your data into GL_TRIANGLES, you will have duplicate vertex data for adjacent triangles. This is obviously wasteful.
If you use GL_TRIANGLE_STRIP and GL_TRIANGLE_FAN to try and avoid duplicating data: it isn't terribly effective and you'd have to make a rendering call for each strip and fan. OpenGL calls are expensive and should be avoided where possible
With glDrawElements, you pass in buffer containing the indices of the vertices you want to draw.
Good
No duplicate vertex data - you just index the same data for different triangles
You can just use GL_TRIANGLES and rely on the vertex cache to avoid processing the same data twice - no need to re-organise your geometry data or split rendering over multiple calls
Bad
Memory overhead of index buffer
My recommendation is to use glDrawElements
The performance implications are probably similar on the iphone, the OpenGL ES Programming Guide for iOS recommends using triangle strips and joining multiple strips through degenerate triangles.
The link has a nice illustration of the concept. This way you could reuse some vertices and still do all the drawing in one step.
For best performance, your models should be submitted as a single unindexed triangle strip using glDrawArrays with as few duplicated vertices as possible. If your models require many vertices to be duplicated (because many vertices are shared by triangles that do not appear sequentially in the triangle strip or because your application merged many smaller triangle strips), you may obtain better performance using a separate index buffer and calling glDrawElements instead. There is a trade off: an unindexed triangle strip must periodically duplicate entire vertices, while an indexed triangle list requires additional memory for the indices and adds overhead to look up vertices. For best results, test your models using both indexed and unindexed triangle strips, and use the one that performs the fastest.
Where possible, sort vertex and index data so that that triangles that share common vertices are drawn reasonably close to each other in the triangle strip. Graphics hardware often caches recent vertex calculations, so locality of reference may allow the hardware to avoid calculating a vertex multiple times.
The downside is that you probably need a preprocessing step that sorts your mesh in order to obtain long enough strips.
I could not come up with a nice algorithm for this yet, so I can not give any performance or space numbers compared to GL_TRIANGLES. Of course this is also highly dependent on the meshes you want to draw.
Actually you can degenerate the triangle strip to create continuous strips so that you don't have to split it while using glDrawArray.
I have been using glDrawElements and GL_TRIANGLES but thinking about using glDrawArray instead with GL_TRIANGLE_STRIP. This way there is no need for creating inidices vector.
Anyone that knows more about the vertex cache thing that was mentioned above in one of the posts? Thinking about the performance between glDrawElements/GL_TRIANGLE vs glDrawArray/GL_TRIANGLE_STRIP.
The accepted answer is slightly outdated. Following the doc link in Jorn Horstmann's answer, OpenGL ES Programming Guide for iOS, Apple describes how to use "degenerate-triangles" trick with DrawElements, thereby gaining the best of both worlds.
The minor savings of a few indices by using DrawArrays isn't worth the savings you get by combining all your data into a single GL call, DrawElements. (You could combine all using DrawArrays, but then any "wasted elements" would be wasting vertices, which are much larger than indices, and require more render time too.)
This also means you don't need to carefully consider all your models, as to whether most of them can be rendered as a minimal number of strips, or they are too complex. One uniform solution, that handles everything. (But do try to organize into strips where possible, to minimize data sent, and maximize GPU likeliness of re-using recently cached vertex calculations.)
BEST: A single DrawElements call with GL_TRIANGLE_STRIP, containing all your data (that is changing in each frame).