Bad performance using VBO with OpenGL ES in Android

Bad performance using VBO with OpenGL ES in Android - android

I'm making an Android app and I need to draw some shapes using OpenGL ES. I'm able to render them but I'm disappointed with performance. I updated the code to use VBO but I didn't notice any improvement. I want to render at 60 frames per second (16 ms per frame).
I have a test project where I render several triangles on the screen. When I render 1000 triangles it takes about 20 ms per frame (depending on the device).
I want to keep the rendering under 10 ms because I need the rest (6 ms) to perform other calculations (e.g. update positions, detect collisions, etc.).
Here is the code where I render a triangle:
https://github.com/mauriciotogneri/test/blob/master/src/com/testopengl/Polygon.java#L51-66
Here is the code where iterate over the triangles:
https://github.com/mauriciotogneri/test/blob/master/src/com/testopengl/MapRenderer.java#L117-139
(Change the value of NUMBER_OF_TRIANGLES to display more triangles)
For what I understand, the method GLES20.glDrawArrays(...) takes too much time if I need to call it 1000 times per frame (one per triangle).
Is there another way to render several polygons that doesn't take too much?
Notes:
In the example all the triangles have a fixed position on the screen but in the real scenario they will move around
In the example I assign a random color to each triangle but in the real scenario each of them will have a fixed color

put your positions/colors/normals ... in One VBO object and Draw them in one call.

Related

how to show audio frequency in waveform?

I wanna render wave which shows frequency data of audio.
I have data point of 150 points/second.
I have rendered it using canvas,showing line for each data value. so I show 150 lines for 1 second of song, its showing in right way but when we scroll the view, its lagging.
Is there any Library which can render the data points using openGL, canvas or using any other method which will be smooth while scrolling.
These are two waves. Each line represent one data point, with minimum value zero and maximum value will be highest value in data set.
How to render this wave in OpenGL or using any other library because Its lagging in Scrolling if rendered using canvas.

maybe you could show an example of how it looks like. How do you create the lines? Are the points scattered? Do you have to connect them or do you have a fixed point?
Usually in OpenGL-ES the process would looks like:
- read in your data of audio
- sort them so that OpenGL knows how to connect them
- upload them to your vertexShader

I would really recommend this tutorial. I don't know your OpenGL background, thus this is a perfect tool to start it.
Actually, your application shouldn't be too complicated and the tutorial should offer you enough information. In the case, you want to visualize each second with 150 points
Just a small overview
Learn how to set up a window with OpenGL
You described a 2d application
-define x values as eg. -75 to 75
-define y values as your data
define Lines as x,y dataSet
Use to draw
glBegin(GL_Lines)
glVertexf((float)x value of each line,(float)y small value of each line);
glVertexf((float)x value of each line,(float)y high value of each line);
glEnd();
If you have to use mobile graphics you need shaders because OpenGLES only support shader in GLSL
define your OpenGL camera!

Is there anything I can do about the overhead from running a shader multiple times

I'm trying to implement deferred rendering on an Android phone using OPENGL ES 3.0. I've gotten this to work ok but only very slowly, which rather defeats the whole point. What really slows things up is the multiple calls to the shaders. Here, briefly, is what my code does:
Geometry Pass:
Render scene - output position, normal and colour to off-screen buffers.
For each light:
a) Stencil Pass:
Render a sphere at the current light position, sized according to the lights intensity. Mark these pixels as influenced by current light. No actual output.
b) Light Pass:
Render a sphere again, this time using the data from the geometry pass to apply lighting equations to pixels marked in the previous step. Add this to off-screen buffer
Blit to screen
It's this restarting the shaders for each light causing the bottleneck. For example, with 25 lights the above steps run at about 5 fps. If instead I do: Geometry Pass / Stencil Pass - draw 25 lights / Light Pass - draw 25 lights it runs at around 30 fps. So, does anybody know how I can avoid having to re-initialize the shaders? Or, in fact, just explain what's taking up the time? Would it help or even be possible (and I'm sorry if this sounds daft) to keep the shader 'open' and overwrite the previous data rather than doing whatever it is that takes so much time restarting the shader? Or should I give this up as a method for having multiple lights, on a mobile devise anyway.

Well, I solved the problem of having to swap shaders for each light by using an integer texture as a stencil map, where a certain bit is set to represent each light. (So, limited to 32 lights.) This means step 2a (above) can be looped, then a single change of shader, and looping step 2b. However, (ahahaha!) it turns out that this didn't really speed things up as it's not, after all, swapping shaders that's the problem but changing write destination. That is, multiple calls to glDrawBuffers. As I had two such calls in the stencil creation loop - one to draw nowhere when drawing a sphere to calculate which pixels are influenced and one to draw to the integer texture used as the stencil map. I finally realized that as I use blending (each write with a colour where a singe bit is on) it doesn't matter if I write at the pixel calculation stage, so long as it's with all zeros. Getting rid of the unnecessary calls to glDrawBuffers takes the FPS from single figures to the high twenties.
In summary, this method of deferred rendering is certainly faster than forward rendering but limited to 32 lights.
I'd like to say that me code was written just to see if this was a viable method and many small optimizations could be made. Incidentally, as I was limited to 4 draw buffers, I had to scratch the position map and instead recover this from gl_FragCoord.xyz. I don't have proper benchmarking tools so I'd be interested to hear from anyone who can tell me what difference this makes, speedwise.

OpenGL ES 2.0 How to increase performance in a voxel world?

I created a voxel world using OpenGL ES 2.0 using a VBO to store a basic cube and using a different position matrix for each cube. I am able to get 30fps on my Galaxy S3 when there are 500-600 cubes being rendered, but anything more than 1500 cubes isn't able to run at a faster rate than 8 fps. This is unacceptable because the voxel world should be able to handle more than 5,000 voxels being rendered at a stable 30fps. I have played other mobile games on my phone that run at good framerates and render much more than 5000 blocks at a time. What kind of techniques would be best for getting good performance?
Here is what I have set up in more detail:
There is one VBO containing vertex information for a basic cube.
Each block has its own matrix that is translated to the block's position in world space (This matrix is calculated only once when the block is created). The block calls glDrawArrays to draw the cube using its position matrix. Unfortunately this means there are thousands of calls to glDrawArrays in each frame.
Is there a better technique to this? I don't know how to group all the blocks into one single call to glDrawArrays because that would mean the VBO would need a huge allocation, to add all the vertex data for every single cube, and it is impossible to know how much space the VBO would need before drawing them. What I was thinking was to allocate a VBO for every 500 or so blocks so that if it needs more space for blocks it can always create a new VBO for it. And this way it wouldn't be allocating too much extra space since it will only allocate enough space for 500 blocks, and this way if we have 5000 blocks in the world, there will be only 10 calls to glDrawArrays instead of having thousands of those calls.
Another idea I have is that instead of having a VBO for the cube, I could make a VBO for a quad, and use a transformation matrix on each quad. This would require even more calls to glDrawArrays since I would have to call it for each face of the cube, but the plus side is that this way I can remove the faces that already have a block next to them. For the floor level, each block has 4 blocks surrounding it, so those 4 faces don't actually need to be drawn. This would save drawing those 4 quads for each block, but it would require more than double the amount of glDrawArrays calls. To reduce the amount of glDrawArrays calls I could create a new VBO for every 500 or so quads, and add/remove quads to the current VBOs whenever necessary. This would reduce the amount of glDrawArrays calls, but it would mean that I have to group each quad based on its texture, which is another issue because if I have to create a VBO for each texture, that would require me to allocate a lot of extra unnecessary space because there might be just one block that uses a certain texture and I may end up allocating space for 500 blocks for that texture.
These are my thoughts on some of the methods I can think of to optimise the rendering, but I don't think any of these techniques will drastically improve the fps of the game, because every method comes with its own issues. Is there anything that I have not thought of that could be a better solution?
EDIT: I switched to rendering quads instead of cubes because this way I can skip over the faces that are not visible. After that I also added frustum culling so that only blocks visible inside the frustum are shown. This increased the performance so that I can render a decent sized world at 30 fps now. But I think there is still a lot of room for improvement, because there are currently 23,000 calls to glDrawArrays(GL_TRIANGLES) (one for each quad rendered on screen). Would switching to using glDrawArrays(GL_TRIANGLE_STRIPS) make any real difference? And also creating VBO's that hold 1,000 quads each instead of just 1 quad is a possibility, but that would mean I would have to allocate a lot more space in the VBO's. (Right now there is only one quad stored in the VBO which is transformed by a matrix to its position/rotation).

if using Octtrees (wich is definitely THE WAY) does not suit you, you can optimize the code for calling the vbo lists.
In my work, I started with a scene rendering at 3fps rate, just optimizing the opengl calls and context switches, now runs on 53fps (wich is quite fine considering the starting point).
So, try not to change any register inside the gpu between calls:
order all the objects with the same shader to render them all at the time using only one glUseProgram
order objects with transparency, so you only draw translucent objects at the end.
draw objects in such a fashion that fragments are drawn only once (if a object is behind another, draw the front object first, cause depth test is faster than fragment calculation).
use shaders without "discard;" wich is costly for the cpu to process.
use reversed loops to get a little bit of cpu speed
dont select the texture if it is already the same than selected in the GPU (a cpu 'if' is less costly than a GPU register change).
try not to update the shader attributes if there is no need to (cpu if is less costly).
if you post some pieces of code I can help you better.

I am currently implementing a voxel world using java on a normal PC with OpenGL 4.x.
At the beginning I had the same issue but that I followed a very basic tutorial: https://sites.google.com/site/letsmakeavoxelengine/
With one render call per chunk there is no problem having 10 Chunks of 32*32*32 Blocks rendered (FPS > 30). You should load the Chunk and only add those faces which are not occluded by other faces (so that they are visible to the player) to an array which will be uploaded to a VBO. Therefore you have one rendercall per Chunk with the minimum amout of faces
In 2D is looks like this
_ _ _
|B B B|
|B B |
|B B B|
- - -
There is no need to draw the faces between the outter faces. In addition you can use frustrum culling: How to check if an object lies outside the clipping volume in OpenGL?
So you just need to make a render call for those chunks which are actually inside your frustrum. Do not render chunks behind the camera. OpenGL will make a lot of calculations for all vertices of the chunk, but then the chunk is not visible so why render it in the first place. This can happen in your java code.
A third optimazation could be deferred shading: http://en.wikipedia.org/wiki/Deferred_shading
As far as I know the shading is processed before depth testing and throwing away those triangels/ faces occluded by others, you can speed up your shader using deferred shading as you only shade those vertices which will pass the depth-testing.
There are a lot of more ways to optimize voxel rendering but for me this are the most basic operations. The given tutorial behind the first link isn't finished yet, but he shows a lot of ideas for optimizing voxel rendering.
Edit:
If you want to use textures, which different textures for each cube, I recommend to place all textures in a big one, so you do not need to swap textures, a simple texture lookup is much more faster than swapping a texture (glBindTexture(..)) and then make a lookup and later swap back to this texture. Use one big huge texture and apply the right UV coordinates to your vertices.

You should use BSP Octrees to discard big blocks of offscreen cubes.
You divide the world into 8 "space cubes" wich go in the different axis.
Then, you check if the camera can see something inside the cube, if it can't you discard all the blocks in that section (wich can speed up to 8x). Then, inside the block, you divide again in 8 sections, and check again if they are visible. An so on, speeding checks and renders.
http://en.wikipedia.org/wiki/Octree
http://i.ytimg.com/vi/S-oIeUiw2UY/hqdefault.jpg
Octree can be accelerated using "portals" (and I dont mean GladOs ;) ) wich discard voxels and Octrees depending on the visibility inside doors and windows, but is only good for interiors.

Android / Offscreen rendering to texture

I am making a 2D graphical app that will display planets. I say 2D because the majority of the app will be 2D. I however want to render some 3D objects into dynamic sprites offscreen (to a texture), with transparent (possibly translucent) areas, and subsequently render those rendered textures to the active screen as 2D textured quads. Rendering directly to the screen as 3D objects is not optimal in this case, because it would require me to implement some sort of 3D picking. I am not that advanced in math yet. Note also that the main screen render will be orthographic, while the offscreen render would be perspective.
How can I accomplish this (general idea, no need for specifics), and what would be the most efficient way to do this? Would this reduce support for a wide variety of devices? Also, if the 3D sprite renderings were constantly refreshed every frame (such as being rotated fine amounts) would that kill framerates with continuous unloading/reloading of texture to memory? I suppose that some scenes could have as many as 10 of these 3D offscreen sprites.
Thanks for the help

If you really must use the offscreen rendering just search for FBO(frame buffer object) and attach a texture to it, then use the texture in your main view as 2D. It is quite a straight forward procedure but might decrease the speed. You will probably not be able to do any multithreading on it so you should create just 1 FBO. Its dimensions will probably have to be a power of 2 so the resolution might be different then you wish. This procedure does not continually load/unload anything, the data is allocated when creating the texture and GL draws/reads directly from it. The largest drawback here will be the memory.. You will create as many as 10 of this textures just to draw on them and present once.
It might be very easy to place this objects on a specific place on your main buffer though: Make all the logic as if you would want to draw a full screen planet but use "viewport" method to place it to a specific part of the screen.
If those planet images will be updated only on user request (you don't want to draw them every frame) then I suggest you try to make a combination of both: Create a FBO with a texture of same size or larger then main view and draw all the planets to this single texture using "viewport" method. Then you can update any you want, just don't clear the buffer, rather draw a clear rect on the specific part of the buffer/texture. And keep drawing the whole texture to the main buffer.

Can OpenGL ES render textures of non base 2 dimensions?

This is just a quick question before I dive deeper into converting my current rendering system to openGL. I heard that textures needed to be in base 2 sizes in order to be stored for rendering. Is this true?
My application is very tight on memory, but most of the bitmaps are not powers of two. Does storing non-base 2 textures consume more memory?

It's true depending on the OpenGL ES version, OpenGL ES 1.0/1.1 have the power of two restriction. OpenGL ES 2.0 doesn't have the limitation, but it restrict the wrap modes for non power of two textures.
Creating bigger textures to match POT dimensions does waste texture memory.

Suresh, the power of 2 limitation was built into OpenGL back in the (very) early days of computer graphics (before affordable hardware acceleration), and it was done for performance reasons. Low-level rendering code gets a decent performance boost when it can be hard-coded for power-of-two textures. Even in modern GPU's, POT textures are faster than NPOT textures, but the speed difference is much smaller than it used to be (though it may still be noticeable on many ES devices).
GuyNoir, what you should do is build a texture atlas. I just solved this problem myself this past weekend for my own Android game. I created a class called TextureAtlas, and its constructor calls glTexImage2D() to create a large texture of any size I choose (passing null for the pixel values). Then I can call add(id, bitmap), which calls glTexSubImage2D(), repeatedly to pack in the smaller images. The TextureAtlas class tracks the used and free space within the larger texture and the rectangles each bitmap is stored in. Then the rendering code can call get(id) to get the rectangle for an image within the atlas (which it can then convert to texture coordinates).
Side note #1: Choosing the best way to pack in various texture sizes is NOT a trivial task. I chose to start with simple logic in the TextureAtlas class (think typewriter + carriage return + line feed) and make sure I load the images in the best order to take advantage of that logic. In my case, that was to start with the smallest square-ish images and work my way up to the medium square-ish images. Then I load any short+wide images, force a CR+LF, and then load any tall+skinny images. I load the largest square-ish images last.
Side note #2: If you need multiple texture atlases, try to group images inside each that will be rendered together to minimize the number of times you need to switch textures (which can kill performance). For example, in my Android game I put all the static game board elements into one atlas and all the frames of various animation effects in a second atlas. That way I can bind atlas #1 and draw everything on the game board, then I can bind atlas #2 and draw all the special effects on top of it. Two texture selects per frame is very efficient.
Side note #3: If you need repeating/mirroring textures, they need to go into their own textures, and you need to scale them (not add black pixels to fill in the edges).

No, it must be a 2base. However, you can get around this by adding black bars to the top and/or bottom of your image, then using the texture coordinates array to restrict where the texture will be mapped from your image. For example, lets say you have a 13 x 16 pixel texture. You can add 3 pixels of black to the right side then do the following:
static const GLfloat texCoords[] = {
0.0, 0.0,
0.0, 13.0/16.0,
1.0, 0.0,
1.0, 13.0/16.0
};
Now, you have a 2base image file, but a non-2base texture. Just make sure you use linear scaling :)

This is a bit late but Non-power of 2 textures are supported under OpenGL ES 1/2 through extensions.
The main one is GL_OES_texture_npot. There is also GL_IMG_texture_npot and GL_APPLE_texture_2D_limited_npot for iOS devices
Check for these extensions by calling glGetString(GL_EXTENSIONS) and searching for the extension you need.
I would also advise keeping your textures to sizes that are multiples of 4 as some hardware stretches textures if not.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.