I am trying to batch draw a bunch of lines on Android using OpenGL ES 2.0 and I need to know the best way to do this.
Right now I made a class called LineEngine which builds up a FloatBuffer of all the vertices to draw and then draws all the lines at once. The problem is that apparently FloatBuffer.put() is very slow and is gobbling up CPU time like crazy.
Here is my class
public class LineEngine {
private static final float[] IDENTIY = new float[16];
private FloatBuffer mLinePoints;
private FloatBuffer mLineColors;
private int mCount;
public LineEngine(int maxLines) {
Matrix.setIdentityM(IDENTIY, 0);
ByteBuffer byteBuf = ByteBuffer.allocateDirect(maxLines * 2 * 4 * 4);
byteBuf.order(ByteOrder.nativeOrder());
mLinePoints = byteBuf.asFloatBuffer();
byteBuf = ByteBuffer.allocateDirect(maxLines * 2 * 4 * 4);
byteBuf.order(ByteOrder.nativeOrder());
mLineColors = byteBuf.asFloatBuffer();
reset();
}
public void addLine(float[] position, float[] color){
mLinePoints.put(position, 0, 8); //These lines
mLineColors.put(color, 0, 4); // are taking
mLineColors.put(color, 0, 4); // the longest!
mCount++;
}
public void reset(){
mLinePoints.position(0);
mLineColors.position(0);
mCount = 0;
}
public void draw(){
mLinePoints.position(0);
mLineColors.position(0);
GraphicsEngine.setMMatrix(IDENTIY);
GraphicsEngine.setColors(mLineColors);
GraphicsEngine.setVertices4d(mLinePoints);
GraphicsEngine.disableTexture();
GLES20.glDrawArrays(GLES20.GL_LINES, 0, mCount * 2);
GraphicsEngine.disableColors();
reset();
}
}
Is there a better way to batch all these lines together?
What you are trying to do is called SpriteBatching. If you want your program to be robust in openGL es you have to check the following( a list of what makes your program slow ) :
-Changing to many opengl ES states each frame.
-Enable//Dissable (textures etc) again and again. Once you enable something you dont have to do it again it will be applied automaticly its frame.
-Using to many assets instead of spriteSheets. Yes thats make a huge performance impact. For example if you have 10 images you have to load for each image a different texture and that is SLOW. A better way to do this is to create a spriteSheet.(You can check google for free spriteSheet creators).
-Your images are high quality. Yes! check your resolution and file extension. Prefer .png files.
-Care for the api 8 float buffer bug(you have to use int buffer instead to fix the bug).
-Using java containers. This is the biggest pitfall, if you use string concatenation(that's not a container but it returns a new string each time) or a List or any other container that RETURNS A NEW CLASS INSTANCE chances are your program will be slow due to garbage collection. For input handling i would suggest you to search a teqnique called the Pool Class. Its use is to recycle objects instead of creating new ones. Garbage collector is the big enemy, try to make him happy and avoid any programming technique that might call him.
-Never load things on the fly, instead make a loader class and load all the necessary assets in the begining of the app.
If you do implement this list then your chances are your program will be robust.
One last adition. What is a sprite batcher? A spriteBatcher is a class that uses a single texture to render multiple objects. It will create automaticly vertices, indices, colors, u - v coords and it will render them as a batch. This pattern will save GPU power as well as cpu power. From your posted code i can't be sure what causes the cpu to slow down but from my experience is due to one(or more) things of the list i previously mention. Check the list, follow it, search google for spriteBatching and i am sure your program will run fast. Hope i helped!
Edit: I think i found what causes your program to slow down, you dont flip the buffer dude! You just reset the position. You just add add add more objects and you cause buffer overload. In the reset method just flip the buffer. mLineColors.flip mLinePaints.flip will do the job. Make sure you call them each frame if you send new verices each frame.
FloatBuffer.put(float[]) with a relatively large float array should be considerably faster. The single put(float) calls have plenty of overhead.
Just go for a very simple native function which will be called from your class. You can put float[] to OpenGL directly, no need to kill CPU time with a silly buffer interface.
Related
I have an Android OpenGL-ES application featuring two threads. Call Thread 1 the "display thread" which "blends" its current texture with a texture emanating from Thread 2 a.k.a the "worker thread". Thread 2 performs off-screen rendering (render to texture), and then Thread 1 combines this texture with it's own texture to generate the frame which is displayed to the user.
I have a working solution but I know it is inefficient and am trying to improve upon it. In it's OnSurfaceCreated() method, Thread 1 creates two textures. Thread 2, in it's draw method, does a glReadPixels() into a ByteBuffer (let's refer to it as bb). Thread 2 then signals to Thread 1 that a new frame is ready, at which point Thread 1 invokes glTexSubImage2D(bb) to update it's texture with the new data from Thread 2, and proceed with it's "blending" in order to generate a new frame.
This architecture works better on some Android devices than others, and I have been able to garner a slight improvement in performance by using PBOs. But I figured that by using so-called "direct textures" via the EGL Image extension (https://software.intel.com/en-us/articles/using-opengl-es-to-accelerate-apps-with-legacy-2d-guis) I would gain some benefit by removing the need for the costly glTexSubImage2D() call. Yes, I'd still have the glReadPixels() call which bothers me still but at least I should measure some improvement. In fact, at least on a Samsung Galaxy Tab S (Mali T628 GPU) my new code is dramatically slower than before! How can this be?
In the new code Thread 1 instantiates the EGLImage object by using gralloc and proceeds to bind it to a texture:
// note gbuffer::create() is a wrapper around gralloc
buffer = gbuffer::create(width, height, gbuffer::FORMAT_RGBA_8888);
EGLClientBuffer anb = buffer->getNativeBuffer();
EGLImageKHR pEGLImage = _eglCreateImageKHR(eglGetCurrentDisplay(), EGL_NO_CONTEXT, EGL_NATIVE_BUFFER_ANDROID, (EGLClientBuffer)anb, attrs);
glBindTexture(GL_TEXTURE_2D, texid); // texid from glGenTextures(...)
_glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, pEGLImage);
Then Thread 2 in it's main loop does it's off-screen render-to-texture stuff and essentially pushes the data back over to Thread 1 via glReadPixels() with the destination address as the backing storage behind the EGLImage:
void* vaddr = buffer->lock();
glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, vaddr);
buffer->unlock();
How can this be slower than glReadPixels() into a ByteBuffer followed by glTexSubImage2D from the aforementioned ByteBuffer? I'm also interested in alternative techniques as I am not limited to OpenGL-ES 2.0 and can use OpenGL-ES 3.0. I have tried FBOs but ran into some issues.
In response to the first answer I decided to take a stab at implementing a different approach. Namely, sharing the textures between Thread 1 and Thread 2. While I don't have the sharing part working yet, I do have Thread 1's EGLContext passed down to Thread 2's EGLContext so that in theory, Thread 2 can share textures with Thread 1. With these changes in place, and with the glReadPixels() and glTexSubImage2D() calls remaining, the app works but is far slower than before. Strange.
The other oddity I uncovered is the deal with the difference between javax.microedition.khronos.egl.EGLContext and android.opengl.EGLContext. GLSurfaceView exposes an interface method setEGLContextFactory() which allows me to pass Thread 1's EGLContext to Thread 2, as in the following:
public Thread1SurfaceView extends GLSurfaceView {
public Thread1SurfaceView(Context context) {
super(context);
// here is how I pass Thread 1's EGLContext to Thread 2
setEGLContextFactory(new EGLContextFactory() {
#Override
public javax.microedition.khronos.egl.EGLContext createContext(
final javax.microedition.khronos.egl.EGL10 egl,
final javax.microedition.khronos.egl.EGLDisplay display,
final javax.microedition.khronos.egl.EGLConfig eglConfig) {
// Configure context for OpenGL ES 3.0.
int[] attrib_list = {EGL14.EGL_CONTEXT_CLIENT_VERSION, 3, EGL14.EGL_NONE};
javax.microedition.khronos.egl.EGLContext renderContext =
egl.eglCreateContextdisplay, eglConfig, EGL10.EGL_NO_CONTEXT, attrib_list);
mThread2 = new Thread2(renderContext);
}
});
}
Previously, I used stuff out of the EGL14 namespace but since the interface for GLSurfaceView apparently relies on EGL10 stuff I had to change the implementation for Thread 2. Everywhere I used EGL14 I replaced with javax.microedition.khronos.egl.EGL10. Then my shaders stopped compiling until I added GLES3 to the attribute list. Now things work, albeit slower than before (but next I will remove the calls to glReadPixels and glTexSubImage2D).
My follow-on question is, is this the right way to handle the javax.microedition.khronos.egl.* versus android.opengl.* issue? Can I typecast javax.microedition.khronos.egl.EGL10 to android.opengl.EGL14, javax.microedition.khronos.egl.EGLDisplay to android.opengl.EGLDisplay, and javax.microedition.khronos.egl.EGLContext to android.opengl.EGLContext? What I have right now just seems ugly and doesn't feel right, although this proposed casting doesn't sit right either. Am I missing something?
Based on the description of what you're trying to do, both approaches sound way more complicated and inefficient than necessary.
The way I've always understood EGLImage, it's a mechanism for sharing images between different processes, and possibly different APIs.
For multiple OpenGL ES contexts in the same process, you can simply share the textures. All you need to do is make both contexts part of the same share group, and they can both use the same textures. In your use case, you can then have one thread rendering to a texture using an FBO, and then sample from it in the other thread. This way, there is no extra data copying, and your code should become much simpler.
The only slightly tricky aspect is synchronization. ES 2.0 does not have synchronization mechanisms in the API. The best you can probably do is call glFinish() in one thread (e.g. after your thread 2 finished rendering to the texture), and then use standard IPC mechanisms to signal the other thread. ES 3.0 has sync objects, which make this much more elegant.
My earlier answer here sketches some of the steps needed to create multiple contexts that are in the same share group: about opengles and texture on android. The key part of creating multiple contexts in the same share group is the 3rd argument to eglCreateContext, where you specify the context that you want to share objects with.
EDIT:
More debugging led me to the fact that glGetAttribLocation returns -1, except for the first start of the Application. Program ID is valid (I guess?), it was 12 in my testing right now. I also tried to retrieve attribute location right before drawing again, but this did not work out neither.
My shader "architecture" now looks like this:
I've turned the shader into a singleton. I.e. only one instance. Using it:
public void useProgram() {
GLES20.glUseProgram(iProgram);
getUniformLocations();
getAttributeLocations();
}
I.e. program will be sent to OpenGL, afterwards I'm retrieving uniform and attribute locations for all my variables, they are stored within a HashMap (one for each shader):
#Override
protected void getAttributeLocations() {
addToGLMap(A_NORMAL, GLES20.glGetAttribLocation(iProgram, A_NORMAL));
addToGLMap(A_POSITION, GLES20.glGetAttribLocation(iProgram, A_POSITION));
addToGLMap(A_COLOR, GLES20.glGetAttribLocation(iProgram, A_COLOR));
}
I don't understand, why the program's ID is for example 12, but all the attribute locations are non-existent in the second and the following run of my Application...
In my Application, I am loading a Wavefront object, as well as I am drawing several lines and cubes, just to try something. After starting the Application "clean", i.e. after rebooting or installing it, everything looks as intended. But if I close the Application and re-open it, it looks weird, screenshot is at the bottom.
What I'm currently doing:
onSurfaceCreated:
Taking care of culling, clear color, etc, etc.
Clear all loaded objects (just for testing, will of course not delete memory in later phase).
Reload objects (threaded).
My objects are stored like this:
public class WavefrontObject {
private FloatBuffer mPositionBuffer = null;
private FloatBuffer mColorBuffer = null;
private FloatBuffer mNormalBuffer = null;
private ShortBuffer mIndexBuffer = null;
}
Buffers are filled upon creation of the element.
They are drawn:
mColorBuffer.position(0);
mNormalBuffer.position(0);
mIndexBuffer.position(0);
mPositionBuffer.position(0);
GLES20.glVertexAttribPointer(mShader.getGLLocation(BaseShader.A_POSITION), 3, GLES20.GL_FLOAT, false,
0, mPositionBuffer);
GLES20.glEnableVertexAttribArray(mShader.getGLLocation(BaseShader.A_POSITION));
// etc...
GLES20.glDrawElements(GLES20.GL_TRIANGLES, mIndexBuffer.capacity(), GLES20.GL_UNSIGNED_SHORT, mIndexBuffer);
Do I need to disable the VertexAttribArrays after drawing them? I am currently overwriting the buffer for each drawing loop, but do they maybe interact with other models being drawn?
The model I am loading displays a small toy-plane. After restarting the Application, it looks like this (loading the object, all colors are set to white (for testing)):
So to me it looks like the buffers either have left-over stuff in them? What's the "best practice" for using these buffers? Disable the arrays? Does OpenGL ES2.0 offer some sort of "clear buffer" method that I can use before putting my values in them?
What was expected to be drawn: At the point where the "weird triangles" and colors origin from, there should be the plane-model. All in white.
When your application loses context its OpenGL context is destroyed.
So all objects (programs and its uniform/attribute handles, etc) are invalidated.
During reopening you have to clear/invalidate all singleton objects like yours...
Hey all I'm at a crossroads with my app that I've been working on.
It's a game and an 'arcade / action' one at that, but I've coded it using Surfaceview rather than Open GL (it just turned out that way as the game changed drastically from it's original design).
I find myself plagued with performance issues and not even in the game, but just in the first activity which is an animated menu (full screen background with about 8 sprites floating across the screen).
Even with this small amount of sprites, I can't get perfectly smooth movement. They move smoothly for a while and then it goes 'choppy' or 'jerky' for a split second.
I noticed that (from what I can tell) the background (a pre-scaled image) is taking about 7 to 8 ms to draw. Is this reasonable? I've experimented with different ways of drawing such as:
canvas.drawBitmap(scaledBackground, 0, 0, null);
the above code produces roughly the same results as:
canvas.drawBitmap(scaledBackground, null, screen, null);
However, if I change my holder to:
getHolder().setFormat(PixelFormat.RGBA_8888);
The the drawing of the bitmap shoots up to about 13 MS (I am assuming because it then has to convert to RGB_8888 format.
The strange thing is that the rendering and logic move at a very steady 30fps, it doesn't drop any frames and there is no Garbage Collection happening during run-time.
I've tried pretty much everything I can think of to get my sprites moving smoothly
I recently incorporated interpolation into my gameloop:
float interpolation = (float)(System.nanoTime() + skipTicks - nextGameTick)
/ (float)(skipTicks);
I then pass this into my draw() method:
onDraw(interpolate)
I have had some success with this and it has really helped smooth things out, but I'm still not happy with the results.
Can any one give me any final tips on maybe reducing the time taken to draw my bitmaps or any other tips on what may be causing this or do you think it's simply a case of Surfaceview not being up to the task and therefore, should I scrap the app as it were and start again with Open GL?
This is my main game loop:
int TICKS_PER_SECOND = 30;
int SKIP_TICKS = 1000 / TICKS_PER_SECOND;
int MAX_FRAMESKIP = 10;
long next_game_tick = GetTickCount();
int loops;
bool game_is_running = true;
while( game_is_running ) {
loops = 0;
while( GetTickCount() > next_game_tick && loops < MAX_FRAMESKIP) {
update_game();
next_game_tick += SKIP_TICKS;
loops++;
}
interpolation = float( GetTickCount() + SKIP_TICKS - next_game_tick )
/ float( SKIP_TICKS );
display_game( interpolation );
}
Thanks
You shouldn't use Canvas to draw fast sprites, especially if you're drawing a fullscreen image. Takes way too long, I tell you from experience. I believe Canvas is not hardware accelerated, which is the main reason you'll never get good performance out of it. Even simple sprites start to move slow when there are ~15 on screen. Switch to OpenGL, make an orthographic projection and for every Sprite make a textured quad. Believe me, I did it, and it's worth the effort.
EDIT: Actually, instead of a SurfaceView, the OpenGL way is to use a GLSurfaceView. You create your own class, derive from it, implement surfaceCreated, surfaceDestroyed and surfaceChanged, then you derive from Renderer too and connect both. Renderer handles an onDraw() function, which is what will render, GLSurfaceView manages how you will render (bit depth, render modes, etc.)
I'm developing a 2D videogame in Android, using JAVA and OpenGL-ES.
I'm having an issue with I think it's the Garbage Collector. Every few seconds, the game freezes, no matter what I'm doing.
I have been reading some documentation and so on about it, and I removed all the loop iterators I had, now I use for(i=0,...), among others solutions. The case is, it didn't do anything with the perfomance...
I have been looking my code and I found something I think that could be making problems, and it's the way I change between sprites in an animation.
For instance, I have a hero which can move pressing some keys. When it walks, his sprite changes between frames. To do this, I move the texture buffer to aim the part of the image I want. And every time it does, I use this function:
protected void SetTextureBuffer(float xo, float xf, float yo, float yf) {
float textureVertexs[] = {
xo, yf, // top left
xf, yf, // bottom left
xf, yo, // top right
xo, yo // bottom right
};
// a float has 4 bytes so we allocate for each coordinate 4 bytes
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(textureVertexs.length * 4);
byteBuffer.order(ByteOrder.nativeOrder());
// allocates the memory from the byte buffer
textureBuffer = byteBuffer.asFloatBuffer();
// fill the vertexBuffer with the vertices
textureBuffer.put(textureVertexs);
// set the cursor position to the beginning of the buffer
textureBuffer.position(0);
}
It allocates memory to create the buffer every time is called, which could be a lot of times every second, since there are more entities with animations...
Do you think this could be a problem? Or Am I looking this wrongly? If this is a problem... how could I solve this in another more efficient way?
Try removing these lines:
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(textureVertexs.length * 4);
byteBuffer.order(ByteOrder.nativeOrder());
textureBuffer = byteBuffer.asFloatBuffer();
Allocate textureBuffer once in your class constructor/initializer. No need to recreate the buffer again since you don't need to keep previous data. So you just overwrite the old data in the same buffer.
I want to do an "chain" or circular loop of animations as can be described below:
LABEL start
do Anim1->Anim2->Anim3->Anim4
GOTO start
The above will do a circular loop: Anim1->Anim2->Anim3->Anim4 and back to Anim1 and so on.
I am not able to merge all the PNGs in one Texture because Andengine/Android is limited in loading the resources. However, when I split my initial large tile into 4 smaller tiles, everything works fine.
I tried to use an AnimationListener inside Anim1. When onAnimationFinished() is called, I detach Anim1, and run Anim2 and do this in chain of inner functions. However, when I am in Anim4, I do not know how to go back to the start and attach Anim1.
Note: All this problem could be solved if you know how I can pack a set of 150 PNGs that individually quite large but fit in a tile of 4096x4096 px.
Thank you for your help!
EDIT (following JiMMaR's proposed solution):
I am using Texture Packer and the overall Texture exceeds 4096*4096, causing an OutOfMemory error on Android.
At the moment, I have split the Textures into four tiles and I four PNG tilemaps.
You can use 'Texture Packer' to see if all your images can fit in 4096 x 4096.
You can download Texture Packer from here. (NOTE: Texture Packer supports AndEngine data output too)
You can use "BuildableBitmapTextureAtlas" and "BlackPawnTextureAtlasBuilder" classes to pack your PNGs into one texture atlas.
You should post some code so we can see the implementation.
Try to use several AnimatedSprites with animation listeners in each one. This way you can start the animation of the next sprite in the onAnimationFinished() call.
private class AnimationLooperListener implements IAnimationListener{
private AnimatedSprite nextSpriteToAnimate;
public AnimationLooperListener(AnimatedSprite sprite){
nextSpriteToAnimate = sprite;
}
/** other methods are hidden */
public void onAnimationFinished(AnimatedSprite sprite){
sprite.setVisible(false);
nextSpriteToAnimate.setVisible(true);
nextSpriteToAnimate.animate(100, new AnimationLooperListener(sprite);
}
}
AnimatedSprite sprite1 = new AnimatedSprite(0, 0, tiledTextureRegion, vertex);
AnimatedSprite sprite2 = new AnimatedSprite(0, 0, tiledTextureRegion, vertex);
AnimatedSprite sprite2 = new AnimatedSprite(0, 0, tiledTextureRegion, vertex);
AnimationLooperListener listener1 = new AnimationLooperListener(sprite2);
AnimationLooperListener listener2 = new AnimationLooperListener(sprite3);
AnimationLooperListener listener3 = new AnimationLooperListener(sprite1);
sprite1.animate(100, listener1);
sprite2.animate(100, listener2);
sprite3.animate(100, listener3);
This way, you have an animation loop, between several sprites that can be created using several TiledTextureRegions.