I am building a game like application using android NDK and openGL ES 2.0
So far I understand the concept of vertices and shaders and programs.
The main game loop would be a loop in a single thread as follows
step 1. Read all the user input
step 2. Update game objects (if needed) based on the input
step 3. make draw calls for all the objects
step 4. call glSwapBuffers
then loop back to step 1
But I ran into various confusions regarding sync and threading so I'm listing all the question together.
1.Since open GL draw calls are asynchronous the draw calls and glSwapBuffers may get called many times before the gpu has even rendered actually a single frame from calls from last iteration of loop. Will this be problematic? buffer overflow or tearing ?
2.Assuming VSYNC is enabled then does point 1 still causes problem?
3.Since all calls are async how do I measure the time spent rendering each frame? glSwapBuffers would return immediately so how can I know when was the frame actually done?
4.loading textures will occupy space in the ram is checking free memory before loading texture standard way or I should keep loading textures until I reach OUT_OF_MEMORY_ERROR?
5.If I switch to multithreaded approach calling just glswapbuffers at a fixed 60 times per second without any regard to the thread which is processing input and giving out draw calls then what is supposed to happen?
Also how do I control the fps in game loop? I know the exact fps depends on a large no of factors but how can you go close to that
The SwapBuffers() will not be executed out of order. Issuing it after all of the draw commands for the frame is fine. The driver will take care about it, you don't need to sync anything. You can only screw this about by using multiple threads or multiple contexts, but even that would take a lot of effort.
There is no problem with 1, and VSYNC does not directly change anything here.
The calls might be asynchronous, but the driver will not queue up an unlimit amount of work. Sooner or later, it will have to block, if you try to issue too many calls in advance. When vsync is on, the typicial behavior is that the driver will queue up at most a few frames (or just one, depending on the driver settings), and SwapBuffers() will block when that limit is reached. So the timing statistics you get there are accurate, after the first few frames. Note that this is still much better than completely flushing the queue, as the driver unblocks as soon as the first pending buffer swap was carried out.
This is a totally new topic, which probably belongs into another question. However: It is very unlikely that you get any of the current desktop GL implementations to ever generate GL_OUT_OF_MEMORY. The driver will automatically page textures (and other objects) between VRAM and system RAM (and the OS might even page that to disk). The GL also provides no means to query the available memory.
In that scenario, you will need to synchronize manually. That approach does not make the slightest sense and seems like trying to solve a problem which does not exist. If you want your game to use multithreading, still put all the gl rendering (and swapbuffers) into the same thread. You can use different threads for input processing, sound, physics, update of the scene, general game logic and whatever. But you should just use a single thread/single context approach for the GL. That way, it also won't hurt you when SwapBuffers() blocks your render thread, as your game logic and input handling is still done, and the render thread will just render new frames with the newest available data in the frequency the display needs (with vsync on) or as fast as the CPU and GPU can work (if vsync is off).
Related
Im working at a simple 2D Game with Custom View canvas drawing (postInvalidate()) and HardwareAcceleration. After weeks of performance analysis i decided to sync my update and drawing operations with the VSYNC pulse over the Interface Choreographer.FrameCallback. Im thinking thats the right way to get smooth movements.
However im still experiencing choppy movements. I analyzed it with systrace and noticed that is has something to do with my BufferQueue. As soon as double buffering sets in, the frame time exceeds the 16ms. I made a screenshot of my trace with some explanations:
The whole draw operation waits for the buffer release of the SurfaceFlinger (consumer) to dequeue its own new empty Buffer.
Can you tell me if this is a regular behavior or what could be the reason for this?
On your graph, you have a note, "SurfaceFlinger misses VSYNC".
However, if you look at the BufferQueue row, you can see that the buffer arrived after the VSYNC deadline. SurfaceFlinger woke up, but there was nothing to do.
Your app then provided an additional buffer, which meant you had two buffers pending. Since you continued to provide a buffer on every VSYNC, the queue never got back down to zero buffers. With the queue stuffed full, every attempt to add additional buffers results in blocking.
FWIW, your BufferQueue is triple-buffered: two are in the queue, one is on the display.
There are a few things you can do:
Have the app drop frames if you've missed the deadline.
Specify a presentation time for the frames so SurfaceFlinger will drop them if the time is passed.
Deliberately drop a frame every once in a while to let the queue empty. (Not the preferred approach.)
#2 only works with GLES on a SurfaceView, so we can ignore that one.
#1 might work for you; you can see an example in Grafika. It essentially says, "if the next VSYNC is firing in less than 2ms, or has already fired, don't bother rendering the current frame." The View/invalidate approach doesn't give you the same fine-grained control that GLES does though, so I'm not sure how well that will work.
The key to smooth animation on a busy device isn't hitting every frame at 60fps. The key is to make your updates based on delta time, so things look smooth even if you drop a frame or two.
For additional details on the graphics architecture, see this doc.
I recently upgraded my old Galaxy S2 phone to a brand new Galaxy S7, and was very surprised to find an old game I wrote seemed to be performing worse on the new phone. After cutting everything down to a bare bones project, I have discovered the problem - the GLES20.glFinish() call I was performing at the end of every onDrawFrame. With this in there, with a glClear but no draw calls, the FPS hovered around 40. Without the glFinish, solid 60 FPS. My old S2 had solid 60 FPS regardless.
I then went back to my game, and removed the glFinish method call, and sure enough performance went back to being perfect and there was no obvious downside to its removal.
Why was glFinish slowing down my frame rate on my new phone but not my old phone?
I think a speculative answer is as good as it's going to get, so — apologies for almost certainly repeating a lot of what you already know:
Commands sent to OpenGL go through three states, named relative to the GPU side of things:
unsubmitted
submitted but pending
completed
Communicating with the code running the GPU is usually expensive. So most OpenGL implementations accept your calls and just queue the work up inside your memory space for a while. At some point it'll decide that a communication is justified and will pay the cost to transfer all the calls at once, promoting them to the submitted state. Then the GPU will complete each one (potentially out-of-order, subject to not breaking the API).
glFinish:
... does not return until the effects of all previously called GL
commands are complete. Such effects include all changes to GL state,
all changes to connection state, and all changes to the frame buffer
contents.
So for some period when that CPU thread might have been doing something else, it now definitely won't. But if you don't glFinish then your output will probably still appear, it's just unclear when. glFlush is often the correct way forwards — it'll advance everything to submitted but not wait for completed, so everything will definitely appear shortly, you just don't bother waiting for it.
OpenGL bindings to the OS vary a lot; in general though you almost certainly want to flush rather than finish if your environment permits you to do so. If it's valid to neither flush nor finish and the OS isn't pushing things along for you based on any criteria then it's possible you're incurring some extra latency (e.g. the commands you issue one frame may not reach the GPU until you fill up the unsubmitted queue again during the next frame) but if you're doing GL work indefinitely then output will almost certainly still proceed.
Android sits upon EGL. Per the spec, 3.9.3:
... eglSwapBuffers and eglCopyBuffers perform an implicit flush operation
on the context ...
I therefore believe that you are not required to perform either a flush or a finish in Android if you're double buffering. A call to swap the buffers will cause a buffer swap as soon as drawing is complete without blocking the CPU.
As to the real question, the S7 has an Adreno 530 GPU. The S2 has a Mali T760MP6 GPU. The Malis are produced by ARM, the Adrenos by Qualcomm, so they're completely different architectures and driver implementations. So the difference that causes the blocking could be almost anything. But it's permitted to be. glFinish isn't required and is a very blunt instrument; it's probably not one of the major optimisation targets.
I'd like to calculate FPS to detect performance issue of an application based on existing Android profiling tool .
I noted that on Systrace, it can record the length of performTraversals. As far as I know, performTraversals performs measure, layout and draw, which include most of jobs when updating a frame. So can performTraversals be representative enough to measure whether a frame will take 60 ms to update?
I also noted that Systrace record the time spending on SurfaceFlinger. I know SurfaceFlinger served for rendering purpose, but I don't know the exact beginning point and ending point of a frame. Should I also considering the time spent on SurfaceFlinger to the frame rate? (Though I do observe that SurfaceFlinger perform more frequently than performTraversals, which means SurfaceFlinger may not necessarily follow performTraversals. It will also be triggered in other scenarios.)
P.S. I'm aware of the sysdump gfxinfo, but it can only record 128 frames(~2 seconds), while what I want may last much longer.
Systrace is not useful for measuring FPS overall, but you can do that trivially with a frame counter and System.nanoTime(). If you're not hitting your target framerate, though, it can help you figure out why not.
The official docs provide some useful pointers, but there's a lot of information and the interactions can be complex. The key things to know are:
The device display panel generates a vsync signal. You can see that on the VSYNC line. Every time it transitions between 1 and 0 is a refresh.
The vsync wakes surfaceflinger, which gathers up the incoming buffers for the various windows and composites them (either itself using OpenGL ES, or through the Hardware Composer).
If your app was running faster than the panel refresh rate (usually 60fps), it will have blocked waiting for surfaceflinger (in, say, eglSwapBuffers()). Once surfaceflinger acquires the buffer, the app is free to continue and generate another frame.
Unless you're rendering offscreen, you can't go faster than surfaceflinger.
As of Android 4.3 (API 18) you can add your own events to the systrace output using the android.os.Trace class. Wrapping your draw method with trace markers can be extremely informative. You have to enable their tag with systrace to see them.
If you want to be running at 60fps, your rendering must finish in well under 16.7ms. If you see a single invocation of performTraversals taking longer than that, you're not going to hit maximum speed.
I'm trying to use Android and OpenGL 2.0 to create a sort-of desert racing game. At least that's the end goal. For the time being I'm really just working with generating an endless desert, through the use of a Perlin noise algorithm. However, I'm coming across a lot of problems with regard to concurrency and synchronization. The program consists of three threads: a "render" thread, a "geometry" thread which essentially sits in the background generating tiles of perlin noise (eventually sending them through to the render thread to process in its own time) and a "main" thread which updates the camera's position and updates the geometry thread if new perlin noise tiles need to be created.
Aforementioned perlin tiles are stored in VBOs and only rendered when they're within a certain distance of the camera. Buffer initialization always begins immediately.
This all works well, without any noticeable problems.
HOWEVER.
When the tiles are uploaded to the GPU through glBufferData() (after processing by the separate geometry thread), the render thread always appears to block. I presume this is because Android implicitly calls glFinish() before the screen buffer is rendered. Obviously, I'd like the data uploading to be performed in the background while everything else is being drawn - even taking place over multiple frames if necessary.
I've looked on google and the only solution I could find is to use glMapBuffer/glMapBufferRange(), but these two methods aren't supported in GLES2.0. Neither are any of the synchronization objects - glFenceSync etc. so...
....
any help?
P.S. I haven't provided any code as I didn't think it was necessary, as the problem seems more theoretical to me. However I can certainly produce some on request.
A screenshot of the game so far:
http://i.stack.imgur.com/Q6S0k.png
Android does not call glFinish() (glFinish() is actually a no-op on IMG's GPUs). The problem is that glBufferData() is not an asynchronous API. What you really want is PBOs which are only available in OpenGL ES 3.0 and do offer the ability to perform asynchronous copies (including texture uploads.)
Are you always using glBufferData()? You should use glBufferSubData() as much as possible to avoid reallocating your VBO every time.
My question is about game loop design in opengl.
I believe that game loop can be spited in 3 main steps
accept input - it may come from users click on the screen or system alert, or anything else.
do logic
update view
To make it simple let's just focus on step 2 and 3.
I think that it will be wrong to run them in parallel order or to mix them in to one step.
I give you an example.
Lats say you are creating a war game, and you need to draw 100 soldiers fighting on the screen, the logic will be to update their position, to update the background area, and then you need to draw the result. You can't start drawing one soldier before updated the position of another soldier.
So according this simple steps, it is clear that step 2 and 3 need to be synchronized some how, and step 2 must be done before step 3.
Now tell me how it is possible to run game loop on more then one thread, and more then one process? Does opnegl use multiCore? How?
Edit: one way to use multithreading is to precalculate the game logic, or in another words using Vectors. But there are two big disadvantages in vectors that make them almost unrecommend to use.
Some times you wan't to change your vector, so there were lots of calculation that you did and you are not going to use them
in most of the cases you are trying to reach 60+ FPS witch mean 16 milliseconds for game-loop, switching threads requires some kind of synchronization, any synchronization is bad for performance, from what I saw, even a simple Handler.post() in android(just adding task to queue to run it on other thread) may take up to 3 milliseconds(18% from your frame rate time), so unless your calculation take longer then that, don't do it! For now I did not found anything taking so much time.
Now tell me how it is possible to run game loop on more then one thread
The idea of multicore computing is parallelization, i.e. splitting up computational intensive tasks into independent working sets that can be processed in parallel. Games have surprisingly little space for parallelization, as you found out yourself.
The usual way to use multiple cores in games is to parallelize I/O with logic operations. I.e. doing all the networking and user interaction in one thread, AI and scene management in another. Sound is usually parallelized away, too.
Does OpenGL use multiCore? How?
The OpenGL specification doesn't specify this. But some implementations may choose to utilize multiple cores. though it's quite unlikely. Why? Because it creates unneccessary cache management overhead.
I will try to explain graphically, why 2 threads, one for rendering and one for logic are much better and doesn't result in inconsistencies:
The single threaded design you proposed would run as follows:
logic |-----| |-----|
graphics |-----| |-----| and so forth.
but as you have multiple cpus, you should utilize them. That means, after the first time the logic has finished, the graphics can render that game state. meanwhile, the next logic step can be calculated:
logic |-----||-----||-----|
graphics |-----||-----| and so forth.
As you see, you can almost double your performance in this way: of course there is some overhead, but it will be faster than the single threaded version anyway.
Also, you can assume, that the game logic will take less time calculating then the render thread: This makes the threading quite easy, because you can wait in the logic thread for a call back from the renderthread to give the render thread new instructions and then calculating the next logic step.