I'm trying to use Android and OpenGL 2.0 to create a sort-of desert racing game. At least that's the end goal. For the time being I'm really just working with generating an endless desert, through the use of a Perlin noise algorithm. However, I'm coming across a lot of problems with regard to concurrency and synchronization. The program consists of three threads: a "render" thread, a "geometry" thread which essentially sits in the background generating tiles of perlin noise (eventually sending them through to the render thread to process in its own time) and a "main" thread which updates the camera's position and updates the geometry thread if new perlin noise tiles need to be created.
Aforementioned perlin tiles are stored in VBOs and only rendered when they're within a certain distance of the camera. Buffer initialization always begins immediately.
This all works well, without any noticeable problems.
HOWEVER.
When the tiles are uploaded to the GPU through glBufferData() (after processing by the separate geometry thread), the render thread always appears to block. I presume this is because Android implicitly calls glFinish() before the screen buffer is rendered. Obviously, I'd like the data uploading to be performed in the background while everything else is being drawn - even taking place over multiple frames if necessary.
I've looked on google and the only solution I could find is to use glMapBuffer/glMapBufferRange(), but these two methods aren't supported in GLES2.0. Neither are any of the synchronization objects - glFenceSync etc. so...
....
any help?
P.S. I haven't provided any code as I didn't think it was necessary, as the problem seems more theoretical to me. However I can certainly produce some on request.
A screenshot of the game so far:
http://i.stack.imgur.com/Q6S0k.png
Android does not call glFinish() (glFinish() is actually a no-op on IMG's GPUs). The problem is that glBufferData() is not an asynchronous API. What you really want is PBOs which are only available in OpenGL ES 3.0 and do offer the ability to perform asynchronous copies (including texture uploads.)
Are you always using glBufferData()? You should use glBufferSubData() as much as possible to avoid reallocating your VBO every time.
Related
I am building a game like application using android NDK and openGL ES 2.0
So far I understand the concept of vertices and shaders and programs.
The main game loop would be a loop in a single thread as follows
step 1. Read all the user input
step 2. Update game objects (if needed) based on the input
step 3. make draw calls for all the objects
step 4. call glSwapBuffers
then loop back to step 1
But I ran into various confusions regarding sync and threading so I'm listing all the question together.
1.Since open GL draw calls are asynchronous the draw calls and glSwapBuffers may get called many times before the gpu has even rendered actually a single frame from calls from last iteration of loop. Will this be problematic? buffer overflow or tearing ?
2.Assuming VSYNC is enabled then does point 1 still causes problem?
3.Since all calls are async how do I measure the time spent rendering each frame? glSwapBuffers would return immediately so how can I know when was the frame actually done?
4.loading textures will occupy space in the ram is checking free memory before loading texture standard way or I should keep loading textures until I reach OUT_OF_MEMORY_ERROR?
5.If I switch to multithreaded approach calling just glswapbuffers at a fixed 60 times per second without any regard to the thread which is processing input and giving out draw calls then what is supposed to happen?
Also how do I control the fps in game loop? I know the exact fps depends on a large no of factors but how can you go close to that
The SwapBuffers() will not be executed out of order. Issuing it after all of the draw commands for the frame is fine. The driver will take care about it, you don't need to sync anything. You can only screw this about by using multiple threads or multiple contexts, but even that would take a lot of effort.
There is no problem with 1, and VSYNC does not directly change anything here.
The calls might be asynchronous, but the driver will not queue up an unlimit amount of work. Sooner or later, it will have to block, if you try to issue too many calls in advance. When vsync is on, the typicial behavior is that the driver will queue up at most a few frames (or just one, depending on the driver settings), and SwapBuffers() will block when that limit is reached. So the timing statistics you get there are accurate, after the first few frames. Note that this is still much better than completely flushing the queue, as the driver unblocks as soon as the first pending buffer swap was carried out.
This is a totally new topic, which probably belongs into another question. However: It is very unlikely that you get any of the current desktop GL implementations to ever generate GL_OUT_OF_MEMORY. The driver will automatically page textures (and other objects) between VRAM and system RAM (and the OS might even page that to disk). The GL also provides no means to query the available memory.
In that scenario, you will need to synchronize manually. That approach does not make the slightest sense and seems like trying to solve a problem which does not exist. If you want your game to use multithreading, still put all the gl rendering (and swapbuffers) into the same thread. You can use different threads for input processing, sound, physics, update of the scene, general game logic and whatever. But you should just use a single thread/single context approach for the GL. That way, it also won't hurt you when SwapBuffers() blocks your render thread, as your game logic and input handling is still done, and the render thread will just render new frames with the newest available data in the frequency the display needs (with vsync on) or as fast as the CPU and GPU can work (if vsync is off).
I have imported into Eclipse Juno the sample of OpenCv4Android, ver. 2.4.5, called "cameracontrol". It can be found here:Camera Control OpenCv4Android Sample.
Now I want to use this project as base for mine. I want to process every frame with image-processing techniques, so, in order to improve performance, I want to split the main activity of the project in two classes: one that is merely the activity and one (a thread) that is responsible for preview.
How can I do? Are there any examples about this?
This might not be a complete answer as I'm only learning about this myself at the moment, but I'll provide as much info as I can.
You will likely have to grab the images from the camera yourself and dispatch it to threads. This is because your activity in the example gets called with a frame from the camera and has to return the frame to be displayed (immediately) as the return value. You can't get 2+ frames to process in parallel without showing a blank screen in the meantime or some other hacky stuff. You'll probably want to allocate a (fixed sized) buffer somewhere, then start processing a frame with a worker thread when you get one (this would be the dispatcher). Once your worker thread is done he notifies the dispatcher who gives the image to the view. If frames come from the camera while all worker threads are busy (i.e. there are no free slots in the buffer), the frame is dropped. Once a space in the buffer frees up again, the next frame is accepted and processed.
You can look at the code of the initialitzeCamera function of JavaCameraView and NativeCameraView to get an idea of how to do this (also google should help, as this is how apps without OpenCV have to do it as well). For me the native camera performs significantly better though (even without heavy processing it's just much smoother), but ymmv...
I can't help with actual details about the implementation since I'm not that far into it myself yet. I dope this provides some ideas though.
I am using the Rajawali framework to make OpenGL ES based live wallpapers. To achieve many of my animation effects I have created some functions that are called from the onDrawFrame method. These functions vary from simple x,y,z rotations to more complicated equations with conditional statements that simulate wind or other randomized motions. It works well currently, and it is highly responsive especially when sleeping and waking the device.
As my live wallpapers become more complex I am worried that my crude solution will eventually start causing performance issues. Is that the case?
Is there a better way to make these types of cyclical or repetitive changes, like maybe making background threads?
Try to move as much computing as posible into vertex and fragment shaders. It's easier to hit other limits than CPU limits. Nobody can answer your question easily without examples and without knowing in depth what are you trying to archive.
Best TIP: Measure often and early during development.
Hope that helps.
Rajawali onDrawFrame() method is called by a timer, which is set with the frameRate() method. This gives much smoother animation than estimating thread sleeps (an alternate solution which is not advised).
If you are at 100fps, this gives you 10ms in between frames to do your calculations - that's it, but that is a lot of time to a modern CPU.
Putting the GPU calculations in a separate thread is problematic since this is not thread safe. You could put any non-GPU calcs in a separate thread, but this could result in sync issues.
Shaders are compiled into native code, but really should be limited to bitwise (integer) operations required to render the actual pixels (always the bottleneck) after all the expensive 3D calcs are done.
For example, an anaglyph shader would take care of overlaying two stereoscopic frames.
Hope this helps!
I am rendering objects via OpenGL, and got a nice smooth framerate of 60fps in most situations.
UNTIL I do something heavy in a background thread, like fetching stuff from a REST API, processing it, and adding objects to the graph (low-priority stuff, I care more about UI fluidity). The renderer will then pause for a very long period, up to 1 second (ca. as long as the background thread runs), and then resume as if nothing had happened. I noticed this because an animation is started at the same time, and it gets stuck for this period. The background thread is set to minimum priority, and garbage collection does take up to 100-200ms, but not the whole second. When I set a debug point anywhere in the background task, rendering continues just fine, without any delays.
Is it possible that my heavy background thread starves the OpenGL thread? If so, what can I do?
Of course! A GPU needs to be fed data, and that's done by the CPU. So, if something in the system bottlenecks, like I/O or CPU processing, then the GPU can't get fed. For example, animation is traditionally done on the CPU. This is why you get a lot of games on the PC getting higher frame rates with the same graphic chips but with different CPUs.
I also agree that profiling is a very good idea. If you can, I would suggest profiling to make sure it's actually the REST call, or if the REST call is one of many things
One thing I noticed about REST processing, and this happened to me. Since REST sometimes processes a lot of strings, and if you don't use StringBuilder, you can end up firing off a lot of garbage collection. However, it doesn't sound like you are getting this.
My question is about game loop design in opengl.
I believe that game loop can be spited in 3 main steps
accept input - it may come from users click on the screen or system alert, or anything else.
do logic
update view
To make it simple let's just focus on step 2 and 3.
I think that it will be wrong to run them in parallel order or to mix them in to one step.
I give you an example.
Lats say you are creating a war game, and you need to draw 100 soldiers fighting on the screen, the logic will be to update their position, to update the background area, and then you need to draw the result. You can't start drawing one soldier before updated the position of another soldier.
So according this simple steps, it is clear that step 2 and 3 need to be synchronized some how, and step 2 must be done before step 3.
Now tell me how it is possible to run game loop on more then one thread, and more then one process? Does opnegl use multiCore? How?
Edit: one way to use multithreading is to precalculate the game logic, or in another words using Vectors. But there are two big disadvantages in vectors that make them almost unrecommend to use.
Some times you wan't to change your vector, so there were lots of calculation that you did and you are not going to use them
in most of the cases you are trying to reach 60+ FPS witch mean 16 milliseconds for game-loop, switching threads requires some kind of synchronization, any synchronization is bad for performance, from what I saw, even a simple Handler.post() in android(just adding task to queue to run it on other thread) may take up to 3 milliseconds(18% from your frame rate time), so unless your calculation take longer then that, don't do it! For now I did not found anything taking so much time.
Now tell me how it is possible to run game loop on more then one thread
The idea of multicore computing is parallelization, i.e. splitting up computational intensive tasks into independent working sets that can be processed in parallel. Games have surprisingly little space for parallelization, as you found out yourself.
The usual way to use multiple cores in games is to parallelize I/O with logic operations. I.e. doing all the networking and user interaction in one thread, AI and scene management in another. Sound is usually parallelized away, too.
Does OpenGL use multiCore? How?
The OpenGL specification doesn't specify this. But some implementations may choose to utilize multiple cores. though it's quite unlikely. Why? Because it creates unneccessary cache management overhead.
I will try to explain graphically, why 2 threads, one for rendering and one for logic are much better and doesn't result in inconsistencies:
The single threaded design you proposed would run as follows:
logic |-----| |-----|
graphics |-----| |-----| and so forth.
but as you have multiple cpus, you should utilize them. That means, after the first time the logic has finished, the graphics can render that game state. meanwhile, the next logic step can be calculated:
logic |-----||-----||-----|
graphics |-----||-----| and so forth.
As you see, you can almost double your performance in this way: of course there is some overhead, but it will be faster than the single threaded version anyway.
Also, you can assume, that the game logic will take less time calculating then the render thread: This makes the threading quite easy, because you can wait in the logic thread for a call back from the renderthread to give the render thread new instructions and then calculating the next logic step.