I'm developing an app that renders the camera preview straight to a custom GlSurfaceView I have created.
It's pretty basic for someone who uses OpenGL on regular base.
The problem I'm experiencing is a low fps on some of the devices and I came with a solution - to choose which shader to apply on runtime. Now, I don't want to load a OpenGl program, measure the fps and than change the program to a lighter shader because it would create definite lags.
What I would like to do is somehow determine the GPU strength before I'm linking the GL program(Right after I'm creating the openGL context).
After some hours of investigation I pretty much understood that it won't gonna be very easy - mostly because the rendering time depends on hidden dev elements like - device gpu memory, openGL pipeline which might be implemented differently for different devices.
As I see it I have only 1 or 2 options, render a texture off-screen and measure its rendering time - if its takes longer that 16 millis(recommended fps time by Romain Guy from this post) I'll use the lighter shader.
Or - checking OS version and available RAM (though it really inaccurate).
I really hope for a more elegant and accurate solution.
Related
I created an application with Starling, on the new mobile devices it performs amazingly well, however on the older devices (e.g. iPhone 4) I encounter a very odd lag.
I have as far as I can tell a completely static situation:
There are quite a few display objects added to stage, many of them are buttons in case it matters, their properties are not changed at all after initialization (x, y, rotation, etc...).
There are no enterframes / timeouts / intervals / requests of any kind in the background.
I'm not allocating / deallocating any memory.
In this situation, there's an average of 10 FPS out of 30, which is very odd.
Since Starling is a well established framework, I imagine it's me who's doing something wrong / not understanding something / not aware of something.
Any idea what might be causing it?
Has anyone else experienced this sort of problem?
Edit:
After reading a little I've made great optimizations in every possible way according to this thread:
http://wiki.starling-framework.org/manual/performance_optimization
I reduced the draw calls from around 90 to 12, flattened sprites and set blendmode to none in specific cases to ease on CPU, and so on...
To my surprise when I tested again, the FPS was unaffected:
fps: 6 / 60
mem: 19
drw: 12
Is it even possible to get normal fps with Starling on mobile? What am I missing?
I am using big textures that are scaled down to the size of the device, is it possible that such a thing affects the fps that much?
Regarding "Load textures from files/URLs", I'm downloading different piles of assets for different situations, therefore I assumed compiling each pile into a SWF would be way faster than sending a separate request for each file. The problem is, for that I can only use embed, which apparently uses twice the memory. Do you have any solution in mind to enjoy the best of both worlds?
Instead of downloading 'over-the-wire' your assets and manually caching them for re-use, you can embed the assets into your app bundle vs. embedding them and then use the Starling AssetManager to load the textures at the resolution/scale that you need for the device:
ie.
assets.enqueue(
appDir.resolvePath("audio"),
appDir.resolvePath(formatString("fonts/{0}x", scaleFactor)),
appDir.resolvePath(formatString("textures/{0}x", scaleFactor))
);
Ref: https://github.com/Gamua/Starling-Framework/blob/master/samples/scaffold_mobile/src/Scaffold_Mobile.as
Your application bundle gets bigger of course, but you do not take the 2x ram hit of using 'embed'.
Misc perf ideas from my comment:
Testing FPS with "Release" mode correct?
Are you using textures that are scaled down to match the resolution of the device before loading them?
Are you mixing BLEND modes that are causing additional draw calls?
Ref: The Performance Optimization is great reading to optimize your usage of Starling.
Starling is not a miracle solution for mobile device. There's quite a lot of code running in the background in order to make the GPU display anything. You the coder has to make sure the amount of draw call is kept to a minimum. The weaker the device and the less draw call you should force. It's not rare to see people using Starling and not pay any attention to their draw calls.
The size of graphics used is only relevant for the GPU upload time and not that much for the GPU display time. So of course all relevant texture need to be uploaded prior to displaying any scenes. You simply cannot try to upload any new texture while any given scene is playing. Even a small texture uploading will cause idling.
Displaying everything using Starling is not always a smart choice. In render mode the GPU gets a lot of power but the CPU still has some remaining. You can reduce the amount of GPU uploading and GPU charge by simply displaying static UI elements using the classic display list (which is where the Staling framework design is failing). Starling was originally made to make it very difficult to use both display system together that's one of the downsides of using this framework. Most professional I know including myself don't use Starling for that reason.
Your system must be flexible and you should embed your assets for mobile and not use any external swf as much as possible and be able to switch to another system for the web. If you expect to use one system of asset for mobile/desktop/web version of your app you are setting yourself up for failure. Embedding on mobile is critical for memory management as the AIR platform internally manages the cache of those embedded assets. Thx to that when creating new instances of those assets the memory consumption stays under control, if you don't embed then you are on your own.
Regarding overall performance a very weak Android device will probably never be able to go passed 10 fps when using Starling or any Stage3D framework because of the amount of code those framework need to run (draw calls) in the background. On weak device that amount of code is already enough to completely overload the CPU. On the other hand on weak device you can still get a good performance and user experience by using GPU mode instead of render mode (so no Stage3D) and displaying mostly raster graphic.
IN RESPONSE TO YOUR EDIT:
12 draw calls is very good (90 was pretty high).
That you still get low FPS on some device is not that surprising. Especially low end Android device will always have low FPS in render mode with Stage3D framework because of the amount of code that those framework have to run to render one frame. Now the size of the texture you are using should not affect the FPS that much (that's the point of Stage3D). It would help with the GPU uploading time if you reduce the size of those graphics.
Now optimization is the key and optimizing on low end device with low FPS is the best way to go since whatever you do will have great effect on better device as well. Start by running tests and only displaying static graphics with no or very little code on your part just to see how far the Stage3D framework can go on its own on those weak device without losing any FPS and then optimize from there. The amount of object displayed on screen + the amount of draw calls is what affects FPS with those Stage3D framework so keep a count of those and always seek ways to reduce it. On some low end device it's not practical to try to keep a 60fps so try to switch to 30 and adjust your rendering accordingly.
I am constantly getting 60 frames per second and doing no CPU intensive operations on a simple application displaying a single rotating triangle. Unfortunately, when using OpenGL ES 2.0 on a Samsung Galaxy Express I am recieving slight hiccups in the rendering, as if some frames are not being drawn.
The funny thing is that with OpenGL ES 1.0 there is no such hiccup, so I know it does not have to do with the rotation method or the use of System.nanoTime () as a measure of the elapsed time between frames. I use the exact same method I both, and as per my research here and with "fixing your timestep" have smoothed the game loop. I get the same results.
To make matters even funnier, the spinning triangle example from Google's own developer.android.com introduction to OpenGL has the same issue, as well as the Play Store examples from the website 'learnopengles' both of which are rendering a simple spinning triangle.
As per research into similar threads here, I have tried continuous rendering as well as dirty rendering. The slight flicker remains.
After 3 days I am wondering whether it is my device, or whether all OpenGL ES 2.0 applications have a slight stutter that is more easily noticeable on a single spinning triangle.
I have no other test device and no money for one, so I cannot say whether it is a problem with my device, or the Samsung Galaxy Express in general, or something else.
Is there anything else I can do to fix this issue?
Is this slight stutter normal behavior?
Are there any examples of code I can test myself that does not exhibit this behavior?
Does the Samsung Galaxy express have a known issue with OpenGL 2?
Thank you for reading.
I'm writing an OpenGL ES 2.0 game in C++ running on multiple (mobile) platforms.
On IOS,Android,.. basically everything runs fine, except one device:
The computation for one frame in a specific scene take about 8ms on an HTC Desire.
Another device, a Samsung Galaxy Nexus, which is much newer takes 18-20ms.
I digged into the problem and found out that it is related to enable/disable of GL_DEPTH_TEST.
if I comment out all glEnable(GL_DEPTH_TEST)/glDisable(GL_DEPTH_TEST) calls the time needed for one frame drops to 1-2ms on the Nexus.
so I optimized the glEnable()/glDisable() calls to occur only when absolutely needed. I have 3d and 2d parts in my scene and therefore need to render 2d without depth_test and 3d with depth test.
currently I enable depth test, draw 3d, disable depth test, draw 2d
but still the computation takes 18-20ms on the nexus
I also checked if the depth buffer is cleared more than needed. But it is just cleared at the start of the frame.
Is it possible that the switch of the depth test takes that much time?
Does anyone have other ideas what can be checked?
UPDATE
I found out that the 3d object I render is somehow responsible for the slow computation.
If I remove the 3d object the performance is good
but the same 3d object is used in another scene without causing such trouble
and the weirdest thing: the Nexus runs Android 4.2 and it has an option in the developer options to visualize the cpu load as an overlay. If I enable this setting and start the game, the computation time is 5-6ms instead of 18-20ms. How can this be related?
I've written an OpenGL live wallpaper for Android that uses 17 pixel and 17 vertex shaders. On my HTC Legend, these take about 3 seconds to load and compile. Loading time is about 20% of this, the rest is compiling.
A live wallpaper has its OpenGL context destroyed every time a full-screen app is ran, and when the wallpaper becomes visible again, all shaders, textures and so on need to be reloaded, causing the screen to freeze for about 3 seconds each time, which is unacceptable to me :(
I've done some reading and apparently, it's not possible to precompile the shaders. What else could I do to fix this? Is it possible to load and compile shaders in a background thread? I could show some kind of progress animation in that case. Wouldn't be great, but better than nothing...
[EDIT1]
Another big reason to speed this up is that the whole OpenGL based Live Wallpaper life cycle is difficult to get working properly on all devices (and that is an understatement). Introducing long load times whenever the context is lost/recreated adds more headaches than I want. Anyway:
As answer 1 suggests, I tried looking at the GL_OES_get_program_binary extension, to make some kind of compile-once-store-compiled-version-per-installed-app, but I'm worried about how widely this extension is implemented. For example, my Tegra2 powered tablet does not seem to support it.
Other approaches I'm considering:
1) Ubershader: putting all pixel shaders into one big shader, with a switch or if statements. Would this slow down the pixel shader dramatically? Would it make the shader too big and make me overrun all those pesky register/instruction count/texture lookup limits? Same idea for the vertex shaders. This would reduce my entire shadercount to 1 pixel and 1 vertex shader, and hopefully make compiling/linking lots faster. Has anyone tried this?
[EDIT2] I just tried this. Don't. Compiling/linking now takes 8 seconds before giving up with a vague "link failed" error :(
2) Poor man's background loading: don't load/compile the shaders at the beginning, but load/compile one shader per frame update for the first 17 frames. At least I would be refreshing the display, and I could show a progress bar to make the user see something is happening. This would work fine on the slow devices, but on the fast devices this would probably make the whole shader load/compile phase slower than it needs to be...
Check if your implementation supports OES_get_program_binary.
I'm working on a 2D game for android using OpenGL ES 1.1 and I would like to know if this idea is good/bad/useless.
I have a screen divided in 3 sections, so I used scissors to avoid object overlapping from one view to the other.
I roughly understand the low level implementation of scissor and since my draws take a big part of the computation, I'm looking for ideas to speed it up.
My current idea is as follows:
If I put a glscissor around each object before I draw it, would I increase the speed of my application.
The idea is if I put a GLScissor, (center+/-sizetexture), then the OpenGL pipeline will have less tests to do (since it can discard 90~99% of the surface thanks to the glscissors.
So to all opengl experts, is this good, bad or will have no impact ? And why?
It shouldn't have any impact, IMHO. I'm not an expert, but my thinking is as follows:
Scissor test saves on your GPU's fill rate (the amount of fragments/pixels a hardware can put in the framebuffer per second),
if you put a glScissor around each object, the test won't actually cut off anything - the same number of pixels will be rendered, so no fill rate will be saved.
If you want to have your rendering optimized, a good place to start is to make sure you're doing optimal batching and reduce the number of draw calls or complex state switches (texture switches).
Of course the correct approach to optimizations is to try to diagnose why is your rendering slow, so the above is just my guess which may or may not help in your particular situation.