I've written an OpenGL live wallpaper for Android that uses 17 pixel and 17 vertex shaders. On my HTC Legend, these take about 3 seconds to load and compile. Loading time is about 20% of this, the rest is compiling.
A live wallpaper has its OpenGL context destroyed every time a full-screen app is ran, and when the wallpaper becomes visible again, all shaders, textures and so on need to be reloaded, causing the screen to freeze for about 3 seconds each time, which is unacceptable to me :(
I've done some reading and apparently, it's not possible to precompile the shaders. What else could I do to fix this? Is it possible to load and compile shaders in a background thread? I could show some kind of progress animation in that case. Wouldn't be great, but better than nothing...
[EDIT1]
Another big reason to speed this up is that the whole OpenGL based Live Wallpaper life cycle is difficult to get working properly on all devices (and that is an understatement). Introducing long load times whenever the context is lost/recreated adds more headaches than I want. Anyway:
As answer 1 suggests, I tried looking at the GL_OES_get_program_binary extension, to make some kind of compile-once-store-compiled-version-per-installed-app, but I'm worried about how widely this extension is implemented. For example, my Tegra2 powered tablet does not seem to support it.
Other approaches I'm considering:
1) Ubershader: putting all pixel shaders into one big shader, with a switch or if statements. Would this slow down the pixel shader dramatically? Would it make the shader too big and make me overrun all those pesky register/instruction count/texture lookup limits? Same idea for the vertex shaders. This would reduce my entire shadercount to 1 pixel and 1 vertex shader, and hopefully make compiling/linking lots faster. Has anyone tried this?
[EDIT2] I just tried this. Don't. Compiling/linking now takes 8 seconds before giving up with a vague "link failed" error :(
2) Poor man's background loading: don't load/compile the shaders at the beginning, but load/compile one shader per frame update for the first 17 frames. At least I would be refreshing the display, and I could show a progress bar to make the user see something is happening. This would work fine on the slow devices, but on the fast devices this would probably make the whole shader load/compile phase slower than it needs to be...
Check if your implementation supports OES_get_program_binary.
Related
Over the last month, I've taken a dive into trying my hand at a simple 2d side scroller using Android's default canvas APIs.
In short, my question boils down to "Is the Canvas API performant enough to pull 60fps with a simple side scroller with a multi-layer, parallaxing background?".
Now hear me out, because I've tried a ton of different approaches, and so far I've come up fairly empty handed on how to squeeze any more efficiency out of what I've attempted.
First, to address the easy problems:
I'm not allocating anything in the game loop
The Surface is Hardware Accelerated
Bitmaps are being loaded prior to starting the game, and drawn without scaling.
I don't believe I'm doing anything significant aside from the bitmap drawing in the game loop (mostly updating position variables and getting time)
The problems started, as you may have guessed, with the parallaxing BG. Seven layers in total, though I've been experimenting with as few as 3 while still unable to maintain 60fps. I'm clipping the parts that are overlapping to minimize overdraw, but I would guess I'm still drawing ~2-3x overdraw, all summed up.
As the game runs, I often start out at ~ 60fps, but about 30/40 seconds in, the fps drops to about 20. The time is random, and it seems to be a state of the phone, rather than a result of anything in the code (force killing the app & restarting causes the new app to start at ~20fps, but letting the phone sit for a while results in higher fps). My guess here is thermal throttling on the CPU...
Now I'm testing on a 5x, and naively thought these issues might disappear on a faster device (6P). Of course, due to the larger screen, the problems got worse, as it ran at ~15fps continuously.
A co-worker had suggested loading in small bitmaps and stretching them on draw, instead of scaling the bitmaps on load to the size they would appear on the screen. Implementing this by making each bitmap 1/3rd the size, and using the canvas.drawBitmap(bitmap, srcRect, destRect, Paint) method to compensate for size and scale yielded worse performance overall (though I'm sure it helped the memory footprint). I haven't tried the drawMesh method, but I imagined it wouldn't be more performant that the plain old drawBitmap.
In another attempt, I gave an array of ImageViews a shot, thinking the Android View class might be doing some magic I wasn't, but after an hour of fiddling with it, it didn't look any more promising.
Lastly, the GPU profiling on the device remains far beneath the green line, indicating 60fps, but the screen doesn't seem to reflect that. Not sure what's going on there.
Should I bite the bullet and switch over to OpenGL? I thought that Canvas would be up to the task of a side-scroller with a parallaxing background, but I'm either doin' it wrong™, or using the wrong tool for the job, it seems.
Any help or suggestions are sincerely appreciated, and apologies for the long-winded post.
I have made some full screen renders using OpenGL ES 2.0 on Andorid devices.
In these renders I used a custom fragment shader that uses a uniform time parameter as part of the animation.
I have experienced major image tearing/massive fps drops and pixelated result as the render went on.
After playing around with values and trying to fix it, I found the problem to be in the size of the time parameter, as the value got bigger and bigger the result got worse.
changing the float precision to highp in the fragment shader didn't help,but the animation got worse at a later time then before, as you'd expect.
I found a solution by limiting the size of the parameter before it was sent to the shader, by using the mod operator on it.
On the other hand, I copied the exact shader code into a browser that runs a web-gl environment to render the same thing that runs on my phone, and there is no problem with the parameter size, no fps drop, no nothing.
I can understand that the graphics card on mobile devices is weaker then what I have on my pc, and it is only natural to assume that my pc graphics card can hold much larger values.
But, my question is, what possible solution can I work with to go around this problem of parameters sizes?
I would like my animation to go on forever*, and not be forced to loop around after 5 seconds.
Here is a link to the website with the animation: website link
*not actually forever but a quite a long time, just like in the browesr.
If uFloat is meant to represent a timestamp, I'd advise passing in a unix timestamp instead, in milli- or nano- seconds, as an int or long. I don't know how it'll fix the framerate issues, but if the cause of your woes are related to the precision of the timestamp, that'll probably fix that issue.
Edit: Based on the comments, this may be a precision issue. As uFloat increases in value, it slowly loses precision on its mantissa. What could very well be happening is that incremental increases to uFloat are too low. What you'll want to do for debugging purposes is spit out the entirety of the floating point number generated each frame for the value of uFloat, and compare them frame by frame to see if the number stops increasing consistently every frame.
I'm developing an app that renders the camera preview straight to a custom GlSurfaceView I have created.
It's pretty basic for someone who uses OpenGL on regular base.
The problem I'm experiencing is a low fps on some of the devices and I came with a solution - to choose which shader to apply on runtime. Now, I don't want to load a OpenGl program, measure the fps and than change the program to a lighter shader because it would create definite lags.
What I would like to do is somehow determine the GPU strength before I'm linking the GL program(Right after I'm creating the openGL context).
After some hours of investigation I pretty much understood that it won't gonna be very easy - mostly because the rendering time depends on hidden dev elements like - device gpu memory, openGL pipeline which might be implemented differently for different devices.
As I see it I have only 1 or 2 options, render a texture off-screen and measure its rendering time - if its takes longer that 16 millis(recommended fps time by Romain Guy from this post) I'll use the lighter shader.
Or - checking OS version and available RAM (though it really inaccurate).
I really hope for a more elegant and accurate solution.
I'm in the process of writing an Android game and I seem to be having performance issues with drawing to the Canvas. My game has multiple levels, and each has (obviously) a different number of objects in it.
The strange thing is that in one level, which contains 45 images, runs flawlessly (almost 60 fps). However, another level, which contains 81 images, barely runs at all (11 fps); it is pretty much unplayable. Does this seem odd to anybody besides me?
All of the images that I use are .png's and the only difference between the aforementioned levels is the number of images.
What's going on here? Can the Canvas simply not draw this many images each game loop? How would you guys recommend that I improve this performance?
Thanks in advance.
Seems strange to me as well. I am also developing a game, lots of levels, I can easily have a 100 game objects on screen, have not seen a similar problem.
Used properly, drawbitmap should be very fast indeed; it is little more than a copy command. I don't even draw circles natively; I have Bitmaps of pre-rendered circles.
However, the performance of Bitmaps in Android is very sensitive to how you do it. Creating Bitmaps can be very expensive, as Android can by default auto-scale the pngs which is CPU intensive. All this stuff needs to be done exactly once, outside of your rendering loop.
I suspect that you are looking in the wrong place. If you create and use the same sorts of images in the same sorts of ways, then doubling the number of screen images should not reduce performance by a a factor of over 4. At most it should be linear (a factor of 2).
My first suspicion would be that most of your CPU time is spent in collision detection. Unlike drawing bitmaps, this usually goes up as the square of the number of interacting objects, because every object has to be tested for collision against every other object. You doubled the number of game objects but your performance went down to a quarter, ie according to the square of the number of objects. If this is the case, don't despair; there are ways of doing collision detection which do not grow as the square of the number of objects.
In the mean time, I would do basic testing. What happens if you don't actually draw half the objects? Does the game run much faster? If not, its nothing to do with drawing.
I think this lecture will help you. Go to the 45 minute . There is a graph comparing the Canvas method and the OpenGl method. I think it is the answer.
I encountered a similar problem with performance - ie, level 1 ran great and level 2 didn't
Turned it wasn't the rendering that was a fault (at least not specifically). It was something else specific to the level logic that was causing a bottleneck.
Point is ... Traceview is your best friend.
The method profiling showed where the CPU was spending its time and why the glitch in the framerate was happening. (incidentally, the rendering cost was also higher in Level 2 but wasn't the bottleneck)
I'm working on a 2D game for android using OpenGL ES 1.1 and I would like to know if this idea is good/bad/useless.
I have a screen divided in 3 sections, so I used scissors to avoid object overlapping from one view to the other.
I roughly understand the low level implementation of scissor and since my draws take a big part of the computation, I'm looking for ideas to speed it up.
My current idea is as follows:
If I put a glscissor around each object before I draw it, would I increase the speed of my application.
The idea is if I put a GLScissor, (center+/-sizetexture), then the OpenGL pipeline will have less tests to do (since it can discard 90~99% of the surface thanks to the glscissors.
So to all opengl experts, is this good, bad or will have no impact ? And why?
It shouldn't have any impact, IMHO. I'm not an expert, but my thinking is as follows:
Scissor test saves on your GPU's fill rate (the amount of fragments/pixels a hardware can put in the framebuffer per second),
if you put a glScissor around each object, the test won't actually cut off anything - the same number of pixels will be rendered, so no fill rate will be saved.
If you want to have your rendering optimized, a good place to start is to make sure you're doing optimal batching and reduce the number of draw calls or complex state switches (texture switches).
Of course the correct approach to optimizations is to try to diagnose why is your rendering slow, so the above is just my guess which may or may not help in your particular situation.