I'm testing OpenGL performance on an Android phone ( HTC Wildfire to be exact ), and I came across a strange thing - when I try to draw a textured indexed rectangle the size of the screen ( 320 x 480 ), I can get a framerate up to 40 fps!!! - and that's only when I use a 32x32 texture.
If I increase the texture size to 256x256, the performance drops down to 35 frames.
My question is - how is it possible for all those Android games to run smoothly and still be so full of cool graphics.
There are a bunch of different ways to eek performance out of your device.
Make sure you aren't doing blending, or drawing that you don't need and from experience a lot of the lower end HTC devices (eg the desire) are fill rate limited.
Use triangle strips instead of triangles.
Use draw elements and cache your calls eg load your vertex buffer and call draw multiple times for a frame.
But most importantly use the DDMS with method profiling to determine where your bottle necks actually are they may be places you don't expect them eg logging, gc, slow operation in a loop. You will be able to see how much of the time is due to rendering and how much for other operations. (look for GLImpl.glDrawElements)
My biggest ones were with GC kicking in too often (5 times per second and causing my fps to be very sluggish), something you will see with mem allocations often in places you won't even think of. Eg if you are concatenating a string to a float with the + operator or using traditional java get() functions everywhere or (worst) collections, these create a large number of objects that can trigger GC.
Also if you have expensive operations separate them out into a separate thread.
Also I am assuming you are creating your texture once and using the same index each time. I have seen some tutorials which create the texture every time a frame is rendered.
With extensive use of the DDMS I was able to take fps from 12 to 50 on the HTC Desire for a busy scene.
Hope that gets you going in the right direction :)
In my own experience, they don't on devices like the wildfire.
Related
I created an application with Starling, on the new mobile devices it performs amazingly well, however on the older devices (e.g. iPhone 4) I encounter a very odd lag.
I have as far as I can tell a completely static situation:
There are quite a few display objects added to stage, many of them are buttons in case it matters, their properties are not changed at all after initialization (x, y, rotation, etc...).
There are no enterframes / timeouts / intervals / requests of any kind in the background.
I'm not allocating / deallocating any memory.
In this situation, there's an average of 10 FPS out of 30, which is very odd.
Since Starling is a well established framework, I imagine it's me who's doing something wrong / not understanding something / not aware of something.
Any idea what might be causing it?
Has anyone else experienced this sort of problem?
Edit:
After reading a little I've made great optimizations in every possible way according to this thread:
http://wiki.starling-framework.org/manual/performance_optimization
I reduced the draw calls from around 90 to 12, flattened sprites and set blendmode to none in specific cases to ease on CPU, and so on...
To my surprise when I tested again, the FPS was unaffected:
fps: 6 / 60
mem: 19
drw: 12
Is it even possible to get normal fps with Starling on mobile? What am I missing?
I am using big textures that are scaled down to the size of the device, is it possible that such a thing affects the fps that much?
Regarding "Load textures from files/URLs", I'm downloading different piles of assets for different situations, therefore I assumed compiling each pile into a SWF would be way faster than sending a separate request for each file. The problem is, for that I can only use embed, which apparently uses twice the memory. Do you have any solution in mind to enjoy the best of both worlds?
Instead of downloading 'over-the-wire' your assets and manually caching them for re-use, you can embed the assets into your app bundle vs. embedding them and then use the Starling AssetManager to load the textures at the resolution/scale that you need for the device:
ie.
assets.enqueue(
appDir.resolvePath("audio"),
appDir.resolvePath(formatString("fonts/{0}x", scaleFactor)),
appDir.resolvePath(formatString("textures/{0}x", scaleFactor))
);
Ref: https://github.com/Gamua/Starling-Framework/blob/master/samples/scaffold_mobile/src/Scaffold_Mobile.as
Your application bundle gets bigger of course, but you do not take the 2x ram hit of using 'embed'.
Misc perf ideas from my comment:
Testing FPS with "Release" mode correct?
Are you using textures that are scaled down to match the resolution of the device before loading them?
Are you mixing BLEND modes that are causing additional draw calls?
Ref: The Performance Optimization is great reading to optimize your usage of Starling.
Starling is not a miracle solution for mobile device. There's quite a lot of code running in the background in order to make the GPU display anything. You the coder has to make sure the amount of draw call is kept to a minimum. The weaker the device and the less draw call you should force. It's not rare to see people using Starling and not pay any attention to their draw calls.
The size of graphics used is only relevant for the GPU upload time and not that much for the GPU display time. So of course all relevant texture need to be uploaded prior to displaying any scenes. You simply cannot try to upload any new texture while any given scene is playing. Even a small texture uploading will cause idling.
Displaying everything using Starling is not always a smart choice. In render mode the GPU gets a lot of power but the CPU still has some remaining. You can reduce the amount of GPU uploading and GPU charge by simply displaying static UI elements using the classic display list (which is where the Staling framework design is failing). Starling was originally made to make it very difficult to use both display system together that's one of the downsides of using this framework. Most professional I know including myself don't use Starling for that reason.
Your system must be flexible and you should embed your assets for mobile and not use any external swf as much as possible and be able to switch to another system for the web. If you expect to use one system of asset for mobile/desktop/web version of your app you are setting yourself up for failure. Embedding on mobile is critical for memory management as the AIR platform internally manages the cache of those embedded assets. Thx to that when creating new instances of those assets the memory consumption stays under control, if you don't embed then you are on your own.
Regarding overall performance a very weak Android device will probably never be able to go passed 10 fps when using Starling or any Stage3D framework because of the amount of code those framework need to run (draw calls) in the background. On weak device that amount of code is already enough to completely overload the CPU. On the other hand on weak device you can still get a good performance and user experience by using GPU mode instead of render mode (so no Stage3D) and displaying mostly raster graphic.
IN RESPONSE TO YOUR EDIT:
12 draw calls is very good (90 was pretty high).
That you still get low FPS on some device is not that surprising. Especially low end Android device will always have low FPS in render mode with Stage3D framework because of the amount of code that those framework have to run to render one frame. Now the size of the texture you are using should not affect the FPS that much (that's the point of Stage3D). It would help with the GPU uploading time if you reduce the size of those graphics.
Now optimization is the key and optimizing on low end device with low FPS is the best way to go since whatever you do will have great effect on better device as well. Start by running tests and only displaying static graphics with no or very little code on your part just to see how far the Stage3D framework can go on its own on those weak device without losing any FPS and then optimize from there. The amount of object displayed on screen + the amount of draw calls is what affects FPS with those Stage3D framework so keep a count of those and always seek ways to reduce it. On some low end device it's not practical to try to keep a 60fps so try to switch to 30 and adjust your rendering accordingly.
I am making a 3D game with Unity for Android. In the game there is a main character and up to around 10 opponents possibly on the screen at the same time, all using the same model (prefabs for "opponents" differ only in one material for other colors).
The model was designed using MakeHuman and Blender.
On PC (since rendering is a lot faster) there are no problems, but when testing it on an Android device, the rendering time drops the frame rate to around 25-30 FPS when there are 3-4 or more bodies on the screen, creating a really "laggy" feeling (I am expecting a frame rate of about 60 FPS).
Before importing the Blender model I used placeholder spheres and there was no such behavior. Since this is the first time I am using Blender and such 3D models, I am not sure whether my model is within the expected sizes for a mobile game. My current model consists of: 5,956 Verts, 10,456 Faces, 10,819 Tris with a file size of around 6.5 MB (it was generated even larger by MakeHuman at first, but I managed to compress it and optimize it significantly, but still without major effects).
I attempted different solutions, including merging all meshes in the model into one, turning off shadows, using as least materials as possible, etc. All attempts were with no or very limited improvement.
Any ideas are welcome. Cheers!
My current model consists of: 5,956 Verts, 10,456 Faces, 10,819 Tris
with a file size of around 6.5 MB
Sounds like way too much to me.
Also, keep in mind that if you're doing anything such collision calculations, every extra complexity takes exponentially more time
for a mobile game I would suggest you to remesh the model (even though its not a nice work, but the results are much better than the decimate-modifyer for example) an than bake normalmaps for it from the high-res model.
1.
TIMING in a game:
Is there a way to use other then System.Currentmillis()-starttime>XX to update anything in a game? Is it safe or CPU expensive?
2.
In my game I have 20 items (moving square vertexes), when it comes up to 60-70 vertex, the FPS drop down to 30-40 FPS, from 60 FPS. (testing on a galaxy S i9000 phone).
Is there a way to my game FPS will be 30FPS? Is it good for update my game to 30 FPS, or I dont need to handle this? (because there will be a lagg on lower FPS - it will be slow)
How I can do to my objects run on the same speed, at any time?
3.
what is the best way: do the phisyx, and all the stupp in onDrawFrame, or: start a Thread what is made the mathematics for me? What is faster and better?
TIMING in a game: Is there a way to use other then System.Currentmillis()-starttime>XX to update anything in a game? Is it safe or CPU expensive?
You should use a time delta system. There are a lot of tutorials about this. This is a very good tutorial: deWiTTERS Game Loop.
In my game I have 20 items (moving square vertexes), when it comes up to 60-70 vertex, the FPS drop down to 30-40 FPS, from 60 FPS. (testing on a galaxy S i9000 phone). Is there a way to my game FPS will be 30FPS? Is it good for update my game to 30 FPS, or I dont need to handle this? (because there will be a lagg on lower FPS - it will be slow)
That's depending on which method you're using. If 3D, you should consider using Vertex Buffer Objects (VBO:s) (like a Vertex Array but in your device's GPU memory). That makes a huge difference since the CPU doesn't need to copy the data from CPU to GPU every iteration.
If 2D, you can still use VBO:s but if draw_texture is supported on the device that's recommended.
However, the options are:
Vertex Arrays (slowest).
Vertex Buffer Objects (fastest in 3D, slower than draw_texture in 2D).
draw_texture extension (fastest in 2D, but doesn't render 3D stuff).
You can support all of this methods to cover the whole range of Android devices but remember to check the extensions of the device's OpenGL drivers. Some might support all of these, but there can be devices that only support Vertex Arrays and VBO:s (for example).
I've answered a related question here, but just to show you; here's a print from one of Chris Pruett's lectures at Google I/O:
How I can do to my objects run on the same speed, at any time?
You can't. Android is a multi-processing operating system and you have no idea what's going on (maybe another Service application is updating?). What you can do is to use a time delta system, like I mentioned above.
what is the best way: do the phisyx, and all the stupp in onDrawFrame, or: start a Thread what is made the mathematics for me? What is faster and better?
It's recommended to multi-threading (two Thread's running in parallel). Shortly, do your drawing stuff inside onDrawFrame and update stuff inside your own created Thread/Runnable.
Recommended resources:
Google I/O 2009 and Google I/O 2010
Game Development for Android: A Quick Primer
I'm in the process of writing an Android game and I seem to be having performance issues with drawing to the Canvas. My game has multiple levels, and each has (obviously) a different number of objects in it.
The strange thing is that in one level, which contains 45 images, runs flawlessly (almost 60 fps). However, another level, which contains 81 images, barely runs at all (11 fps); it is pretty much unplayable. Does this seem odd to anybody besides me?
All of the images that I use are .png's and the only difference between the aforementioned levels is the number of images.
What's going on here? Can the Canvas simply not draw this many images each game loop? How would you guys recommend that I improve this performance?
Thanks in advance.
Seems strange to me as well. I am also developing a game, lots of levels, I can easily have a 100 game objects on screen, have not seen a similar problem.
Used properly, drawbitmap should be very fast indeed; it is little more than a copy command. I don't even draw circles natively; I have Bitmaps of pre-rendered circles.
However, the performance of Bitmaps in Android is very sensitive to how you do it. Creating Bitmaps can be very expensive, as Android can by default auto-scale the pngs which is CPU intensive. All this stuff needs to be done exactly once, outside of your rendering loop.
I suspect that you are looking in the wrong place. If you create and use the same sorts of images in the same sorts of ways, then doubling the number of screen images should not reduce performance by a a factor of over 4. At most it should be linear (a factor of 2).
My first suspicion would be that most of your CPU time is spent in collision detection. Unlike drawing bitmaps, this usually goes up as the square of the number of interacting objects, because every object has to be tested for collision against every other object. You doubled the number of game objects but your performance went down to a quarter, ie according to the square of the number of objects. If this is the case, don't despair; there are ways of doing collision detection which do not grow as the square of the number of objects.
In the mean time, I would do basic testing. What happens if you don't actually draw half the objects? Does the game run much faster? If not, its nothing to do with drawing.
I think this lecture will help you. Go to the 45 minute . There is a graph comparing the Canvas method and the OpenGl method. I think it is the answer.
I encountered a similar problem with performance - ie, level 1 ran great and level 2 didn't
Turned it wasn't the rendering that was a fault (at least not specifically). It was something else specific to the level logic that was causing a bottleneck.
Point is ... Traceview is your best friend.
The method profiling showed where the CPU was spending its time and why the glitch in the framerate was happening. (incidentally, the rendering cost was also higher in Level 2 but wasn't the bottleneck)
I'm working on a 2D game for android using OpenGL ES 1.1 and I would like to know if this idea is good/bad/useless.
I have a screen divided in 3 sections, so I used scissors to avoid object overlapping from one view to the other.
I roughly understand the low level implementation of scissor and since my draws take a big part of the computation, I'm looking for ideas to speed it up.
My current idea is as follows:
If I put a glscissor around each object before I draw it, would I increase the speed of my application.
The idea is if I put a GLScissor, (center+/-sizetexture), then the OpenGL pipeline will have less tests to do (since it can discard 90~99% of the surface thanks to the glscissors.
So to all opengl experts, is this good, bad or will have no impact ? And why?
It shouldn't have any impact, IMHO. I'm not an expert, but my thinking is as follows:
Scissor test saves on your GPU's fill rate (the amount of fragments/pixels a hardware can put in the framebuffer per second),
if you put a glScissor around each object, the test won't actually cut off anything - the same number of pixels will be rendered, so no fill rate will be saved.
If you want to have your rendering optimized, a good place to start is to make sure you're doing optimal batching and reduce the number of draw calls or complex state switches (texture switches).
Of course the correct approach to optimizations is to try to diagnose why is your rendering slow, so the above is just my guess which may or may not help in your particular situation.