I am developing a simple 2D game. I have multiple sprites. Each sprite has around 80 png/frames of 265* 256. I used LibGdx's Texture packer to package the atlas. Am enabling mimap using following code to pac
TexturePacker.Settings settings = new TexturePacker.Settings();
settings.combineSubdirectories = true;
settings.filterMin = Texture.TextureFilter.MipMapNearestLinear;
settings.filterMag = Texture.TextureFilter.Linear;
TexturePacker.process(settings, f.getPath(), outputFolderName, atlasFileName);
Are 80 images/frames for single sprite too much?
Is 0.5 GB memory usage too much for a simple game like Fruit ninja?
How can i reduce my memory usage?
Any other things i should try?
Here is the the screen shot taken from android profiler.
80 frames is kind of a lot but that number has never been a problem in my projects. That said, most of our 60 frame pre-rendered animations are small, like an icon flashing graphic in 48x48 segments, with one sheet (so there is minimal context switching). This has never been a performance issue for us ... BUT you saying 256x256 scares me a little, especially if it is uncropped and a lot of pngs are created!
While my projects have sprites of similar frame numbers, they have been optimized via the Texture Packer. Make sure you have Trim mode set to "Trim" and that it isn't "None" (the setting is near the bottom under Sprites). This setting will de-homogenize the 256x256 into smaller pieces if at all possible, reducing the number of total sheets and texture bindings (I think, and also fewer context switches) .. I'm not totally sure how you packed your sprites or if they can even be trimmed, but this could potentially be a performance life saver.
Let me show you an example:
Also if you provided the code in how your animations are created, we could double check to make sure you aren't binding 80 Textures when only 1 would be needed. I create my animations via the following process:
TextureAtlas atlas = assetManager.get(atlas_name);
Sprite[] spriteFrames = new Sprite[numFrames];
for (int index=0; index<numFrames; index++) {
if (atlas.findRegion(lookup_name, index) == null)
Engine.console("[ERROR] problem loading and finding region for " + atlas_name + " " + lookup_name + " index: " + index + " not found in spritesheet .. fix immediately");
else {
spriteFrames[index] = atlas.createSprite(lookup_name, index);
Animation<Sprite> animation = new Animation<Sprite>(timeBetweenFrames, spriteFrames);
I hoped this helped providing some insight.
I know this has been asked generally but answer is alweays "depends", so I'm creating a concrete question in hope to get a concrete answer.
I know the evil of IF's on GLSL, they can be really expensive, even execute all code in some hardware.
So, I have a fragment shader from an example (a dual paraboloid shadow map) which uses if's to determine which map to use and compute the depth, but I know it's very easy to replace those if's with a multiplier, the question is there are a texture sampling inside the fragment shader, what would be faster, to use an if or use a multiplier to filter the unused data?
These are the proposed codes:
IF version:
//Alpha is a variable computed on the fly, cannot be replaced
float depth = 0;
float mydepth = 0;
if(alpha >= 0.5f)
depth = texture2D(ShadowFrontS, P0.xy).x;
mydepth = P0.z;
depth = texture2D(ShadowBackS, P1.xy).x;
mydepth = P1.z;
Filter version:
float mlt = ceiling(alpha - 0.5f);
float depth = 0;
float mydepth = 0;
depth = texture2D(ShadowFrontS, P0.xy).x * mlt;
mydepth = P0.z * mlt;
mlt = 1.0f - mlt;
depth = depth + (texture2D(ShadowFrontS, P1.xy).x * mlt);
mydepth = P1.z * mlt;
P.D.: I'm targeting Desktop and Mobile devices, so performance on low-end hardware is a must.
Branching is not "evil" per-se on massively SIMD architectures. If all the threads in a "bunch" (NVidia calls them Warps) follow the same code path, i.e. take all the same branches, everything is fine.
Only if a branch is partly taken (within that bunch) and for the other part not, both branches must be executed and later on the calculations and data fetches discarded that are not relevant for the current thread.
Now in your case it requires some careful profiling to see, which variant benefits your GPU more. But my gut instinct tells me, it's actually the branching version. Why? Because: Usually the value by which you decide on a branch depends on the screen space position and often large contiguous areas of fragments share the same code path and branching; so performance penalities happen only for those "bunches", which cover a bordering region. These bunches are usually only a few pixel² in size (8×8, or 16×16).
The shader you have there is not GPU limited (i.e. limited by the computational capabilities of the GPU), but memory bandwidth limited, i.e. by the throughput that the GPU's memory link offers; that is because of the texture2D fetch operations. And in that case reducing the actual number of fetches and thereby the required memory bandwidth will probably benefit your program more than reducing the number of computations.
The branchless mix-multiplex variant of your shader will always fetch both textures, the branching one will do that only within the bordering regions. So from that heuristic I'd guess, that your branching variant is actually the better choice.
But to be sure you have to profile it.
I am working on a project in android in which i am using OpenCV to detect faces from all the images which are in the gallery. The process of getting faces from the images is performing in the service. Service continuously working till all the images are processed. It is storing the detected faces in the internal storage and also showing in the grid view if activity is opened.
My code is:
CascadeClassifier mJavaDetector=null;
public void getFaces()
for (int i=0 ; i<size ; i++)
File file=new File(urls.get(i));
defaultBitmap=BitmapFactory.decodeFile(file, bitmapFatoryOptions);
mJavaDetector = new CascadeClassifier(FaceDetector.class.getResource("lbpcascade_frontalface").getPath());
Mat image = new Mat (defaultBitmap.getWidth(), defaultBitmap.getHeight(), CvType.CV_8UC1);
MatOfRect faceDetections = new MatOfRect();
mJavaDetector.detectMultiScale(image,faceDetections,1.1, 10, 0, new Size(20,20), new Size(image.width(), image.height()));
catch(Exception e)
Everything is fine but it is detection faces very slow. The performance is very slow. When i debug the code then i found the line which is taking time is:
mJavaDetector.detectMultiScale(image,faceDetections,1.1, 10, 0, new Size(20,20), new Size(image.width(), image.height()));
I have checked multiple post for this problem but i didn't get any solution.
Please tell me what should i do to solve this problem.
Any help would be greatly appreciated. Thank you.
You should pay attention to the parameters of detectMultiScale():
scaleFactor – Parameter specifying how much the image size is reduced at each image scale. This parameter is used to create a scale pyramid. It is necessary because the model has a fixed size during training. Without pyramid the only size to detect would be this fix one (which can be read from the XML also). However the face detection can be scale-invariant by using multi-scale representation i.e., detecting large and small faces using the same detection window.
scaleFactor depends on the size of your trained detector, but in fact, you need to set it as high as possible while still getting "good" results, so this should be determined empirically.
Your 1.1 value can be a good value for this purpose. It means, a relative small step is used for resizing (reduce size by 10%), you increase the chance of a matching size with the model for detection is found. If your trained detector has the size 10x10 then you can detect faces with size 11x11, 12x12 and so on. But in fact a factor of 1.1 requires roughly double the # of layers in the pyramid (and 2x computation time) than 1.2 does.
minNeighbors – Parameter specifying how many neighbours each candidate rectangle should have to retain it.
Cascade classifier works with a sliding window approach. By applying this approach, you slide a window through over the image than you resize it and search again until you can not resize it further. In every iteration the true outputs (of cascade classifier) are stored but unfortunately it actually detects many false positives. And to eliminate false positives and get the proper face rectangle out of detections, neighbourhood approach is applied. 3-6 is a good value for it. If the value is too high then you can lose true positives too.
minSize – Regarding to the sliding window approach of minNeighbors, this is the smallest window that cascade can detect. Objects smaller than that are ignored. Usually cv::Size(20, 20) are enough for face detections.
maxSize – Maximum possible object size. Objects bigger than that are ignored.
Finally you can try different classifiers based on different features (such as Haar, LBP, HoG). Usually, LBP classifiers are a few times faster than Haar's, but also less accurate.
And it is also strongly recommended to look over these questions:
Recommended values for OpenCV detectMultiScale() parameters
OpenCV detectMultiScale() minNeighbors parameter
Instead reading images as Bitmap and then converting them to Mat via using Utils.bitmapToMat(defaultBitmap,image) you can directly use Mat image = Highgui.imread(imagepath); You can check here for imread() function.
Also, below line takes too much time because the detector is looking for faces with at least having Size(20, 20) which is pretty small. Check this video for visualization of face detection using OpenCV.
mJavaDetector.detectMultiScale(image,faceDetections,1.1, 10, 0, new Size(20,20), new Size(image.width(), image.height()));
So I'm getting really confused here. The designer I work with wants high-quality images (png files) for Android tablets, but the game also has smaller images for less-powerful devices. I figured that the amount of memory on the heap would be the metric to determine which set of images to use, by using Runtime.getRuntime().maxMemory() - Runtime.getRuntime().totalMemory(). That doesn't seem to be the case though. On BlueStacks it can load the high-quality images just fine, and it has around 40,000,000 bytes. The designer's Galaxy Nexus has black boxes for some of the larger images (which I understand is due to a lack of memory for loading the image), but his Galaxy Nexus has about 50,000,000 available bytes, which is even more than BlueStacks.
So what is the limiting factor? And on a related matter, how is it that there are mobile games that have impressive quality visuals, yet I can't manage to load a few images? What am I doing wrong?
To note, I am using AndEngine, and below is an example of how I'm loading the images.
BuildableBitmapTextureAtlas resetTA = new BuildableBitmapTextureAtlas(this.getTextureManager(), 310 / d, 190 / d,
resetTR = BitmapTextureAtlasTextureRegionFactory.createTiledFromAsset(resetTA, this, "gfx/" + lowres + "reset.png", 1, 1);
resetTA.build(new BlackPawnTextureAtlasBuilder<IBitmapTextureAtlasSource, BitmapTextureAtlas>(0, 0, 0));
catch (TextureAtlasBuilderException e)
One of the images that isn't loading in the Galaxy Nexus is a sprite sheet png file that's 2320x464.
There are two limiting factors here. First it's the heap memory. To find the available heap for your app you can use the method with the Runtime class, but that will tell you the maximum memory your app can use before it completely crashes. A limit that your app should respect in Android can be found this way:
ActivityManager am = (ActivityManager) getSystemService(ACTIVITY_SERVICE);
int memoryClass = am.getMemoryClass();
Log.d("MyTag", "Heap: + Integer.toString(memoryClass));
The second is GL_MAX_TEXTURE_SIZE value that limits the maximum dimension of a square texture for given device. This can vary, but the minimum these days seems to be 2048, therefore your textures can be as much as 2048x2048 pixels large. However the only recommendation is that the dimension must be larger than the screen dimensions and the real size is up to the manufacturer of the phone.
I think you can use the following code to find out the size:
int[] maxTextureSize = new int[1];
GLES20.glGetIntegerv(GLES20.GL_MAX_TEXTURE_SIZE, maxTextureSize, 0);
Log.d("MyTag", "GL_MAX_TEXTURE_SIZE: " + Integer.toString(int[0]));
The games can have impressive graphics before they split the big textures to smaller, load only what is needed and reuse as much as possible. I've made a game where some levels have 50000px wide ground by assembling it from 256x256 pieces making the game sharp even on full HD tablets. The pieces were distributed over several 2048x2048 textures.
Hey all I'm at a crossroads with my app that I've been working on.
It's a game and an 'arcade / action' one at that, but I've coded it using Surfaceview rather than Open GL (it just turned out that way as the game changed drastically from it's original design).
I find myself plagued with performance issues and not even in the game, but just in the first activity which is an animated menu (full screen background with about 8 sprites floating across the screen).
Even with this small amount of sprites, I can't get perfectly smooth movement. They move smoothly for a while and then it goes 'choppy' or 'jerky' for a split second.
I noticed that (from what I can tell) the background (a pre-scaled image) is taking about 7 to 8 ms to draw. Is this reasonable? I've experimented with different ways of drawing such as:
canvas.drawBitmap(scaledBackground, 0, 0, null);
the above code produces roughly the same results as:
canvas.drawBitmap(scaledBackground, null, screen, null);
However, if I change my holder to:
The the drawing of the bitmap shoots up to about 13 MS (I am assuming because it then has to convert to RGB_8888 format.
The strange thing is that the rendering and logic move at a very steady 30fps, it doesn't drop any frames and there is no Garbage Collection happening during run-time.
I've tried pretty much everything I can think of to get my sprites moving smoothly
I recently incorporated interpolation into my gameloop:
float interpolation = (float)(System.nanoTime() + skipTicks - nextGameTick)
/ (float)(skipTicks);
I then pass this into my draw() method:
I have had some success with this and it has really helped smooth things out, but I'm still not happy with the results.
Can any one give me any final tips on maybe reducing the time taken to draw my bitmaps or any other tips on what may be causing this or do you think it's simply a case of Surfaceview not being up to the task and therefore, should I scrap the app as it were and start again with Open GL?
This is my main game loop:
long next_game_tick = GetTickCount();
int loops;
bool game_is_running = true;
while( game_is_running ) {
loops = 0;
while( GetTickCount() > next_game_tick && loops < MAX_FRAMESKIP) {
next_game_tick += SKIP_TICKS;
interpolation = float( GetTickCount() + SKIP_TICKS - next_game_tick )
/ float( SKIP_TICKS );
display_game( interpolation );
You shouldn't use Canvas to draw fast sprites, especially if you're drawing a fullscreen image. Takes way too long, I tell you from experience. I believe Canvas is not hardware accelerated, which is the main reason you'll never get good performance out of it. Even simple sprites start to move slow when there are ~15 on screen. Switch to OpenGL, make an orthographic projection and for every Sprite make a textured quad. Believe me, I did it, and it's worth the effort.
EDIT: Actually, instead of a SurfaceView, the OpenGL way is to use a GLSurfaceView. You create your own class, derive from it, implement surfaceCreated, surfaceDestroyed and surfaceChanged, then you derive from Renderer too and connect both. Renderer handles an onDraw() function, which is what will render, GLSurfaceView manages how you will render (bit depth, render modes, etc.)
I try to create game for Android and I have problem with high speed objects, they don't wanna to collide.
I have Sphere with Sphere Collider and Bouncy material, and RigidBody with this param (Gravity=false, Interpolate=Interpolate, Collision Detection = Continuous Dynamic)
Also I have 3 walls with Box Collider and Bouncy material.
This is my code for Sphere
function IncreaseBallVelocity() {
rigidbody.velocity *= 1.05;
function Awake () {
rigidbody.AddForce(4, 4, 0, ForceMode.Impulse);
InvokeRepeating("IncreaseBallVelocity", 2, 2);
In project Settings I set: "Min Penetration For Penalty Force"=0.001, "Solver Interation Count"=50
When I play on the start it work fine (it bounces) but when speed go to high, Sphere just passes the wall.
Can anyone help me?
var hit : RaycastHit;
var mainGameScript : MainGame;
var particles_splash : GameObject;
function Awake () {
rigidbody.AddForce(4, 4, 0, ForceMode.Impulse);
InvokeRepeating("IncreaseBallVelocity", 2, 2);
function Update() {
if (rigidbody.SweepTest(transform.forward, hit, 0.5))
Debug.Log(hit.distance + "mts distance to obstacle");
if(transform.position.y < -3) {
function IncreaseBallVelocity() {
rigidbody.velocity *= 1.05;
function OnCollisionEnter(collision : Collision) {
Instantiate(particles_splash, transform.position, transform.rotation);
EDITED added more info
Fixed Timestep = 0.02 Maximum Allowed Tir = 0.333
There is no difference between running the game in editor player and on Android
No. It looks OK when I set 0.01
My Paddle is Box Collider without Rigidbody, walls are the same
There are all in same layer (when speed is normal it all works) value in PhysicsManager are the default (same like in image) exept "Solver Interation Co..." = 50
No. When I change speed it pass other wall
I am using standard cube but I expand/shrink it to fit my screen and other objects, when I expand wall more then it's OK it bouncing
No. It's simple project simple example from Video http://www.youtube.com/watch?v=edfd1HJmKPY
I don't use gravity
Similar SO Question
A community script that uses ray tracing to help manage fast objects
UnityAnswers post leading to the script in (2)
You could also try changing the fixed time step for physics. The smaller this value, the more times Unity calculates the physics of a scene. But be warned, making this value too small, say <= 0.005, will likely result in an unstable game, especially on a portable device.
The script above is best for bullets or small objects. You can manually force rigid body collisions tests:
public class example : MonoBehaviour {
public RaycastHit hit;
void Update() {
if (rigidbody.SweepTest(transform.forward, out hit, 10))
Debug.Log(hit.distance + "mts distance to obstacle");
I think the main problem is the manipulation of Rigidbody's velocity. I would try the following to solve the problem.
Redesign your code to ensure that IncreaseBallVelocity and every other manipulation of Rigidbody is called within FixedUpdate. Check that there are no other manipulations to Transform.position.
Try to replace setting velocity directly by using AddForce or similar methods so the physics engine has a higher chance to calculate all dependencies.
If there are more items (main player character, ...) involved related to the physics calculation, ensure that their code runs in FixedUpdate too.
Another point I stumbled upon were meshes that are scaled very much. Having a GameObject with scale <= 0.01 or >= 100 has definitely a negative impact on physics calculation. According to the docs and this Unity forum entry from one of the gurus you should avoid Transform.scale values != 1
Still not happy? OK then the next test is starting with high velocities but no acceleration. At this phase we want to know, if the high velocity itself or the acceleration is to blame for the problem. It would be interesting to know the velocities' values at which the physics engine starts to fail - please post them so that we can compare them.
EDIT: Some more things to investigate
6.7 m/sec does not sound that much so that I guess there is a special reason or a combination of reasons why things go wrong.
Is your Maximum Allowed Timestep high enough? For testing I suggest 5 to 10x Fixed Timestep. Note that this might kill the frame rate but that can be dfixed later.
Is there any difference between running the game in editor player and on Android?
Did you notice any drops in frame rate because of the 0.01 FixedTimestep? This would indicate that the physics engine might be in trouble.
Could it be that there are static colliders (objects having a collider but no Rigidbody) that are moved around or manipulated otherwise? This would cause heavy recalculations within PhysX.
What about the layers: Are all walls on the same layer resp. are the involved layers are configured appropriately in collision detection matrix?
Does the no-bounce effect always happen at the same wall? If so, can you just copy the 1st wall and put it in place of the second one to see if there is something wrong with this specific wall.
If not to much effort, I would try to set up some standard cubes as walls just to be sure that transform.scale is not to blame for it (I made really bad experience with this).
Do you manipulate gravity or TimeManager.timeScale from within a script?
BTW: are you using gravity? (Should be no problem just