I am making a 3D game with Unity for Android. In the game there is a main character and up to around 10 opponents possibly on the screen at the same time, all using the same model (prefabs for "opponents" differ only in one material for other colors).
The model was designed using MakeHuman and Blender.
On PC (since rendering is a lot faster) there are no problems, but when testing it on an Android device, the rendering time drops the frame rate to around 25-30 FPS when there are 3-4 or more bodies on the screen, creating a really "laggy" feeling (I am expecting a frame rate of about 60 FPS).
Before importing the Blender model I used placeholder spheres and there was no such behavior. Since this is the first time I am using Blender and such 3D models, I am not sure whether my model is within the expected sizes for a mobile game. My current model consists of: 5,956 Verts, 10,456 Faces, 10,819 Tris with a file size of around 6.5 MB (it was generated even larger by MakeHuman at first, but I managed to compress it and optimize it significantly, but still without major effects).
I attempted different solutions, including merging all meshes in the model into one, turning off shadows, using as least materials as possible, etc. All attempts were with no or very limited improvement.
Any ideas are welcome. Cheers!
My current model consists of: 5,956 Verts, 10,456 Faces, 10,819 Tris
with a file size of around 6.5 MB
Sounds like way too much to me.
Also, keep in mind that if you're doing anything such collision calculations, every extra complexity takes exponentially more time
for a mobile game I would suggest you to remesh the model (even though its not a nice work, but the results are much better than the decimate-modifyer for example) an than bake normalmaps for it from the high-res model.
Related
as part of my project, I need to plot 2D and 3D functions in android using android studio. I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Thanks in advance.
I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Since you already understand 2D and want to advance to 3D there's a simple and non-optimal method:
Decide on how much z depth you desire:
EG: Currently your limits for x and y in your drawing functions are 1920 and 1080 (or even 4096x4096), if you want to save memory and have things a bit low resolution use a size of 1920x1080x1000 - that's going to use 1000x more memory and has the potential to increase the drawing time of some calls by 1000 times - but that's the cost of 3D.
A more practical limit is matrices of 8192,8192,16384 but be aware that video games at that resolution need 4-6GB graphic cards to work half decently (more being a bit better) - so you'll be chewing up some main memory starting at that size.
It's easy enough to implement a smaller drawing space and simply increase your allocation and limit variables later, not only does that test that future increases will go smoothly but it allows everything to run faster while you're ironing the bugs out.
Add a 3rd dimension to the functions:
EG: Instead of a function that is simply line_draw(x,y) change it to line_draw(x,y,z), use the global variable z_limit (or whatever you decide to name it) to test that z doesn't exceed the limit.
You'll need to decide if objects at the maximum distance are a dot or not visible. While testing having all z's past the limit changed to the limit value (thus making them a visible dot) is useful. For the finished product once it goes past the limit that you are implementing it's best that it isn't visible.
Start by allocating the drawing buffer and implementing a single function first, there's no point (and possibly great frustration) changing everything and hoping it's just going to work - it should but if it doesn't you'll have a lot on your plate if there's a common fault in every function.
Once you have this 3D buffer filled with an image (start with just a few 3D lines, such as a full screen sized "X" and "+") you draw to your 2D screen X,Y by starting at the largest value of Z first (EG: z=1000). Once you finish that layer decrement z and continue, drawing each layer until you reach zero, the objects closest to you.
That's the simplest (and slowest) way to make certain that closest objects obscure the furthest objects.
Now does it look OK? You don't want distant objects the same size (or larger) than the closest objects, you want to make certain that you scale them.
The reason to choose numbers such as 8192 is because after writing your functions in C (or whichever language you choose) you'll want to optimize them with several versions each, written in assembly language, optimized for specific CPUs and GPU architectures. Without specifically optimized versions everything will be extremely slow.
I understand that you don't want to use a library but looking at several should give you an idea of the work involved and how you might implement your own. No need to copy, improve instead.
There are similar questions and answers that might fill in the blanks:
Reddit - "I want to create a 3D engine from scratch. Where do I start?"
Davrous - "Tutorial series: learning how to write a 3D soft engine from scratch in C#, TypeScript or JavaScript"
GameDev.StackExchange - "How to write my own 3-D graphics library for Windows? [closed]"
I am developing app in which I need to get face landmarks points on a cam like mirror cam or makeup cam. I want it to be available for iOS too. Please guide me for a robust solution.
I have used Dlib and Luxand.
DLIB: https://github.com/tzutalin/dlib-android-app
Luxand: http://www.luxand.com/facesdk/download/
Dlib is slow and having a lag of 2 sec approximately (Please look at the demo video on the git page) and luxand is ok but it's paid. My priority is to use an open source solution.
I have also use the Google vision but they are not offering much face landmarks points.
So please give me a solution to make the the dlib to work fast or any other option keeping cross-platform in priority.
Thanks in advance.
You can make Dlib detect face landmarks in real-time on Android (20-30 fps) if you take a few shortcuts. It's an awesome library.
Initialization
Firstly you should follow all the recommendations in Evgeniy's answer, especially make sure that you only initialize the frontal_face_detector and shape_predictor objects once instead of every frame. The frontal_face_detector will initialize faster if you deserialize it from a file instead of using the get_serialized_frontal_faces() function. The shape_predictor needs to be initialized from a 100Mb file, and takes several seconds. The serialize and deserialize functions are written to be cross-platform and perform validation on the data, which is robust but makes it quite slow. If you are prepared to make assumptions about endianness you can write your own deserialization function that will be much faster. The file is mostly made up of matrices of 136 floating point values (about 120000 of them, meaning 16320000 floats in total). If you quantize these floats down to 8 or 16 bits you can make big space savings (e.g. you can store the min value and (max-min)/255 as floats for each matrix and quantize each separately). This reduces the file size down to about 18Mb and it loads in a few hundred milliseconds instead of several seconds. The decrease in quality from using quantized values seems negligible to me but YMMV.
Face Detection
You can scale the camera frames down to something small like 240x160 (or whatever, keeping aspect ratio correct) for faster face detection. It means you can't detect smaller faces but it might not be a problem depending on your app. Another more complex approach is to adaptively crop and resize the region you use for face detections: initially check for all faces in a higher res image (e.g. 480x320) and then crop the area +/- one face width around the previous location, scaling down if need be. If you fail to detect a face one frame then revert to detecting the entire region the next one.
Face Tracking
For faster face tracking, you can run face detections continuously in one thread, and then in another thread, track the detected face(s) and perform face feature detections using the tracked rectangles. In my testing I found that face detection took between 100 - 400ms depending on what phone I used (at about 240x160), and I could do 7 or 8 face feature detections on the intermediate frames in that time. This can get a bit tricky if the face is moving a lot, because when you get a new face detection (which will be from 400ms ago), you have to decide whether to keep tracking from the new detected location or the tracked location of the previous detection. Dlib includes a correlation_tracker however unfortunately I wasn't able to get this to run faster than about 250ms per frame, and scaling down the resolution (even drastically) didn't make much of a difference. Tinkering with internal parameters produced increase speed but poor tracking. I ended up using a CAMShift tracker based on the chroma UV planes of the preview frames, generating the color histogram based on the detected face rectangles. There is an implementation of CAMShift in OpenCV, but it's also pretty simple to roll your own.
Hope this helps, it's mostly a matter of picking the low hanging fruit for optimization first and just keep going until you're happy it's fast enough. On a Galaxy Note 5 Dlib does face+feature detections at about 100ms, which might be good enough for your purposes even without all this extra complication.
Dlib is fast enough for most cases. The most of processing time is taken to detect face region on image and its slow because modern smartphones are producing high-resolution images (10MP+)
Yes, face detection can take 2+ seconds on 3-5MP image, but it tries to find very small faces of 80x80 pixels size. I am really sure, that you dont need such small faces on high resolution images and the main optimization here is to reduce the size of image before finding faces.
After the face region is found, the next step - face landmarks detection is extremely fast and takes < 3 ms for one face, this time does not depend on resolution.
dlib-android port is not using dlib's detector the right way for now. Here is a list of recommendations how to make dlib-android port work much faster:
https://github.com/tzutalin/dlib-android/issues/15
Its very simple and you can implement it yourself. I am expecting performance gain about 2x-20x
Apart from OpenCV and Google Vision, there are widely-available web services like Microsoft Cognitive Services. The advantage is that it would be completely platform-independent, which you've listed as a major design goal. I haven't personally used them in an implementation yet but based on playing with their demos for awhile they seem quite powerful; they're pretty accurate and can offer quite a few details depending on what you want to know. (There are similar solutions available from other vendors as well by the way).
The two major potential downsides to something like that are the potential for added network traffic and API pricing (depending on how heavily you'll be using them).
Pricing-wise, Microsoft currently offers up to 5,000 transactions a month for free with added transactions beyond that being some fraction of a penny (depending on traffic, you can actually get a discount for high volume), but if you're doing, for example, millions of transactions per month the fees can start adding up surprisingly quickly. This is actually a fairly typical pricing model; before you select a vendor or implement this kind of a solution make sure you understand how they're going to charge you and how much you're likely to end up paying and how much you could be paying if you scale your user base. Depending on your traffic and business model it could be either very reasonable or cost-prohibitive.
The added network traffic may or may not be a problem depending on how your app is written and how much data you're sending. If you can do the processing asynchronously and be guaranteed reasonably fast Wi-Fi access that obviously wouldn't be a problem but unfortunately you may or may not have that luxury.
I am currently working with the Google Vision API and it seems to be able to detect landmarks out of the box. Check out the FaceTracker here:
google face tracker
This solution should detect the face, happiness, and left and right eye as is. For other landmarks, you can call the getLandmarks on a Face and it should return everything you need (thought I have not tried it) according to their documentation: Face reference
I'm testing OpenGL performance on an Android phone ( HTC Wildfire to be exact ), and I came across a strange thing - when I try to draw a textured indexed rectangle the size of the screen ( 320 x 480 ), I can get a framerate up to 40 fps!!! - and that's only when I use a 32x32 texture.
If I increase the texture size to 256x256, the performance drops down to 35 frames.
My question is - how is it possible for all those Android games to run smoothly and still be so full of cool graphics.
There are a bunch of different ways to eek performance out of your device.
Make sure you aren't doing blending, or drawing that you don't need and from experience a lot of the lower end HTC devices (eg the desire) are fill rate limited.
Use triangle strips instead of triangles.
Use draw elements and cache your calls eg load your vertex buffer and call draw multiple times for a frame.
But most importantly use the DDMS with method profiling to determine where your bottle necks actually are they may be places you don't expect them eg logging, gc, slow operation in a loop. You will be able to see how much of the time is due to rendering and how much for other operations. (look for GLImpl.glDrawElements)
My biggest ones were with GC kicking in too often (5 times per second and causing my fps to be very sluggish), something you will see with mem allocations often in places you won't even think of. Eg if you are concatenating a string to a float with the + operator or using traditional java get() functions everywhere or (worst) collections, these create a large number of objects that can trigger GC.
Also if you have expensive operations separate them out into a separate thread.
Also I am assuming you are creating your texture once and using the same index each time. I have seen some tutorials which create the texture every time a frame is rendered.
With extensive use of the DDMS I was able to take fps from 12 to 50 on the HTC Desire for a busy scene.
Hope that gets you going in the right direction :)
In my own experience, they don't on devices like the wildfire.
I'm in the process of writing an Android game and I seem to be having performance issues with drawing to the Canvas. My game has multiple levels, and each has (obviously) a different number of objects in it.
The strange thing is that in one level, which contains 45 images, runs flawlessly (almost 60 fps). However, another level, which contains 81 images, barely runs at all (11 fps); it is pretty much unplayable. Does this seem odd to anybody besides me?
All of the images that I use are .png's and the only difference between the aforementioned levels is the number of images.
What's going on here? Can the Canvas simply not draw this many images each game loop? How would you guys recommend that I improve this performance?
Thanks in advance.
Seems strange to me as well. I am also developing a game, lots of levels, I can easily have a 100 game objects on screen, have not seen a similar problem.
Used properly, drawbitmap should be very fast indeed; it is little more than a copy command. I don't even draw circles natively; I have Bitmaps of pre-rendered circles.
However, the performance of Bitmaps in Android is very sensitive to how you do it. Creating Bitmaps can be very expensive, as Android can by default auto-scale the pngs which is CPU intensive. All this stuff needs to be done exactly once, outside of your rendering loop.
I suspect that you are looking in the wrong place. If you create and use the same sorts of images in the same sorts of ways, then doubling the number of screen images should not reduce performance by a a factor of over 4. At most it should be linear (a factor of 2).
My first suspicion would be that most of your CPU time is spent in collision detection. Unlike drawing bitmaps, this usually goes up as the square of the number of interacting objects, because every object has to be tested for collision against every other object. You doubled the number of game objects but your performance went down to a quarter, ie according to the square of the number of objects. If this is the case, don't despair; there are ways of doing collision detection which do not grow as the square of the number of objects.
In the mean time, I would do basic testing. What happens if you don't actually draw half the objects? Does the game run much faster? If not, its nothing to do with drawing.
I think this lecture will help you. Go to the 45 minute . There is a graph comparing the Canvas method and the OpenGl method. I think it is the answer.
I encountered a similar problem with performance - ie, level 1 ran great and level 2 didn't
Turned it wasn't the rendering that was a fault (at least not specifically). It was something else specific to the level logic that was causing a bottleneck.
Point is ... Traceview is your best friend.
The method profiling showed where the CPU was spending its time and why the glitch in the framerate was happening. (incidentally, the rendering cost was also higher in Level 2 but wasn't the bottleneck)
I'm working on a 2D game for android using OpenGL ES 1.1 and I would like to know if this idea is good/bad/useless.
I have a screen divided in 3 sections, so I used scissors to avoid object overlapping from one view to the other.
I roughly understand the low level implementation of scissor and since my draws take a big part of the computation, I'm looking for ideas to speed it up.
My current idea is as follows:
If I put a glscissor around each object before I draw it, would I increase the speed of my application.
The idea is if I put a GLScissor, (center+/-sizetexture), then the OpenGL pipeline will have less tests to do (since it can discard 90~99% of the surface thanks to the glscissors.
So to all opengl experts, is this good, bad or will have no impact ? And why?
It shouldn't have any impact, IMHO. I'm not an expert, but my thinking is as follows:
Scissor test saves on your GPU's fill rate (the amount of fragments/pixels a hardware can put in the framebuffer per second),
if you put a glScissor around each object, the test won't actually cut off anything - the same number of pixels will be rendered, so no fill rate will be saved.
If you want to have your rendering optimized, a good place to start is to make sure you're doing optimal batching and reduce the number of draw calls or complex state switches (texture switches).
Of course the correct approach to optimizations is to try to diagnose why is your rendering slow, so the above is just my guess which may or may not help in your particular situation.