I use the SimpleBlobDetector of OpenCV to find a specific set of little features in images. I work in C++ native (JNI) on Android. On my newer faster phone, it works nicely.
However, on an older slower phone, it is way too slow. I have discovered that the slowest part is the thesholding. Modifying the three theshold parameters to speed things up simply makes the algorithm stop working.
I found a version of the source code on some web page and started modifying it.
I try to use an adaptive thresholding instead and to perform some erode and dilate after, for good measure, but I didn't manage to get any reasonable results. Perhaps the parameters are way off?
adaptiveThreshold(mGr, mBin, 255, ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY_INV, 25, 30);
Mat kernel = getStructuringElement(MORPH_CROSS, Size(3,3), Point(1,1));
erode(mBin, mBin, kernel);
dilate(mBin, mBin, kernel, Point(-1,-1), 5);
I get confused when there are too many parameters to fiddle with. I am also concerned that the image conditions will vary and then other parameters have to be used. I'd want an "adaptive adaptive" tresholding, if you know what I mean?
What can I do to make it work, and what other ways can we do this to get higher speed?
Assuming you are dealing with video, rather than a random set of images, one technique to reduce the load on your device when doing this type of detection, is to not do it very frame.
For example, you might do it even 10th frame rather than every frame.
You can experiment with different intervals to see if you can find one that reduces the load while still detecting quickly enough for your chosen use cases.
Related
I am developing app in which I need to get face landmarks points on a cam like mirror cam or makeup cam. I want it to be available for iOS too. Please guide me for a robust solution.
I have used Dlib and Luxand.
DLIB: https://github.com/tzutalin/dlib-android-app
Luxand: http://www.luxand.com/facesdk/download/
Dlib is slow and having a lag of 2 sec approximately (Please look at the demo video on the git page) and luxand is ok but it's paid. My priority is to use an open source solution.
I have also use the Google vision but they are not offering much face landmarks points.
So please give me a solution to make the the dlib to work fast or any other option keeping cross-platform in priority.
Thanks in advance.
You can make Dlib detect face landmarks in real-time on Android (20-30 fps) if you take a few shortcuts. It's an awesome library.
Initialization
Firstly you should follow all the recommendations in Evgeniy's answer, especially make sure that you only initialize the frontal_face_detector and shape_predictor objects once instead of every frame. The frontal_face_detector will initialize faster if you deserialize it from a file instead of using the get_serialized_frontal_faces() function. The shape_predictor needs to be initialized from a 100Mb file, and takes several seconds. The serialize and deserialize functions are written to be cross-platform and perform validation on the data, which is robust but makes it quite slow. If you are prepared to make assumptions about endianness you can write your own deserialization function that will be much faster. The file is mostly made up of matrices of 136 floating point values (about 120000 of them, meaning 16320000 floats in total). If you quantize these floats down to 8 or 16 bits you can make big space savings (e.g. you can store the min value and (max-min)/255 as floats for each matrix and quantize each separately). This reduces the file size down to about 18Mb and it loads in a few hundred milliseconds instead of several seconds. The decrease in quality from using quantized values seems negligible to me but YMMV.
Face Detection
You can scale the camera frames down to something small like 240x160 (or whatever, keeping aspect ratio correct) for faster face detection. It means you can't detect smaller faces but it might not be a problem depending on your app. Another more complex approach is to adaptively crop and resize the region you use for face detections: initially check for all faces in a higher res image (e.g. 480x320) and then crop the area +/- one face width around the previous location, scaling down if need be. If you fail to detect a face one frame then revert to detecting the entire region the next one.
Face Tracking
For faster face tracking, you can run face detections continuously in one thread, and then in another thread, track the detected face(s) and perform face feature detections using the tracked rectangles. In my testing I found that face detection took between 100 - 400ms depending on what phone I used (at about 240x160), and I could do 7 or 8 face feature detections on the intermediate frames in that time. This can get a bit tricky if the face is moving a lot, because when you get a new face detection (which will be from 400ms ago), you have to decide whether to keep tracking from the new detected location or the tracked location of the previous detection. Dlib includes a correlation_tracker however unfortunately I wasn't able to get this to run faster than about 250ms per frame, and scaling down the resolution (even drastically) didn't make much of a difference. Tinkering with internal parameters produced increase speed but poor tracking. I ended up using a CAMShift tracker based on the chroma UV planes of the preview frames, generating the color histogram based on the detected face rectangles. There is an implementation of CAMShift in OpenCV, but it's also pretty simple to roll your own.
Hope this helps, it's mostly a matter of picking the low hanging fruit for optimization first and just keep going until you're happy it's fast enough. On a Galaxy Note 5 Dlib does face+feature detections at about 100ms, which might be good enough for your purposes even without all this extra complication.
Dlib is fast enough for most cases. The most of processing time is taken to detect face region on image and its slow because modern smartphones are producing high-resolution images (10MP+)
Yes, face detection can take 2+ seconds on 3-5MP image, but it tries to find very small faces of 80x80 pixels size. I am really sure, that you dont need such small faces on high resolution images and the main optimization here is to reduce the size of image before finding faces.
After the face region is found, the next step - face landmarks detection is extremely fast and takes < 3 ms for one face, this time does not depend on resolution.
dlib-android port is not using dlib's detector the right way for now. Here is a list of recommendations how to make dlib-android port work much faster:
https://github.com/tzutalin/dlib-android/issues/15
Its very simple and you can implement it yourself. I am expecting performance gain about 2x-20x
Apart from OpenCV and Google Vision, there are widely-available web services like Microsoft Cognitive Services. The advantage is that it would be completely platform-independent, which you've listed as a major design goal. I haven't personally used them in an implementation yet but based on playing with their demos for awhile they seem quite powerful; they're pretty accurate and can offer quite a few details depending on what you want to know. (There are similar solutions available from other vendors as well by the way).
The two major potential downsides to something like that are the potential for added network traffic and API pricing (depending on how heavily you'll be using them).
Pricing-wise, Microsoft currently offers up to 5,000 transactions a month for free with added transactions beyond that being some fraction of a penny (depending on traffic, you can actually get a discount for high volume), but if you're doing, for example, millions of transactions per month the fees can start adding up surprisingly quickly. This is actually a fairly typical pricing model; before you select a vendor or implement this kind of a solution make sure you understand how they're going to charge you and how much you're likely to end up paying and how much you could be paying if you scale your user base. Depending on your traffic and business model it could be either very reasonable or cost-prohibitive.
The added network traffic may or may not be a problem depending on how your app is written and how much data you're sending. If you can do the processing asynchronously and be guaranteed reasonably fast Wi-Fi access that obviously wouldn't be a problem but unfortunately you may or may not have that luxury.
I am currently working with the Google Vision API and it seems to be able to detect landmarks out of the box. Check out the FaceTracker here:
google face tracker
This solution should detect the face, happiness, and left and right eye as is. For other landmarks, you can call the getLandmarks on a Face and it should return everything you need (thought I have not tried it) according to their documentation: Face reference
I created an application with Starling, on the new mobile devices it performs amazingly well, however on the older devices (e.g. iPhone 4) I encounter a very odd lag.
I have as far as I can tell a completely static situation:
There are quite a few display objects added to stage, many of them are buttons in case it matters, their properties are not changed at all after initialization (x, y, rotation, etc...).
There are no enterframes / timeouts / intervals / requests of any kind in the background.
I'm not allocating / deallocating any memory.
In this situation, there's an average of 10 FPS out of 30, which is very odd.
Since Starling is a well established framework, I imagine it's me who's doing something wrong / not understanding something / not aware of something.
Any idea what might be causing it?
Has anyone else experienced this sort of problem?
Edit:
After reading a little I've made great optimizations in every possible way according to this thread:
http://wiki.starling-framework.org/manual/performance_optimization
I reduced the draw calls from around 90 to 12, flattened sprites and set blendmode to none in specific cases to ease on CPU, and so on...
To my surprise when I tested again, the FPS was unaffected:
fps: 6 / 60
mem: 19
drw: 12
Is it even possible to get normal fps with Starling on mobile? What am I missing?
I am using big textures that are scaled down to the size of the device, is it possible that such a thing affects the fps that much?
Regarding "Load textures from files/URLs", I'm downloading different piles of assets for different situations, therefore I assumed compiling each pile into a SWF would be way faster than sending a separate request for each file. The problem is, for that I can only use embed, which apparently uses twice the memory. Do you have any solution in mind to enjoy the best of both worlds?
Instead of downloading 'over-the-wire' your assets and manually caching them for re-use, you can embed the assets into your app bundle vs. embedding them and then use the Starling AssetManager to load the textures at the resolution/scale that you need for the device:
ie.
assets.enqueue(
appDir.resolvePath("audio"),
appDir.resolvePath(formatString("fonts/{0}x", scaleFactor)),
appDir.resolvePath(formatString("textures/{0}x", scaleFactor))
);
Ref: https://github.com/Gamua/Starling-Framework/blob/master/samples/scaffold_mobile/src/Scaffold_Mobile.as
Your application bundle gets bigger of course, but you do not take the 2x ram hit of using 'embed'.
Misc perf ideas from my comment:
Testing FPS with "Release" mode correct?
Are you using textures that are scaled down to match the resolution of the device before loading them?
Are you mixing BLEND modes that are causing additional draw calls?
Ref: The Performance Optimization is great reading to optimize your usage of Starling.
Starling is not a miracle solution for mobile device. There's quite a lot of code running in the background in order to make the GPU display anything. You the coder has to make sure the amount of draw call is kept to a minimum. The weaker the device and the less draw call you should force. It's not rare to see people using Starling and not pay any attention to their draw calls.
The size of graphics used is only relevant for the GPU upload time and not that much for the GPU display time. So of course all relevant texture need to be uploaded prior to displaying any scenes. You simply cannot try to upload any new texture while any given scene is playing. Even a small texture uploading will cause idling.
Displaying everything using Starling is not always a smart choice. In render mode the GPU gets a lot of power but the CPU still has some remaining. You can reduce the amount of GPU uploading and GPU charge by simply displaying static UI elements using the classic display list (which is where the Staling framework design is failing). Starling was originally made to make it very difficult to use both display system together that's one of the downsides of using this framework. Most professional I know including myself don't use Starling for that reason.
Your system must be flexible and you should embed your assets for mobile and not use any external swf as much as possible and be able to switch to another system for the web. If you expect to use one system of asset for mobile/desktop/web version of your app you are setting yourself up for failure. Embedding on mobile is critical for memory management as the AIR platform internally manages the cache of those embedded assets. Thx to that when creating new instances of those assets the memory consumption stays under control, if you don't embed then you are on your own.
Regarding overall performance a very weak Android device will probably never be able to go passed 10 fps when using Starling or any Stage3D framework because of the amount of code those framework need to run (draw calls) in the background. On weak device that amount of code is already enough to completely overload the CPU. On the other hand on weak device you can still get a good performance and user experience by using GPU mode instead of render mode (so no Stage3D) and displaying mostly raster graphic.
IN RESPONSE TO YOUR EDIT:
12 draw calls is very good (90 was pretty high).
That you still get low FPS on some device is not that surprising. Especially low end Android device will always have low FPS in render mode with Stage3D framework because of the amount of code that those framework have to run to render one frame. Now the size of the texture you are using should not affect the FPS that much (that's the point of Stage3D). It would help with the GPU uploading time if you reduce the size of those graphics.
Now optimization is the key and optimizing on low end device with low FPS is the best way to go since whatever you do will have great effect on better device as well. Start by running tests and only displaying static graphics with no or very little code on your part just to see how far the Stage3D framework can go on its own on those weak device without losing any FPS and then optimize from there. The amount of object displayed on screen + the amount of draw calls is what affects FPS with those Stage3D framework so keep a count of those and always seek ways to reduce it. On some low end device it's not practical to try to keep a 60fps so try to switch to 30 and adjust your rendering accordingly.
So I'm trying to get the camera pixel data, monitor any major changes in luminosity and then save the image. I have decided to use open gl as I figured it would be quicker to do the luminosity checks in the fragment shader.
I bind a surface texture to the camera to get the image to the shader and am currently using glReadPixels to get the pixels back which I then put in a bitmap and save.
The bottle neck on the glReadPixels is crazy so I looked into other options and saw that EGL_KHR_image_base was probably my best bet as I'm using OpenGL-ES 2.0.
Unfortunately I have no experience with extensions and don't know where to find exactly what I need. I've downloaded the ndk but am pretty stumped. Could anyone point me in the direction of some documentation and help explain it if I don't understand fully?
Copying pixels with glReadPixels() can be slow, though it may vary significantly depending on the specific device and pixel format. Some tests with using glReadPixels() to save frames from video data (which is also initially YUV) found that 96.5% of the time was in PNG compression and file I/O on a Nexus 5.
In some cases, the time required goes up substantially if the source and destination formats don't match. On one particular device I found that copying to RGBA, instead of RGB, reduced the time required.
The EGL calls can work but require non-public API calls. And it's a bit tricky; see e.g. this answer. (I think the comment in the edit would allow it to work, but I never got back around to trying it, and I'm not in a position to do so now.)
The only solution would be using Pixel Pack Buffer (PBO), where the reading is asynchronous. However, to utilize this asynchronous, you need to have PBO and use it as ping pong buffer.
I refer to http://www.jianshu.com/p/3bc4db687546 where I reduce the read time for 1080p from 40ms to 20ms.
Drawing on android itself, is a herculean task. Now my requirement is on to see, how robust I can draw atleast 10million points with different intensity levels.
Some methods I came across:
Android draws with Canvas and Bitmaps
SurfaceView with OpenGL
Using libGDX fastest drawing library
Custom view to refresh & update automatically
What is best method to go about it? If I need to draw 10million or more points maybe on a static image on android, how can I enhance it and not degrade its performance. Every second I need to refresh and draw another 10million points. Is it possible or android is capable of doing such a task?
As your question states 10mil/sec, I understand that you want them realtime, thus opengl is the way to go, leaving you with options 2, 3 and 4.
You would definitely need to batch those calls.
You can think about using point sprites to reduce the amount of data you need to transfer to GPU.
Android as OS is capable of anything your machine can support. Your specific device may have performance issues, or not.
Don't optimize prematurely and try option 3 (libGDX). It would be the easiest to set up and achieve your task. If it won't be performant enough I'd think about rolling my own opengl-based solution.
https://gamedev.stackexchange.com/questions/11095/opengl-es-2-0-point-sprites-size
I'm trying to put a particle system together in Android, using OpenGL. I want a few thousand particles, most of which will probably be offscreen at any given time. They're fairly simple particles visually, and my world is 2D, but they will be moving, changing colour (not size - they're 2x2), and I need to be able to add and remove then.
I currently have an array which I iterate through, handling velocity changes, managing lifecyling (killing old ones, adding new ones), and plotting them, using glDrawArrays. What OpenGl is pointing at, though, for this call, is a single vertex; I glTranslatex it to the relevant co-ords for each particle I want to plot, one at a time, set the colour with glColor4x then glDrawArrays it. It works, but it's a bit slow and only works for a few hundred particles. I'm handling the clipping myself.
I've written a system to support static particles which I have loaded into a vertex/colourarray and plot using glDrawArrays, but this approach only seems suitable for particles which will never change relative location (ie I move all of them using glTranslate), colour and where I don't need to add/remove particles. A few tests on my phone (HTC Desire) suggest that trying to alter the contents of those arrays (which are ByteBuffers, pointed to by OpenGL) is extremely slow.
Perhaps there's some way of manually writing the screen myself with the CPU. If I'm just plotting 1x1/2x2 dots on the screen, and I'm purely interested in writing and not doing any blending/antialiasing, is this an option? Would it be quicker than whatever OpenGl is doing?
(200 or so particles on a 1ghz machine with megs of ram. This is way slower than I was getting 20 years ago on a 7mhz machine with <500k of ram! I appreciate I'm using Java here, but surely there must be a better solution. Do I have to use the NDK to get the power of C++, or is what I'm after possible)
I've been hoping somebody might answer this definitively, as I'll be needing particles on Android myself. (I'm working in C++, though -- Currently using glDrawArrays(), but haven't pushed particles to the limit yet.)
I found this thread on gamedev.stackexchange.com (not Android-specific), and nobody can agree on the best approach there, but you might want to try a few things out and see for yourself.
I was going to suggest glDrawArrays(GL_POINTS, ...) with glPointSize(), but the guy asking the question there seemed unhappy with it.
Let us know if you find a good solution!