I'd like to ask for some help regarding the sampling rate and jitter on the magnetometer.
I'm working on a project with some people that involves a high rate magnetic field sampling application. Even though we have developed an algorithm to workaround the jitter and other issues we encountered, we'd like to improve the sampling rating somehow and, at the same time, if possible, attempt to reduce the sampling jitter. Improving the sampling rate would allow us to achieve better results for our application. We are using a Samsung Nexus S and according to the tests we performed we observed that the sampling rates between 15ms and 20ms and, sometimes, peaks around 50 ms (this is between consecutive events).
We have come with different approaches to try to develop a solution to these issues, however without any success so far.
Firstly, we thought of modifying the current magnetometer (AK8973) device driver but we soon realized that the bottleneck couldn't the there as the device driver directly implements the correct sensor operation modes, data reading and respects the sensor hardware timing constraints.
As a second alternative, we developed a small code using Android NDK to obtain samples to compare the times obtained between consecutive events, i.e. between samples, with the code developed at the Java level. Sadly, the result was pretty much the same.
As a final alternative, we are currently trying to understand how the events are handled by the API and passed to Java. That said, if the bottleneck is there we'd try to change the code to solve the issues. However, we are not sure if the bottleneck is in the underlying hardware or software API.
The code we used for NDK is based on the example provided by the Android documentation (NativeActivity) and some other examples we came across with by googling (google groups and other articles). The articles we found are quite interesting (Native Sampling, Sensor Sampling Performance). Even though it is reported that native sampling allows for better performance, in our case it seems not to happen.
We'd like to know if it is actually possible to obtain a higher sampling rate at all or if anyone has already developed a solution. Is the bottleneck at the software or hardware level?
In the articles referenced above, it is mentioned that a custom library (FreeMotion) is able to deliver better performance results, as a replace to the original sensor library, because it works with the drivers directly. Has anyone used this library before and, if yes, could you provide us your results?
With another smartphone, a Samsung Galaxy Nexus, we decided to collect more magnetometer data samples and do some statistical analysis and compare the results obtained with the Samsung Nexus S. This time we used Android v4.1.2. Again, we observed that the rate at we are able to collect data does not improve significantly when comparing NDK vs SDK APIs with both smartphones, using the values from ASensor_getMinDelay() and SENSOR_DELAY_FASTEST, respectively, which give maximum performance. The timestamp jitter reduction is, however, very significant for both smartphones with NDK API, regardless of the approach used: polling or callback-based. Polling, in general, provides little or no better results, and should be more CPU intensive.
The Samsung Galaxy Nexus sensor hardware is far superior, and thus fine-grained tuning of the desired event rate is possible for, naturally, rates aboves ASensor_getMinDelay(). For the Samsung Nexus S, however, this was not possible; for lower rates the target rate is not satisfied and samples are acquired at an even slower rate. Activating multiple sensors, the overall jitter reduction is greater than using only a single sensor.
Related
I'm trying to use the TFLite Benchmark tool with mobilenet model and checking the final inference time in microseconds to compare different models. The issue I am facing is with varying results between runs. I also found this section in the documentation which is pertaining to reducing variance between runs on Android. It explains how one can set the CPU affinity before running the benchmark to get consistent results between runs. Currently using Redmi Note 4 and One Plus for the work.
Please, can someone explain what should I set the CPU affinity value as for my experiments?
Can I find the affinity masks for different mobiles online or on the Android phone?
When I increase the number of --warmup_runs parameter I get less varying results. Are there more ways in which I can make my results more consistent?
Are the background processes on the Android phone affecting my the inference time and is there a way I can stop them to reduce the variance in results?
As the docs suggest, any value is fine, as long as you stay consistent with one across experiments. The one thing to consider is whether to use a big core or a little core (if you're a big.little architecture) and usually it's good to try both (they have varying cache sizes, etc.)
Yes you can typically find this information online. See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0515b/CJHBGEBA.html as an example. You'll want to look at your particular phone, see the particular CPU it uses, and then google for more info from there.
I've tried --warmup_runs = 2000+ and typically it's pretty stable. There's a bit more variance with smaller models. For intensive models (at least for the particular device), you might want to see if the devices are overheating, etc. I haven't seen this for mid-tier phones, but heard that people sometimes keep their devices in a cool area (fan, fridge) for this.
They may, but it's unavoidable. The best you can do is close all applications and disconnect from the internet. I personally haven't seen them introduce too much variance though.
I am developing app in which I need to get face landmarks points on a cam like mirror cam or makeup cam. I want it to be available for iOS too. Please guide me for a robust solution.
I have used Dlib and Luxand.
DLIB: https://github.com/tzutalin/dlib-android-app
Luxand: http://www.luxand.com/facesdk/download/
Dlib is slow and having a lag of 2 sec approximately (Please look at the demo video on the git page) and luxand is ok but it's paid. My priority is to use an open source solution.
I have also use the Google vision but they are not offering much face landmarks points.
So please give me a solution to make the the dlib to work fast or any other option keeping cross-platform in priority.
Thanks in advance.
You can make Dlib detect face landmarks in real-time on Android (20-30 fps) if you take a few shortcuts. It's an awesome library.
Initialization
Firstly you should follow all the recommendations in Evgeniy's answer, especially make sure that you only initialize the frontal_face_detector and shape_predictor objects once instead of every frame. The frontal_face_detector will initialize faster if you deserialize it from a file instead of using the get_serialized_frontal_faces() function. The shape_predictor needs to be initialized from a 100Mb file, and takes several seconds. The serialize and deserialize functions are written to be cross-platform and perform validation on the data, which is robust but makes it quite slow. If you are prepared to make assumptions about endianness you can write your own deserialization function that will be much faster. The file is mostly made up of matrices of 136 floating point values (about 120000 of them, meaning 16320000 floats in total). If you quantize these floats down to 8 or 16 bits you can make big space savings (e.g. you can store the min value and (max-min)/255 as floats for each matrix and quantize each separately). This reduces the file size down to about 18Mb and it loads in a few hundred milliseconds instead of several seconds. The decrease in quality from using quantized values seems negligible to me but YMMV.
Face Detection
You can scale the camera frames down to something small like 240x160 (or whatever, keeping aspect ratio correct) for faster face detection. It means you can't detect smaller faces but it might not be a problem depending on your app. Another more complex approach is to adaptively crop and resize the region you use for face detections: initially check for all faces in a higher res image (e.g. 480x320) and then crop the area +/- one face width around the previous location, scaling down if need be. If you fail to detect a face one frame then revert to detecting the entire region the next one.
Face Tracking
For faster face tracking, you can run face detections continuously in one thread, and then in another thread, track the detected face(s) and perform face feature detections using the tracked rectangles. In my testing I found that face detection took between 100 - 400ms depending on what phone I used (at about 240x160), and I could do 7 or 8 face feature detections on the intermediate frames in that time. This can get a bit tricky if the face is moving a lot, because when you get a new face detection (which will be from 400ms ago), you have to decide whether to keep tracking from the new detected location or the tracked location of the previous detection. Dlib includes a correlation_tracker however unfortunately I wasn't able to get this to run faster than about 250ms per frame, and scaling down the resolution (even drastically) didn't make much of a difference. Tinkering with internal parameters produced increase speed but poor tracking. I ended up using a CAMShift tracker based on the chroma UV planes of the preview frames, generating the color histogram based on the detected face rectangles. There is an implementation of CAMShift in OpenCV, but it's also pretty simple to roll your own.
Hope this helps, it's mostly a matter of picking the low hanging fruit for optimization first and just keep going until you're happy it's fast enough. On a Galaxy Note 5 Dlib does face+feature detections at about 100ms, which might be good enough for your purposes even without all this extra complication.
Dlib is fast enough for most cases. The most of processing time is taken to detect face region on image and its slow because modern smartphones are producing high-resolution images (10MP+)
Yes, face detection can take 2+ seconds on 3-5MP image, but it tries to find very small faces of 80x80 pixels size. I am really sure, that you dont need such small faces on high resolution images and the main optimization here is to reduce the size of image before finding faces.
After the face region is found, the next step - face landmarks detection is extremely fast and takes < 3 ms for one face, this time does not depend on resolution.
dlib-android port is not using dlib's detector the right way for now. Here is a list of recommendations how to make dlib-android port work much faster:
https://github.com/tzutalin/dlib-android/issues/15
Its very simple and you can implement it yourself. I am expecting performance gain about 2x-20x
Apart from OpenCV and Google Vision, there are widely-available web services like Microsoft Cognitive Services. The advantage is that it would be completely platform-independent, which you've listed as a major design goal. I haven't personally used them in an implementation yet but based on playing with their demos for awhile they seem quite powerful; they're pretty accurate and can offer quite a few details depending on what you want to know. (There are similar solutions available from other vendors as well by the way).
The two major potential downsides to something like that are the potential for added network traffic and API pricing (depending on how heavily you'll be using them).
Pricing-wise, Microsoft currently offers up to 5,000 transactions a month for free with added transactions beyond that being some fraction of a penny (depending on traffic, you can actually get a discount for high volume), but if you're doing, for example, millions of transactions per month the fees can start adding up surprisingly quickly. This is actually a fairly typical pricing model; before you select a vendor or implement this kind of a solution make sure you understand how they're going to charge you and how much you're likely to end up paying and how much you could be paying if you scale your user base. Depending on your traffic and business model it could be either very reasonable or cost-prohibitive.
The added network traffic may or may not be a problem depending on how your app is written and how much data you're sending. If you can do the processing asynchronously and be guaranteed reasonably fast Wi-Fi access that obviously wouldn't be a problem but unfortunately you may or may not have that luxury.
I am currently working with the Google Vision API and it seems to be able to detect landmarks out of the box. Check out the FaceTracker here:
google face tracker
This solution should detect the face, happiness, and left and right eye as is. For other landmarks, you can call the getLandmarks on a Face and it should return everything you need (thought I have not tried it) according to their documentation: Face reference
The Google Fit app, when installed, measures the duration you are walking or running, and also the number of steps all the time. However, strangely, using it does not seem to drain the battery. Other apps like Moves which seems to record number of steps pretty accurately declares that it uses a lot of power because of it constantly monitoring the GPS and the accelerometer.
I imagine several possibilities:
Wakes up the phone every minute or so, then analyses the sensors for a few seconds and then sleeps again. However it seems that the records are pretty accurate to the minute, so the waking up must be frequent.
Actually turns on the accelerometer all the time, and analyzes it only after the accelerometer measurement data buffer is full. However I think the accelerometer has a small buffer to store the latest measurements.
Use GPS to estimate the number of steps instead of actually counting it. However this should not be the case, since it works even indoors.
The app still feels magical. Counting steps the whole time without perceptible battery drain.
Thanks for asking this question!
Battery is one of our top most concerns and we work hard to optimize Google Fit's battery usage and provide a magical experience.
Google Fit uses a mix of sensors(Accelerometer, Step counter, Significant Motion counter), Machine Learning and heuristics to get the data right. Our algorithm is pretty similar to your 1st option plus a little bit of magic.
We periodically poll accelerometer and use Machine Learning and heuristics to correctly identify the activity and duration.
For devices with hardware step counters, we use these step counters to monitor step counts. For older devices, we use the activity detected to predict the right number of steps.
Our algorithms merge these activities, steps and sometimes location to correlate and further increase accuracy.
We do not poll GPS to estimate steps or detect activities.
-- Engineer on Google Fit Team.
On some very recent phones like the Nexus 5 (released in late 2013 with Android 4.4 KitKat), there is a dedicated low-power CPU core that can serve as a pedometer. Since this core consumes very little power and can compute steps by itself without the need for the entire CPU or the GPS, overall battery use is reduced greatly. On the recent iPhones, there is a similar microcontroller called the M7 coprocessor in the iPhone 5s and the M8 in the iPhone 6.
More information here:
https://developer.android.com/about/versions/kitkat.html
http://nexus5.wonderhowto.com/how-to/your-nexus-5-has-real-pedometer-built-in-heres-you-use-0151267/
http://www.androidbeat.com/2014/01/pedometer-nexus5-hardware-count-steps-walked/
having a 3 year old HTC OneX I can say that THERE IS NO DEDICATED HARDWARE, Google Fit just uses standard sensors in a very clever way. I come from Runtastic Pedometer: there is a clear battery consume when in use, it would be impossible to keep it on all the time as it needs the full accelerometer power. On the other side, if you stand still and shake the phone Runtastic will count the shakes, while Google Fit apparently does nothing... Still it works perfectly when you actually walk or run. Magic.
Google fit try to learn use pedo step pattern and try to create its own personal walking patterns and its clusters. This eliminates the need of having huge mathematics calculations on receiving sensor data every time. This makes Google fit more power efficient compared other software pedo apps. Having said that, there is compromise on accuracy factors here. Between power-accuracy trade off, google seems to be more aligned towards power factor here.
At this moment the most power efficient detection happens Samsung flagship & its other high end models. Thanks to Samsung's dedicated hardware chip! No matter how power efficient your software pedo algorithm be but its hard to beat dedicated hardware unit advantage. I also heard about Google's bringing dedicated hardware unit for Ped upcoming nexus devices.
It would seem like the solution would be device dependent, with devices where a co-motion processor or "wimpier" core is available for low power operations, that it would default to this once the buffer is full or similar condition. With devices where a low-power core is not available, it seems like waking the device could trigger a JIT operation that would/should finish by the time the app is called.
While the Nexus 5 does have a dedicated "low-power" pedometer built in. It isn't as "low power" as you might think.
My Nexus 5 battery life was decreased by about 25% when I had Google Fit Activity Detection switched on.
Also, the pedometer doesn't show up in the battery usage stats. Presumably, because it is a hardware thing.
I don't know for the other phones out there, but Google Fit was really draining my battery life on my Nexus 5. Disabling it definitely improved my battery life.
I asked this question over in the Android Developer's user group, last week. Nobody responded, so I thought I'd ask it over here.
Does anyone have any suggestions about how to schedule video events to happen at an exact clock time? I've been thinking about an application that would require two adjacent phones to display the same thing at exactly the same time. I'm wondering what that granularity of "exactly" is going to be.
I've done some testing on a couple of devices and it seems that the delay between an invalidate and the subsequent redraw can be as much 16ms. Perhaps I can do better with OpenGL?
Ideas? Anyone?
OpenGL itself is capable of very high framerates (unless I am mistaken). What I can tell you is that plenty of games have been written to run and maintain 30 frames per second. That's one frame every 3.33ms. At that speed, the change should be imperceptible to the human eye, or so I've heard (the estimate limit is 5ms).
However, there is a major difference between what OpenGL can do, and what the device running OpenGL can do. Again, Unless I am mistaken, you should be able to instruct OpenGL to run at 200 frames per second. The caveat is that if the machine you are running the animation on can't handle that framerate, it will either frame-skip or lag, and in either case will hog the processor and GPU like no other.
Again, as I don't know the specifics, I can only guess, but I would think that this is less of an issue with OpenGL vs the other leading brand, and more of an issue of the devices you are trying to sync. With the right code, a proven framework, two powerful machines, and high-speed data transfer capability (read: LAN at the least), there is no reason why you shouldn't be able to sync up the video. If any of these things are not the case, all bets are off.
-Cody
I'm exploring voice recognition and DSP, and so I would like to implement a simple sound frequency analyzer on my smartphone (I have both an iPhone and a Samsung Nexus S running Android). I have done basic DSP in Matlab previously.
From my understanding, I need to perform an FFT to get the fundamental frequencies of a signal.
So now, I would like to sample the microphone at 44100 Hz. If I use a sliding window of sample size 512 with 50% overlap, that means I need to do an FFT every 256 samples, or 0.00580 seconds.
That rate seems really high, particularly if I program in Java for Android. Will my smartphone be able to handle that speed? I am aware that you can program in C/C++ on Android, but I would like to keep it with Java for the time being.
Performing a real-to-complex FFT requires ~5/2 n lg n floating-point operations (additions and multiplications). In your case, n=512, so:
flops per fft ~= (5/2) * 512 * 9 = 11520
So 172 ffts per second requires about 2 million floating-point operations per second. That sounds like a lot, but it really isn't that many. The hardware of a typical armv7-class smartphone is capable of hundreds of millions or billions of floating-point operations per second.
Note however that you will want to have a carefully-written high-performance FFT; poorly written FFTs are notoriously inefficient. On the iPhone, you can use the Accelerate framework (built right into the OS, and available in the SDK), which provides a nice set of FFT functions; I'm not sure what's available on Android.
For the iPhone, the Accelerate framework for iOS can do all the FFTs you specify using on the order of 1% of CPU time (exact percentage depending on device model and FFT data types).
For Android, you might strongly want to consider using an NDK native library for processor intensive numerical calculations.
Also note that an FFT will give you the peak frequencies, which will not necessarily include the fundamental or voice pitch frequency.
ADDED: This Java benchmark web page suggests that Android phones are capable of in the range of 5 to over 50 MFlops using Java for well written matrix math. A well written FFT should fall around roughly the same performance range in MFlops. #Stephan Cannon posted that on the order of 2 MFlops might be required for your spec.
Your Android device will be able to handle this fine. I've written realtime, FFT-based frequency analyzers that ran on Windows Mobile devices from a few years ago (using pure C#), and these devices had much worse processors than current Android devices. The most computationally expensive aspect of FFT is the trig functions, and since you're using a fixed-size window you can easily replace the trig function calls with a pre-calculated lookup table.
As an aside you can probably cut down your computation time by reducing your sampling rate. Speech doesn't have much energy above 8 kHz, so you could likely downsample your audio to 16 KHz before doing any FFTs, without losing much accuracy. At 16 kHz your FFTs would be smaller, and so faster.
Wikipedia claims that 16 kHz is a standard sampling rate for speech recognition in desktop applications.
(I realize that this doesn't answer the OP's question, but I think it might be helpful to him nonetheless, given his application.)