In an Android app I'm making, I would like to detect when a user is holding a phone in his hand, makes a gesture like he would when throwing a frissbee. I have seen a couple of apps implementing this, but I can't find any example code or tutorial on the web.
It would be great with some thoughts on how this could be done, and ofc.
It would be even better with some example code or link to a tutorial.
Accelerometer provides you with a stream of 3d vectors. In case your phone is help in hand, its direction is opposite of earth gravity pull and size is the same. (this way you can determine phone orientation)
If user lets if fall, vector value will go to 0 (the process as weighlessness on space station)
If user makes some gesture without throwing it, directon will shift, and amplitude will rise, then fall and then rise again (when user stops movement). To determine how it looks like, you can do some research by recording accelerometer data and performing desireg gestures.
Keep in mind, that accelerometer is pretty noisy - you will have to do some averaging over nearby values to get meaningful results.
I think that one workable approach to match gesture would be invariant moments (like Hu moments used to image recognition) - accelerometer vector over time defines 4 dimensional space, and you will need set of scaling / rotation invariant moments. Designing such set is not easy, but comptuing is not complicated.
After you got your moments, you may use standart techniques of matching vectors to clusters. ( see "moments" and "cluster" modules from our javaocr project: http://javaocr.svn.sourceforge.net/viewvc/javaocr/trunk/plugins/ )
PS: you may get away with just speed over time, which produces 2-Dimensional space and can be analysed with javaocr on the spot.
Not exactly what you are looking for:
Store orientation to an array - and compare
Tracking orientation works well. Perhaps you can do something similar with the accelerometer data (without any integration).
A similar question is Drawing in air with Android phone.
I am curious what other answers you will get.
Related
I want to detect a specific pattern of motion on an Android mobile phone, e.g. if I do five sit-stands.
[Note: I am currently detecting the motion but the motion in all direction is the same.]
What I need is:
I need to differentiate the motion downward, upward, forward and backward.
I need to find the height of the mobile phone from ground level (and the height of the person holding it).
Is there any sample project which has pattern motion detection implemented?
This isn't impossible, but it may not be extremely accurate, given that the accuracy of the accelerometer and gyroscopes in phones have improved a lot.
What your app will doing is taking sensor data, and doing a regression analysis.
1) You will need to build a model of data that you classify as five sit and stands. This could be done by asking the user to do five sit and stands, or by loading the app with a more fine-tuned model from data that you've collected beforehand. There may be tricks you could do, such as loading several models of people with different heights, and asking the user to submit their own height in the app, to use the best model.
2) When run, your app will be trying to fit the data from the sensors (Android has great libraries for this), to the model that you've made. Hopefully, when the user performs five sit-stands, he will generate a set of motion data similar enough to your definition of five sit-stands that your algorithm accepts it as such.
A lot of the work here is assembling and classifying your model, and playing with it until you get an acceptable accuracy. Focus on what makes a stand-sit unique to other up and down motions - For instance, there might be a telltale sign of extending the legs in the data, followed by a different shape for straightening up fully. Or, if you expect the phone to be in the pocket, you may not have a lot of rotational motion, so you can reject test sets that registered lots of change from the gyroscope.
It is impossible. You can recognize downward and upward comparing acceleration with main gravity force but how do you know is your phone is in the back pocket when you rise or just in your waving hand when you say hello? Was if 5 stand ups or 5 hellos?
Forward and backward are even more unpredictable. What is forward for upside-down phone? What if forward at all from phone point of view?
And ground level as well as height are completely out of measurement. Phone will move and produce accelerations in exact way for dwarf or giant - it more depends on person behavior or motionless then on height.
It's a topic of research and probably I'm way too late to post it here, but I'm foraging the literature anyway, so what?
All kind of machine learning approaches have been set on the issue, I'll mention some on the way. Andy Ng's MOOC on machine learning gives you an entry point to the field and into Matlab/Octave that you instantly can put to practice, it demystifies the monsters too ("Support vector machine").
I'd like to detect if somebody is drunk from phone acceleration and maybe angle, therefore I'm flirting with neuronal networks for the issue (they're good for every issue basically, if you can afford the hardware), since I don't want to assume pre-defined patterns to look for.
Your task could be approached pattern based it seems, an approach applied to classify golf play motions, dancing, behavioural every day walking patterns, and two times drunk driving detection where one addresses the issue of finding a base line for what actually is longitudinal motion as opposed to every other direction, which maybe could contribute to find the baselines you need, like what is the ground level.
It is a dense shrub of aspects and approaches, below just some more.
Lim e.a. 2009: Real-time End Point Detection Specialized for Acceleration Signal
He & Yin 2009: Activity Recognition from acceleration data Based on
Discrete Consine Transform and SVM
Dhoble e.a. 2012: Online Spatio-Temporal Pattern Recognition with Evolving Spiking Neural Networks utilising Address Event Representation, Rank Order, and Temporal Spike Learning
Panagiotakis e.a.: Temporal segmentation and seamless stitching of motion patterns for synthesizing novel animations of periodic dances
This one uses visual data, but walks you through a matlab implementation of a neuronal network classifier:
Symeonidis 2000: Hand Gesture Recognition Using Neural Networks
I do not necessarily agree with Alex's response. This is possible (although maybe not as accurate as you would like) using accelerometer, device rotation and ALOT of trial/error and data mining.
The way I see that this can work is by defining a specific way that the user holds the device (or the device is locked and positioned on the users' body). As they go through the motions the orientation combined with acceleration and time will determine what sort of motion is being performed. You will need to use class objects like OrientationEventListener, SensorEventListener, SensorManager, Sensor and various timers e.g. Runnables or TimerTasks.
From there, you need to gather a lot of data. Observe, record and study what the numbers are for doing specific actions, and then come up with a range of values that define each movement and sub-movements. What I mean by sub-movements is, maybe a situp has five parts:
1) Rest position where phone orientation is x-value at time x
2) Situp started where phone orientation is range of y-values at time y (greater than x)
3) Situp is at final position where phone orientation is range of z-values at time z (greater than y)
4) Situp is in rebound (the user is falling back down to the floor) where phone orientation is range of y-values at time v (greater than z)
5) Situp is back at rest position where phone orientation is x-value at time n (greatest and final time)
Add acceleration to this as well, because there are certain circumstances where acceleration can be assumed. For example, my hypothesis is that people perform the actual situp (steps 1-3 in my above breakdown) at a faster acceleration than when they are falling back. In general, most people fall slower because they cannot see what's behind them. That can also be used as an additional condition to determine the direction of the user. This is probably not true for all cases, however, which is why your data mining is necessary. Because I can also hypothesize that if someone has done many situps, that final situp is very slow and then they just collapse back down to rest position due to exhaustion. In this case the acceleration will be opposite of my initial hypothesis.
Lastly, check out Motion Sensors: http://developer.android.com/guide/topics/sensors/sensors_motion.html
All in all, it is really a numbers game combined with your own "guestimation". But you might be surprised at how well it works. Perhaps (hopefully) good enough for your purposes.
Good luck!
I'm trying to make an Android application that uses a smartphone moved along on a flat surface (e.g. a desk) as a mouse. Since I want to emulate a mouse, I ignore the z-axis, and figure that the best way to utilize the accelerometer data would be to construct a two dimensional vector that I could then scale to the size of the screen.
I've read other answers on SO and I see that the integration method has a large error as t increases, but I'm not sure if this error is a factor considering the short duration and position change of mouse movements (How long is the average mouse movement? I'd assume less than 2 sec.).
How would I go about designing an algorithm that meets my needs? Is an integration-based algorithm sufficient?
Yes, an accelerometer data have high mistake, that would create a large errors if we'll try to get absolute coordinates out of them. But a mouse needs no absolute coordinates. Relative ones are absolutely enough. Use your integration, not a doubt in it.
"the integration method has a large error as t increases" - correct, but a user is really interested in the last movement only. So, it will work as a mouse, and it will be felt as a normal mouse. How good the mouse will be, is up to the concrete device and the task. I am not at all sure about serious gaming, for example. You will have to do your own survey about it. But it will do really a very bad tablet/pen simulator.
Be careful about ignoring the Z axis, for notice, even for placing a point on the map GPS uses all three coordinates - for better precision. Often movements will not have Z change equal to 0. And simply ignoring one of the coordinates, instead of recounting all three of them into two you really need, will cause greater mistakes. I am not sure you can allow it. And you simply needn't - it is NOT a heavy algorithm, devouring much time and battery. And for a user the possibility to move the device in the air could bring much convenience - not everybody wants to scratch his device against a table. So, COUNT two coordinates from three source ones, but not simply GET two of the source ones, ignoring the third.
The problem will be elsewhere. When you use mouse and an error collected, you can raise the mouse up and move it to another point and start from it anew. You should realize something similar, too, for your device will collect errors in time as well.
I am trying to create an application that will track movement of the device in 2D space. After doing research online, all I could find that one way to do it is integrate linear acceleration twice but the error is horrible.
Are there any solutions to this problem? I would like to be able to move my phone up, which would cause a vertical line to be drawn on the screen, to scale of how far the phone was moved. Then if I move the phone to the left, horizontal line would be drawn - effectively allowing me to draw on the screen using movements of the phone.
Can this be done at all? If so, what direction should I take in the development? I don't know where to start...
EDIT: More about the project:
I am trying to make an exercise app that will track the movement of the leg/arm: for example, when you are doing stomach crunches and the phone is attached with an armstrap to your ankle.
The app would track repeated movements of the leg.
Unfortunately the accelerometers in these phones are nowhere near what you need to implement an inertial measurement unit. The big problem is since you are integrating twice an integration always comes with a constant integral(x,dx) = x^2/2 +c this constant is what makes this difficult. To make things worse you get it twice, once when integrating to get velocity and once to get position.
One method of fixing this that I have seen in commercial innertial measurement units is called a zero velocity null, this is where you use some other source of data to tell it when you have stopped the motion of the device so you can zero out the velocity. For example I saw a project put an inertial measurement unit on a shoe and it would zero the velocity whenever it detected the shoe being put on the ground which vastly improved the accuracy. Its possible that you could use a camera or something to determine this, however I have not seen it done. If you would like to start messing with this then you are an awesome person and I would love to hear how it turns out.
Edit: I should clarify that the constant I mention above is where the error accumulates. If you can zero velocity null it then you periodically drop the accumulated error from your stored current velocity. The error in position will still accumulate, however this would make it not drift when they are holding it relatively still which may make it passable for drawing.
I know no other way other than integrating the acceleration twice.
Moreover I think that it's not possible if you don't have knowledge about other sensors that might be in your device (for example on one of my devices I have 7 (seven) sensors related to various physical signals the device might be receiving).
Other than that remember that the sensor data is noisy and almost always must be pre-filtered. For example you can use geometric mean of last 10 samples. That should lower your error by providing a smoother input data to the integrating function.
I'm building an application for Android devices that requires it to recognize, by accelerometer data, the difference between walking noise and double tapping it. I'm trying to solve this problem using Neural Networks.
At the start it went pretty well, teaching it to recognize the taps from noise such as standing up/ sitting down and walking around at a slower pace. But when it came to normal walking it never seemed to learn even though I fed it with a large proportion of noise data.
My question: Are there any serious flaws in my approach? Is the problem based on lack of data?
The network
I've choosen a 25 input 1 output multi-layer perceptron, which I am training with backpropagation. The input is the changes in acceleration every 20ms and output ranges from -1 (for no-tap) to 1 (for tap). I've tried pretty much every constallation of hidden inputs there are, but had most luck with 3 - 10.
I'm using Neuroph's easyNeurons for the training and exporting to Java.
The data
My total training data is about 50 pieces double taps and about 3k noise. But I've also tried to train it with proportional amounts of noise to double taps.
The data looks like this (ranges from +10 to -10):
Sitting double taps:
Fast walking:
So to reiterate my questions: Are there any serious flaws in my approach here? Do I need more data for it to recognize the difference between walking and double tapping? Any other tips?
Update
Ok so after much adjusting we've boiled the essential problem down to being able to recognize double taps while taking a brisk walk. Sitting and regular (in-house) walking we can solve pretty good.
Brisk walk
So this is some test data of me first walking then stopping, standing still, then walking and doing 5 double taps while I'm walking.
If anyone is interested in the raw data, I linked it for the latest (brisk walk) data here
Do you insist on using a neural network? If not, here is an idea:
Take a window of 0.5 seconds and consider the area under the curve (or since your signal is discrete, the sum of the absolute values of each sensor reading-- the red area in the attached image). You will probably find that that sum is high when the user is walking and much much lower when they are sitting and/or tapping. You can set a threshold above which you consider a given window to be taken while the user is walking. Alternatively, since you have labelled data, you can train any binary classifier to differentiate between walking and not walking.
You can probably improve your system by considering other features of the signal, such as how jagged the line is. If the phone is sitting on a table, the line will be almost flat. If the user is typing, the line will be kind of flat, and you will see a spike every now and then. If they are walking, you will see something like a sine wave.
Have you considered that the "fast walking" and "fast walking + double tapping" signals might be too similar to differentiate using only accelerometer data? It may simply not be possible to achieve accuracy above a certain amount.
Otherwise, neural networks are probably a good choice for your data, and it still may be possible to get better performance out of them.
This very-useful paper (http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) recommends that you whiten your dataset so that it has a mean of zero and unit covariance.
Also, since your problem is a classification problem, you should make sure that you are training your network using a cross-entropy criteria (http://arxiv.org/pdf/1103.0398v1.pdf ) rather than RMSE. (I have no idea whether Neuroph supports cross-entropy or not.)
Another relatively simple thing you could try, as other posters suggested, is transforming your data. Using an FFT or DCT to transform your data to the frequency domain is relatively standard for time-series classification.
You could also try training networks on different sized windows and averaging the results.
If you want to try some more difficult NN architectures, you could look at the Time-Delay-Neural-Network (just google this for the paper), which takes multiple windows into account in its structure. It should be relatively straightforward to use one of the Torch libraries (http://www.torch.ch/) to implement this, but it might be hard to export the network to an Android environment.
Finally, another method of getting better classification performance in time-series data is to consider the relationships between adjacent labels. Conditional Neural Fields (http://code.google.com/p/cnf/ - note:I have never used this code) do this by integrating neural networks into conditional random fields, and, depending on the patterns of behavior in your actual data, may do a better job.
What probably would work is to filter the data using a Fourier transform first. Walking has a sinus like amplitude, your double taps would stand-out in the transform-result as a different frequency. I guess a neural network can than determine if the data contains your double tabs because it has the extra frequency (the double tabs frequency). Some questions remain:
How long the sample of data needs to be?
Can your phone do all the work it needs to do, does it have enough processing power?
You might even want to consider using the GPU for this.
Another option is to use the Fourier output and some good old Fuzzy Logic.
This sound like fun...
I am developing simple mobile app for iPhone and Android platform and I am looking for algorithms that would allow me to trigger certain events (functions) when we detect a certain gesture using internal accelerometer. I work with Phonegap that utilizes HTML5 and javascript which reads three coordinates (x,y and z) from accelerometer on pre-set interval (e.g. every 0.04 sec.).
I wrote a simple function that detects a shaking motion and it works quite fine but it is primitive (it only detects shaking, not the direction) - and I want to detect some other gestures such as:
- tilt (to the left/right)
- shake up/down
- shake left/right
- circular motion
- turn upside down
- etc....
Does anybody have algorithms (or at least mathematical formulas/functions) that can calculate (detect) this kind of gestures based on input values I have (x,y,z and time interval for each call)?
I am looking for any code in any programming language (I will rewrite it to javascript myself. Thanks in advance!
Dynamic Time Warping (DTW) does a good job, however I would recommend using Fast Dynamic Time Warping (Fast DTW). Especially for mobile scenarios, FastDTW is really applicable!
For a detailed version, take a look at this research paper: http://cs.fit.edu/~pkc/papers/tdm04.pdf
Edit: Some time ago, I wrote my thesis about 3D gestures for controlling devices in a smart-home setting. See it in action here (there is a link to the PDF, too). I used FastDTW for recognizing gestures on an iPhone.
You might want to try dynamic time warping. An illustrative example is here.
If I may be so bold, Fast DTW (and the related, but different FTW of Sakurai and Faloutsos) are not good solutions.
If you constrain the warping (a):
Then using a lower bound DTW is as fast as euclidean distance [b][c]
The accuracy will improve.(a,b)
For constrained warping FTW and Fast DTW are slower than brute force due to overhead (Ira Assent among others have shown this).
a) Ratanamahatana, C. A. and Keogh. E. (2004). Everything you know about Dynamic Time Warping is Wrong
b) Xiaoyue Wang, Hui Ding, Goce Trajcevski, Peter Scheuermann, Eamonn J. Keogh: Experimental Comparison of Representation Methods and Distance Measures for Time Series Data CoRR abs/1012.2789: (2010)
c) http://www.cs.ucr.edu/~eamonn/LB_Keogh.htm