I want to record a dog bark, save the file and compare with several files containing different types of bark (warning bark, crying bark, etc..).
How could i do that comparison in order to get a match? What is the process to follow in this type of apps?
Thank you for the tips.
There is no simple answer to your problem. However, for starters, you might look into how audio fingerprinting works. This paper is an excellent start written by the creators of shazam:
http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
I'm not sure how well that approach would work for dog barking, but there are some concepts there that might prove useful.
Another thing to look into is how the FFT works. Here's a tutorial with code that I wrote for pitch tracking, which is one way to use the FFT. You are looking more at how the tone and pitch interact with the formant structure of a given dog. So parameters you'll want to derive might include fundamental pitch (which, alone, might be enough to distinguish whining from other kinds of barks), and ratio of fundamental pitch to higher harmonics, which would help identify how agressive the bark is (I'm guessing a bit here):
http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
Finally, you might want to do some research into basic speech recognition and speech processing, as there will be some overlap. Wikipedia will probably be enough to get you started.
EDIT: oh, also, once you've identified some parameters to use for comparison, you'll need a way to compare your multiple parameters to your database of sounds with multiple parameters. I don't think the techniques in the shazam article will work. One thing you could try is Logistic Regression. There are other options, but this is probably the simplest.
I'd check out Google's open source lib musicg API: http://code.google.com/p/musicg/
It's Java so it works in Android and it gives similarity metrics for two audio files.
But it's compatible only with .wav files.
Related
I'm writing an Android game using LibGDX and Box2D. I'm planning on adding a turn-based multiplayer feature to it.
Now, if on both clients I step the Box2D world at the same rate with the same time steps and I start a simulation on both clients with the exact same initial parameters, when the simulations are over, will the final state of both simulations be exactly the same? In other words, is a Box2D simulation perfectly deterministic?
If it's not, then that means every time a simulation is over, one client acting as a host will have to tell the other to throw away its final simulation's results and use its instead.
Official FAQ quote
The official FAQ had a quote that confirms what you deduced http://web.archive.org/web/20160131050402/https://github.com/erincatto/Box2D/wiki/FAQ#is-box2d-deterministic:
Is Box2D deterministic?
For the same input, and same binary, Box2D will reproduce any simulation. Box2D does not use any random numbers nor base any computation on random events (such as timers, etc).
However, people often want more stringent determinism. People often want to know if Box2D can produce identical results on different binaries and on different platforms. The answer is no. The reason for this answer has to do with how floating point math is implemented in many compilers and processors. I recommend reading this article if you are curious: http://www.yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html
Or in other words: Fixed-size floating point types
Why the wiki was deleted, I do not know. Humans. I'm glad he lowercased the project name though.
After looking around, the answer is "No", even if the same time steps are used! The reason for this answer has to do with how floating point math is implemented in many compilers and processors. Small discrepancies on each cycle add up resulting in significantly different simulations.
I managed to make Box2D deterministic for an experiment but it was not pretty. The way b2Body::GetTransform()/SetTransform() works does not allow reading the transform and then setting it back to the exact same values. I also had to delete and re-create the contact list for each body every frame. It would be possible to fix these cleanly and more efficiently but it would add enough overhead it would be hard to get the change merged.
I have seen lot of questions with this topic and read alot of articles but still cant find the best sloution for what I am looking for.
I want to build an app (Android/IOS/...whatever) which has this feature:
when the user write down a text (using killboard), the app will can recognize speech to text on what he wrote with 99.9% performance, I dont mind if he would have to record his voise first to make performance better... I want it to be "live" like Google Servies unlike Seri that writes the texts only after you finish talking.
I have found this site:
http://cmusphinx.sourceforge.net
and I wish to start working with it but before start I wanted to make sure it is the best way.
can anyone give some advises?
thanks
*edit: I dont care to build a new field for a new launguage if needed (its not in english).
I mean like, if you do some research you'll see that 99% accuracy in speech-to-text is only a very recent thing, and an example is Nuance's Dragon.
High accuracy speech-to-text can cost around $600 for a license. It's not an easy thing to create. You have to pay for high accuracy TTS libraries.
For what you're doing though, a really good service I have used is Wit.ai. It's very accurate, and its getting faster every week.
Another possibility for you might be the AT&T speech engine (Watson) found here: http://developer.att.com/
They offer 1 million API calls per month for a fee(low) and allow you to customize the "library" you use to recognize speech. It might be what you are looking for given your latest statements. You can try it for free though it is throttled until you pay.
I'm writing a game for Google Glass, but unfortunately SpeechRecognizer API isn't available on the current builds on Google Glass GDK.
So I've been thinking about implementing an algorithm for a very simple voice recognition.
Let's say I want to recognize only: "Yes" and "No".
Do you know any example code or any helpful resources to help me in implementing this ?
Is it so hard that I should drop the idea and go with big frameworks like CMUSphinx ?
What about recognizing: up, down, right, left or numbers from 1 to 10 ?
As I know, there often used transition to the frequency domain by fast Fourier transform (FFT) and it analyzing. Also need some dictionary of speeched words for frequency correlation.
Please see this links:
CMU Sphinx have java implementation.
David Wagner have a good article and matlab implementation.
P.S. Ohh, if you speak in russian, why you don't read this article - very simple, with java examples.
P.P.S. Honestly, I never use this framework, but if you have only a superficial knowledge about speech recognition, robust and easyest way is to use existing complete solutions like frameworks or libraries, otherwise you need spend time to possess the necessary knowledge threshold. In this case you can read this article.
I'm implementing a face tracker on Android, and as a literature study, would like to identify the underlying technique of Android's FaceDetector.
Simply put: I want to understand how the android.media.FaceDetector classifier works.
A brief Google search didn't yield anything informative, so I thought I'd take a look at the code.
By looking at the Java source code, FaceDetector.java, there isn't much to be learned: FaceDetector is simply a class that is provided the image dimensions and number of faces, then returns an array of faces.
The Android source contains the JNI code for this class. I followed through the function calls, where, reduced to the bare essentials, I learned:
The "FaceFinder" is created in FaceFinder.c:75
On line 90, bbs_MemSeg_alloc returns a btk_HFaceFinder object (which contains the function to actually find faces), essentially copying it the hsdkA->contextE.memTblE.espArrE array of the original btk_HSDK object initialized within initialize() (FaceDetector_jni.cpp:145) by btk_SDK_create()
It appears that a maze of functions provide each other with pointers and instances of btk_HSDK, but nowhere can I find a concrete instantiation of sdk->contextE.memTblE.espArrE[0] that supposedly contains the magic.
What I have discovered, is a little clue: the JNI code references a FFTEm library that I can't find the source code for. By the looks of it, however, FFT is Fast Fourier Transform, which is probably used together with a pre-trained neural network. The only literature I can find that aligns with this theory is a paper by Ben-Yacoub et al.
I don't even really know if I'm set on the right path, so any suggestions at all would undoubtedly help.
Edit: I've added a +100 bounty for anybody who can give any insight.
I Found a couple of links too...Not sure if it would help you...
http://code.google.com/p/android-playground-erdao/source/browse/#svn/trunk/SnapFace
http://code.google.com/p/jjil/
http://benosteen.wordpress.com/2010/03/03/face-recognition-much-easier-than-expected/
I'm on a phone, so can't respond extensively, but Google keywords "neven vision algorithm" pull up some useful papers...
Also, US patent 6222939 is related.
Possibly also some of the links on http://peterwilliams97.blogspot.com/2008/09/google-picasa-to-have-face-recognition.html might be handy...
have a look at this:
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1562271
I think I once saw some matlab code doing this in a presentation.
Maybe it's somewhere online.
Greetings,
Lars
I'd like to make an Android app that lets a user apply cool effects to photos taken with the camera. There are already a few out there, I know, but I'd like to try my own hand at one.
I have been googling and stack-overflowing, but so far I've mostly found some references to published papers or books. I am ordering this one from Amazon presently - Digital Image Processing: An Algorithmic Introduction using Java
After some reading, I think I have a basic understanding of manipulating the RGB values for all the pixels in the image. My main question is how do I come up with a transformation that produces cool effects?
By cool effects I mean some like those in these iPhone apps:
ToyCamera
Polarize
I already have quite a bit of experience with Java, and I've made my first app for android already. Any ideas? Thanks in advance.
There are specific classes to do this for you!
Take a look here for grayscale image processing
Here is something in C# which is similar enough to java that you should get the idea.
If you want to do something unique you might have to experiment with the tweaking of the RGB ratios yourself.