I'm thinking of starting a android project, which records audio signals and does some processing to denoise. My quesion is, as many (nearly all) denoising algorithms involve FFT, is it possible for me to do a real-time program? By real-time I mean the program do recording and processing at the same time, so I could save my time when I finish recording.
I have made a sample project, which applies fourier transformation to the audio signal and implement a simple algorithm called sub-spectrum. But I found that it is difficult to implement this algorithm in real time, which means after I press the 'stop' button, it takes me a while to do the processing and save the file (I'm also wondering how do these commercial recorder programs record sound and at the same time save it). I know that my FFT may not be the fastest, but I'd like to know whether I could achieve 'real-time', if I fully optimized it or use the fastest FFT code? Thanks a lot!
It sounds like you are talking about broadband denoising. So I'll address my question to that. There are other kinds of denoising, from simple filtering to adaptive filtering to dynamic range expanding and probably others.
I don't think anyone can answer this question with a simple yes or no. You will have to try it and see what can be done.
First off, there are a variety of FFT implementations, including FFTW, of varying speed you could try. Some are faster than others, but at the end of the day they are all going to deliver comparable results.
This is one place where native C/C++ will outperform Java/Dalvik code because it can truly take advantage of vector code. For that to work, you'll probably need to write some assembler, or find some code that is already android optimized. I'm not aware of an android optimized FFT, but I'm sure it exists.
The real performance win will come from how you structure your overall denoising algorithm. All denoising I'm familiar with is extremely processor intensive and probably won't work on a phone in real-time, although it might on a tablet. That's just a(n educated) guess, though.
Related
I am working on one voice messaging application, I need to compare two voice like,
Register with app by record your voice
Sent voice message to
another user by record voice, but first need to compare this voice
to recorded voice in profile.
Its for security purpose and need to know recorded message is from specific user or not.
I tried :
Compare two sound in Android
http://www.dreamincode.net/forums/topic/274280-using-fft-to-compare-two-audio-files-and-then-realtime-comparison/
But not getting idea about voice Comparison.
Please share if anybody know about the same. Didn't find any sample to do this.
Since you indicated it's for security purpose, I'd like to first share a few things on voice biometry :-)
The problem with authenticating someone is that you'd need to be sure he was actually there saying the things that were recorded... and that's a whole different level of complexity than merely comparing voice characteristics.
Algorithms extracting voice features from a sample and later calculating the distance between a new sample and the first one can easily be fooled by a recording made up by an attacker.
Since in your case there's a human recipient, creating a message made up of chopped words or sentences from random conversations is actually quite difficult and time consuming. But not completely impossible...
There are very good sounding softwares created for the music industry that will e.g. take some voice audio input and make it sound (intonation and time wise) like a second audio sample (a guide, made by the fraudster). Vocalign Pro by SynchroArts does this to help get perfect backing vocal tracks. You could further tweak the audio by hand using other voice editing softwares and achieve an acceptable level of quality that wouldn't be immediately detected by the recipient.
Depending on what the attacker wants your user to say, the process complexity could range from an hour to a day provided he has all the recording material he wants...
To fight against this type of attack, you need to detect the audio sample has been edited. The digital edition will leave unnatural traces. E.g. in the background noise surrounding the voice.
AFAICT, only the best commercial softwares achieve this level security check, but I can't tell how far they go in the detection of such edits.
From a pure security perspective, you'd also need to be sure the device was not compromised. So these voice verification check should happen server side and not on the phone itself.
Please note these are general considerations and it all depends on what sort of security measures you actually need for your use case. My car alarm is certainly not unbreakable but it helps raising the bar so fewer attackers could potentially steal it...
Another thing to consider is that biometry is by definition a statistical process and it will yield a certain percentage of false positives and false negatives. By changing the acceptance threshold, you'll be able to lower one of them at the cost of raising the other.
Selecting an appropriate threshold will require you to have a fair amount of test data. Say 1 minute recording of at least 200 speakers to start getting a picture.
One more thing I think you'll need to consider is the inherent variability of the human voice. People may be sick which in some cases might render the voice unrecognizable. Also the emotional state might play a role: sadness or anger will yield different sounding voices...
And last but not least, the surrounding noise might pose a problem. Say the user enrolled while at home and later records a message while on the go in a busy city environment, the system might have troubles making sure it's actually the same person speaking. The signal to noise ratio is definitely going to be one of your main issues. Small tip: depending on the distance of the microphone to the mouth, the ratio will be quite different. You'll get way better result when the user puts the phone close to its face like in a regular phone conversation than when the user looks at the screen while recording the message.
Voice variability and signal to noise ratio are probably the main reasons behind false negative results.
Hopefully, you now have a better understanding of the challenges awaiting you and I can start sharing some pointers for open source and commercial libraries.
AFAIK, there are no open source libraries that includes fraudster detection...
You may want to check Nuance Communication for state-of-the-art. There are plenty other vendors, just check with Google, I only mentioned Nuance because of it's reputation.
There is an OSS library called Alize (written in C++, under LGPL license) which uses an algorithm called MFCC (Mel Frequency Cepstrum Coefficients). MFCC is known to bring excellent results. Expect a steep learning curve as this software is aimed at researchers willing to improve the state-of-the-art on this topic and the vocabulary used is very specific.
I wrote an OSS library named Recognito (Java, Apache 2.0) aimed at regular developers so you should be able to test it in a matter of minutes. The lib is very young and I first focused on it's API before improving the algorithms. The algorithm I use for the moment is called Linear Predictive Coding (LPC) and is known to bring good results (and I do have good results, provided recordings yield the same level of quality :-)). I'm currently in the process of releasing a new version including a likelihood coefficient in the match results. MFCC implementation is on the road map.
There is plenty of javadoc and the code should be very straightforward...
https://github.com/amaurycrickx/recognito
Recognito has a dependency on javax.sound packages for audio file handling. You may want to check this post for what it takes to use it in Android: Voice matching in android
Given many people need something for android, I'll do something about it in the near future instead of saying how one should modify the lib :-)
HTH
I am working on one voice messaging application, I need to compare two voice like,
Register with app by record your voice
Sent voice message to
another user by record voice, but first need to compare this voice
to recorded voice in profile.
Its for security purpose and need to know recorded message is from specific user or not.
I tried :
Compare two sound in Android
http://www.dreamincode.net/forums/topic/274280-using-fft-to-compare-two-audio-files-and-then-realtime-comparison/
But not getting idea about voice Comparison.
Please share if anybody know about the same. Didn't find any sample to do this.
Since you indicated it's for security purpose, I'd like to first share a few things on voice biometry :-)
The problem with authenticating someone is that you'd need to be sure he was actually there saying the things that were recorded... and that's a whole different level of complexity than merely comparing voice characteristics.
Algorithms extracting voice features from a sample and later calculating the distance between a new sample and the first one can easily be fooled by a recording made up by an attacker.
Since in your case there's a human recipient, creating a message made up of chopped words or sentences from random conversations is actually quite difficult and time consuming. But not completely impossible...
There are very good sounding softwares created for the music industry that will e.g. take some voice audio input and make it sound (intonation and time wise) like a second audio sample (a guide, made by the fraudster). Vocalign Pro by SynchroArts does this to help get perfect backing vocal tracks. You could further tweak the audio by hand using other voice editing softwares and achieve an acceptable level of quality that wouldn't be immediately detected by the recipient.
Depending on what the attacker wants your user to say, the process complexity could range from an hour to a day provided he has all the recording material he wants...
To fight against this type of attack, you need to detect the audio sample has been edited. The digital edition will leave unnatural traces. E.g. in the background noise surrounding the voice.
AFAICT, only the best commercial softwares achieve this level security check, but I can't tell how far they go in the detection of such edits.
From a pure security perspective, you'd also need to be sure the device was not compromised. So these voice verification check should happen server side and not on the phone itself.
Please note these are general considerations and it all depends on what sort of security measures you actually need for your use case. My car alarm is certainly not unbreakable but it helps raising the bar so fewer attackers could potentially steal it...
Another thing to consider is that biometry is by definition a statistical process and it will yield a certain percentage of false positives and false negatives. By changing the acceptance threshold, you'll be able to lower one of them at the cost of raising the other.
Selecting an appropriate threshold will require you to have a fair amount of test data. Say 1 minute recording of at least 200 speakers to start getting a picture.
One more thing I think you'll need to consider is the inherent variability of the human voice. People may be sick which in some cases might render the voice unrecognizable. Also the emotional state might play a role: sadness or anger will yield different sounding voices...
And last but not least, the surrounding noise might pose a problem. Say the user enrolled while at home and later records a message while on the go in a busy city environment, the system might have troubles making sure it's actually the same person speaking. The signal to noise ratio is definitely going to be one of your main issues. Small tip: depending on the distance of the microphone to the mouth, the ratio will be quite different. You'll get way better result when the user puts the phone close to its face like in a regular phone conversation than when the user looks at the screen while recording the message.
Voice variability and signal to noise ratio are probably the main reasons behind false negative results.
Hopefully, you now have a better understanding of the challenges awaiting you and I can start sharing some pointers for open source and commercial libraries.
AFAIK, there are no open source libraries that includes fraudster detection...
You may want to check Nuance Communication for state-of-the-art. There are plenty other vendors, just check with Google, I only mentioned Nuance because of it's reputation.
There is an OSS library called Alize (written in C++, under LGPL license) which uses an algorithm called MFCC (Mel Frequency Cepstrum Coefficients). MFCC is known to bring excellent results. Expect a steep learning curve as this software is aimed at researchers willing to improve the state-of-the-art on this topic and the vocabulary used is very specific.
I wrote an OSS library named Recognito (Java, Apache 2.0) aimed at regular developers so you should be able to test it in a matter of minutes. The lib is very young and I first focused on it's API before improving the algorithms. The algorithm I use for the moment is called Linear Predictive Coding (LPC) and is known to bring good results (and I do have good results, provided recordings yield the same level of quality :-)). I'm currently in the process of releasing a new version including a likelihood coefficient in the match results. MFCC implementation is on the road map.
There is plenty of javadoc and the code should be very straightforward...
https://github.com/amaurycrickx/recognito
Recognito has a dependency on javax.sound packages for audio file handling. You may want to check this post for what it takes to use it in Android: Voice matching in android
Given many people need something for android, I'll do something about it in the near future instead of saying how one should modify the lib :-)
HTH
I am trying to animate an object (mouth, eyebrow and some other expressions) to move in accordance to incoming sound. I was thinking to read sound modulation and detect changes and to animate movement of objects.
Is this a good approach?
If not, how should I approach to coding such feature? Does this have to be done in OpenGL or I can use Android SDK and its animations?
I assume you mean Frequency Modulation, as opposed to Amplitude Modulation? Perhaps a combination of both? That might be pretty neat. I don't think there's any reason NOT do to this...
As to whether its a "good" approach or not? I think it might be pretty cool.
The android SDK has pretty robust animation built in at this point. You should be able to do the animations using the SDK just fine. If you get to the point where you really want to go wild with it you might need to step down a layer for speed concerns.
Look at:
https://stackoverflow.com/questions/17163446/what-is-the-best-2d-game-engine-for-android/17166794#17166794
For a pretty robust coverage of 2d Games in Android, which might point you in the right direciton.
Very interesting idea. I think there are two parts to the problem: input (sound) and output (graphics). For the graphics side, I don't think you need OpenGL per se. You could check out this guide to game development: http://www.techrepublic.com/blog/software-engineer/the-abcs-of-android-game-development-prepare-the-canvas/2157/
I think it is relevant because it deals with moving graphics in real time.
For the sound, it would be nice to have an integer based on the frequency and modulation of the sound. For analyzing the sound, perhaps this library could help you out: http://code.google.com/p/musicg/
It can:
- Read amplitude-time domain data
- Read frequency-time domain data
Then, you could progress through the amplitude data of a sound in real time and update the graphics accordingly.
I am taking this crazy class on Moble Programming. We have to do a final project and I would like to do some sort simple guitar processor app.
I wanted to do this in IOS, but it seems like the learning curve for IOS is to impractical for a short class.
No offense to anyone but Droid is easier to program, at least to me, but I am confused if you can even get guitar input from a jack (not mic) and then do some processing on the input and feed it to the output.
I'm aware of latency, which may or may not be a big deal for a class.
Does anyone know if Droid can do anything like this? If so any articles or somewhere to start? I know with IOS you can at least buy a jack and it seems to have tons of open source processing code, but I can't seem to find anything for Droid. All I have seen is "Ghetto Amp" for guitar stuff.
Any ideas?
Thanks
You may want to look at this project:
http://code.google.com/p/moonblink/wiki/Audalyzer
should be pretty useful :)
However the core class you will be using to pick up and look at audio streams is: http://developer.android.com/reference/android/net/rtp/AudioStream.html
I wrote a MIDI guitar for a college project a long time ago, in assembly for a Texas Instruments DSP. As long as you just played exactly one note, and were really careful about it, it could tell what you'd played.
Not much amplification was needed. In fact, I could get some notes even on an unamplified signal. I had oscilloscopes and a pretty generalized ADC to work with, you might have to amplify the signal...but if you do, be careful not to fry your input. Start low...and really, the more you can read up on the tolerances the better.
Looks like they never made any hi-fi micro-USB 24-bit ADCs or wrote drivers for them. I guess there's no market. :) But if you're doing a school project and not producing the latest Muse album, get a path from your guitar to the headset line in:
http://androidforums.com/android-media/194740-questions-about-audio-recording-droid.html
I'd probably just sacrifice a cheap or broken headset to get the headset plug. ( Maybe they sell appropriate tips at Radio Shack but I've learned not to assume such things anymore :-/ ) After building a cable I'd I'd feed it an amplified signal from the guitar so I could control the gain level to whatever I wanted.
Depending on latency requirements you can use Java or NDK. Note this answer:
Need help about sound processing
(I have one of the original Droids sitting around in a drawer, I'm sure I could use it for something but I just haven't figured out what!)
I made test application in Delphi that beeps morse code using Windows API Beep function. Then made an application in Android that stores this morse code in WAV file. Now I want Android application to decode the morse code. Is there some tutorials for sound processing or can somebody post some simple code (think there's no simplicity here) for an example? Or maybe steps that I need to do to get it work?
I also downloaded the JTransforms and jfttw libraries but don't really know where to start.
Regards,
evilone
An FFT is overkill for this - you can just use a simple Goertzel filter to isolate the morse code from background noise, then decode the output of this.
I think an older issues of QST magazine had an article on DSP for Morse/CW decoding several years back. Might want to try and search their archives.
Basically, you need DSP code to determine whether or not a tone is present at any given point in time, and an estimate of the onset and off-time of each tone. Then scale the duration of each tone and the gap times between the tones for the expected code speed, and compare against a table of timings for each Morse code letter to estimate the probability of each or any letter being present.
In the simplest case, you might have a dot-dash-space decision tree. In severe noise and fading plus highly personalized fist/timing you might need some sophisticated statistical and/or adaptive audio pattern matching techniques for decent results.