processing human voice - android

I am trying to make an android app that checks whether the recorded voice of a person is of high frequency or not.I have completed till the recording part but don't know how to proceed further.
After searching I found that FFT algorithm must be used but the problem is how to get the array values that must be passed as the input to the algorithm.
Can anyone help please?

Assuming you have defined what is meant by "contains high frequency", and you merely need a measure of this (no need to visualize the frequency content in a graph), there is really no need to calculate the FFT.
I would calculate the RMS values of the signal (a measure of the total energy), then apply a low-pass filter on the data (in the time domain) and calculate the RMS values again on the filtered signal. Comparing the loss of energy is your measure of how much high frequency content was responsible for your initial energy value.
REPLY TO COMMENT:
You need data in order to process it! Perhaps I dont understand your question? of what do you wish to "get exact values of" You have stated you "completed the recording part" so i assume you have the signal stored in memory, now you need to calculate the total energy of the signal in order to either A) calculate change of energy after filtering or B) compare energy to some predefined hardcoded value (bad idea btw).
Either way, this should be done in the time-domain if all you want is a measure/value. As stated by Parseval's theorem, there is no need to perform cpu intensive processing and go over to the frequency domain to calculate the energy of a signal. http://en.wikipedia.org/wiki/Parseval's_theorem
ELABORATION:
When you record the user's voice (collect data for your signal) you need to ensure the data is not lost and is properly stored in memory (in some array-type object) and that you have a reference to this array. Once the data is collected, you dont need to convert your signal into values, it is already stored as a sequence of values. Therefore, you are now ready to perform some calculation in order to get a measure of "how much high frequencies there are"...
The RMS (root mean square) value is a standardized way of measuring the total energy of a signal - you take the "square-root of the average of all values squared". See http://mathworld.wolfram.com/Root-Mean-Square.html
The RMS is quick and easy to calculate, but it gives you the energy of the total signal, low frequency components and high frequency components together and there is no way of knowing if a high RMS value is due to alot of high frequency components or low frequency components. Therefore, I suggest, you remove the high frequency components and calculate the RMS value again to see how much the total energy changed in doing so, ie. how much the high frequencies was responsible for the initial "raw" RMS value. Dividing the two values is your high frequency ratio measure... Im not sure this is what you want to do, but its what I would do.
In order perform low pass filtering you need to pick a frequency value Fcut and say anything over this is considered "high", then apply a low pass filter with the cut off point set to Fcut, applying a filter is done in the time domain by means of convolution.

Usually they use AudioRecord class. It writes raw PCM data then they can do some calculations on the data.

Related

How to compare audio in android?

I'm making an application in which I record a direct audio from the microphone of the cell phone, I save that recording and I need to compare it with some audio already stored in the device
The audios are of "noise" of motors, the idea is that from the recorded recording it indicates us to which case of the saved ones it seems
that is, I have two cases, a good engine and a damaged engine, when I finish recording it must say "this audio belongs to a damaged engine"
Reading I find that it has to be done through artificial intelligence, which is really complex, I have read that you can "decompose" the audio into a vector of numbers or make comparisons by FFT, however I do not find much information about it, really I'd appreciate your help.
the file type saved is .wav
It's nontrivial task to compare audio signals.
The audio is just a sequence of values (numbers) where index is just a "time" and value is a loudness of sound (amplitude).
If you compare audio data like two arrays (sequences) element by element, iterating through the index - it will be luck to get something reasonable. Though you need some transformation of this array to get aggregated info about this sequence of numbers as a whole (for example - spectre of signal).
There are some mathematical tools for this task, for example, mentioned by you well-known Fourier Transform and statistical tool Autocorrelation (it finds "kindness" of sequence of numbers).
The autocorrelation method can be relatively simple - you just iterate comparing arrays of data and calculate the autocorrelation. But you will pay for simplicity in case of initial quality (or preparation/normalization) of signals - they should have similar duration. The value of resulted correlation function will show how differ two sequences, i.e. 0 - is absolutely different and 1 - is almost the same.
To implement Fourier Transform (FFT) is not a problem too, you could take well described algo and implement it itself on any language without using third party libs. It does the job very well.
FT will help you get a spectrum of the signal i.e. another set of values: set of amplitudes per frequency (roughly, frequency as array index instead of time in case of input raw signal) and now you can compare this given spectrums almost like two arrays iterating through an index (frequency) and then decide on their similarity - calculate deltas and see whether it hit into some acceptance interval (or you can use more correct statistical methods e.g. correlation function).
As for noised signal, the noise is usually subtracted from the given data set (but here you should know the sort of noise type).
It is all related to signal processing area and if you're working on such project you need to learn more about this.
Bonus: a book for example

Using raw GNSS Measurement could one potentially increase sample rates?

I'm looking for ways to increase the position sample rate using an android phone. How to get a higher sample rate has been asked before about once a year here at SO.
But my question is more specific. By using the new Raw GNSS Measurements would it be possible to get a higher sample rate if I use the raw data and calculate the position myself?
Maybe I have misunderstood the point of the raw GNSS data, but in my ignorance I'm thinking that a phone like the Pixel 2 which supports data from GPS, GLONASS, GALILEO, BeiDou & QZSS should theoretically get the data much more frequent than 1Hz. But the chip it self only calculates and send positions to the system at a 1Hz sample rate.
But since there is the raw data from five positioning systems it should be possible to not only get a higher sample rate but also more accuracy!?!?!?
So my question is if its possible, using the raw data, to get higher sample rates and better accuracy? Reading through the page above doesn't suggest much about it and Raw positioning data is not a specialty of mine.
The interval for raw GPS updates really depends on the internal GPS receiver's capabilities. No matter what feature Android provides, it can't invent more samples than the receiver provides.
Secondly, by supporting multiple satellite constellations, there is a higher chance that you will obtain a 3D fix - because there is more to choose from by the receiver - but that is not guaranteed. For example, if you are driving in downtown Manhattan N.Y., being surrounded by tall buildings will reduce satellite visibility across the board. Combining low precision samples from multiple constellations to generate high precision data would be quite complicated (I won't say impossible though).
I don't know if modern receivers perform this sort of combination, so I typically do not assume they do. And relegating this complex computation to your application - via Raw GNSS measurements - would be an interesting experiment...

Extract only high frequency from FFT

I am trying to do FFT and extract high frequency features on smart phones. It turns out too slow to do a full FFT on 44100HZ sampled data on smart phones, but downsampling it will kill high frequency information because of Nyquist Theorem. Is there a way to speed up the FFT while retaining the higher frequencies?
It is not clear if you want to use the FFT information or if it is just a way to implement some filter.
For the first case you can subsample the data, i.e., run a highpass filter and then compress (downsample) the sequence. Yes, there will be aliasing, but you can still map particular frequencies from the FFT back to the original higher frequencies.
If it is filtering, the filter should be reasonable long before you get any benefit from applying transform based filtering. Also, if you do this make sure you read up on overlap-add and overlap-save filtering and do not go with the all to common "let's take the FFT, multipliy with an 'ideal' response and then an IFFT". This will in general not give the expected result (unless you expect a transfer function which is time varying and different from the 'ideal').

Generate FFT and decode on Arduino

I really fail at FFT and now I'm in need to communicate from the headphone jack of my Android to the Arduino there's currently a library for Arduino (talks about it in the blog post Real-time spectrum analyzer powered by Arduino) and one for Android too!
How should I start? How should I build audio signals which ultimately can be turned into FFTs and the Arduino can analyse the same using the library and I can actuate anything?
You are asking a very fuzzy question: "How should I build audio signals which ultimately can be turned into FFTs and the Arduino can analyse the same using the library and I can actuate anything?". I am going to help you think through the problem - asking yourself the right questions is essential to get any answers.
Presumably, your audio signals are "coming from somewhere" - i.e. they are sound. This means that you need to convert them into a stream of numbers first.
problem #1: converting audio signal into a stream of numbers
This breaks down into three separate sub problems:
Getting the signal to the right amplitude
Choosing the sampling rate needed
Digitizing and storing the data for later processing
Items (1) and (3) are related, since you need to know how you are going to digitize the signal before you can choose the right amplitude. For example, if you have a microphone as your sound input source, you will need to amplify the signal (and maybe add some automatic gain control) before feeding it into an ADC (analog to digital converter) that has a 5 V input range, since the microphone may have an output in the mV range. Without more information about the hardware you are using, there's not a lot to add here. It sounds from your tag that you are trying to do that inside an Android device - in which case I wonder how you intend to move the digital signal to the Arduino (over USB?).
The second point, "choosing the sampling rate", is actually very important. A sound signal contains many different frequencies - think of them as keys on the piano. In order to detect a high frequency, you need to sample the signal "faster than it is changing". There is a formal theorem called "Nyquist's Theorem" that states that you have to sample at 2x the highest frequency that is present in your signal. Note - it's not just "that you are interested in", but "that is present". If you sample a high frequency signal with a low frequency sample clock, it will appear "aliased" - it wil show up in your output as something completely different. So before you digitize a signal you have to decide what the frequencies of interest are, and remove all higher frequencies with a filter. Let's say you are interested in frequencies up to 500 Hz (about 1 octave above middle C on a piano). To give your filter a chance to work, you might choose to cut off all frequencies above 1 kHz (filters "roll off" - i.e. they increase in strength over a range of frequencies), and would sample at 2 kHz. This means you get 2000 samples per second, and you need to figure out where to put them on your Arduino (memory fills up quickly on the little board.)
Problem #2: analyzing the signal
Assuming that you have somehow captured a digital signal, your next task is analyzing it. The FFT is basicaly some clever math that tells you, for a given sound sample, "what keys on the piano were hit, and how hard". It breaks the sound signal into a series of frequency "bins", and determines how much energy is in each bin (it also computes the phase, but let's keep it simple). So if the input of a FFT algorithm is a sound sample, the output is an array of values telling you what frequencies were present in the signal. This is approximate, since it will find the "nearest bin". Sticking with the same analogy - if you were hitting a piano that's out of tune, the algorithm won't return "out of tune", but rather "a bit of C, and a bit of C sharp", since it cannot actually measure anything in between. The accuracy of an FFT is determined by the sampling frequency (which gives you the upper limit on the frequency you can detect) and the sample length: the longer you "listen" so the sample, the more subtle the differences you can "hear". So you have another trade-off to consider: if your audio signal changes rapidly, you have to sample for a short time (to capture the quick changes); but if you need an accurate frequency, you have to sample for a long time. For example if you are writing a Morse decoder, your sampling has to be short compared to a pause between "dits" and "dashes" - or they will slur together. Figuring out that a morse tone is present is pretty easy though, since there will be a single tone (one bin in the FFT) that is much larger than the others.
Exactly how you implement these things depends on your application. The third step, "doing something with it", requires you to decide what is a meaningful signal. Again, if you are making a Morse decoder, you would perhaps turn an LED ON when a single tone is present (one or two bins in the FFT have much bigger value than the mean of the others), and OFF when it is not (all noise - lots of bins with approximately the same size). But without a LOT more information from you, there's not much more one can say to help you.
You might learn a lot from reading the following articles:
http://www.arduinoos.com/2010/10/sound-capture/
http://www.arduinoos.com/2010/10/fast-fourier-transform-fft/
http://interface.khm.de/index.php/lab/experiments/frequency-measurement-library/

How to best determine volume of a signal?

I want to determine the volume of an audio signal.
I have found two options:
Compute Root Mean Squared of the amplitude
find the maximum amplitude
Are there advantages to using #1 or #2?
Here is what I am trying to do:
I want my Android to analyze audio from the microphone. I want the device to detect a loud noise. The input is a short [].
If you use the maximum amplitude (2), then your volume level would be determined by a single sample (which you might not even be able to hear). When calculating a value that correlates with your impression of the loudness of the sound such as the Sound Pressure Level or the Sound Power Level you need to use the RMS (1).
Because you ear is not equally sensitive to all frequencies, a better correlate of your perception can be had by using an A-weighting on the signal. Split (filter) the signal in octave bands, calculate the RMS for each band and apply the A-weighting.
If you want to check volume level, just compute its dB Value (I assume the signal is normalized i.e. 1 == maximum level):
level[n] = - 20 x log(1/signal[n]);
However, detecting audio noise is not a trivial task. The most common and simple technique is to use algorithm called NoiseGate which basically compares the signal level with some dB Threshold value - if the signal level is above threshold, then the output is zeroed. But it is unusable in practice; there must be also some Attack and Release times for smooth thresholding otherwise it would affect also a real signal (music, speech) and produce some kind of clipping.
Check Google, it will give you a lot of resources about NoiseGate algorithm and noise removal techniques:
http://en.wikipedia.org/wiki/Noise_gate
http://www.developer.com/java/other/article.php/3599661/Adaptive-Noise-Cancellation-using-Java.htm

Categories

Resources