Guitar pitch detection in android

Guitar pitch detection in android - android

I try to develop a guitar game in android platform.
And I need to do the real-time pitch detection to get the frequency of guitar chord/String.
I will get the input from the microphone, and then analyze the input (the input playing which kind of guitar string/chord)
I find two kinds of method that I can use, one is YIN, another one is FFT.
Which method can get better performance and exact result?

You need to first understand what 'pitch' really is (read the Wikipedia link below). When a single note is made on a guitar or piano, what we hear is not just one frequency of sound vibration, but a composite of multiple sound vibrations occurring at different mathematically related frequencies. The elements of this composite of vibrations at differing frequencies are referred to as harmonics or partials. For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ).
Below, at GitHub.com, is the C++ source code for an unusual two-stage algorithm that I devised which can do Realtime Pitch Detection on polyphonic MP3 files while being played on Windows. This free application (PitchScope Player, available on web) is frequently used to detect the notes of a guitar or saxophone solo upon a MP3 recording. You could download the executable for Windows to see my algorithm at work on a mp3 file of your choosing. The algorithm is designed to detect the most dominant pitch (a musical note) at any given moment in time within a MP3 or WAV music file. Note onsets are accurately inferred by a change in the most dominant pitch (a musical note) at any given moment during the MP3 recording.
I use a modified DFT Logarithmic Transform (similar to a FFT) to first detect these possible Harmonics by looking for frequencies with peak levels (see diagram below). Because of the way that I gather data for my modified Log DFT, I do NOT have to apply a Windowing Function to the signal, nor do add and overlap. And I have created the DFT so its frequency channels are logarithmically located in order to directly align with the frequencies where harmonics are created by the notes on a guitar, saxophone, etc.
My Pitch Detection Algorithm is actually a two stage process: a) First the ScalePitch is detected ('ScalePitch' has 12 possible pitch values: {E, F, F#, G, G#, A, A#, B, C, C#, D, D#} ) b) and after ScalePitch is determined, then the Octave is calculated by examining all the harmonics for the 4 possible Octave-Candidate notes. The algorithm is designed to detect the most dominant pitch (a musical note) at any given moment in time within a polyphonic MP3 file. That usually corresponds to the notes of an instrumental solo. Those interested in the C++ source code for my Two Stage Pitch Detection algorithm might want to start at the Estimate_ScalePitch() function within the SPitchCalc.cpp file at GitHub.com.
https://github.com/CreativeDetectors/PitchScope_Player
https://en.wikipedia.org/wiki/Transcription_(music)#Pitch_detection
Below is the image of a Logarithmic DFT (created by my C++ software) for 3 seconds of a guitar solo on a polyphonic mp3 recording. It shows how the harmonics appear for individual notes on a guitar, while playing a solo. For each note on this Logarithmic DFT we can see its multiple harmonics extending vertically, because each harmonic will have the same time-width. After the Octave of the note is determined, then we know the frequency of the Fundamental.
The diagram below demonstrates the Octave Detection algorithm which I developed to pick the correct Octave-Candidate note (that is, the correct Fundamental), once the ScalePitch for that note has been determined. Those wishing to see that method in C++ should go to the Calc_Best_Octave_Candidate() function inside the file called FundCandidCalcer.cpp, which is contained in my source code at GitHub.

Related

Recognize knock (clap) by sound in android

In my task, our android mobile app need to recognize the knock sound (when knock to the surface of mobile device) to open to the app.
I tried some ways but it only recognize about 80% of knock (some time I knock phone but it do not return it is knock sound) and sometime it recognize other sound as knock, like vowel 'a'.
Here are the 3 methods we used -
1. Recognize by hight pass filter:
2. Using sum of magnitude from 13kHz to 18kHz (refer this article) :
3. Using library (refer link)
All of this effort only recognize about 80% of knock sound and some time it recognize other sound as knock.
I am not sure about knock characteristics and how to recognize knock exactly (it recognize knock when I clap phone exactly). Any help is greatly appreciated!

Recognize by hight pass filter
No relation to knock
Using sum of magnitude from 13kHz to 18kHz (refer this article) :
This is a reasonable direction but you need to add more features, in particular the energy in other frames nearby.
Using library
Not relevant
All your methods do not work because they have no relation to knock properties. To properly detect knock you need to figure out what distinguishes it from other sounds:
Knock is very short in time
Knock frequencies are in higher part of the spectrum.
So you need to implement the following algorithm:
Split audio on frames
Create FFT transform for every frame
Analyze FFT transform for every frame and neighbor frames and make
sure the following:
Spectral energy for frame is concentrated in the upper part
Energy of frame is significantly higher than the energy of neighbor frames
Once you see both features you can signal about knock detected.
For the reference see knock spectrogram:
A related algorithm with explanation is also covered here:
Given an audio stream, find when a door slams (sound pressure level calculation?)
If you want to further discriminate between sounds, for example recognizer clicks and doorslams from claps, then you might want to implement a classifier for the spectrum. You will need to collect more examples of claps and different sound and apply a machine learning toolkit on FFT values. An SVM should work reasonably well for this task.

what is the maximum sound recording capacity of mobile hardware?

I am developing an android app for recording the sound. In my app i will display the SPL (Sound Pressure Level) in dB. As part of my search, i come across, mobile hardware can only record sounds up to <= 110 dB. The reason is, mobiles are designed for human voice recording and that falls under the range of 60 dB. So if i need to record the sounds which is more than 110 dB how the mobile hardware will respond to that? Do i need to depend upon external devices and not the mobiles? Please provide your comments.
Thanks & regards,
Siva.

Your question is in fact about the dynamic range of the audio input of a mobile phone - any value you record must be capable of being represented in the scale used to measure it.
There is an associated question of what the largest sound pressure level that a particular phone can record, but this is ultimately limited by the dynamic range and the design of transducer used. Any absolutely measure is relative a calibration point - which in digital audio systems is dB FSD (e.g. ratio sample to maximum), yielding negative values.
The dynamic range in dB of a ideal PCM system is limited by quantisation noise and is related directly to bit-depth (Q) of the sample:
SQNR = 20*log10(2 ^ Q) =~ 6.02Q
State-of-the-art ADCs used in pro-audio equipment typically have 24-bit sample depth giving a SQNR of 144dB. It's worth noting, that in silicon ADCs and DACs, the thermal noise floor of the analogue section of the converter is smaller than this, and the LSB might as well be random.
AFAIK, Android is using 16-bit PCM, which has a SQNR of 96dB. This is the same performance as the CD Audio standard. A SNR of 110dB wouldn't be bad for pro-audio equipment.
In practice, audio quality is rarely a headline feature of phones and most get nowhere near this. Most users use crappy headphones or the on-board speaker of their phone for voice calls and won't notice the difference. It's an obvious corner to cut from both a cost and power budget point of view for a phone manufacturer.
Additionally, good digital audio design is a black-art. Factors such as decoupling of digital signals from ground and physical proximity of analogue components come into play. You find that in tear-downs of Apple kit, they often place the codec right next to the headphone jack, and away from the main board of the system. Again, other cost-conscious manufactures don't do this, and it'll degrade the dynamic range of the system.
In order to get meaningful measurements from the audio input you will need to disable both automatic gain control (AGC) and probably the HFP (used to remove DC bias, and often set with Fc > 100Hz for voice calls).
If your intention is to record absolute SPL, you will need to calibrate the audio system of the device to a set-point. There is no standardisation of this between manufacturers (or even devices from any given manufacturer). Unless you fancy doing this for the devices on the market (of which there are a lot), you'll never provide universally accurate measurements.

Generate FFT and decode on Arduino

I really fail at FFT and now I'm in need to communicate from the headphone jack of my Android to the Arduino there's currently a library for Arduino (talks about it in the blog post Real-time spectrum analyzer powered by Arduino) and one for Android too!
How should I start? How should I build audio signals which ultimately can be turned into FFTs and the Arduino can analyse the same using the library and I can actuate anything?

You are asking a very fuzzy question: "How should I build audio signals which ultimately can be turned into FFTs and the Arduino can analyse the same using the library and I can actuate anything?". I am going to help you think through the problem - asking yourself the right questions is essential to get any answers.
Presumably, your audio signals are "coming from somewhere" - i.e. they are sound. This means that you need to convert them into a stream of numbers first.
problem #1: converting audio signal into a stream of numbers
This breaks down into three separate sub problems:
Getting the signal to the right amplitude
Choosing the sampling rate needed
Digitizing and storing the data for later processing
Items (1) and (3) are related, since you need to know how you are going to digitize the signal before you can choose the right amplitude. For example, if you have a microphone as your sound input source, you will need to amplify the signal (and maybe add some automatic gain control) before feeding it into an ADC (analog to digital converter) that has a 5 V input range, since the microphone may have an output in the mV range. Without more information about the hardware you are using, there's not a lot to add here. It sounds from your tag that you are trying to do that inside an Android device - in which case I wonder how you intend to move the digital signal to the Arduino (over USB?).
The second point, "choosing the sampling rate", is actually very important. A sound signal contains many different frequencies - think of them as keys on the piano. In order to detect a high frequency, you need to sample the signal "faster than it is changing". There is a formal theorem called "Nyquist's Theorem" that states that you have to sample at 2x the highest frequency that is present in your signal. Note - it's not just "that you are interested in", but "that is present". If you sample a high frequency signal with a low frequency sample clock, it will appear "aliased" - it wil show up in your output as something completely different. So before you digitize a signal you have to decide what the frequencies of interest are, and remove all higher frequencies with a filter. Let's say you are interested in frequencies up to 500 Hz (about 1 octave above middle C on a piano). To give your filter a chance to work, you might choose to cut off all frequencies above 1 kHz (filters "roll off" - i.e. they increase in strength over a range of frequencies), and would sample at 2 kHz. This means you get 2000 samples per second, and you need to figure out where to put them on your Arduino (memory fills up quickly on the little board.)
Problem #2: analyzing the signal
Assuming that you have somehow captured a digital signal, your next task is analyzing it. The FFT is basicaly some clever math that tells you, for a given sound sample, "what keys on the piano were hit, and how hard". It breaks the sound signal into a series of frequency "bins", and determines how much energy is in each bin (it also computes the phase, but let's keep it simple). So if the input of a FFT algorithm is a sound sample, the output is an array of values telling you what frequencies were present in the signal. This is approximate, since it will find the "nearest bin". Sticking with the same analogy - if you were hitting a piano that's out of tune, the algorithm won't return "out of tune", but rather "a bit of C, and a bit of C sharp", since it cannot actually measure anything in between. The accuracy of an FFT is determined by the sampling frequency (which gives you the upper limit on the frequency you can detect) and the sample length: the longer you "listen" so the sample, the more subtle the differences you can "hear". So you have another trade-off to consider: if your audio signal changes rapidly, you have to sample for a short time (to capture the quick changes); but if you need an accurate frequency, you have to sample for a long time. For example if you are writing a Morse decoder, your sampling has to be short compared to a pause between "dits" and "dashes" - or they will slur together. Figuring out that a morse tone is present is pretty easy though, since there will be a single tone (one bin in the FFT) that is much larger than the others.
Exactly how you implement these things depends on your application. The third step, "doing something with it", requires you to decide what is a meaningful signal. Again, if you are making a Morse decoder, you would perhaps turn an LED ON when a single tone is present (one or two bins in the FFT have much bigger value than the mean of the others), and OFF when it is not (all noise - lots of bins with approximately the same size). But without a LOT more information from you, there's not much more one can say to help you.
You might learn a lot from reading the following articles:
http://www.arduinoos.com/2010/10/sound-capture/
http://www.arduinoos.com/2010/10/fast-fourier-transform-fft/
http://interface.khm.de/index.php/lab/experiments/frequency-measurement-library/

audio, balance the sound from 2 sources

I am working a phone recording software (android) which record a conversation between 2 people on a phone call. The output of each phone call is an audio file of which contains the sound from both the caller and callee.
However, most of the time, the voice from the phone that this software run on is clearer than the other. Users request me to make the 2 sound equally clear.
So the problem I have now is: I have a sound file containing voices from 2 sources with different volume, what should I do make the volume of voice from those 2 sources equally regarding the noise should not be increased. Given that this is a phone call so at a specific time there is only one person speaking.
I see at least 1 straight solution for this: making a program analyzing the wave form of the sound file, identifying parts of the sound file coming from the source having smaller voice and increase it to a level seemingly balance with the another. However this will be not an easy one to implement and I also hope that there would be better solution out there. Do you have any suggestion for me?
Thank you.

Well, the first thing to do is to get rid of all of the noise that you do not care about.
The spectrum that you would want to use is: 300 Hz to 3500 Hz
You can cut all of the other frequencies which would substantially cut your noise. You can then apply an autoequalization gain profile or even tap into the DSP profiles available on several devices.
I would also take a look at this whitepaper if you have a chance. (IEEE or ACM membership required).
An Auto-Equalization System Based on DirectShow Technology and Its Application in Audio Broadcast System of Radio Station
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5384659&contentType=Conference+Publications&searchWithin%3Dp_Authors%3A.QT.Bai+Xinyue.QT.

This is how I have solved this problem:
1. I decode the audio into a series of Integer value thank to the storing WAV format.
The result be [xi] ; 0 < xi < 255
2. Then I have to decide 2 custom value:
- Noise threshold? if xi > threshold => it is not noise (pretty naive!)
- How long should sound be a chunk of human voice?
I myself choose the first value to 5 and the second value to 100ms
3. My algorithm will analyze the [xi] in to [Yi] with each Y is an array of x and each Y represent a chunk of human sound.
After that, I apply k-mean with k=2 and got 2 different cluster of Y, one belongs to the person whose voice is louder and the other belongs to the one with softer voice.
4. What left is pretty straight forward, I have to decide a parameter M, each x belong to a Y of the softer voice will multiply with M and I get the final result.

what's the "mapping" between audio analog input(voltage) and digital data for microphone?

i want to realize 232 communication over Audio jack in Android phone. >i met a problem when i want to convert the audio voltage to digital data, i don't know what's the digital value that a audio voltage will be convert to , in other word it's refer to the "mapping" between audio analog and digital data;
thanks !

There is no standard mapping between volts and "digits". With pro gear, several standards have been proposed. I have most often calibrated at 0 dBu = -10 dBFS, but a lot of (by no means all!) modern pro analog gear is pretty linear well above +10 dBu, so I'm not sure that calibration makes sense.
The mapping is defined by each individual A/D converter chip and effected by associated electronic circuitry which may add or subtract signal gain. In principle, a given A/D converter will convert its complete input analog range (whatever that is) to it's complete output range (whatever that is). If I remember correctly (which I may not) several popular 16-bit analog devices A/D converters range +/- 2.2 V, while others can operate in different ranges depending on what is supplied. In the 2.2 V case, that would mean that 0V in is close to digital 0, +2.2 V is close to digital 32767 and -2.2 V is close to digital 32768. I say "close to" both because of the obvious asymmetry on the digital side, and the not-so-obvious effects of temperature, noise, frequency and so on.
Once the signal is converted from Volts to "digits" by hardware, it may be further converted to floating point representation by software. There is no standard way this is done either. See: http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html At least the various methods get it close.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.