I'm using audiorecorder to record sound and do some processing in pseudorealtime on android phone.
i'm facing a problem between FFT and convolution of audio signal:
I perform FFT on a known signal(a sine waveform), and i correctly always find the single tone contained in it, by using the FFT.
Now i want to do the same thing by using a convolution (it's an exercise): I perform 5000 convolutions of that signal by using 5000 filters. Each filter is a sine waveform on a different frequency between 0 and 5000 Hz.
Then, i search the peak for each convolution output. By this way i should find the maximum peak when i'm using the filter with the same tone contained on the signal.
Infact with a tone of 2kHz i can find the max with the 2kHz filter.
The problem is that when i receive a 4kHz tone, i find the max on the convolution with the 4200Hz filter (while the FFT instead always works fine)
Is it matematically possible?
what is the problem in my convolution?
This is the convolution function that i wrote:
//i do the convolution and return the max
//IN is the array with the signal
//DATASIZE is the size of the array IN
//KERNEL is the filter containing the sine at the selected frequency
int convolveAndGetPeak(short[] in,int dataSize, double[] kernel) {
//per non rischiare l'overflow, il kernel deve avere un ampiezza massima pari a 1/10 del max
int i, j, k;
int kernelSize=kernel.length;
int tmpSignalAfterFilter=0;
double out;
// convolution from out[0] to out[kernelSize-2]
//iniziamo
for(i=0; i < kernelSize - 1; ++i)
{
out = 0; // init to 0 before sum
for(j = i, k = 0; j >= 0; --j, ++k)
out += in[j] * kernel[k];
if (Math.abs((int) out)>tmpSignalAfterFilter ){
tmpSignalAfterFilter=Math.abs((int) out);
}
}
// start convolution from out[kernelSize-1] to out[dataSize-1] (last)
//iniziamo da dove eravamo arrivati
for( ; i < dataSize; ++i)
{
out = 0; // initialize to 0 before accumulate
for(j = i, k = 0; k < kernelSize; --j, ++k)
out += in[j] * kernel[k];
if (Math.abs((int) out)>tmpSignalAfterFilter ){
tmpSignalAfterFilter=Math.abs((int) out);
}
}
return tmpSignalAfterFilter;
}
the kernel, used as filter, is generated this way:
//curFreq is the frequency of the filter in Hz
//kernelSamplesSize is the desired length of the filter (number of samples), for time precision reasons i'm using 20 samples length.
//sampleRate is the sampling frequency
double[] generateKernel(int curFreq,int kernelSamplesSize,int sampleRate){
double[] curKernel= new double[kernelSamplesSize] ;
for (int kernelIndex=0;kernelIndex<curKernel.length;kernelIndex++){
curKernel[kernelIndex]=Math.sin( (double)kernelIndex * ((double)(2*Math.PI) * (double)curFreq / (double)sampleRate)); //the part that makes this a sine wave....
}
return curKernel;
}
if you want to try a convolution, the data contained in the IN array is the following:
http://www.tr3ma.com/Dati/signal.txt
Note1: the sampling frequency is 44100Hz
Note2: the tone contained in the signal is a single 4kHz tone (even if the convolution has the max peak with a 4200Hz filter.
EDIT: I also repeated the test on a excel sheet. the result is the same (of course, i'm using the same algorithm) and the algorithms seems to me to be correct...
this is the excel sheet i prepared, if you prefer to work on excel: http://www.tr3ma.com/Dati/convolutions.xlsm
You change the bandwidth by two factors:
a) The length of your kernel (e.g. a length t of 5ms produces a rough bandwidth of f >= 200Hz, estimated with 1/0.005 because Δt·Δf >= 1, see "Heisenberg"), and
b) the window function (which you definitely should implement to make your algorithm working in real-world applications because otherwise in some cases sidelobes of some filter outputs could yield more energy than the main lobe of the expected filter output).
But you have another problem: you need to convolve with a 2nd kernel consisting of cosine waves (which means that you need the same waves as in the 1st kernel but shifted by 90 degrees). Why is that? Because with only the sine kernel, you get a phase-dependent modulation of the filter outputs (e.g. if the phase difference between the input signal and the kernel wave with the identical frequency is 90 degrees you get the amplitude 0).
Finally, you combine the outputs of both kernels with Pythagoras.
it seems all correct, apart the number of samples of the kernel (the filter).
Increasing the size of the filter the result is more accurate.
I don't know how to calculate the bandwidth of this filter but it seems clear to me that it's a matter of filter bandwidth. So, the filter bandwidth depends also on the number of samples of the filter used in the convolution, with reference to the sampling frequency(and may be also with reference to the tone frequency). Unfortunately i can not increase too much the number of samples of my filter since otherwise the phone can not perform the filtering in realtime.
Note: i need the convolution cause i need to identify the precise moment when the tone was fired.
EDIT: i made a compare between filter with 20 samples and filter with 40 samples.
I don't know the formula to obtain the fitler bandwidth but it's clear, in the following image, the difference between the 2 filters.
EDIT2: FEW DAYS AFTER POSTING THE SOLUTION I FOUND HOW TO CALCULATE THE BANDWIDTH OF SUCH FILTER: IT'S JUST THE INVERSE OF THE FILTER DURATION. SO IN EXAMPLE A KERNEL OF 40 SAMPLES AT 44100KhZ HAS A DURATION OF ABOUT 907uS, THEN THE FILTER BANDWIDTH, WITH THIS KERNEL AND A WINDOW OF THE SAME LENGTH IS 1/907uS= 1,1KhZ
(source: tr3ma.com)
Related
I'm started to implement Oboe c++ library for Android.
(According to Build a Musical Game using Oboe
I just scale the sample for increasing the volume and it works but with crackling popping.
can I increase the amplitude without getting the crackling popping?
I tried to save my sample sounds with a little bit gain but it sounds very bad.
Thanks.
Btw without increasing the volume, it sounds clear but very low volume compared to other music apps.
for (int i = 0; i < mNextFreeTrackIndex; ++i) {
mTracks[i]->renderAudio(mixingBuffer, numFrames);
for (int j = 0; j < numFrames * kChannelCount; ++j) {
audioData[j] += (mixingBuffer[j] * ((float)volume));
}
Edited:
int16_t Mixer::hardLimiter(int16_t sample) {
int16_t audioData = sample * volume;
if(audioData >= INT16_MAX){
return INT16_MAX;
}else if(audioData <= INT16_MIN){
return INT16_MIN;
}
return audioData;
};
The code which you've posted is from Mixer::renderAudio(int16_t *audioData, int32_t numFrames). Its job is to mix the sample values from the individual tracks together into a single array of 16-bit samples.
If you're mixing 2 or more tracks together without reducing the values first then you may exceed the maximum sample value of 32,767 (aka INT16_MAX). Doing so would cause wraparound (i.e. writing 32,768 to an int16_t will result in a value of -32,768 being stored) and therefore audible distortion/crackling.
With this in mind you could write a very basic (hard) limiter - do your volume scaling using an int32_t and only write it into the int16_t array if the value doesn't exceed the maximum, otherwise just write the maximum.
This isn't really a good approach though because you shouldn't be hitting the limits of 16-bit values. Better would be to scale down your input sample values first, then add a gain stage after the mixer (or on individual tracks inside the mixer) to bring the overall amplitude up to an acceptable level.
I'm analyzing audio signals on Android. First tried with MIC and succeeded. Now I'm trying to apply FFT on MP3 data comes from Visualizer.OnDataCaptureListener's* onWaveFormDataCapturemethod which is linked to MediaPlayer. There is a byte array called byte[] waveform which I get spectral leakage or overlap when apply FFT on this data.
public void onWaveFormDataCapture(Visualizer visualizer, byte[] waveform, int samplingRate)
I tried to convert the data into -1..1 range by using the code below in a for loop;
// waveform varies in range of -128..+127
raw[i] = (double) waveform[i];
// change it to range -1..1
raw[i] /= 128.0;
Then I copy the raw into fft buffer;
fftre[i] = raw[i];
fftim[i] = 0;
Then I call the fft function;
fft.fft(fftre, fftim); // in: audio signal, out: fft data
As final process I convert them into magnitudes in dB then draw freqs on screen
// Ignore the first fft data which is DC component
for (i = 1, j = 0; i < waveform.length / 2; i++, j++)
{
magnitude = (fftre[i] * fftre[i] + fftim[i] * fftim[i]);
magnitudes[j] = 20.0 * Math.log10(Math.sqrt(magnitude) + 1e-5); // [dB]
}
When I play a sweep signal from 20Hz to 20kHz, I don't see what I see on MIC. It doesn't draw a single walking line, but several symmetric lines going far or coming near. Somehow there is a weaker symmetric signal on other end of the visualizer.
The same code which using 32768 instead of 128 on division works very well on MIC input with AudioRecord.
Where am I doing wrong?
(and yes, I know there is a direct fft output)
The input audio is 8-bit unsigned mono. The line raw[i] = (double) waveform[i] causes an unintentional unsigned-to-signed conversion, and since raw is biased to approximately a 128 DC level, a small sine wave ends up getting changed into a high-amplitude modified square wave, as the signal crosses the 127/-128 boundary. That causes a bunch of funny harmonics (which caused the "symmetric lines coming and going" you were talking about).
Solution
Change to (double) (waveform[i] & 0xFF) so that the converted value lies in the range 0..255, instead of -128..127.
I am trying to display some visualization effect of some PCM data.
The target is to display something like the following:
I searched and found that JTransform is the correct library to use. However, I cannot find a good guide of how to use this library. How can I translate my PCM data into the band/frequency data that can be used to draw the bars?
Thanks a lot.
PCM audio is the digitized simplification of an analog audio curve ... this time domain signal can get feed into a Discrete Fourier Transform api call to transform the data into its frequency domain equivalent ... imaginary numbers and Euler's Formula are your friends
The easy part is to call fft, its more involved to parse its output ...
fill a buffer with at least 1024 (make sure its a power of 2) points from your PCM and just feed this into some fft api call ... this will return back to you its frequency domain equivalent ... nail the doc on whichever Discrete Fourier Transform api call you use ... lookup notion of Nyquist Limit ... master idea of frequency bin ... keep at hand number of samples per buffer and sample rate of your PCM audio
Be aware as you increase the number of audio samples (PCM points on the audio curve) you feed into a Fourier Transform the finer the frequency resolution returned from that call, however if your audio is some dynamic signal like music (opposed to a static tone) this lowers the temporal specificity
Here is a function I wrote in golang which is a wrapper around a call to DFT where I feed it a PCM raw audio buffer normalized into floating point which varies from -1 to +1 where it makes the Discrete Fourier Transform (fft) call then calculates magnitude of each frequency bin using array of complex numbers returned from DFT ... a portion of a project which synthesizes audio by watching video (an image at a time) then it can listen to that audio to synthesize output images ... achieved goal where output photo largely matches input photo ...... input image -> audio -> output image
func discrete_time_fourier_transform(aperiodic_audio_wave []float64, flow_data_spec *Flow_Spec) ([]discrete_fft, float64, float64, []float64) {
min_freq := flow_data_spec.min_freq
max_freq := flow_data_spec.max_freq
// https://www.youtube.com/watch?v=mkGsMWi_j4Q
// Discrete Fourier Transform - Simple Step by Step
var complex_fft []complex128
complex_fft = fft.FFTReal(aperiodic_audio_wave) // input time domain ... output frequency domain of equally spaced freqs
number_of_samples := float64(len(complex_fft))
nyquist_limit_index := int(number_of_samples / 2)
all_dft := make([]discrete_fft, 0) // 20171008
/*
0th term of complex_fft is sum of all other terms
often called the bias shift
*/
var curr_real, curr_imag, curr_mag, curr_theta, max_magnitude, min_magnitude float64
max_magnitude = -999.0
min_magnitude = 999.0
min_magnitude = 999.0
all_magnitudes := make([]float64, 0)
curr_freq := 0.0
incr_freq := flow_data_spec.sample_rate / number_of_samples
for index, curr_complex := range complex_fft { // we really only use half this range + 1
// if index <= nyquist_limit_index {
if index <= nyquist_limit_index && curr_freq >= min_freq && curr_freq < max_freq {
curr_real = real(curr_complex) // pluck out real portion of imaginary number
curr_imag = imag(curr_complex) // ditto for im
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples
curr_theta = math.Atan2(curr_imag, curr_real)
curr_dftt := discrete_fft{
real: 2.0 * curr_real,
imaginary: 2.0 * curr_imag,
magnitude: curr_mag,
theta: curr_theta,
}
if curr_dftt.magnitude > max_magnitude {
max_magnitude = curr_dftt.magnitude
}
if curr_dftt.magnitude < min_magnitude {
min_magnitude = curr_dftt.magnitude
}
// ... now stow it
all_dft = append(all_dft, curr_dftt)
all_magnitudes = append(all_magnitudes, curr_mag)
}
curr_freq += incr_freq
}
return all_dft, max_magnitude, min_magnitude, all_magnitudes
}
Now you have an array all_magnitudes where each element of the array is the magnitude of that frequency bin ... each frequency bin is evenly spaced by a frequency increment defined by above var incr_freq ... normalize the magnitude using min and max_magnitude ... its ready to feed into an X,Y plot to give you the spectrogram visualization
I suggest cracking open some books ... watch the video I mention in above comments ... my voyage of exploration into wonders of the Fourier Transform has been ongoing since being an EE undergrad and its loaded with surprising applications and its theory continues to be a very active research domain
In my app I allow the user to record audio using the phone's camera, while the recording is in progress I update a Path using time as the X value and a normalized form of getMaxAmplitude() for the y value.
float amp = Math.min(mRecorder.getMaxAmplitude(), mMaxAmplitude)
/ (float) mMaxAmplitude;
This works rather well.
My problem occurs when I go to play back the audio (after transporting it over the network). I want to recreate the waveform generated while recording, but the MediaPlayer class does not possess the same getMaxAmplitude() method.
I have been attempting to use the Visualizer class provided by the framework, but am having a difficult time getting a usable result for the y value. The byte array returned contains values between -128 and 127 but when i look at the actual values they do not appear to represent the waveform as I would expect it to be.
How do I use the values returned from the visualizer to get a value related to the loudness of the sound?
Your byte array is probably an array of 16, 24 or 32 bit signed values. Assuming they are 16 bit signed then the bytes will be alternating hi-byte with the MSB being the sign bit and the lo-byte. Or, depending on the endianness it could be lo-byte followed by the high byte. Moreover, if you have two channels of data, each sample is probably interleaved. Again, assuming 16-bits, you can decode the samples something in a manner similar to this:
for (int i = 0 ; i < numBytes/2 ; ++i)
{
sample[i] = (bytes[i*2] << 8) | bytes[i*2+1];
}
According to the documentation of getMaxAmplitude, it returns the maximum absolute amplitude that was sampled since the last call. I guess this means the peak amplitude but it's not totally clear from the documentation. To compute the peak amplitude, just compute the max of the abs of all the samples.
int maxPeak = 0.0;
for (int i = 0 ; i < numSamples ; ++i)
{
maxPeak = max(maxPeak, abs(samples[i]));
}
I'm using the library of #LeffelMania : https://github.com/LeffelMania/android-midi-lib
I'm musician but I've always recorded as studio recordings, not MIDI, so I don't understand some things.
The thing I want to understand is this piece of code:
// 2. Add events to the tracks
// Track 0 is the tempo map
TimeSignature ts = new TimeSignature();
ts.setTimeSignature(4, 4, TimeSignature.DEFAULT_METER, TimeSignature.DEFAULT_DIVISION);
Tempo tempo = new Tempo();
tempo.setBpm(228);
tempoTrack.insertEvent(ts);
tempoTrack.insertEvent(tempo);
// Track 1 will have some notes in it
final int NOTE_COUNT = 80;
for(int i = 0; i < NOTE_COUNT; i++)
{
int channel = 0;
int pitch = 1 + i;
int velocity = 100;
long tick = i * 480;
long duration = 120;
noteTrack.insertNote(channel, pitch, velocity, tick, duration);
}
Ok, I have 228 Beats per minute, and I know that I have to insert the note after the previous note. What I don't understand is the duration.. is it in milliseconds? it doesn't have sense if I keep the duration = 120 and I set my BPM to 60 for example. Neither I understand the velocity
MY SCOPE
I want to insert notes of X pitch with Y duration.
Could anyone give me some clue?
The way MIDI files are designed, notes are in terms of musical length, not time. So when you insert a note, its duration is a number of ticks, not a number of seconds. By default, there are 480 ticks per quarter note. So that code snippet is inserting 80 sixteenth notes since there are four sixteenths per quarter and 480 / 4 = 120. If you change the tempo, they will still be sixteenth notes, just played at a different speed.
If you think of playing a key on a piano, the velocity parameter is the speed at which the key is struck. The valid values are 1 to 127. A velocity of 0 means to stop playing the note. Typically a higher velocity means a louder note, but really it can control any parameter the MIDI instrument allows it to control.
A note in a MIDI file consists of two events: a Note On and a Note Off. If you look at the insertNote code you'll see that it is inserting two events into the track. The first is a Note On command at time tick with the specified velocity. The second is a Note On command at time tick + duration with a velocity of 0.
Pitch values also run from 0 to 127. If you do a Google search for "MIDI pitch numbers" you'll get dozens of hits showing you how pitch number relates to note and frequency.
There is a nice description of timing in MIDI files here. Here's an excerpt in case the link dies:
In a standard MIDI file, there’s information in the file header about “ticks per quarter note”, a.k.a. “parts per quarter” (or “PPQ”). For the purpose of this discussion, we’ll consider “beat” and “quarter note” to be synonymous, so you can think of a “tick” as a fraction of a beat. The PPQ is stated in the last word of information (the last two bytes) of the header chunk that appears at the beginning of the file. The PPQ could be a low number such as 24 or 96, which is often sufficient resolution for simple music, or it could be a larger number such as 480 for higher resolution, or even something like 500 or 1000 if one prefers to refer to time in milliseconds.
What the PPQ means in terms of absolute time depends on the designated tempo. By default, the time signature is 4/4 and the tempo is 120 beats per minute. That can be changed, however, by a “meta event” that specifies a different tempo. (You can read about the Set Tempo meta event message in the file format description document.) The tempo is expressed as a 24-bit number that designates microseconds per quarter-note. That’s kind of upside-down from the way we normally express tempo, but it has some advantages. So, for example, a tempo of 100 bpm would be 600000 microseconds per quarter note, so the MIDI meta event for expressing that would be FF 51 03 09 27 C0 (the last three bytes are the Hex for 600000). The meta event would be preceded by a delta time, just like any other MIDI message in the file, so a change of tempo can occur anywhere in the music.
Delta times are always expressed as a variable-length quantity, the format of which is explained in the document. For example, if the PPQ is 480 (standard in most MIDI sequencing software), a delta time of a dotted quarter note (720 ticks) would be expressed by the two bytes 82 D0 (hexadecimal).