Android SuperPowered SDK Audio - Frequency Domain example - memset frequency manipulation - android

I'm trying to understand the Superpowered SDK, but new to both Android and C++, as well as audio signals. I have Frequency Domain example from here:
https://github.com/superpoweredSDK/Low-Latency-Android-Audio-iOS-Audio-Engine/tree/master/Examples_Android/FrequencyDomain
running on my Nexus 5X. In the FrequencyDomain.cpp file:
static SuperpoweredFrequencyDomain *frequencyDomain;
static float *magnitudeLeft, *magnitudeRight, *phaseLeft, *phaseRight, *fifoOutput, *inputBufferFloat;
static int fifoOutputFirstSample, fifoOutputLastSample, stepSize, fifoCapacity;
#define FFT_LOG_SIZE 11 // 2^11 = 2048
static bool audioProcessing(void * __unused clientdata, short int *audioInputOutput, int numberOfSamples, int __unused samplerate) {
SuperpoweredShortIntToFloat(audioInputOutput, inputBufferFloat, (unsigned int)numberOfSamples); // Converting the 16-bit integer samples to 32-bit floating point.
frequencyDomain->addInput(inputBufferFloat, numberOfSamples); // Input goes to the frequency domain.
// In the frequency domain we are working with 1024 magnitudes and phases for every channel (left, right), if the fft size is 2048.
while (frequencyDomain->timeDomainToFrequencyDomain(magnitudeLeft, magnitudeRight, phaseLeft, phaseRight)) {
// You can work with frequency domain data from this point.
// This is just a quick example: we remove the magnitude of the first 20 bins, meaning total bass cut between 0-430 Hz.
memset(magnitudeLeft, 0, 80);
memset(magnitudeRight, 0, 80);
I understand how the first 20 bins is 0-430 Hz from here:
How do I obtain the frequencies of each value in an FFT?
but I don't understand the value of 80 in memset... being 4*20, is it 4 bytes for a float * 20 bins? Does magnitudeLeft hold data for all the frequencies? How would I then remove, for example, 10 bins of frequencies from the middle or the highest from the end? Thank you!

Every value in magnitudeLeft and magnitudeRight is a float, which is 32-bits, 4 bytes.
memset takes a number of bytes parameter, so 20 bins * 4 bytes = 80 bytes.
memset clears the first 20 bins this way.
Both magnitudeLeft and magnitudeRight represents the full frequency range with 1024 floats. Their size is always FFT size divided by two, so 2048 / 2 in the example.
Removing from the middle and the top looks something like:
memset(&magnitudeLeft[index_of_first_bin_to_remove], 0, number_of_bins * sizeof(float));
Note that the first parameter is not multiplied with sizeof(float), because the compiler knows that magnitudeLeft is a float, so it will automatically input the correct address.

Related

Understanding onWaveFormDataCapture byte array format

I'm analyzing audio signals on Android. First tried with MIC and succeeded. Now I'm trying to apply FFT on MP3 data comes from Visualizer.OnDataCaptureListener's* onWaveFormDataCapturemethod which is linked to MediaPlayer. There is a byte array called byte[] waveform which I get spectral leakage or overlap when apply FFT on this data.
public void onWaveFormDataCapture(Visualizer visualizer, byte[] waveform, int samplingRate)
I tried to convert the data into -1..1 range by using the code below in a for loop;
// waveform varies in range of -128..+127
raw[i] = (double) waveform[i];
// change it to range -1..1
raw[i] /= 128.0;
Then I copy the raw into fft buffer;
fftre[i] = raw[i];
fftim[i] = 0;
Then I call the fft function;
fft.fft(fftre, fftim); // in: audio signal, out: fft data
As final process I convert them into magnitudes in dB then draw freqs on screen
// Ignore the first fft data which is DC component
for (i = 1, j = 0; i < waveform.length / 2; i++, j++)
{
magnitude = (fftre[i] * fftre[i] + fftim[i] * fftim[i]);
magnitudes[j] = 20.0 * Math.log10(Math.sqrt(magnitude) + 1e-5); // [dB]
}
When I play a sweep signal from 20Hz to 20kHz, I don't see what I see on MIC. It doesn't draw a single walking line, but several symmetric lines going far or coming near. Somehow there is a weaker symmetric signal on other end of the visualizer.
The same code which using 32768 instead of 128 on division works very well on MIC input with AudioRecord.
Where am I doing wrong?
(and yes, I know there is a direct fft output)
The input audio is 8-bit unsigned mono. The line raw[i] = (double) waveform[i] causes an unintentional unsigned-to-signed conversion, and since raw is biased to approximately a 128 DC level, a small sine wave ends up getting changed into a high-amplitude modified square wave, as the signal crosses the 127/-128 boundary. That causes a bunch of funny harmonics (which caused the "symmetric lines coming and going" you were talking about).
Solution
Change to (double) (waveform[i] & 0xFF) so that the converted value lies in the range 0..255, instead of -128..127.

getMaxAmplitude() alternative for Visualizer

In my app I allow the user to record audio using the phone's camera, while the recording is in progress I update a Path using time as the X value and a normalized form of getMaxAmplitude() for the y value.
float amp = Math.min(mRecorder.getMaxAmplitude(), mMaxAmplitude)
/ (float) mMaxAmplitude;
This works rather well.
My problem occurs when I go to play back the audio (after transporting it over the network). I want to recreate the waveform generated while recording, but the MediaPlayer class does not possess the same getMaxAmplitude() method.
I have been attempting to use the Visualizer class provided by the framework, but am having a difficult time getting a usable result for the y value. The byte array returned contains values between -128 and 127 but when i look at the actual values they do not appear to represent the waveform as I would expect it to be.
How do I use the values returned from the visualizer to get a value related to the loudness of the sound?
Your byte array is probably an array of 16, 24 or 32 bit signed values. Assuming they are 16 bit signed then the bytes will be alternating hi-byte with the MSB being the sign bit and the lo-byte. Or, depending on the endianness it could be lo-byte followed by the high byte. Moreover, if you have two channels of data, each sample is probably interleaved. Again, assuming 16-bits, you can decode the samples something in a manner similar to this:
for (int i = 0 ; i < numBytes/2 ; ++i)
{
sample[i] = (bytes[i*2] << 8) | bytes[i*2+1];
}
According to the documentation of getMaxAmplitude, it returns the maximum absolute amplitude that was sampled since the last call. I guess this means the peak amplitude but it's not totally clear from the documentation. To compute the peak amplitude, just compute the max of the abs of all the samples.
int maxPeak = 0.0;
for (int i = 0 ; i < numSamples ; ++i)
{
maxPeak = max(maxPeak, abs(samples[i]));
}

convolution of audio signal

I'm using audiorecorder to record sound and do some processing in pseudorealtime on android phone.
i'm facing a problem between FFT and convolution of audio signal:
I perform FFT on a known signal(a sine waveform), and i correctly always find the single tone contained in it, by using the FFT.
Now i want to do the same thing by using a convolution (it's an exercise): I perform 5000 convolutions of that signal by using 5000 filters. Each filter is a sine waveform on a different frequency between 0 and 5000 Hz.
Then, i search the peak for each convolution output. By this way i should find the maximum peak when i'm using the filter with the same tone contained on the signal.
Infact with a tone of 2kHz i can find the max with the 2kHz filter.
The problem is that when i receive a 4kHz tone, i find the max on the convolution with the 4200Hz filter (while the FFT instead always works fine)
Is it matematically possible?
what is the problem in my convolution?
This is the convolution function that i wrote:
//i do the convolution and return the max
//IN is the array with the signal
//DATASIZE is the size of the array IN
//KERNEL is the filter containing the sine at the selected frequency
int convolveAndGetPeak(short[] in,int dataSize, double[] kernel) {
//per non rischiare l'overflow, il kernel deve avere un ampiezza massima pari a 1/10 del max
int i, j, k;
int kernelSize=kernel.length;
int tmpSignalAfterFilter=0;
double out;
// convolution from out[0] to out[kernelSize-2]
//iniziamo
for(i=0; i < kernelSize - 1; ++i)
{
out = 0; // init to 0 before sum
for(j = i, k = 0; j >= 0; --j, ++k)
out += in[j] * kernel[k];
if (Math.abs((int) out)>tmpSignalAfterFilter ){
tmpSignalAfterFilter=Math.abs((int) out);
}
}
// start convolution from out[kernelSize-1] to out[dataSize-1] (last)
//iniziamo da dove eravamo arrivati
for( ; i < dataSize; ++i)
{
out = 0; // initialize to 0 before accumulate
for(j = i, k = 0; k < kernelSize; --j, ++k)
out += in[j] * kernel[k];
if (Math.abs((int) out)>tmpSignalAfterFilter ){
tmpSignalAfterFilter=Math.abs((int) out);
}
}
return tmpSignalAfterFilter;
}
the kernel, used as filter, is generated this way:
//curFreq is the frequency of the filter in Hz
//kernelSamplesSize is the desired length of the filter (number of samples), for time precision reasons i'm using 20 samples length.
//sampleRate is the sampling frequency
double[] generateKernel(int curFreq,int kernelSamplesSize,int sampleRate){
double[] curKernel= new double[kernelSamplesSize] ;
for (int kernelIndex=0;kernelIndex<curKernel.length;kernelIndex++){
curKernel[kernelIndex]=Math.sin( (double)kernelIndex * ((double)(2*Math.PI) * (double)curFreq / (double)sampleRate)); //the part that makes this a sine wave....
}
return curKernel;
}
if you want to try a convolution, the data contained in the IN array is the following:
http://www.tr3ma.com/Dati/signal.txt
Note1: the sampling frequency is 44100Hz
Note2: the tone contained in the signal is a single 4kHz tone (even if the convolution has the max peak with a 4200Hz filter.
EDIT: I also repeated the test on a excel sheet. the result is the same (of course, i'm using the same algorithm) and the algorithms seems to me to be correct...
this is the excel sheet i prepared, if you prefer to work on excel: http://www.tr3ma.com/Dati/convolutions.xlsm
You change the bandwidth by two factors:
a) The length of your kernel (e.g. a length t of 5ms produces a rough bandwidth of f >= 200Hz, estimated with 1/0.005 because Δt·Δf >= 1, see "Heisenberg"), and
b) the window function (which you definitely should implement to make your algorithm working in real-world applications because otherwise in some cases sidelobes of some filter outputs could yield more energy than the main lobe of the expected filter output).
But you have another problem: you need to convolve with a 2nd kernel consisting of cosine waves (which means that you need the same waves as in the 1st kernel but shifted by 90 degrees). Why is that? Because with only the sine kernel, you get a phase-dependent modulation of the filter outputs (e.g. if the phase difference between the input signal and the kernel wave with the identical frequency is 90 degrees you get the amplitude 0).
Finally, you combine the outputs of both kernels with Pythagoras.
it seems all correct, apart the number of samples of the kernel (the filter).
Increasing the size of the filter the result is more accurate.
I don't know how to calculate the bandwidth of this filter but it seems clear to me that it's a matter of filter bandwidth. So, the filter bandwidth depends also on the number of samples of the filter used in the convolution, with reference to the sampling frequency(and may be also with reference to the tone frequency). Unfortunately i can not increase too much the number of samples of my filter since otherwise the phone can not perform the filtering in realtime.
Note: i need the convolution cause i need to identify the precise moment when the tone was fired.
EDIT: i made a compare between filter with 20 samples and filter with 40 samples.
I don't know the formula to obtain the fitler bandwidth but it's clear, in the following image, the difference between the 2 filters.
EDIT2: FEW DAYS AFTER POSTING THE SOLUTION I FOUND HOW TO CALCULATE THE BANDWIDTH OF SUCH FILTER: IT'S JUST THE INVERSE OF THE FILTER DURATION. SO IN EXAMPLE A KERNEL OF 40 SAMPLES AT 44100KhZ HAS A DURATION OF ABOUT 907uS, THEN THE FILTER BANDWIDTH, WITH THIS KERNEL AND A WINDOW OF THE SAME LENGTH IS 1/907uS= 1,1KhZ
(source: tr3ma.com)

FFT audio input

I want to apply FFT on a signal recorded by AudioRecorder and saved to a wav file. the FFT i am using has a Complex[] input parameter. I am confused, is there a difference between converting from bytes to comlex dividing by 32768, and converting by just adding 0 to the imaginary part and leaving the real part as a byte?
Edit:
public Complex[] convertToComplex(byte[] file)
{
int size= file.length;
double[]x=new double[size];
Complex[]data= new Complex[size];
for(int i=0;i<size;i++)
{
x[i]=file[i]/32768.0;
data[i]=new Complex(x[i],0);
// Log.d("tag", "indice"+i+":"+data[i]);
}
return data;
}
If you are working with audio with a bit depth of 16 bits (each sample has 16 bits), then each byte will only have half of a sample.What you need to do is cast your bytes to 16 bit samples then divide the resulting number by 32768 (This is the magnitude of the smallest number a 2's complement 16 bit number can store i.e 2^15) to get the actual audio sample which is a number between -1 and 1.You will then convert this number to a complex number by setting it's imaginary component to 0.
A small C# sample can be seen below (indicative code):
byte[] myAudioBytes = readAudio();
int numBytes = myAudioBytes.Length;
var myAudioSamples = new List<short>();
for( int i = 0; i < numBytes; i = i + 2)
{
//Cast to 16 bit audio and then add sample
short sample = (short) ((myAudioBytes[i] << 8 | myAudioBytes[i + 1]) / 32768 );
myAudioSamples.Add(sample);
}
//Change real audio to Complex audio
var complexAudio = new Complex[myAudioSamples.Length];
int i = 0;
foreach(short sample in myAudioSamples)
complexAudio[i++] = new Complex(){ Real = sample, Imaginary = 0 };
//Now you can proceed to getting the FFT of your Audio here
Hope the code has guided you on how you should handle your audio.
Generalized FFT functions like working with arrays of complex inputs and outputs. So, for input, you might need to create an array of complex numbers which conform to the Complex data structure that the FFT library wants. This will probably consist of a real and an imaginary component for each. Just set the imaginary portion to 0. The real portion is probably a signed floating point number that is expected to fall between -1.0..1.0, so you are on the right track with dividing integer PCM samples. However, when you wrote "converting bytes", that raised a red flag. These are probably signed, little endian, 16-bit integer PCM samples, so be sure to cast them accordingly before dividing by 32768 (but this is Java, so types will be enforced a bit more stringently anyway).

Read and transmit wav files in Android in a way similar to Chirp

I want to read each character from an existing wav file and assign it to a certain frequency.
I specifically want to transfer WAV files over sound from a phone to another like the "Chirp" android application.
And for that I need to map all the data to certain frequencies and play the generated tone so that the other phone can decode it and reconstitute the wav file.
Take a look at this: chirp.io/tech
For example the first line of a wave file is :
52 49 46 46 E0...
my idea is to do like:
5--> 100hz
2--> 200hz
4-->300hz
...
Is their a way to split them without changing the data?
i think i should mention that my wav file is formatted as:
static int sampleRate=44100;
static int numSample=duration*sampleRate;
long mySubChunk1Size = 16;
static short myBitsPerSample= 16;
int myFormat = 1;
static int myChannels = 1;
long myByteRate = sampleRate * myChannels * myBitsPerSample/8;
int myBlockAlign = myChannels * myBitsPerSample/8;
long myChunk2Size = generatedSnd.length* myChannels * myBitsPerSample/8;
long myChunkSize = 36 + myChunk2Size;
The way you are trying to do it is simply... naive.
I tell you why:
0x59 is a byte (decimal: 89) but you want 2 different sounds from it (0x50 and 0x09)?
It doesn't seem a good idea, since you would have 2 frequencies (for the LSB and the MSB).
Moreover, you will need to map all the 256 byte values you can have in a file,
from 0x00 to 0xFF (decimal: 255).
Plus, assigning 0x50 (decimal: 80) 100 Hz and 0x009 (decimal: 9) 200 Hz, again doesn't
make much sense to me...
Now, a possible way to do that:
I would implement an algorithm such as (byte * 32) + 440 that gives me:
440 Hz for 0x00, which is (0 * 32) + 440
...
3288 Hz for 0x59, which is (89 * 32) + 440
...
8600 Hz for 0xFF, which is (255 * 32) + 440.
All bytes will be "encoded" into frequencies in the audible spectrum.
In much a similar way to that used in the aforementioned "Chirp" method.
And I don't have to tell it that 0 is 440 Hz (nor any other association), the algorithm makes it for me.
Moreover, you can sen out (and receive) any kind of file. Not bad.
[EDIT]
Since the media (acoustic speakers/microphones) are limited to the Low Frequency range (audible sounds), you have to use audible sounds.
If you were to use Radio transmission, then you could use High Frequencies as well.
Since it really depends on construction quality. I suggested a "safe" range every speaker/microphone coupling will be able to deal with.
For reference on audible tones: http://en.wikipedia.org/wiki/Audio_frequency

Categories

Resources