AudioRecorder | Interpreting FFT data for Spectrum Analyzer

AudioRecorder | Interpreting FFT data for Spectrum Analyzer - android

I am building an app that needs to be able to display a real-time spectral analyzer. Here is the version I was able to successfully make on iOS:
I am using Wendykierp JTransforms library to perform the FFT calculations, and have managed to capture audio data and execute the FFT functions. See below:
short sData[] = new short[BufferElements2Rec];
int result = audioRecord.read(sData, 0, BufferElements2Rec);
try
{
//Initiate FFT
DoubleFFT_1D fft = new DoubleFFT_1D(sData.length);
//Convert sample data from short[] to double[]
double[] fftSamples = new double[sData.length];
for (int i = 0; i < sData.length; i++) {
//IMPORTANT: We cannot simply cast the short value to double.
//As a double is only 2 bytes (values -32768 to 32768)
//We must divide by 32768 before we cast to Double.
fftSamples[i] = (double) sData[i] / 32768;
}
//Perform fft calcs
fft.realForward(fftSamples);
//TODO - Convert FFT data into 20 "bands"
} Catch (Exception e)
{
}
In iOS, I was using a library (Tempi-FFT) which had built in functionality for calculating magnitude, frequency, and providing averaged data for any given number of bands (I am using 20 bands as you can see in the image above). It seems I don't have that luxury with this library and I need to calculate this myself.
Looking for any good examples or tutorials on how to interperate the data returned by the FFT calculations. Here is some sample data I am receiving:
-11387.0, 183.0, -384.9121475854448, -224.66315714636642, -638.0173005872095, -236.2318653974911, -1137.1498541119106, -437.71599514435786, 1954.683405957685, -2142.742125980924 ...
Looking for simple explanation of how to interpret this data. Some other questions I have looked at that I was either unable to understand, or did not provide information on how to determine a given number of bands:
Power Spectral Density from jTransforms DoubleFFT_1D
How to develop a Spectrum Analyser from a realtime audio?

Your question can be split into two parts: finding the magnitude of all frequencies (interpreting the output) and averaging the frequencies into bands
Finding the magnitude of all frequencies:
I won't go into the intricacies of the Fast Fourier Transform/Discrete Fourier Transform (if you would like to gain a basic understanding see this video), but know that there is a real and an imaginary part of each output.
The documentation of the realForward function describes where both the imaginary and the real parts are located in the output array (I'm assuming you have an even sample size):
a[2*k] = Re[k], 0 <= k < n / 2
a[2*k+1] = Im[k], 0 < k < n / 2
a[1] = Re[n/2]
a is equivalent to your fftSamples, which means we can translate this documentation into code as follows (I've changed Re and Im to realPart and imaginaryPart respectively):
int n = fftSamples.length;
double[] realPart = new double[n / 2];
double[] imaginaryPart = new double[n / 2];
for(int k = 0; k < n / 2; k++) {
realPart[k] = fftSamples[k * 2];
imaginaryPart[k] = fftSamples[k * 2 + 1];
}
realPart[n / 2] = fftSamples[1];
Now we have the real and imaginary parts of each frequency. We could plot these on an x-y coordinate plane using the real part as the x value and the imaginary part as the y value. This creates a triangle, and the length of the triangle's hypotenuse is the magnitude of the frequency. We can use the pythagorean theorem to get this magnitude:
double[] spectrum = new double[n / 2];
for(int k = 1; k < n / 2; k++) {
spectrum[k] = Math.sqrt(Math.pow(realPart[k], 2) + Math.pow(imaginaryPart[k], 2));
}
spectrum[0] = realPart[0];
Note that the 0th index of the spectrum doesn't have an imaginary part. This is the DC component of the signal (we won't use this).
Now, we have an array with the magnitudes of each frequency across your spectrum (If your sampling frequency is 44100Hz, this means you now have an array with the magnitudes of the frequencies between 0Hz and 44100Hz, and if you have 441 values in your array, then each index value represents a 100Hz step.)
Averaging the frequencies into bands:
Now that we've converted the FFT output to data that we can use, we can move on to the second part of your question: finding the averages of different bands of frequencies. This is relatively simple. We just need to split the array into different bands and find the average of each band. This can be generalized like so:
int NUM_BANDS = 20; //This can be any positive integer.
double[] bands = new double[NUM_BANDS];
int samplesPerBand = (n / 2) / NUM_BANDS;
for(int i = 0; i < NUM_BANDS; i++) {
//Add up each part
double total;
for(int j = samplesPerBand * i ; j < samplesPerBand * (i+1); j++) {
total += spectrum[j];
}
//Take average
bands[i] = total / samplesPerBand;
}
Final Code:
And that's it! You now have an array called bands with the average magnitude of each band of frequencies. The code above is purposefully not optimized in order to show how each step works. Here is a shortened and optimized version:
int numFrequencies = fftSamples.length / 2;
double[] spectrum = new double[numFrequencies];
for(int k = 1; k < numFrequencies; k++) {
spectrum[k] = Math.sqrt(Math.pow(fftSamples[k*2], 2) + Math.pow(fftSamples[k*2+1], 2));
}
spectrum[0] = fftSamples[0];
int NUM_BANDS = 20; //This can be any positive integer.
double[] bands = new double[NUM_BANDS];
int samplesPerBand = numFrequencies / NUM_BANDS;
for(int i = 0; i < NUM_BANDS; i++) {
//Add up each part
double total;
for(int j = samplesPerBand * i ; j < samplesPerBand * (i+1); j++) {
total += spectrum[j];
}
//Take average
bands[i] = total / samplesPerBand;
}
//Use bands in view!
This has been a really long answer, and I haven't tested the code yet (though I do plan to). Feel free to comment if you find any mistakes.

Related

Android Image U and V buffers too large

I'm reading the YUV values from a android image using the camera2 api. Hence I have the 3 planes.
for (int x = 0; x < imageSheaf[0].Width; x++)
{
for (int y = 0; y < imageSheaf[0].Height; y++)
{
imageYuv[x, y] = new yuv();
}
}
for (int j = 0; bufferY.HasRemaining; j++)
{
for (int i = 0; i < rowStrideY/2; i += 2)
{
if (i > width / 2 - 1 || j > height / 2 - 1)
Log.Info("Processing", "Out of Bounds");
imageYuv[i, j].y = bufferY.Get();
bufferY.Get();//skip a pixel due to 4:2:0 sub sampling
}
for (int i = 0; i < rowStrideY/2; i++)//skip a line due to 4:2:0 sub sampling
{
bufferY.Get();
bufferY.Get();
}
if (!bufferY.HasRemaining)
Log.Debug("Processing", "finished");
}
for (int j = 0; bufferU.HasRemaining; j++)
{
for (int i = 0; i < rowStrideU; i++)
{
if (!bufferU.HasRemaining)
Log.Debug("Processing", "finished");
imageYuv[i, j].u = bufferU.Get();
}
if (!bufferU.HasRemaining)
Log.Debug("Processing", "finished");
}
for (int j = 0; bufferV.HasRemaining; j++)
{
for (int i = 0; i < rowStrideV; i++)
{
if (!bufferV.HasRemaining)
Log.Debug("Processing", "finished");
imageYuv[i, j].v = bufferV.Get();
}
if (!bufferV.HasRemaining)
Log.Debug("Processing", "finished");
}
This is the code that I'm using to get the Y, U and V values from the byte buffers.
The ImageFormat is YUV_420_888, It is my understanding that the 4:2:0 subsampling means that for every U or V pixel there is 4 Y pixels.
My issue is that the size of the byte buffers for the U and V planes are larger than they should be causing array out of bounds exceptions:
[Processing] RowstrideY = 720
[Processing] RowstrideU = 368
[Processing] RowstrideV = 368
[Processing] y.remaining = 345600, u.remaining = 88312, v.remaining = 88312
(the size of the image is 720x480)

YUV420 has 8 bits per pixel for Y, and 8 bits per four-pixel group for U and V. So at 720x480, you'd expect the U-V plane to be 360x240.
However, the actual hardware may have additional alignment or stride restrictions. In this case, it appears the hardware requires the stride to be a multiple of 16, so it increases it from 360 to 368.
You'd expect that to turn into a length of 368*240=88320, but remember, the last eight bytes on every line are simply padding. So the buffer can actually be (368*239)+360 = 88312 bytes without omitting any data. If you're getting array-bounds exceptions it's because you're attempting to read the end-of-row pad bytes from the last line, but that's not allowed. The API only guarantees that you will be able to read the data.
The motivation for this is that, if the padding on the last line happened to cross a page boundary, the system would need to allocate an additional unnecessary page for each buffer.
You can modify your code to copy the data bytes from each row, then have a second loop that just consumes the padding bytes (if any) at the end of the row.

Cross correlation to find sonar echoes

I'm trying to detect echoes of my chirp in my sound recording on Android and it seems cross correlation is the most appropriate way of finding where the FFTs of the two signals are similar and from there I can identify peaks in the cross correlated array which will correspond to distances.
From my understanding, I have come up with the following cross correlation function. Is this correct? I wasn't sure whether to add zeros to the beginning as and start a few elements back?
public double[] xcorr1(double[] recording, double[] chirp) {
double[] recordingZeroPadded = new double[recording.length + chirp.length];
for (int i = recording.length; i < recording.length + chirp.length; ++i)
recordingZeroPadded[i] = 0;
for (int i = 0; i < recording.length; ++i)
recordingZeroPadded[i] = recording[i];
double[] result = new double[recording.length + chirp.length - 1];
for (int offset = 0; offset < recordingZeroPadded.length - chirp.length; ++offset)
for (int i = 0; i < chirp.length; ++i)
result[offset] += chirp[i] * recordingZeroPadded[offset + i];
return result;
}
Secondary question:
According to this answer, it can also be calculated like
corr(a, b) = ifft(fft(a_and_zeros) * fft(b_and_zeros[reversed]))
which I don't understand at all but seems easy enough to implement. That said I have failed (assuming my xcorr1 is correct). I feel like I've completely misunderstood this?
public double[] xcorr2(double[] recording, double[] chirp) {
// assume same length arguments for now
DoubleFFT_1D fft = new DoubleFFT_1D(recording.length);
fft.realForward(recording);
reverse(chirp);
fft.realForward(chirp);
double[] result = new double[recording.length];
for (int i = 0; i < result.length; ++i)
result [i] = recording[i] * chirp[i];
fft.realInverse(result, true);
return result;
}
Assuming I got both working, which function would be most appropriate given that the arrays will contain a few thousand elements?
EDIT: Btw, I have tried adding zeros to both ends of both arrays for the FFT version.
EDIT after SleuthEye's response:
Can you just verify that, because I'm dealing with 'actual' data, I need only do half the computations (the real parts) by doing a real transform?
From your code, it looks as though the odd numbered elements in the array returned by the REAL transform are imaginary. What's going on here?
How am I going from an array of real numbers to complex? Or is this the purpose of a transform; to move real numbers into the complex domain? (but the real numbers are just a subset of the complex numbers and so wouldn't they already be in this domain?)
If realForward is in fact returning imaginary/complex numbers, how does it differ to complexForward? And how do I interpret the results? The magnitude of the complex number?
I apologise for my lack of understanding with regard to transforms, I have only so far studied fourier series.
Thanks for the code. Here is 'my' working implementation:
public double[] xcorr2(double[] recording, double[] chirp) {
// pad to power of 2 for optimisation
int y = 1;
while (Math.pow(2,y) < recording.length + chirp.length)
++y;
int paddedLength = (int)Math.pow(2,y);
double[] paddedRecording = new double[paddedLength];
double[] paddedChirp = new double[paddedLength];
for (int i = 0; i < recording.length; ++i)
paddedRecording[i] = recording[i];
for (int i = recording.length; i < paddedLength; ++i)
paddedRecording[i] = 0;
for (int i = 0; i < chirp.length; ++i)
paddedChirp[i] = chirp[i];
for (int i = chirp.length; i < paddedLength; ++i)
paddedChirp[i] = 0;
reverse(chirp);
DoubleFFT_1D fft = new DoubleFFT_1D(paddedLength);
fft.realForward(paddedRecording);
fft.realForward(paddedChirp);
double[] result = new double[paddedLength];
result[0] = paddedRecording[0] * paddedChirp[0]; // value at f=0Hz is real-valued
result[1] = paddedRecording[1] * paddedChirp[1]; // value at f=fs/2 is real-valued and packed at index 1
for (int i = 1; i < result.length / 2; ++i) {
double a = paddedRecording[2*i];
double b = paddedRecording[2*i + 1];
double c = paddedChirp[2*i];
double d = paddedChirp[2*i + 1];
// (a+b*j)*(c-d*j) = (a*c+b*d) + (b*c-a*d)*j
result[2*i] = a*c + b*d;
result[2*i + 1] = b*c - a*d;
}
fft.realInverse(result, true);
// discard trailing zeros
double[] result2 = new double[recording.length + chirp.length - 1];
for (int i = 0; i < result2.length; ++i)
result2[i] = result[i];
return result2;
}
However, until about 5000 elements each, xcorr1 seems to be quicker. Am I doing anything particularly slow (perhaps the constant 'new'ing of memory -- maybe I should cast to an ArrayList)? Or the arbitrary way in which I generated the arrays to test them? Or should I do the conjugates instead of reversing it? That said, performance isn't really an issue so unless there's something obvious you needn't bother pointing out optimisations.

Your implementation of xcorr1 does correspond to the standard signal-processing definition of cross-correlation.
Relative to your interrogation with respect to adding zeros at the beginning: adding chirp.length-1 zeros would make index 0 of the result correspond to the start of transmission. Note however that the peak of the correlation output occurs chirp.length-1 samples after the start of echoes (the chirp has to be aligned with the full received echo). Using the peak index to obtain echo delays, you would then have to adjust for that correlator delay either by subtracting the delay or by discarding the first chirp.length-1 output results. Noting that the additional zeros correspond to that many extra outputs at the beginning, you'd probably be better off not adding those zeros at the beginning in the first place.
For xcorr2 however, a few things need to be addressed. First, if the recording and chirp inputs are not already zero-padded to at least chirp+recording data length you would need to do so (preferably to a power of 2 length for performance reasons). As you are aware, they would both need to be padded to the same length.
Second, you didn't take into account that the multiplication indicated in the posted reference answer, correspond in fact to complex multiplications (whereas DoubleFFT_1D.realForward API uses doubles). Now if you are going to implement something such as a complex multiplication with the chirp's FFT, you might as well actually implement the multiplication with the complex conjugate of the chirp's FFT (the alternate implementation indicated in the reference answer), removing the need to reverse the time-domain values.
Also accounting for DoubleFFT_1D.realForward packing order for even length transforms, you would get:
// [...]
fft.realForward(paddedRecording);
fft.realForward(paddedChirp);
result[0] = paddedRecording[0]*paddedChirp[0]; // value at f=0Hz is real-valued
result[1] = paddedRecording[1]*paddedChirp[1]; // value at f=fs/2 is real-valued and packed at index 1
for (int i = 1; i < result.length/2; ++i) {
double a = paddedRecording[2*i];
double b = paddedRecording[2*i+1];
double c = paddedChirp[2*i];
double d = paddedChirp[2*i+1];
// (a+b*j)*(c-d*j) = (a*c+b*d) + (b*c-a*d)*j
result[2*i] = a*c + b*d;
result[2*i+1] = b*c - a*d;
}
fft.realInverse(result, true);
// [...]
Note that the result array would be of the same size as paddedRecording and paddedChirp, but only the first recording.length+chirp.length-1 should be kept.
Finally, relative to which function is the most appropriate for arrays of a few thousand elements, the FFT version xcorr2 is likely going to be much faster (provided you restrict array lengths to powers of 2).

The direct version doesn't require zero-padding first. You just take recording of length M and chirp of length N and calculate result of length N+M-1. Work through a tiny example by hand to grok the steps:
recording = [1, 2, 3]
chirp = [4, 5]
1 2 3
4 5
1 2 3
4 5
1 2 3
4 5
1 2 3
4 5
result = [1*5, 1*4 + 2*5, 2*4 + 3*5, 3*4] = [5, 14, 23, 4]
The FFT method is much faster if you have long arrays. In this case you have to zero-pad each input to size M+N-1 so that both input arrays are the same size before taking the FFT.
Also, the FFT output is complex numbers, so you need to use complex multiplication. (1+2j)*(3+4j) is -5+10j, not 3+8j. I don't know how your complex numbers are arranged or handled, but make sure this is right.
Or is this the purpose of a transform; to move real numbers into the complex domain?
No, the Fourier transform transforms from the time domain to the frequency domain. The time domain data can be either real or complex, and the frequency domain data can be either real or complex. In most cases you have real data with a complex spectrum. You need to read up on the Fourier transform.
If realForward is in fact returning imaginary/complex numbers, how does it differ to complexForward?
The real FFT takes a real input, while the complex FFT takes a complex input. Both transforms produce complex numbers as their output. That's what the DFT does. The only time a DFT produces real output is if the input data is symmetrical (in which case you can use the DCT to save even more time).

how to voice frequency detect?

I'm beginner android programmer. (my home language is not English so, my English is poor.)
i want to make app, get frequency recorded human voice and show note like " C3 " or "G#4"...
so, i want to detect human voice frequency , but it is too difficult.
i try use FFT, it detect piano(or guitar) sound pretty good (some part, over octave4, it didn't detect low frequency piano (or guitar) sound.), but it can't detect human voice.
(i use piano program used general midi)
I found lots of information, but i can't understand.
most of people say use pitch detect algorithm and link just wiki.
Please tell me in detail about pitch detect algorithm.
(actually i want example code :(
or
is there any idea to use my app?
HERE IS MY SOURCE CODE:
public void Frequency(double[] array) {
int sampleSize = array.length;
double[] win = window.generate(sampleSize);
// signals for fft input
double[] signals = new double[sampleSize];
for (int i = 0; i < sampleSize; i++) {
signals[i] = array[i] * win[i];
}
double[] fftArray = new double[sampleSize * 2];
for (int i = 0; i < sampleSize - 1; i++) {
fftArray[2 * i] = signals[i];
fftArray[2 * i + 1] = 0;
}
FFT.complexForward(fftArray);
getFrequency(fftArray);
}
private void getFrequency(double[] array) {
// ========== Value ========== //
int RATE = sampleRate;
int CHUNK_SIZE_IN_SAMPLES = RECORDER_BUFFER_SIZE;
int MIN_FREQUENCY = 50; // HZ
int MAX_FREQUENCY = 2000; // HZ
int min_frequency_fft = Math.round(MIN_FREQUENCY * CHUNK_SIZE_IN_SAMPLES / RATE);
int max_frequency_fft = Math.round(MAX_FREQUENCY * CHUNK_SIZE_IN_SAMPLES / RATE);
// ============================ //
double best_frequency = min_frequency_fft;
double best_amplitude = 0;
for (int i = min_frequency_fft; i <= max_frequency_fft; i++) {
double current_frequency = i * 1.0 * RATE / CHUNK_SIZE_IN_SAMPLES;
double current_amplitude = Math.pow(array[i * 2], 2) + Math.pow(array[i * 2 + 1], 2);
double normalized_amplitude = current_amplitude * Math.pow(MIN_FREQUENCY * MAX_FREQUENCY, 0.5) / current_frequency;
if (normalized_amplitude > best_amplitude) {
best_frequency = current_frequency;
best_amplitude = normalized_amplitude;
}
}
FrequencyArray[FrequencyArrayIndex] = best_frequency;
FrequencyArrayIndex++;
}
I refer to this : http://code.google.com/p/android-guitar-tuner/
Pitch_detection_algorithm
use Jtransforms

The Wikipedia page on pitch detection links to another Wikipedia page explaining autocorrelation: http://en.m.wikipedia.org/wiki/Autocorrelation#section_3 , which is one of many pitch estimation methods you could try.
Running the example code you posted can show that FFT peak frequency estimation is quite poor at musical pitch detection and estimation for many common pitched sounds.

Best Buffer Size

What is the best value for buffer size when implementing a guitar tuner using FFT? Am getting an output, but it seems that the value displayed is not much accurate as I expected. I think it's an issue with the buffer size I allocated. I'm using 8000 as the buffer size. Are there any other suggestions to retrieve more efficient result?

You can kinda wiggle the results around a bit. It's been a while since I've done FFT work, but if I recall, with a buffer of 8000, the Nth bucket would be (8000 / 2) / N Hz (is that right? It's been a long time). So the 79th through 81st buckets are 50.63, 50, and 49.38 Hz.
You can then do a FFT with a slightly different number of buckets. So if you dropped down to 6000 buckets, the 59th through 61st buckets would be 50.84, 50, and 49.18 Hz.
Now you've got an algorithm that you can use to home in on the specific frequency. I think it's O((log M) * (N log N)) where N is roughly the number of buckets you use each time, and M is the precision.
Update: Sample Stretching
public byte[] stretch(byte[] input, int newLength) {
byte[] result = new byte[newLength];
result[0] = input[0];
for (int i = 1; i < newLength; i++) {
float t = i * input.length / newLength;
int j = (int) t;
float d = t - j;
result[i] = (byte) (input[j - 1] * d + input[j] * (1 - d))
}
return result;
}
You might have to fix some of the casting to make sure you get the right numbers, but that looks about right.
i = index in result[]
j = index in input[] (rounded up)
d = percentage of input[j - 1] to use
1 - d = percentage of input[j] to use

Android audio FFT to display fundamental frequency

I have been working on an Android project for awhile that displays the fundamental frequency of an input signal (to act as a tuner). I have successfully implemented the AudioRecord class and am getting data from it. However, I am having a hard time performing an FFT on this data to get the fundamental frequency of the input signal. I have been looking at the post here, and am using FFT in Java and Complex class to go with it.
I have successfully used the FFT function found in FFT in Java, but I am not sure if I am obtaining the correct results. For the magnitude of the FFT (sqrt[rere+imim]) I am getting values that start high, around 15000 Hz, and then slowly diminish to about 300 Hz. Doesn't seem right.
Also, as far as the raw data from the mic goes, the data seems fine, except that the first 50 values or so are always the number 3, unless I hit the tuning button again while still in the application and then I only get about 15. Is that normal?
Here is a bit of my code.
First of all, I convert the short data (obtained from the microphone) to a double using the following code which is from the post I have been looking at. This snippet of code I do not completely understand, but I think it works.
//Conversion from short to double
double[] micBufferData = new double[bufferSizeInBytes];//size may need to change
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 1.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bufferSizeInBytes - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = audioData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
The code then continues as follows:
//Create Complex array for use in FFT
Complex[] fftTempArray = new Complex[bufferSizeInBytes];
for (int i=0; i<bufferSizeInBytes; i++)
{
fftTempArray[i] = new Complex(micBufferData[i], 0);
}
//Obtain array of FFT data
final Complex[] fftArray = FFT.fft(fftTempArray);
final Complex[] fftInverse = FFT.ifft(fftTempArray);
//Create an array of magnitude of fftArray
double[] magnitude = new double[fftArray.length];
for (int i=0; i<fftArray.length; i++){
magnitude[i]= fftArray[i].abs();
}
fft.setTextColor(Color.GREEN);
fft.setText("fftArray is "+ fftArray[500] +" and fftTempArray is "+fftTempArray[500] + " and fftInverse is "+fftInverse[500]+" and audioData is "+audioData[500]+ " and magnitude is "+ magnitude[1] + ", "+magnitude[500]+", "+magnitude[1000]+" Good job!");
for(int i = 2; i < samples; i++){
fft.append(" " + magnitude[i] + " Hz");
}
That last bit is just to check what values I am getting (and to keep me sane!). In the post referred to above, it talks about needing the sampling frequency and gives this code:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}
How do I implement this code? I don't realy understand where fftOutWindowSize and arrayIndex comes from?
Any help is greatly appreciated!
Dustin

Recently I'm working on a project which requires almost the same. Probably you don't need any help anymore but I will give my thoughts anyway. Maybe someone need this in the future.
I'm not sure whether the short to double function works, I don't understand that snippet of code neither. It is wrote for byte to double conversion.
In the code: "double[] micBufferData = new double[bufferSizeInBytes];" I think the size of micBufferData should be "bufferSizeInBytes / 2", since every sample takes two bytes and the size of micBufferData should be the sample number.
FFT algorithms do require a FFT window size, and it has to be a number which is the power of 2. However many algorithms can receive an arbitrary of number as input and it will do the rest. In the document of those algorithms should have the requirements of input. In your case, the size of the Complex array can be the input of FFT algorithms. And I don't really know the detail of the FFT algorithm but I think the inverse one is not needed.
To use the code you gave at last, you should firstly find the peak index in the sample array. I used double array as input instead of Complex, so in my case it is something like: double maxVal = -1;int maxIndex = -1;
for( int j=0; j < mFftSize / 2; ++j ) {
double v = fftResult[2*j] * fftResult[2*j] + fftResult[2*j+1] * fftResult[2*j+1];
if( v > maxVal ) {
maxVal = v;
maxIndex = j;
}
}
2*j is the real part and 2*j+1 is the imaginary part. maxIndex is the index of the peak magnitude you want (More detail here), and use it as input to the ComputeFrequency function. The return value is the frequency of the sample array you want.
Hopefully it can help someone.

You should pick an FFT window size depending on your time versus frequency resolution requirements, and not just use the audio buffer size when creating your FFT temp array.
The array index is your int i, as used in your magnitude[i] print statement.
The fundamental pitch frequency for music is often different from FFT peak magnitude, so you may want to research some pitch estimation algorithms.

I suspect that the strange results you're getting are because you might need to unpack the FFT. How this is done will depend on the library that you're using (see here for docs on how it's packed in GSL, for example). The packing may mean that the real and imaginary components are not in the positions in the array that you expect.
For your other questions about window size and resolution, if you're creating a tuner then I'd suggest trying a window size of about 20ms (eg 1024 samples at 44.1kHz). For a tuner you need quite high resolution, so you could try zero-padding by a factor of 8 or 16 which will give you a resolution of 3-6Hz.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.