What is the best value for buffer size when implementing a guitar tuner using FFT? Am getting an output, but it seems that the value displayed is not much accurate as I expected. I think it's an issue with the buffer size I allocated. I'm using 8000 as the buffer size. Are there any other suggestions to retrieve more efficient result?
You can kinda wiggle the results around a bit. It's been a while since I've done FFT work, but if I recall, with a buffer of 8000, the Nth bucket would be (8000 / 2) / N Hz (is that right? It's been a long time). So the 79th through 81st buckets are 50.63, 50, and 49.38 Hz.
You can then do a FFT with a slightly different number of buckets. So if you dropped down to 6000 buckets, the 59th through 61st buckets would be 50.84, 50, and 49.18 Hz.
Now you've got an algorithm that you can use to home in on the specific frequency. I think it's O((log M) * (N log N)) where N is roughly the number of buckets you use each time, and M is the precision.
Update: Sample Stretching
public byte[] stretch(byte[] input, int newLength) {
byte[] result = new byte[newLength];
result[0] = input[0];
for (int i = 1; i < newLength; i++) {
float t = i * input.length / newLength;
int j = (int) t;
float d = t - j;
result[i] = (byte) (input[j - 1] * d + input[j] * (1 - d))
}
return result;
}
You might have to fix some of the casting to make sure you get the right numbers, but that looks about right.
i = index in result[]
j = index in input[] (rounded up)
d = percentage of input[j - 1] to use
1 - d = percentage of input[j] to use
Related
I'm trying to convert an YUV image to grayscale, so basically I just need the Y values.
To do so I wrote this little piece of code (with frame being the YUV image):
imageConversionTime = System.currentTimeMillis();
size = frame.getSize();
byte nv21ByteArray[] = frame.getImage();
int lol;
for (int i = 0; i < size.width; i++) {
for (int j = 0; j < size.height; j++) {
lol = size.width*j + i;
yMatrix.put(j, i, nv21ByteArray[lol]);
}
}
bitmap = Bitmap.createBitmap(size.width, size.height, Bitmap.Config.ARGB_8888);
Utils.matToBitmap(yMatrix, bitmap);
imageConversionTime = System.currentTimeMillis() - imageConversionTime;
However, this takes about 13500 ms. I need it to be A LOT faster (on my computer it takes 8.5 ms in python) (I work on a Motorola Moto E 4G 2nd generation, not super powerful but it should be enough for converting images right?).
Any suggestions?
Thanks in advance.
First of all I would assign size.width and size.height to a variable. I don't think the compiler will optimize this by default, but I am not sure about this.
Furthermore Create a byte[] representing the result instead of using a Matrix.
Then you could do something like this:
int[] grayScalePixels = new int[size.width * size.height];
int cntPixels = 0;
In your inner loop set
grayScalePixels[cntPixels] = nv21ByteArray[lol];
cntPixels++;
To get your final image do the following:
Bitmap grayScaleBitmap = Bitmap.createBitmap(grayScalePixels, size.width, size.height, Bitmap.Config.ARGB_8888);
Hope it works properly (I have not tested it, however at least the shown principle should be applicable -> relying on a byte[] instead of Matrix)
Probably 2 years too late but anyways ;)
To convert to gray scale, all you need to do is set the u/v values to 128 and leave the y values as is. Note that this code is for YUY2 format. You can refer to this document for other formats.
private void convertToBW(byte[] ptrIn, String filePath) {
// change all u and v values to 127 (cause 128 will cause byte overflow)
byte[] ptrOut = Arrays.copyOf(ptrIn, ptrIn.length);
for (int i = 0, ptrInLength = ptrOut.length; i < ptrInLength; i++) {
if (i % 2 != 0) {
ptrOut[i] = (byte) 127;
}
}
convertToJpeg(ptrOut, filePath);
}
For NV21/NV12, I think the loop would change to:
for (int i = ptrOut.length/2, ptrInLength = ptrOut.length; i < ptrInLength; i++) {}
Note: (didn't try this myself)
Also I would suggest to profile your utils method and createBitmap functions separately.
I am building an app that needs to be able to display a real-time spectral analyzer. Here is the version I was able to successfully make on iOS:
I am using Wendykierp JTransforms library to perform the FFT calculations, and have managed to capture audio data and execute the FFT functions. See below:
short sData[] = new short[BufferElements2Rec];
int result = audioRecord.read(sData, 0, BufferElements2Rec);
try
{
//Initiate FFT
DoubleFFT_1D fft = new DoubleFFT_1D(sData.length);
//Convert sample data from short[] to double[]
double[] fftSamples = new double[sData.length];
for (int i = 0; i < sData.length; i++) {
//IMPORTANT: We cannot simply cast the short value to double.
//As a double is only 2 bytes (values -32768 to 32768)
//We must divide by 32768 before we cast to Double.
fftSamples[i] = (double) sData[i] / 32768;
}
//Perform fft calcs
fft.realForward(fftSamples);
//TODO - Convert FFT data into 20 "bands"
} Catch (Exception e)
{
}
In iOS, I was using a library (Tempi-FFT) which had built in functionality for calculating magnitude, frequency, and providing averaged data for any given number of bands (I am using 20 bands as you can see in the image above). It seems I don't have that luxury with this library and I need to calculate this myself.
Looking for any good examples or tutorials on how to interperate the data returned by the FFT calculations. Here is some sample data I am receiving:
-11387.0, 183.0, -384.9121475854448, -224.66315714636642, -638.0173005872095, -236.2318653974911, -1137.1498541119106, -437.71599514435786, 1954.683405957685, -2142.742125980924 ...
Looking for simple explanation of how to interpret this data. Some other questions I have looked at that I was either unable to understand, or did not provide information on how to determine a given number of bands:
Power Spectral Density from jTransforms DoubleFFT_1D
How to develop a Spectrum Analyser from a realtime audio?
Your question can be split into two parts: finding the magnitude of all frequencies (interpreting the output) and averaging the frequencies into bands
Finding the magnitude of all frequencies:
I won't go into the intricacies of the Fast Fourier Transform/Discrete Fourier Transform (if you would like to gain a basic understanding see this video), but know that there is a real and an imaginary part of each output.
The documentation of the realForward function describes where both the imaginary and the real parts are located in the output array (I'm assuming you have an even sample size):
a[2*k] = Re[k], 0 <= k < n / 2
a[2*k+1] = Im[k], 0 < k < n / 2
a[1] = Re[n/2]
a is equivalent to your fftSamples, which means we can translate this documentation into code as follows (I've changed Re and Im to realPart and imaginaryPart respectively):
int n = fftSamples.length;
double[] realPart = new double[n / 2];
double[] imaginaryPart = new double[n / 2];
for(int k = 0; k < n / 2; k++) {
realPart[k] = fftSamples[k * 2];
imaginaryPart[k] = fftSamples[k * 2 + 1];
}
realPart[n / 2] = fftSamples[1];
Now we have the real and imaginary parts of each frequency. We could plot these on an x-y coordinate plane using the real part as the x value and the imaginary part as the y value. This creates a triangle, and the length of the triangle's hypotenuse is the magnitude of the frequency. We can use the pythagorean theorem to get this magnitude:
double[] spectrum = new double[n / 2];
for(int k = 1; k < n / 2; k++) {
spectrum[k] = Math.sqrt(Math.pow(realPart[k], 2) + Math.pow(imaginaryPart[k], 2));
}
spectrum[0] = realPart[0];
Note that the 0th index of the spectrum doesn't have an imaginary part. This is the DC component of the signal (we won't use this).
Now, we have an array with the magnitudes of each frequency across your spectrum (If your sampling frequency is 44100Hz, this means you now have an array with the magnitudes of the frequencies between 0Hz and 44100Hz, and if you have 441 values in your array, then each index value represents a 100Hz step.)
Averaging the frequencies into bands:
Now that we've converted the FFT output to data that we can use, we can move on to the second part of your question: finding the averages of different bands of frequencies. This is relatively simple. We just need to split the array into different bands and find the average of each band. This can be generalized like so:
int NUM_BANDS = 20; //This can be any positive integer.
double[] bands = new double[NUM_BANDS];
int samplesPerBand = (n / 2) / NUM_BANDS;
for(int i = 0; i < NUM_BANDS; i++) {
//Add up each part
double total;
for(int j = samplesPerBand * i ; j < samplesPerBand * (i+1); j++) {
total += spectrum[j];
}
//Take average
bands[i] = total / samplesPerBand;
}
Final Code:
And that's it! You now have an array called bands with the average magnitude of each band of frequencies. The code above is purposefully not optimized in order to show how each step works. Here is a shortened and optimized version:
int numFrequencies = fftSamples.length / 2;
double[] spectrum = new double[numFrequencies];
for(int k = 1; k < numFrequencies; k++) {
spectrum[k] = Math.sqrt(Math.pow(fftSamples[k*2], 2) + Math.pow(fftSamples[k*2+1], 2));
}
spectrum[0] = fftSamples[0];
int NUM_BANDS = 20; //This can be any positive integer.
double[] bands = new double[NUM_BANDS];
int samplesPerBand = numFrequencies / NUM_BANDS;
for(int i = 0; i < NUM_BANDS; i++) {
//Add up each part
double total;
for(int j = samplesPerBand * i ; j < samplesPerBand * (i+1); j++) {
total += spectrum[j];
}
//Take average
bands[i] = total / samplesPerBand;
}
//Use bands in view!
This has been a really long answer, and I haven't tested the code yet (though I do plan to). Feel free to comment if you find any mistakes.
I'm trying to detect echoes of my chirp in my sound recording on Android and it seems cross correlation is the most appropriate way of finding where the FFTs of the two signals are similar and from there I can identify peaks in the cross correlated array which will correspond to distances.
From my understanding, I have come up with the following cross correlation function. Is this correct? I wasn't sure whether to add zeros to the beginning as and start a few elements back?
public double[] xcorr1(double[] recording, double[] chirp) {
double[] recordingZeroPadded = new double[recording.length + chirp.length];
for (int i = recording.length; i < recording.length + chirp.length; ++i)
recordingZeroPadded[i] = 0;
for (int i = 0; i < recording.length; ++i)
recordingZeroPadded[i] = recording[i];
double[] result = new double[recording.length + chirp.length - 1];
for (int offset = 0; offset < recordingZeroPadded.length - chirp.length; ++offset)
for (int i = 0; i < chirp.length; ++i)
result[offset] += chirp[i] * recordingZeroPadded[offset + i];
return result;
}
Secondary question:
According to this answer, it can also be calculated like
corr(a, b) = ifft(fft(a_and_zeros) * fft(b_and_zeros[reversed]))
which I don't understand at all but seems easy enough to implement. That said I have failed (assuming my xcorr1 is correct). I feel like I've completely misunderstood this?
public double[] xcorr2(double[] recording, double[] chirp) {
// assume same length arguments for now
DoubleFFT_1D fft = new DoubleFFT_1D(recording.length);
fft.realForward(recording);
reverse(chirp);
fft.realForward(chirp);
double[] result = new double[recording.length];
for (int i = 0; i < result.length; ++i)
result [i] = recording[i] * chirp[i];
fft.realInverse(result, true);
return result;
}
Assuming I got both working, which function would be most appropriate given that the arrays will contain a few thousand elements?
EDIT: Btw, I have tried adding zeros to both ends of both arrays for the FFT version.
EDIT after SleuthEye's response:
Can you just verify that, because I'm dealing with 'actual' data, I need only do half the computations (the real parts) by doing a real transform?
From your code, it looks as though the odd numbered elements in the array returned by the REAL transform are imaginary. What's going on here?
How am I going from an array of real numbers to complex? Or is this the purpose of a transform; to move real numbers into the complex domain? (but the real numbers are just a subset of the complex numbers and so wouldn't they already be in this domain?)
If realForward is in fact returning imaginary/complex numbers, how does it differ to complexForward? And how do I interpret the results? The magnitude of the complex number?
I apologise for my lack of understanding with regard to transforms, I have only so far studied fourier series.
Thanks for the code. Here is 'my' working implementation:
public double[] xcorr2(double[] recording, double[] chirp) {
// pad to power of 2 for optimisation
int y = 1;
while (Math.pow(2,y) < recording.length + chirp.length)
++y;
int paddedLength = (int)Math.pow(2,y);
double[] paddedRecording = new double[paddedLength];
double[] paddedChirp = new double[paddedLength];
for (int i = 0; i < recording.length; ++i)
paddedRecording[i] = recording[i];
for (int i = recording.length; i < paddedLength; ++i)
paddedRecording[i] = 0;
for (int i = 0; i < chirp.length; ++i)
paddedChirp[i] = chirp[i];
for (int i = chirp.length; i < paddedLength; ++i)
paddedChirp[i] = 0;
reverse(chirp);
DoubleFFT_1D fft = new DoubleFFT_1D(paddedLength);
fft.realForward(paddedRecording);
fft.realForward(paddedChirp);
double[] result = new double[paddedLength];
result[0] = paddedRecording[0] * paddedChirp[0]; // value at f=0Hz is real-valued
result[1] = paddedRecording[1] * paddedChirp[1]; // value at f=fs/2 is real-valued and packed at index 1
for (int i = 1; i < result.length / 2; ++i) {
double a = paddedRecording[2*i];
double b = paddedRecording[2*i + 1];
double c = paddedChirp[2*i];
double d = paddedChirp[2*i + 1];
// (a+b*j)*(c-d*j) = (a*c+b*d) + (b*c-a*d)*j
result[2*i] = a*c + b*d;
result[2*i + 1] = b*c - a*d;
}
fft.realInverse(result, true);
// discard trailing zeros
double[] result2 = new double[recording.length + chirp.length - 1];
for (int i = 0; i < result2.length; ++i)
result2[i] = result[i];
return result2;
}
However, until about 5000 elements each, xcorr1 seems to be quicker. Am I doing anything particularly slow (perhaps the constant 'new'ing of memory -- maybe I should cast to an ArrayList)? Or the arbitrary way in which I generated the arrays to test them? Or should I do the conjugates instead of reversing it? That said, performance isn't really an issue so unless there's something obvious you needn't bother pointing out optimisations.
Your implementation of xcorr1 does correspond to the standard signal-processing definition of cross-correlation.
Relative to your interrogation with respect to adding zeros at the beginning: adding chirp.length-1 zeros would make index 0 of the result correspond to the start of transmission. Note however that the peak of the correlation output occurs chirp.length-1 samples after the start of echoes (the chirp has to be aligned with the full received echo). Using the peak index to obtain echo delays, you would then have to adjust for that correlator delay either by subtracting the delay or by discarding the first chirp.length-1 output results. Noting that the additional zeros correspond to that many extra outputs at the beginning, you'd probably be better off not adding those zeros at the beginning in the first place.
For xcorr2 however, a few things need to be addressed. First, if the recording and chirp inputs are not already zero-padded to at least chirp+recording data length you would need to do so (preferably to a power of 2 length for performance reasons). As you are aware, they would both need to be padded to the same length.
Second, you didn't take into account that the multiplication indicated in the posted reference answer, correspond in fact to complex multiplications (whereas DoubleFFT_1D.realForward API uses doubles). Now if you are going to implement something such as a complex multiplication with the chirp's FFT, you might as well actually implement the multiplication with the complex conjugate of the chirp's FFT (the alternate implementation indicated in the reference answer), removing the need to reverse the time-domain values.
Also accounting for DoubleFFT_1D.realForward packing order for even length transforms, you would get:
// [...]
fft.realForward(paddedRecording);
fft.realForward(paddedChirp);
result[0] = paddedRecording[0]*paddedChirp[0]; // value at f=0Hz is real-valued
result[1] = paddedRecording[1]*paddedChirp[1]; // value at f=fs/2 is real-valued and packed at index 1
for (int i = 1; i < result.length/2; ++i) {
double a = paddedRecording[2*i];
double b = paddedRecording[2*i+1];
double c = paddedChirp[2*i];
double d = paddedChirp[2*i+1];
// (a+b*j)*(c-d*j) = (a*c+b*d) + (b*c-a*d)*j
result[2*i] = a*c + b*d;
result[2*i+1] = b*c - a*d;
}
fft.realInverse(result, true);
// [...]
Note that the result array would be of the same size as paddedRecording and paddedChirp, but only the first recording.length+chirp.length-1 should be kept.
Finally, relative to which function is the most appropriate for arrays of a few thousand elements, the FFT version xcorr2 is likely going to be much faster (provided you restrict array lengths to powers of 2).
The direct version doesn't require zero-padding first. You just take recording of length M and chirp of length N and calculate result of length N+M-1. Work through a tiny example by hand to grok the steps:
recording = [1, 2, 3]
chirp = [4, 5]
1 2 3
4 5
1 2 3
4 5
1 2 3
4 5
1 2 3
4 5
result = [1*5, 1*4 + 2*5, 2*4 + 3*5, 3*4] = [5, 14, 23, 4]
The FFT method is much faster if you have long arrays. In this case you have to zero-pad each input to size M+N-1 so that both input arrays are the same size before taking the FFT.
Also, the FFT output is complex numbers, so you need to use complex multiplication. (1+2j)*(3+4j) is -5+10j, not 3+8j. I don't know how your complex numbers are arranged or handled, but make sure this is right.
Or is this the purpose of a transform; to move real numbers into the complex domain?
No, the Fourier transform transforms from the time domain to the frequency domain. The time domain data can be either real or complex, and the frequency domain data can be either real or complex. In most cases you have real data with a complex spectrum. You need to read up on the Fourier transform.
If realForward is in fact returning imaginary/complex numbers, how does it differ to complexForward?
The real FFT takes a real input, while the complex FFT takes a complex input. Both transforms produce complex numbers as their output. That's what the DFT does. The only time a DFT produces real output is if the input data is symmetrical (in which case you can use the DCT to save even more time).
I am trying to build some kind of sound-meter to the Android platform. (i am using HTC wildfire) I use the AudioRecord class for that goal, however it seems that the
values that are being returned from its "read" are not reasonable.
This is how i created the AudioRecord object:
int minBufferSize =
AudioRecord.getMinBufferSize(sampleRateInHz,
android.media.AudioFormat.CHANNEL_IN_MONO,
android.media.AudioFormat.ENCODING_PCM_16BIT);
audioRecored = new AudioRecord( MediaRecorder.AudioSource.MIC,
44100,
android.media.AudioFormat.CHANNEL_IN_MONO,
android.media.AudioFormat.ENCODING_PCM_16BIT,
minBufferSize );
This is how i try to read data from it:
short[] audioData = new short[bufferSize];
int offset =0;
int shortRead = 0;
int sampleToReadPerGet = 1000;//some value in order to avoid momentaraly nosies.
//start tapping into the microphone
audioRecored.startRecording();
//start reading from the microphone to an internal buffer - chuck by chunk
while (offset < sampleToReadPerGet)
{
shortRead = audioRecored.read(audioData, offset ,sampleToReadPerGet - offset);
offset += shortRead;
}
//stop tapping into the microphone
audioRecored.stop();
//average the buffer
int averageSoundLevel = 0;
for (int i = 0 ; i < sampleToReadPerGet ; i++)
{
averageSoundLevel += audioData[i];
}
averageSoundLevel /= sampleToReadPerGet;
What are those values? are they decibels?
Edit:
The values goes from -200 to 3000.
The value of shortRead is sampleToReadPerGet (1000).
Not sure what "those values" you are referring to, the raw output or the averaged values, but the raw output are instantaneous amplitude levels. It's important to realize that such values are not referenced to anything in particular. That is, just because you are reading 20, does not tell you 20 of what.
Taking the average of these values doesn't make any sense, because those values swing above and below zero. Do it long enough and you'll just get zero.
It might make sense to take the average of the squares, and then find the square root of the average. This is called the RMS. However, without a fixed buffer size to average over, this is hazardous at best.
To measure dB, you will have to use the formula dB = 20 log_10 (|A|/A_r) where A is the amplitude and A_r is the reference amplitude -- clearly, you must decide what you are referencing (you can calibrate the HTC, or measure against the maximum level or something like that).
You should not get negative values. The values span 16 or 8 bits, so your max is about 32000 or something. The values have no units.
Also, I recommend root-mean-squared instead of an average for determining volume. It is more stable.
What you should try:
Increase the buffer size by 3: Your app may not be reading it fast
enough so you need some space. Otherwise you might be getting some
buffer overflow errors (which you are not checking for in your code)
Try the code in gast-lib: It helps you periodically record audio and also provides you an AsyncTask.
Root mean squared:
public static double rootMeanSquared(short[] nums)
{
double ms = 0;
for (int i = 0; i < nums.length; i++)
{
ms += nums[i] * nums[i];
}
ms /= nums.length;
return Math.sqrt(ms);
}
I have been working on an Android project for awhile that displays the fundamental frequency of an input signal (to act as a tuner). I have successfully implemented the AudioRecord class and am getting data from it. However, I am having a hard time performing an FFT on this data to get the fundamental frequency of the input signal. I have been looking at the post here, and am using FFT in Java and Complex class to go with it.
I have successfully used the FFT function found in FFT in Java, but I am not sure if I am obtaining the correct results. For the magnitude of the FFT (sqrt[rere+imim]) I am getting values that start high, around 15000 Hz, and then slowly diminish to about 300 Hz. Doesn't seem right.
Also, as far as the raw data from the mic goes, the data seems fine, except that the first 50 values or so are always the number 3, unless I hit the tuning button again while still in the application and then I only get about 15. Is that normal?
Here is a bit of my code.
First of all, I convert the short data (obtained from the microphone) to a double using the following code which is from the post I have been looking at. This snippet of code I do not completely understand, but I think it works.
//Conversion from short to double
double[] micBufferData = new double[bufferSizeInBytes];//size may need to change
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 1.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bufferSizeInBytes - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = audioData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
The code then continues as follows:
//Create Complex array for use in FFT
Complex[] fftTempArray = new Complex[bufferSizeInBytes];
for (int i=0; i<bufferSizeInBytes; i++)
{
fftTempArray[i] = new Complex(micBufferData[i], 0);
}
//Obtain array of FFT data
final Complex[] fftArray = FFT.fft(fftTempArray);
final Complex[] fftInverse = FFT.ifft(fftTempArray);
//Create an array of magnitude of fftArray
double[] magnitude = new double[fftArray.length];
for (int i=0; i<fftArray.length; i++){
magnitude[i]= fftArray[i].abs();
}
fft.setTextColor(Color.GREEN);
fft.setText("fftArray is "+ fftArray[500] +" and fftTempArray is "+fftTempArray[500] + " and fftInverse is "+fftInverse[500]+" and audioData is "+audioData[500]+ " and magnitude is "+ magnitude[1] + ", "+magnitude[500]+", "+magnitude[1000]+" Good job!");
for(int i = 2; i < samples; i++){
fft.append(" " + magnitude[i] + " Hz");
}
That last bit is just to check what values I am getting (and to keep me sane!). In the post referred to above, it talks about needing the sampling frequency and gives this code:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}
How do I implement this code? I don't realy understand where fftOutWindowSize and arrayIndex comes from?
Any help is greatly appreciated!
Dustin
Recently I'm working on a project which requires almost the same. Probably you don't need any help anymore but I will give my thoughts anyway. Maybe someone need this in the future.
I'm not sure whether the short to double function works, I don't understand that snippet of code neither. It is wrote for byte to double conversion.
In the code: "double[] micBufferData = new double[bufferSizeInBytes];" I think the size of micBufferData should be "bufferSizeInBytes / 2", since every sample takes two bytes and the size of micBufferData should be the sample number.
FFT algorithms do require a FFT window size, and it has to be a number which is the power of 2. However many algorithms can receive an arbitrary of number as input and it will do the rest. In the document of those algorithms should have the requirements of input. In your case, the size of the Complex array can be the input of FFT algorithms. And I don't really know the detail of the FFT algorithm but I think the inverse one is not needed.
To use the code you gave at last, you should firstly find the peak index in the sample array. I used double array as input instead of Complex, so in my case it is something like: double maxVal = -1;int maxIndex = -1;
for( int j=0; j < mFftSize / 2; ++j ) {
double v = fftResult[2*j] * fftResult[2*j] + fftResult[2*j+1] * fftResult[2*j+1];
if( v > maxVal ) {
maxVal = v;
maxIndex = j;
}
}
2*j is the real part and 2*j+1 is the imaginary part. maxIndex is the index of the peak magnitude you want (More detail here), and use it as input to the ComputeFrequency function. The return value is the frequency of the sample array you want.
Hopefully it can help someone.
You should pick an FFT window size depending on your time versus frequency resolution requirements, and not just use the audio buffer size when creating your FFT temp array.
The array index is your int i, as used in your magnitude[i] print statement.
The fundamental pitch frequency for music is often different from FFT peak magnitude, so you may want to research some pitch estimation algorithms.
I suspect that the strange results you're getting are because you might need to unpack the FFT. How this is done will depend on the library that you're using (see here for docs on how it's packed in GSL, for example). The packing may mean that the real and imaginary components are not in the positions in the array that you expect.
For your other questions about window size and resolution, if you're creating a tuner then I'd suggest trying a window size of about 20ms (eg 1024 samples at 44.1kHz). For a tuner you need quite high resolution, so you could try zero-padding by a factor of 8 or 16 which will give you a resolution of 3-6Hz.