I am mixing two 16bit PCM samples into a short buffer.
// This is our buffer for PCM audio data
mp3Buffer = new short[minBufferSize];
wavBuffer = new short[minBufferSize];
mixedBuffer = new short[minBufferSize];
I am filling these buffers with samples from both the mp3 and wav files. I found out that the wav file will always be in mono and the mp3 will always be stereo.
I've read that if you "Just allocate a buffer twice the size of the original PCM data, and for every sample in the original buffer put it twice in the new buffer"
short[] stereoWavBuffer = new short[minBufferSize];
int k = 1;
for (int j = 0; j < minBufferSize / 2; j += 2)
{
stereoWavBuffer[j] = wavBuffer[j];
stereoWavBuffer[k] = wavBuffer[k];
k += 2;
}
// TO DO - Add the 2 buffers together
for (int i = 0; i < minBufferSize; i++){
mixedBuffer[i] = (short)(mp3Buffer[i] + stereoWavBuffer[i]);
}
track.write(mixedBuffer, 0, minBufferSize);
}
How can I accomplish this? I tried this but the wav audio now is at regular speed but sounds like chipmunk.
It looks to me as if your first for loop should be
j < minBufferSize - 1
/2 would mean you will never read all of of the wave buffer or write your entire stereo buffer, - even if you only read half the wave buffer because that's all the data since it's mono you still need to write the entire stereo buffer. Also you need to increment J by 1 not 2 so you read each mono sample.
The speed issue appears to be because you should set stereowavebuffer at j and k both equal to wavebuffer at j. it seems that you are in fact just duplicating half of the original mono file. and then playing it back as stereo (ie: double the byterate).
I would think the first loop should look something more like this
int k = 0;
for (int j = 0; j < minBufferSize / 2; j++) //Assuming you only have half a buffer since mono???
{
stereoWavBuffer[k] = wavBuffer[j];
stereoWavBuffer[k+1] = wavBuffer[j];
k += 2;
}
-edited to fix bad ipad typing!
Related
I am building an app that needs to be able to display a real-time spectral analyzer. Here is the version I was able to successfully make on iOS:
I am using Wendykierp JTransforms library to perform the FFT calculations, and have managed to capture audio data and execute the FFT functions. See below:
short sData[] = new short[BufferElements2Rec];
int result = audioRecord.read(sData, 0, BufferElements2Rec);
try
{
//Initiate FFT
DoubleFFT_1D fft = new DoubleFFT_1D(sData.length);
//Convert sample data from short[] to double[]
double[] fftSamples = new double[sData.length];
for (int i = 0; i < sData.length; i++) {
//IMPORTANT: We cannot simply cast the short value to double.
//As a double is only 2 bytes (values -32768 to 32768)
//We must divide by 32768 before we cast to Double.
fftSamples[i] = (double) sData[i] / 32768;
}
//Perform fft calcs
fft.realForward(fftSamples);
//TODO - Convert FFT data into 20 "bands"
} Catch (Exception e)
{
}
In iOS, I was using a library (Tempi-FFT) which had built in functionality for calculating magnitude, frequency, and providing averaged data for any given number of bands (I am using 20 bands as you can see in the image above). It seems I don't have that luxury with this library and I need to calculate this myself.
Looking for any good examples or tutorials on how to interperate the data returned by the FFT calculations. Here is some sample data I am receiving:
-11387.0, 183.0, -384.9121475854448, -224.66315714636642, -638.0173005872095, -236.2318653974911, -1137.1498541119106, -437.71599514435786, 1954.683405957685, -2142.742125980924 ...
Looking for simple explanation of how to interpret this data. Some other questions I have looked at that I was either unable to understand, or did not provide information on how to determine a given number of bands:
Power Spectral Density from jTransforms DoubleFFT_1D
How to develop a Spectrum Analyser from a realtime audio?
Your question can be split into two parts: finding the magnitude of all frequencies (interpreting the output) and averaging the frequencies into bands
Finding the magnitude of all frequencies:
I won't go into the intricacies of the Fast Fourier Transform/Discrete Fourier Transform (if you would like to gain a basic understanding see this video), but know that there is a real and an imaginary part of each output.
The documentation of the realForward function describes where both the imaginary and the real parts are located in the output array (I'm assuming you have an even sample size):
a[2*k] = Re[k], 0 <= k < n / 2
a[2*k+1] = Im[k], 0 < k < n / 2
a[1] = Re[n/2]
a is equivalent to your fftSamples, which means we can translate this documentation into code as follows (I've changed Re and Im to realPart and imaginaryPart respectively):
int n = fftSamples.length;
double[] realPart = new double[n / 2];
double[] imaginaryPart = new double[n / 2];
for(int k = 0; k < n / 2; k++) {
realPart[k] = fftSamples[k * 2];
imaginaryPart[k] = fftSamples[k * 2 + 1];
}
realPart[n / 2] = fftSamples[1];
Now we have the real and imaginary parts of each frequency. We could plot these on an x-y coordinate plane using the real part as the x value and the imaginary part as the y value. This creates a triangle, and the length of the triangle's hypotenuse is the magnitude of the frequency. We can use the pythagorean theorem to get this magnitude:
double[] spectrum = new double[n / 2];
for(int k = 1; k < n / 2; k++) {
spectrum[k] = Math.sqrt(Math.pow(realPart[k], 2) + Math.pow(imaginaryPart[k], 2));
}
spectrum[0] = realPart[0];
Note that the 0th index of the spectrum doesn't have an imaginary part. This is the DC component of the signal (we won't use this).
Now, we have an array with the magnitudes of each frequency across your spectrum (If your sampling frequency is 44100Hz, this means you now have an array with the magnitudes of the frequencies between 0Hz and 44100Hz, and if you have 441 values in your array, then each index value represents a 100Hz step.)
Averaging the frequencies into bands:
Now that we've converted the FFT output to data that we can use, we can move on to the second part of your question: finding the averages of different bands of frequencies. This is relatively simple. We just need to split the array into different bands and find the average of each band. This can be generalized like so:
int NUM_BANDS = 20; //This can be any positive integer.
double[] bands = new double[NUM_BANDS];
int samplesPerBand = (n / 2) / NUM_BANDS;
for(int i = 0; i < NUM_BANDS; i++) {
//Add up each part
double total;
for(int j = samplesPerBand * i ; j < samplesPerBand * (i+1); j++) {
total += spectrum[j];
}
//Take average
bands[i] = total / samplesPerBand;
}
Final Code:
And that's it! You now have an array called bands with the average magnitude of each band of frequencies. The code above is purposefully not optimized in order to show how each step works. Here is a shortened and optimized version:
int numFrequencies = fftSamples.length / 2;
double[] spectrum = new double[numFrequencies];
for(int k = 1; k < numFrequencies; k++) {
spectrum[k] = Math.sqrt(Math.pow(fftSamples[k*2], 2) + Math.pow(fftSamples[k*2+1], 2));
}
spectrum[0] = fftSamples[0];
int NUM_BANDS = 20; //This can be any positive integer.
double[] bands = new double[NUM_BANDS];
int samplesPerBand = numFrequencies / NUM_BANDS;
for(int i = 0; i < NUM_BANDS; i++) {
//Add up each part
double total;
for(int j = samplesPerBand * i ; j < samplesPerBand * (i+1); j++) {
total += spectrum[j];
}
//Take average
bands[i] = total / samplesPerBand;
}
//Use bands in view!
This has been a really long answer, and I haven't tested the code yet (though I do plan to). Feel free to comment if you find any mistakes.
So far in my quest to concatenate videos with MediaCodec I've finally managed to resample 48k Hz audio to 44.1k Hz.
I've been testing joining videos together with two videos, the first one having an audio track with 22050 Hz 2 channels format, the second one having an audio track with 24000 Hz 1 channel format. Since my decoder just outputs 44100 Hz 2 channels raw audio for the first video and 48000 Hz 2 channels raw audio for the second one, I resampled the ByteBuffers that the second video's decoder outputs from 48000 Hz down to 44100 Hz using this method:
private byte[] minorDownsamplingFrom48kTo44k(byte[] origByteArray)
{
int origLength = origByteArray.length;
int moddedLength = origLength * 147/160;
//int moddedLength = 187*36;
int delta = origLength - moddedLength;
byte[] resultByteArray = new byte[moddedLength];
int arrayIndex = 0;
for(int i = 0; i < origLength; i+=44)
{
for(int j = i; j < (i+40 > origLength ? origLength : i + 40); j++)
{
resultByteArray[arrayIndex] = origByteArray[j];
arrayIndex++;
}
//Log.i("array_iter", i+" "+arrayIndex);
}
//smoothArray(resultByteArray, 3);
return resultByteArray;
}
However, in the output video file, the video plays at a slower speed upon reaching the second video with the downsampled audio track. The pitch is the same and the noise is gone, but the audio samples just play slower.
My output format is actually 22050 Hz 2 channels, following the first video.
EDIT: It's as if the player still plays the audio as if it has a sample rate of 48000 Hz even after it's downsampled to 44100 Hz.
My questions:
How do I mitigate this problem? Because I don't think changing the timestamps works in this case. I just use the decoder-provided timestamps with some offset based on the first video's last timestamp.
Is the issue related to the CSD-0 ByteBuffers?
If MediaCodec has the option of changing the video bitrate on the fly, would a new feature of changing the audio sample rate or channel count on the fly be feasible?
Turns out it was something as simple as limiting the size of my ByteBuffers.
The decoder outputs 8192 bytes (2048 samples).
After downsampling, the data becomes 7524 bytes (1881 samples) - originally 7526 bytes but that amounts to 1881.5 samples, so I rounded it down.
The prime mistake was in this code where I have to bring the sample rate close to the original:
byte[] finalByteBufferContent = new byte[size / 2]; //here
for (int i = 0; i < bufferSize; i += 2) {
if ((i + 1) * ((int) samplingFactor) > testBufferContents.length) {
finalByteBufferContent[i] = 0;
finalByteBufferContent[i + 1] = 0;
} else {
finalByteBufferContent[i] = testBufferContents[i * ((int) samplingFactor)];
finalByteBufferContent[i + 1] = testBufferContents[i * ((int) samplingFactor) + 1];
}
}
bufferSize = finalByteBufferContent.length;
Where size is the decoder output ByteBuffer's length and testBufferContents is the byte array I use to modify its contents (and is the one that was downsampled to 7524 bytes).
The resulting byte array's length was still 4096 bytes instead of 3762 bytes.
Changing new byte[size / 2] to new byte[testBufferContents.length / 2] resolved that problem.
I'm trying to implement an high pass audio filter on the microphone data that I get form the audioRecord.
The data I get form the microphone is a 16-bit PCM audio byte-array. I was trying to use TarsosDSP which provides a API for high pass filtering. However, as an input it requires a float-array so I converted the byte into a float array and ran the highpass filter. To confirm the results I saved the filtered data in a wave file but it sounds totally distorted.
public static byte[] highPassFilter( byte[] buffer, WaveHeader waveHeader, float frequency) {
HighPass highPass = new HighPass(frequency, waveHeader.getSampleRate());
TarsosDSPAudioFormat format = new TarsosDSPAudioFormat(waveHeader.getSampleRate(),waveHeader.getBitsPerSample(),waveHeader.getChannels(),true, false);
AudioEvent audioEvent = new AudioEvent(format);
float[] f_buffer = bytesToFloats(buffer);
audioEvent.setFloatBuffer(f_buffer);
highPass.process(audioEvent);
buffer = audioEvent.getByteBuffer();
byte[] data = PCMtoWav(buffer, waveHeader.getSampleRate(), waveHeader.getChannels(), waveHeader.getBitsPerSample());
writeWavFile(data);
return buffer;
}
public static float[] bytesToFloats(byte[] bytes) {
float[] floats = new float[bytes.length / 2];
for(int i=0; i < bytes.length; i+=2) {
floats[i/2] = bytes[i] | (bytes[i+1] < 128 ? (bytes[i+1] << 8) : ((bytes[i+1] - 256) << 8));
}
return floats;
}
The data in the waveHeader is:
Sample rate = 11025
getBitsPerSample = 16
getChannels = 1
My best guess is that the bytesToFloats conversion is wrong. To verify this I just set the float buffer of the audioEvent with audioEvent.setFloatBuffer and then retrieved it with audioEvent.getByteBuffer which also resulted in a totally distorted audio file.
The byte buffer is read from the audioRecord:
audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, 11025, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, 220500);
....
buffer = new byte[frameByteSize];
byte[] audioRecord.read(buffer, 0, frameByteSize);
Anybody have any idea how to fix this or suggestions for different high pass filters that I could use on a byte array in android.
Update: I figured it out. This is my updated function to convert from bytes to floats:
public static float[] bytesToFloats(byte[] bytes) {
float[] floats = new float[bytes.length / 2];
short[] shorts = new short[bytes.length/2];
ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts);
for(int i=0; i < bytes.length; i+=2) {
floats[i/2] = shorts[i/2] / 32768f;
}
return floats;
}
Do the two bytes samples represent float values? They could be signed short within the range of -32,768 to 32,767. Also, for floating point representation of samples the values within the range of -1.0 to 1.0 are common.
I would try:
short sample = bytes[i] | (bytes[i+1] < 128 ? (bytes[i+1] << 8) : ((bytes[i+1] - 256) << 8));
floats[i/2] = (float)sample / 32,768f;
You need to convert pairs of bytes into signed short and then scale it to a float in the range of -1.0 to 1.0.
One of the following lines depending on the endianness of the data will convert to signed 16-bit.
short shortSample = (short)(bytes[i]) | (short)(bytes[i+1]) << 8);
short shortSample = (short)(bytes[i] << 8) | (short)(bytes[i+1]));
And then scale to float:
float sample = shortSample / 32768f;
Below is the code for my play() method which simply generates an arbitrary set of frequencies and blends them into one tone.
The problem is that it only plays for a split second - I need is to play it continuously. I would appreciate suggestions on how to constantly generate the sound using the AudioTrack class in Android. I believe it has something to do with the MODE_STREAM constant, but I can't quite work out how.
Here is the link to AudioTrack class documentation:
http://developer.android.com/reference/android/media/AudioTrack.html
EDIT: I forgot to mention one important aspect, it can't loop. Due to the mixing of sometimes up to 50+ frequencies, it will sound choppy because there is no least common denominator for all frequency peaks - or it's too far down the waveform to store as one sound.
/**
* play - begins playing the sound
*/
public void play() {
// Get array of frequencies with their relative strengths
double[][] soundData = getData();
// Track samples array
final double samples[] = new double[1024];
// Calculate the average sum in the array and write it to sample
for (int i = 0; i < samples.length; ++i) {
double valueSum = 0;
for (int j = 0; j < soundData.length; j++) {
valueSum += Math.sin(2 * Math.PI * i / (SAMPLE_RATE / soundData[j][0]));
}
samples[i] = valueSum / soundData.length;
}
// Obtain a minimum buffer size
int minBuffer = AudioTrack.getMinBufferSize(SAMPLE_RATE, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);
if (minBuffer > 0) {
// Create an AudioTrack
mTrack = new AudioTrack(AudioManager.STREAM_MUSIC, SAMPLE_RATE, AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT, minBuffer, AudioTrack.MODE_STREAM);
// Begin playing track
mTrack.play();
// Fill the buffer
if (mBuffer.length < samples.length) {
mBuffer = new short[samples.length];
}
for (int k = 0; k < samples.length; k++) {
mBuffer[k] = (short) (samples[k] * Short.MAX_VALUE);
}
// Write audio data to track for real-time audio sythesis
mTrack.write(mBuffer, 0, samples.length);
}
// Once everything has successfully begun, indicate such.
isPlaying = true;
}
It looks like the code is almost there. It just needs a loop to keep generating the samples, putting them in the buffer, and writing them to the AudioTrack. Right now just one buffer full gets written before it exits which is why it stops so quickly.
void getSamples(double[] samples) {
// Get array of frequencies with their relative strengths
double[][] soundData = getData();
// Calculate the average sum in the array and write it to sample
for (int i = 0; i < samples.length; ++i) {
double valueSum = 0;
for (int j = 0; j < soundData.length; j++) {
valueSum += Math.sin(2 * Math.PI * i / (SAMPLE_RATE / soundData[j][0]));
}
samples[i] = valueSum / soundData.length;
}
}
public void endPlay() {
done = true;
}
/**
* play - begins playing the sound
*/
public void play() {
// Obtain a minimum buffer size
int minBuffer = AudioTrack.getMinBufferSize(SAMPLE_RATE, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);
if (minBuffer > 0) {
// Create an AudioTrack
mTrack = new AudioTrack(AudioManager.STREAM_MUSIC, SAMPLE_RATE, AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT, minBuffer, AudioTrack.MODE_STREAM);
// Begin playing track
mTrack.play();
// Track samples array
final double samples[] = new double[1024];
while (!done) {
// Fill the buffer
if (mBuffer.length < samples.length) {
mBuffer = new short[samples.length];
}
getSamples(samples);
for (int k = 0; k < samples.length; k++) {
mBuffer[k] = (short) (samples[k] * Short.MAX_VALUE);
}
// Write audio data to track for real-time audio sythesis
mTrack.write(mBuffer, 0, samples.length);
// Once everything has successfully begun, indicate such.
isPlaying = true;
}
}
// Once everything is done, indicate such.
isPlaying = false;
}
I have made an application that records from the phones microphone using the AudioRecord and 16-bit encoding, and I am able to playback the recording. For some compatibility reason I need to use 8-bit encoding, but when I try to run the same program using that encoding I keep getting an Invalid Audio Format. my code is :
int bufferSize = AudioRecord.getMinBufferSize(11025,
AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_8BIT);
AudioRecord recordInstance = new AudioRecord(
MediaRecorder.AudioSource.MIC, 11025,
AudioFormat.CHANNEL_CONFIGURATION_MONO, AudioFormat.ENCODING_PCM_8BIT,
bufferSize);
Any one knows what is the problem? According to the documentation AudioRecord is capable of 8-bit encoding.
If you look at the source, it only supports little endian, but Android is writing out big endian. So you have to convert to little endian and then 8-bit. This worked for me and you can probably combine the two:
for (int i = 0; (offset + i + 1) < bytes.length; i += 2) {
lens[i] = bytes[offset + i + 1];
lens[i + 1] = bytes[offset + i];
}
for (int i = 1, j = 0; i < length; i += 2, j++) {
lens[j] = lens[i];
}
Here is a simpler version without endian
for (int i = 0, j = 0; (offset + i) < bytes.length; i += 2, j++) {
lens[j] = bytes[offset + i];
}