Modify audio pitch / tempo while encoding with android MediaCodec - android

I'm using AudioRecord to get audio in real-time from the device microphone, and encoding / saving it to file in the background using the MediaCodec and MediaMuxer classes.
Is there any way to change the Pitch and (or) Tempo of the audio stream before it is saved to file?

By pitch/tempo, do you mean the frequency itself, or really the speed of the samples? If so, then each sample should be projected in a shorter or longer period of time:
Example:
private static byte[] ChangePitch(byte[] samples, float ratio) {
byte[] result = new byte[(int)(Math.Floor (samples.Length * ratio))];
for (int i = 0; i < result.Length; i++) {
var pointer = (int)((float)i / ratio);
result [i] = samples [pointer];
}
return result;
}
If you just want to change the pitch without affecting the speed, then you need to read about phase vocoder. This is sound science, and there are a lot of projects to achieve this. https://en.wikipedia.org/wiki/Phase_vocoder

To modify the pitch/tempo of the audio stream you'll have to resample it yourself before you encode it using the codec. Keep in mind that you also need to modify the timestamps if you change the tempo of the stream.

Related

How to avoid crackling popping sounds after increase volume in Oboe?

I'm started to implement Oboe c++ library for Android.
(According to Build a Musical Game using Oboe
I just scale the sample for increasing the volume and it works but with crackling popping.
can I increase the amplitude without getting the crackling popping?
I tried to save my sample sounds with a little bit gain but it sounds very bad.
Thanks.
Btw without increasing the volume, it sounds clear but very low volume compared to other music apps.
for (int i = 0; i < mNextFreeTrackIndex; ++i) {
mTracks[i]->renderAudio(mixingBuffer, numFrames);
for (int j = 0; j < numFrames * kChannelCount; ++j) {
audioData[j] += (mixingBuffer[j] * ((float)volume));
}
Edited:
int16_t Mixer::hardLimiter(int16_t sample) {
int16_t audioData = sample * volume;
if(audioData >= INT16_MAX){
return INT16_MAX;
}else if(audioData <= INT16_MIN){
return INT16_MIN;
}
return audioData;
};
The code which you've posted is from Mixer::renderAudio(int16_t *audioData, int32_t numFrames). Its job is to mix the sample values from the individual tracks together into a single array of 16-bit samples.
If you're mixing 2 or more tracks together without reducing the values first then you may exceed the maximum sample value of 32,767 (aka INT16_MAX). Doing so would cause wraparound (i.e. writing 32,768 to an int16_t will result in a value of -32,768 being stored) and therefore audible distortion/crackling.
With this in mind you could write a very basic (hard) limiter - do your volume scaling using an int32_t and only write it into the int16_t array if the value doesn't exceed the maximum, otherwise just write the maximum.
This isn't really a good approach though because you shouldn't be hitting the limits of 16-bit values. Better would be to scale down your input sample values first, then add a gain stage after the mixer (or on individual tracks inside the mixer) to bring the overall amplitude up to an acceptable level.

Understanding onWaveFormDataCapture byte array format

I'm analyzing audio signals on Android. First tried with MIC and succeeded. Now I'm trying to apply FFT on MP3 data comes from Visualizer.OnDataCaptureListener's* onWaveFormDataCapturemethod which is linked to MediaPlayer. There is a byte array called byte[] waveform which I get spectral leakage or overlap when apply FFT on this data.
public void onWaveFormDataCapture(Visualizer visualizer, byte[] waveform, int samplingRate)
I tried to convert the data into -1..1 range by using the code below in a for loop;
// waveform varies in range of -128..+127
raw[i] = (double) waveform[i];
// change it to range -1..1
raw[i] /= 128.0;
Then I copy the raw into fft buffer;
fftre[i] = raw[i];
fftim[i] = 0;
Then I call the fft function;
fft.fft(fftre, fftim); // in: audio signal, out: fft data
As final process I convert them into magnitudes in dB then draw freqs on screen
// Ignore the first fft data which is DC component
for (i = 1, j = 0; i < waveform.length / 2; i++, j++)
{
magnitude = (fftre[i] * fftre[i] + fftim[i] * fftim[i]);
magnitudes[j] = 20.0 * Math.log10(Math.sqrt(magnitude) + 1e-5); // [dB]
}
When I play a sweep signal from 20Hz to 20kHz, I don't see what I see on MIC. It doesn't draw a single walking line, but several symmetric lines going far or coming near. Somehow there is a weaker symmetric signal on other end of the visualizer.
The same code which using 32768 instead of 128 on division works very well on MIC input with AudioRecord.
Where am I doing wrong?
(and yes, I know there is a direct fft output)
The input audio is 8-bit unsigned mono. The line raw[i] = (double) waveform[i] causes an unintentional unsigned-to-signed conversion, and since raw is biased to approximately a 128 DC level, a small sine wave ends up getting changed into a high-amplitude modified square wave, as the signal crosses the 127/-128 boundary. That causes a bunch of funny harmonics (which caused the "symmetric lines coming and going" you were talking about).
Solution
Change to (double) (waveform[i] & 0xFF) so that the converted value lies in the range 0..255, instead of -128..127.

Resampling audio in Android from 48kHz to 44.1kHz and vice versa - pure Java, or OpenSL ES?

I've managed to join audio tracks of video files using MediaCodec. There are no problems doing so if the channel count and the sample rate of both audio tracks are the same.
(for some reason the OMX.SEC.aac.dec always outputs 44100 Hz 2 channel audio if the original track is a 22050 Hz, and outputs 48000 Hz 2 channel audio if the original track is 24000 Hz.)
The problem comes in when I try appending a 24000 Hz audio track after a 22050 Hz audio track. Assuming I want to output a 22050 Hz audio track consisting of both said tracks, I'll have to resample the 24000 Hz one.
I tried this:
private byte[] minorDownsamplingFrom48kTo44k(byte[] origByteArray)
{
int origLength = origByteArray.length;
int moddedLength = origLength * 147/160;
int delta = origLength - moddedLength;
byte[] resultByteArray = new byte[moddedLength];
int arrayIndex = 0;
for(int i = 0; i < origLength; i+=11)
{
for(int j = i; j < i+10; j++)
{
resultByteArray[arrayIndex] = origByteArray[j];
arrayIndex++;
}
}
return resultByteArray;
}
It returns a byte array of 3700-something bytes and the correct audio after the encoding... behind a very loud scrambled sound.
My questions:
How do I correctly downsample the audio track without leaving such artifacts? Should I use average values?
Should I use a resampler implemented using OpenSL ES to make the process faster and/or better?
The main issue is that you're just skipping bytes when you should be skipping samples.
Each sample is 16 bits, so two bytes. If the audio is stereo there are four bytes per sample. You have to always skip that many bytes or otherwise your samples will get completely mixed up.
Using the same ratio (10/11) you can use 40/44 to always skip a full four-byte sample and keep the samples proper.
As to why the resulting video is playing at a different speed, that's something completely different.

Changing Pitch and Speed in android by NDK

I have decoded PCM by NDK (mpg123),i want to change pitch and speed dynamically... but how would i suppose to work on speed and pitch separately .. all i can do is change the sample rate in audio-track .. and i have also implemented Sonic-NDK ,in whitch there is a .bin file in raw folder which is playing perfactly, now i am playing .wav to sonic- NDK,then there is but the thing Sonic need to feed PCM so i am converting to pcm by mpg123 and feeding it dynamically .. but the whole process is not fast enough and i am hearing a gap in loop.. code is :
void recursive() {
buffer = new short[minBufferSize];
err = NativeWrapper.decodeMP3(minBufferSize * 2, buffer);
byte[] bufferbyte = ShortToByte_Twiddle_Method(buffer);
sonic.putBytes(bufferbyte, bufferbyte.length);
int available = sonic.availableBytes();
modifiedSamples = new byte[available ];
sonic.receiveBytes(modifiedSamples, available);
track.write(modifiedSamples, 0, available); // Write
if (err != MPG123_DONE) {
recursive();
}
}
so now i have two queries:
how could i remove gap and noise in chunks (code below)
is there any other way to achieve operation on pitch and speed dynamically is there any other library or something or what else i should do.

getMaxAmplitude() alternative for Visualizer

In my app I allow the user to record audio using the phone's camera, while the recording is in progress I update a Path using time as the X value and a normalized form of getMaxAmplitude() for the y value.
float amp = Math.min(mRecorder.getMaxAmplitude(), mMaxAmplitude)
/ (float) mMaxAmplitude;
This works rather well.
My problem occurs when I go to play back the audio (after transporting it over the network). I want to recreate the waveform generated while recording, but the MediaPlayer class does not possess the same getMaxAmplitude() method.
I have been attempting to use the Visualizer class provided by the framework, but am having a difficult time getting a usable result for the y value. The byte array returned contains values between -128 and 127 but when i look at the actual values they do not appear to represent the waveform as I would expect it to be.
How do I use the values returned from the visualizer to get a value related to the loudness of the sound?
Your byte array is probably an array of 16, 24 or 32 bit signed values. Assuming they are 16 bit signed then the bytes will be alternating hi-byte with the MSB being the sign bit and the lo-byte. Or, depending on the endianness it could be lo-byte followed by the high byte. Moreover, if you have two channels of data, each sample is probably interleaved. Again, assuming 16-bits, you can decode the samples something in a manner similar to this:
for (int i = 0 ; i < numBytes/2 ; ++i)
{
sample[i] = (bytes[i*2] << 8) | bytes[i*2+1];
}
According to the documentation of getMaxAmplitude, it returns the maximum absolute amplitude that was sampled since the last call. I guess this means the peak amplitude but it's not totally clear from the documentation. To compute the peak amplitude, just compute the max of the abs of all the samples.
int maxPeak = 0.0;
for (int i = 0 ; i < numSamples ; ++i)
{
maxPeak = max(maxPeak, abs(samples[i]));
}

Categories

Resources