I have decoded PCM by NDK (mpg123),i want to change pitch and speed dynamically... but how would i suppose to work on speed and pitch separately .. all i can do is change the sample rate in audio-track .. and i have also implemented Sonic-NDK ,in whitch there is a .bin file in raw folder which is playing perfactly, now i am playing .wav to sonic- NDK,then there is but the thing Sonic need to feed PCM so i am converting to pcm by mpg123 and feeding it dynamically .. but the whole process is not fast enough and i am hearing a gap in loop.. code is :
void recursive() {
buffer = new short[minBufferSize];
err = NativeWrapper.decodeMP3(minBufferSize * 2, buffer);
byte[] bufferbyte = ShortToByte_Twiddle_Method(buffer);
sonic.putBytes(bufferbyte, bufferbyte.length);
int available = sonic.availableBytes();
modifiedSamples = new byte[available ];
sonic.receiveBytes(modifiedSamples, available);
track.write(modifiedSamples, 0, available); // Write
if (err != MPG123_DONE) {
recursive();
}
}
so now i have two queries:
how could i remove gap and noise in chunks (code below)
is there any other way to achieve operation on pitch and speed dynamically is there any other library or something or what else i should do.
Related
I am trying to display some visualization effect of some PCM data.
The target is to display something like the following:
I searched and found that JTransform is the correct library to use. However, I cannot find a good guide of how to use this library. How can I translate my PCM data into the band/frequency data that can be used to draw the bars?
Thanks a lot.
PCM audio is the digitized simplification of an analog audio curve ... this time domain signal can get feed into a Discrete Fourier Transform api call to transform the data into its frequency domain equivalent ... imaginary numbers and Euler's Formula are your friends
The easy part is to call fft, its more involved to parse its output ...
fill a buffer with at least 1024 (make sure its a power of 2) points from your PCM and just feed this into some fft api call ... this will return back to you its frequency domain equivalent ... nail the doc on whichever Discrete Fourier Transform api call you use ... lookup notion of Nyquist Limit ... master idea of frequency bin ... keep at hand number of samples per buffer and sample rate of your PCM audio
Be aware as you increase the number of audio samples (PCM points on the audio curve) you feed into a Fourier Transform the finer the frequency resolution returned from that call, however if your audio is some dynamic signal like music (opposed to a static tone) this lowers the temporal specificity
Here is a function I wrote in golang which is a wrapper around a call to DFT where I feed it a PCM raw audio buffer normalized into floating point which varies from -1 to +1 where it makes the Discrete Fourier Transform (fft) call then calculates magnitude of each frequency bin using array of complex numbers returned from DFT ... a portion of a project which synthesizes audio by watching video (an image at a time) then it can listen to that audio to synthesize output images ... achieved goal where output photo largely matches input photo ...... input image -> audio -> output image
func discrete_time_fourier_transform(aperiodic_audio_wave []float64, flow_data_spec *Flow_Spec) ([]discrete_fft, float64, float64, []float64) {
min_freq := flow_data_spec.min_freq
max_freq := flow_data_spec.max_freq
// https://www.youtube.com/watch?v=mkGsMWi_j4Q
// Discrete Fourier Transform - Simple Step by Step
var complex_fft []complex128
complex_fft = fft.FFTReal(aperiodic_audio_wave) // input time domain ... output frequency domain of equally spaced freqs
number_of_samples := float64(len(complex_fft))
nyquist_limit_index := int(number_of_samples / 2)
all_dft := make([]discrete_fft, 0) // 20171008
/*
0th term of complex_fft is sum of all other terms
often called the bias shift
*/
var curr_real, curr_imag, curr_mag, curr_theta, max_magnitude, min_magnitude float64
max_magnitude = -999.0
min_magnitude = 999.0
min_magnitude = 999.0
all_magnitudes := make([]float64, 0)
curr_freq := 0.0
incr_freq := flow_data_spec.sample_rate / number_of_samples
for index, curr_complex := range complex_fft { // we really only use half this range + 1
// if index <= nyquist_limit_index {
if index <= nyquist_limit_index && curr_freq >= min_freq && curr_freq < max_freq {
curr_real = real(curr_complex) // pluck out real portion of imaginary number
curr_imag = imag(curr_complex) // ditto for im
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples
curr_theta = math.Atan2(curr_imag, curr_real)
curr_dftt := discrete_fft{
real: 2.0 * curr_real,
imaginary: 2.0 * curr_imag,
magnitude: curr_mag,
theta: curr_theta,
}
if curr_dftt.magnitude > max_magnitude {
max_magnitude = curr_dftt.magnitude
}
if curr_dftt.magnitude < min_magnitude {
min_magnitude = curr_dftt.magnitude
}
// ... now stow it
all_dft = append(all_dft, curr_dftt)
all_magnitudes = append(all_magnitudes, curr_mag)
}
curr_freq += incr_freq
}
return all_dft, max_magnitude, min_magnitude, all_magnitudes
}
Now you have an array all_magnitudes where each element of the array is the magnitude of that frequency bin ... each frequency bin is evenly spaced by a frequency increment defined by above var incr_freq ... normalize the magnitude using min and max_magnitude ... its ready to feed into an X,Y plot to give you the spectrogram visualization
I suggest cracking open some books ... watch the video I mention in above comments ... my voyage of exploration into wonders of the Fourier Transform has been ongoing since being an EE undergrad and its loaded with surprising applications and its theory continues to be a very active research domain
I've managed to join audio tracks of video files using MediaCodec. There are no problems doing so if the channel count and the sample rate of both audio tracks are the same.
(for some reason the OMX.SEC.aac.dec always outputs 44100 Hz 2 channel audio if the original track is a 22050 Hz, and outputs 48000 Hz 2 channel audio if the original track is 24000 Hz.)
The problem comes in when I try appending a 24000 Hz audio track after a 22050 Hz audio track. Assuming I want to output a 22050 Hz audio track consisting of both said tracks, I'll have to resample the 24000 Hz one.
I tried this:
private byte[] minorDownsamplingFrom48kTo44k(byte[] origByteArray)
{
int origLength = origByteArray.length;
int moddedLength = origLength * 147/160;
int delta = origLength - moddedLength;
byte[] resultByteArray = new byte[moddedLength];
int arrayIndex = 0;
for(int i = 0; i < origLength; i+=11)
{
for(int j = i; j < i+10; j++)
{
resultByteArray[arrayIndex] = origByteArray[j];
arrayIndex++;
}
}
return resultByteArray;
}
It returns a byte array of 3700-something bytes and the correct audio after the encoding... behind a very loud scrambled sound.
My questions:
How do I correctly downsample the audio track without leaving such artifacts? Should I use average values?
Should I use a resampler implemented using OpenSL ES to make the process faster and/or better?
The main issue is that you're just skipping bytes when you should be skipping samples.
Each sample is 16 bits, so two bytes. If the audio is stereo there are four bytes per sample. You have to always skip that many bytes or otherwise your samples will get completely mixed up.
Using the same ratio (10/11) you can use 40/44 to always skip a full four-byte sample and keep the samples proper.
As to why the resulting video is playing at a different speed, that's something completely different.
I'm using AudioRecord to get audio in real-time from the device microphone, and encoding / saving it to file in the background using the MediaCodec and MediaMuxer classes.
Is there any way to change the Pitch and (or) Tempo of the audio stream before it is saved to file?
By pitch/tempo, do you mean the frequency itself, or really the speed of the samples? If so, then each sample should be projected in a shorter or longer period of time:
Example:
private static byte[] ChangePitch(byte[] samples, float ratio) {
byte[] result = new byte[(int)(Math.Floor (samples.Length * ratio))];
for (int i = 0; i < result.Length; i++) {
var pointer = (int)((float)i / ratio);
result [i] = samples [pointer];
}
return result;
}
If you just want to change the pitch without affecting the speed, then you need to read about phase vocoder. This is sound science, and there are a lot of projects to achieve this. https://en.wikipedia.org/wiki/Phase_vocoder
To modify the pitch/tempo of the audio stream you'll have to resample it yourself before you encode it using the codec. Keep in mind that you also need to modify the timestamps if you change the tempo of the stream.
I want to apply FFT on a signal recorded by AudioRecorder and saved to a wav file. the FFT i am using has a Complex[] input parameter. I am confused, is there a difference between converting from bytes to comlex dividing by 32768, and converting by just adding 0 to the imaginary part and leaving the real part as a byte?
Edit:
public Complex[] convertToComplex(byte[] file)
{
int size= file.length;
double[]x=new double[size];
Complex[]data= new Complex[size];
for(int i=0;i<size;i++)
{
x[i]=file[i]/32768.0;
data[i]=new Complex(x[i],0);
// Log.d("tag", "indice"+i+":"+data[i]);
}
return data;
}
If you are working with audio with a bit depth of 16 bits (each sample has 16 bits), then each byte will only have half of a sample.What you need to do is cast your bytes to 16 bit samples then divide the resulting number by 32768 (This is the magnitude of the smallest number a 2's complement 16 bit number can store i.e 2^15) to get the actual audio sample which is a number between -1 and 1.You will then convert this number to a complex number by setting it's imaginary component to 0.
A small C# sample can be seen below (indicative code):
byte[] myAudioBytes = readAudio();
int numBytes = myAudioBytes.Length;
var myAudioSamples = new List<short>();
for( int i = 0; i < numBytes; i = i + 2)
{
//Cast to 16 bit audio and then add sample
short sample = (short) ((myAudioBytes[i] << 8 | myAudioBytes[i + 1]) / 32768 );
myAudioSamples.Add(sample);
}
//Change real audio to Complex audio
var complexAudio = new Complex[myAudioSamples.Length];
int i = 0;
foreach(short sample in myAudioSamples)
complexAudio[i++] = new Complex(){ Real = sample, Imaginary = 0 };
//Now you can proceed to getting the FFT of your Audio here
Hope the code has guided you on how you should handle your audio.
Generalized FFT functions like working with arrays of complex inputs and outputs. So, for input, you might need to create an array of complex numbers which conform to the Complex data structure that the FFT library wants. This will probably consist of a real and an imaginary component for each. Just set the imaginary portion to 0. The real portion is probably a signed floating point number that is expected to fall between -1.0..1.0, so you are on the right track with dividing integer PCM samples. However, when you wrote "converting bytes", that raised a red flag. These are probably signed, little endian, 16-bit integer PCM samples, so be sure to cast them accordingly before dividing by 32768 (but this is Java, so types will be enforced a bit more stringently anyway).
I am writing an Android app where I plan to encode several images in to a live h.264 video stream that can be replayed on any browser. I am using the MediaCodec API for encoding and then MediaMuxer to write it to a file as per the example here http://bigflake.com/mediacodec/.
What I am stuck with is that how to tell the encoder/muxer to encode it such that it can be progressively played back. From the examples only when the encoder/muxer.stop()/encoder/muxer.release() call is made, then the video file gets the right meta headers, etc..
Thanks
I guess you are considering the time at which each frame is shown.
You need to give MediaMuxer, along with the frame, the right "MediaCodec.BufferInfo," whose "presentationTimeUs" is set accordingly.
For example, there are 3 frames, each is shown for 1 second in the video:
sec 0---------1---------2-----------
frame1 frame2 frame3
int[] timestampSec = {0, 1, 2};
for (int i = 0; i < 3; i++) {
muxer.writeSampleData(trackId,
frame[i],
timeStampSec[i] * 1000000);
}
As to initialization and ending of MediaMuxer:
addTrack: when you get index == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED when you call MediaCodec.dequeueOutputBuffer(), send the format to MediaMuxer to initialize a new track for this format(it's "vidio/avc" in this case).
mediamuxer.start()
start put frames as above
mediamuxer.stop(), release()