Regulate Android AudioTrack playback speed

Regulate Android AudioTrack playback speed - android

I'm currently trying to playback audio using AudioTrack. Audio is received over the network and application continuously read data and add to an internal buffer. A separate thread is consuming data and using AudioTrack to playback.
Problems:
Audio playback fluctuate (feels like audio drop at a regular interval) continuously making it unclear.
Playback speed is too high or too low making them unrealistic.
In order to avoid the network latency and other factors I made the application to wait till it read enough data and playback at the end.
This makes the audio to play really fast. Here is a basic sample of logic I use.
sampleRate = AudioTrack.getNativeOutputSampleRate(AudioManager.STREAM_MUSIC);
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT),
AudioTrack.MODE_STREAM);
audioTrack.play();
short shortBuffer[] = new short[AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT)];
while (!stopRequested){
readData(shortBuffer);
audioTrack.write(shortBuffer, 0, shortBuffer.length, AudioTrack.WRITE_BLOCKING);
}
Is it correct to say that Android AudiTrack class doesn't have in built functionality to control the audio playback based on environment conditions? If so, are there better libraries available with a simplified way for audio playback?

The first issue that I see, it is an arbitrary sampling rate.
AudioTrack.getNativeOutputSampleRate will return the sampling rate that used by the sound system. It may be 44100, 48000, 96000, 192000 or whatever. But looks like you have audio data from some independent source, which produces the data on the very exact sampling rate.
Let's say audio data from the source is sampled at 44100 samples per second. If you start playing it at 96000 it will be speeded up and higher pitched.
So, use the sampling rate setting, along with the number of channels, sample format etc, as it given by the source, not relying on system defaults.
The second: are you sure the readData procedure always will be fast enough to successfully fill the buffer, whatever small the buffer is, and return back faster than the buffer is played?
You have created AudioTrack with AudioTrack.getMinBufferSize passed as bufferSizeInBytes parameter.
The getMinBufferSize function returns a minimum possible size of the buffer that can be used at this parameter. Let's say it returned the size corresponding to a buffer of 10ms length.
That means the new data should be prepared within this time interval. I.e. The time interval between previous write returned control and new write is performed should be less than the time size of the buffer.
So, if the readData function may delay for some reason longer than that time interval, the playback will be paused for that time, you'll hear small gaps in the playback.
The reasons why readData may delay could be various: if it's reading data from the file, then it may delay waiting for IO operations; if it allocates java objects, it may be bumped into garbage collector's delay; if it uses some kind of decoder of another kind of audio source which uses it's own buffering, it may periodically delay refilling the buffer.
But anyway, if you're not creating some kind of real-time synthesizer which should react as soon as possible to the user input, always use the buffer size reasonably high, but not less than getMinBufferSize returned. I.e.:
sampleRate = 44100;// sampling rate of the source
int bufSize = sampleRate * 4; // 1 second length; 4 - is the frame size: 2 chanels * 2 bytes per each sample
bufSize = max(bufSize, AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT)); // Not less than getMinBufferSize returns
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
bufSize,
AudioTrack.MODE_STREAM);

Like user #pskink said,
Most likely your sampleRate (or any other parameter passed to the
AudioTrack constructor) is invalid.
So I would start by checking what value you are actually setting the sample rate.
For reference, you can also set the speed of AudioTrack by calling the setPlayBackParams method:
public void setPlaybackParams (PlaybackParams params)
If you check the AudioTrack docs, you can see the PlaybackParams docs and can set the speed and pitch of the output audio. This object can then be passed to set the playback parameters within your AudioTrack object.
However, it is unlikely that you will need to use this if your only issue is the original constructor sampleRate (since we cannot see where the variable sampleRate comes from).

Related

How to handle the PTS correctly using Android AudioRecord and MediaCodec as audio encoder?

I'm using AudioRecord to record the audio stream during a camera capturing process on Android device.
Since I want to process the frame data and handle audio/video samples, I do not use MediaRecorder.
I run AudioRecord in another thread with the calling of read() to gather the raw audio data.
Once I get a data stream, I feed them into an MediaCodec configured as an AAC audio encoder.
Here are some of my codes about the audio recorder / encoder:
m_encode_audio_mime = "audio/mp4a-latm";
m_audio_sample_rate = 44100;
m_audio_channels = AudioFormat.CHANNEL_IN_MONO;
m_audio_channel_count = (m_audio_channels == AudioFormat.CHANNEL_IN_MONO ? 1 : 2);
int audio_bit_rate = 64000;
int audio_data_format = AudioFormat.ENCODING_PCM_16BIT;
m_audio_buffer_size = AudioRecord.getMinBufferSize(m_audio_sample_rate, m_audio_channels, audio_data_format) * 2;
m_audio_recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, m_audio_sample_rate,
m_audio_channels, audio_data_format, m_audio_buffer_size);
m_audio_encoder = MediaCodec.createEncoderByType(m_encode_audio_mime);
MediaFormat audio_format = new MediaFormat();
audio_format.setString(MediaFormat.KEY_MIME, m_encode_audio_mime);
audio_format.setInteger(MediaFormat.KEY_BIT_RATE, audio_bit_rate);
audio_format.setInteger(MediaFormat.KEY_CHANNEL_COUNT, m_audio_channel_count);
audio_format.setInteger(MediaFormat.KEY_SAMPLE_RATE, m_audio_sample_rate);
audio_format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
audio_format.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, m_audio_buffer_size);
m_audio_encoder.configure(audio_format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
I found that the first time of AudioRecord.read() takes longer time to return, while the successive read() have time intervals that are more close to the real time of audio data.
For example, my audio format is 44100Hz 16Bit 1Channel, and the buffer size of AudioRecord is 16384, so a full buffer means 185.76 ms. When I record the system time for each call of read() and subtracting them from a base time, I get the following sequence:
time before each read(): 0ms, 345ms, 543ms, 692ms, 891ms, 1093ms, 1244ms, ...
I feed these raw data to the audio encoder with the above time values as PTS, and the encoder outputs encoded audio samples with the following PTS:
encoder output PTS: 0ms, 185ms, 371ms, 557ms, 743ms, 928ms, ...
It looks like that the encoder treats each part of data as having the same time period. I believe that the encoder works correctly since I give it raw data with the same size (16384) every time. However, if I use the encoder output PTS as the input of muxer, I'll get a video with audio content being faster then video content.
I want to ask that:
Is it expected that the first time of AudioRecord.read() blocks longer? I'm sure that the function call takes more than 300ms while it only records 16384 bytes as 186ms. Is this also an issue that depends on device / Android version?
What should I do to achieve audio/video synchronization? I have a workaround to measure the delay time of the first call of read(), then shift the PTS of audio samples by the delay. Is there another better way to handle this?

Convert the mono input to stereo. I was pulling my hair out for some time before I realised the AAC encoder exposed by MediaCoder only works with stereo input.

need to understad how AudioRecord and AudioTrack work for raw PCM capture and playback

I use the following code in a Thread to capture raw audio samples from the microphone and play it back through the speaker.
public void run(){
short[] lin = new short[SIZE_OF_RECORD_ARRAY];
int num = 0;
// am = (AudioManager) this.getSystemService(Context.AUDIO_SERVICE); // -> MOVED THESE TO init()
// am.setMode(AudioManager.MODE_IN_COMMUNICATION);
record.startRecording();
track.play();
while (passThroughMode) {
// while (!isInterrupted()) {
num = record.read(lin, 0, SIZE_OF_RECORD_ARRAY);
for(i=0;i<lin.length;i++)
lin[i] *= WAV_SAMPLE_MULTIPLICATION_FACTOR;
track.write(lin, 0, num);
}
// /*
record.stop();
track.stop();
record.release();
track.release();
// */
}
where record is an AudioRecord and track is an Audiotrack. I need to know in detail (and in a simplified way if possible) how the AudioRecord stores PCM data and AudioTrack plays PCM data. This is how I have understood it so far:
As the while() loop is continuously running, record obtains SIZE_OF_RECORD_ARRAY number of samples (which is 1024 for now) as shown in the figure. The samples get saved contiguously in the lin[] array of shorts (16 bit shorts, as I am using 16 bit PCM encoding). This is done by record.read(). Then track.write() places these samples in the speaker which is played by the hardware. Is this correct or am I missing something here?

As for how the samples are laid out in memory; they're just arrays of linear approximations to a sound wave, taken at discrete times (like your figure shows). In the case of stereo, the samples will be interleaved (LRLRLRLR...).
When it comes to the path the audio takes, you're essentially right, although there are a few more steps involved:
Writing data to your Java AudioTrack causes it to make a JNI (Java Native Interface) call to a native helper class, which in turn calls the native AudioTrack class.
The AudioTracks are owned by the AudioFlinger, which periodically takes data from all the AudioTracks on a given output thread (which have been mixed by the AudioMixer) and writes it to the audio HAL output stream class.
From there the data goes to the user-space ALSA library, and through a couple of intermediate steps to the kernel-space PCM driver. Then further on from there; typically going through some kind of DSP that applies various acoustic compensation filters, and eventually making it's way to the hardware codec, which controls the speaker DAC and amplifiers.
When recording from the internal microphone(s) you'd have more or less the same steps, except that they'd be done in the opposite order.
Note that some of these steps (essentially everything from the audio HAL and below) are platform-specific, and therefore might differ between platforms from different vendors (and even different platforms from the same vendor).

What is the unit of sound when measuring by an Android smartphone?

Here is my code
private static int Fs = 44100;
private byte recorderAudiobuffer[] = new byte [1024];
AudioRecord recorder = new AudioRecord(AudioSource.MIC, Fs, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, 4096);
//start recorder
recorder.startRecording();
A value is identified by bin[n] and bin[n+1] of recorderAudiobuffer[].
What unit are these values?

The difference between the recordAudiobuffer[] elements would be 1/44100 second, since your sample rate is 44100 Hz.
As for the value of the bytes they hold, it could mean any sound level. If the sensitivity is low, the maximum value for the bytes could be, I don't know, 12, even if the sound you're recording is loud. On the other hand, with the sensitivity turned up, 255 could be a whisper, and it's railed after that.

This is more an audio question :), but here it goes, to be simple as much as possible, sound data has two dimensions, one is time based and other is sample based, so you have a sound separated on 44100 * (sample rate for example 16bits per amplitude) you have 705600 bits needed for one second or 88200, and in this case if you have a same sampling you will have 11.6 something milliseconds of buffer.
And of course you are asking about amplitude values.
Hope this helps and enjoy your work.

Playing back sound coming from microphone in real-time

I've been trying to get my application recording the sound coming from the microphone and playing it back in (approximately) real-time, however without success.
I'm using AudioRecord and AudioTrack classes for record and playback, respectively. I've tried different approaches, I've tried to record the incoming sound and write it to a file and it worked fine. I've also tried to playback sound from that file AFTER with AudioTrack and it worked fine too. The problem is when I try to play the sound in real-time, instead of reading a file after it's written.
Here is the code:
//variables
private int audioSource = MediaRecorder.AudioSource.MIC;
private int samplingRate = 44100; /* in Hz*/
private int channelConfig = AudioFormat.CHANNEL_CONFIGURATION_MONO;
private int audioFormat = AudioFormat.ENCODING_PCM_16BIT;
private int bufferSize = AudioRecord.getMinBufferSize(samplingRate, channelConfig, audioFormat);
private int sampleNumBits = 16;
private int numChannels = 1;
// …
AudioRecord recorder = new AudioRecord(audioSource, samplingRate, channelConfig, audioFormat, bufferSize);
recorder.startRecording();
isRecording = true;
AudioTrack audioPlayer = new AudioTrack(AudioManager.STREAM_MUSIC, 44100, AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT, bufferSize, AudioTrack.MODE_STREAM);
if(audioPlayer.getPlayState() != AudioTrack.PLAYSTATE_PLAYING)
audioPlayer.play();
//capture data and record to file
int readBytes=0, writtenBytes=0;
do{
readBytes = recorder.read(data, 0, bufferSize);
if(AudioRecord.ERROR_INVALID_OPERATION != readBytes){
writtenBytes += audioPlayer.write(data, 0, readBytes);
}
}
while(isRecording);
It is thrown a java.lang.IllegalStateException with the reason being caused by "play() called on a uninitialized AudioTrack".
However, if I change the AudioTrack initialization for example to use sampling rate 8000Hz and sample format 8 bits (instead of 16), it doesn't throw the exception anymore and the application runs, although it produces horrible noise.
When I play AudioTrack from a file, there is no problem with the initialization of the AudioTrack, I tried 44100 and 16 bits and it worked properly, producing the correct sound.
Any help ?

All native Android audio is encoded. You can only play out PCM formats in real time, or use a special streaming codec, which I don't think is trivial on Android.
The point is that if you want to record/play out audio simultaneously, you would have to create your own audio buffer and store raw PCM-encoded audio samples in there (I'm not sure if you're thinking duh! or whether this is all over your head, so I'll try to be clear but not to chew your own gum).
PCM is a digital representation of an analog signal in which your audio samples are a set of "snapshots" of the original acoustic wave. Because all kinds of clever mathematicians and engineers saw the potential in trying to reduce the number of bits you represent this data with, they came up with all sorts of encoders. The encoded (compressed) signal is represented very differently from the raw PCM signal and has to be decoded (en-cod-er+dec-oder = codec). Unless you're using special algorithms and media streaming codecs, it's impossible to play back an encoded signal like you're trying to, because it's not encoded sample by sample, but rather frame by frame, where you need the whole frame of samples, if not the complete signal, to decode this frame.
The way to do it is to manually store audio samples coming from the microphone buffer and manually feeding them to the output buffer. You will have to do some coding for that, but I believe there are some open-source apps that you can look at and take a peak at their source (unless you're willing to sell your app later on, of course, but that's a whole different discussion).
If you're developing for Android 2.3 or later and are not too scared of programming in native code, you can try using OpenSL ES. The Android-specific features of OpenSL ES are listed here. This platform allows you somewhat more flexible audio manipulation and you might find just what you need, if your app will be highly reliant on audio processing.

It is thrown a java.lang.IllegalStateException with the reason being
caused by "play() called on a uninitialized AudioTrack".
It is because the buffer size too small. I tried "bufferSize += 2048;", it's ok then.

I had a similar problem and I solved it by adding this permission to the manifest file:
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS"/>

make sure that your var data is enough for samplingRate
Ex: if you use samplingRate as 44100 your data bytearrays's length should be 44101 or more

AudioRecord and AudioTrack latency

I'm trying to develop an aplication like iRig for android, so the first step is to capture the mic input and play it at the same time.
I have it, but the problem is that i get some latency that makes this unusable, and if I start processing the buffer i'm afraid it will get totally unusable.
I use audiorecord and audiotrack like this:
new Thread(new Runnable() {
public void run() {
while(mRunning){
mRecorder.read(mBuffer, 0, mBufferSize);
//Todo: Apply filters here into the buffer and then play it modified
mPlayer.write(mBuffer, 0, mBufferSize);
//Log.v("MY AMP","ARA");
}
And the inicialization this way:
// ==================== INITIALIZE ========================= //
public void initialize(){
mBufferSize = AudioRecord.getMinBufferSize(mHz,
AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT);
mBufferSize2 = AudioTrack.getMinBufferSize(mHz,
AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT);
mBuffer = new byte[mBufferSize];
Log.v("MY AMP","Buffer size:" + mBufferSize);
mRecorder = new AudioRecord(MediaRecorder.AudioSource.MIC,
mHz,
AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT,
mBufferSize);
mPlayer = new AudioTrack(AudioManager.STREAM_MUSIC,
mHz,
AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT,
mBufferSize2,
AudioTrack.MODE_STREAM);
}
do you know how to get a faster response?
Thanks!

Android's AudioTrack\AudioRecord classes have high latency due to minimum buffer sizes.
The reason for those buffer sizes is to minimize drops when GC's occur according to Google (which is a wrong decision in my opinion, you can optimize your own memory management).
What you want to do is use OpenSL, which is available from 2.3. It contains native APIs for streaming audio.
Here's some docs:
http://mobilepearls.com/labs/native-android-api/opensles/index.html

Just a thought, but shouldn't you be reading < mBufferSize

As mSparks pointed out, streaming should be made using smaller read size: you don't need to read the full buffer to stream data!
int read = mRecorder.read(mBuffer, 0, 256); /* Or any other magic number */
if (read>0) {
mPlayer.write(mBuffer, 0, read);
}
This will reduce drastically your latency. If mHz is 44100 and your are in MONO configuration with 256 your latency will be no less then 1000 * 256/44100 milliseconds = ~5.8 ms.
256/44100 is the conversion from samples to seconds, so multiplying by 1000 gives you milliseconds.
The problems is internal implementation of the player. You don't have control about that from java. Hope this helps someone :)

My first instict was to suggest initting AudioTrack into static mode rather than streaming mode, since static mode has notably smaller latency. However, Static Mode is more appropriate for short sounds that fit entirely in memory rather than a sound you are capturing from elsewhere. But just as a wild guess, what if you set AudioTrack to static mode and feed it discrete chunks of your input audio?
If you want tighter control over audio, I'd recommend taking a look at OpenSL ES for Android. The learning curve will be a bit steeper, but you get much more fine-grained control and lower latency.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.