Android oboe glitch/noise/distortion

Android oboe glitch/noise/distortion - android

I'm trying to use oboe in my audio/video communication app, and I'm trying the onAudioReady round-trip callback as in the oboe guide: https://github.com/google/oboe/blob/main/docs/FullGuide.md
Now I'm frustrating:
If the read directly write into the *audioData, the sound quality is perfect, i.e.:
auto result = recordingStream->read(audioData, numFrames, 0);
But if I add a buffer between them, there is significant noise/glitch which is very undesirable:
auto result = recordingStream->read(buffer, numFrames, 0);
std::copy(buffer, buffer + numFrames, static_cast<int16_t *>(audioData));
By inspecting log, this buffering action is done within 1ms, suppose won't hurt?
Both 1 and 2 also use PCM_I16 audio format, buffer is int16_t * with size of numFrames.
Hopefully someone can point out what's wrong to cause this? Sorry I'm lack of audio processing and c++ knowledge.

I've figured it out because the channel is stereo, samples per frames are 2, i.e.:
auto result = recordingStream->read(buffer, numFrames, 0);
std::copy(buffer, buffer + numFrames * 2, static_cast<int16_t *>(audioData));

Related

Android MediaCodec realtime h264 encoding/decoding latency

I'm working with Android MediaCodec and use it for a realtime H264 encoding and decoding frames from camera. I use MediaCodec in synchronous manner and render the output to the Surface of decoder and everething works fine except that I have a long latency from a realtime, it takes 1.5-2 seconds and I'm very confused why is it so.
I measured a total time of encoding and decoding processes and it keeps around 50-65 milliseconds so I think the problem isn't in them.
I tried to change the configuration of the encoder but it didn't help and currently it configured like this:
val formatEncoder = MediaFormat.createVideoFormat("video/avc", 1920, 1080)
formatEncoder.setInteger(MediaFormat.KEY_FRAME_RATE, 30)
formatEncoder.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5)
formatEncoder.setInteger(MediaFormat.KEY_BIT_RATE, 1920 * 1080)
formatEncoder.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
val encoder = MediaCodec.createEncoderByType("video/avc")
encoder.configure(formatEncoder, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
val inputSurface = encoder.createInputSurface() // I use it to send frames from camera to encoder
encoder.start()
Changing the configuration of the decoder also didn't help me at all and currently I configured it like this:
val formatDecoder = MediaFormat.createVideoFormat("video/avc", 1920, 1080)
val decoder = MediaCodec.createDecoderByType("video/avc")
decoder.configure(formatDecoder , outputSurface, null, 0) // I use outputSurface to render decoded frames into it
decoder.start()
I use the following timeouts for waiting for available encoder/decoder buffers I tried to reduce their values but it didn't help me and I left them like this:
var TIMEOUT_IN_BUFFER = 10000L // microseconds
var TIMEOUT_OUT_BUFFER = 10000L // microseconds
Also I measured the time of consuming the inputSurface a frame and this time takes 0.03-0.05 milliseconds so it isn't a bottleneck. Actually I measured all the places where a bottleneck could be, but I wasn't found anything and I think the problem is in the encoder or decoder itself or in their configurations, or maybe I should use some special routine for sending frames to encoding/decoding..
I also tried to use HW accelerated codec and it's the only thing that helped me, when I use it the latency reduces to ~ 500-800 milliseconds but it still doesn't fit me for a realtime streaming.
It seems to me that the encoder or decoder buffers several frames before start displaying them on the surface and eventually it leads to the latency and if it really so then how can I disable bufferization or reduce the time of it?
Please help me I'm stucking on this problem for about half a year and have no idea how to reduce the latency, I'm sure that it's possible because popular apps like Telegram, Viber, WhatsApp etc. work fine and without latency so what's the secret here?
UPD 07.07.2021:
I still haven't found a solution to get rid of the latency. I've tried to change h264 profiles, increase and decrease I-frame inteval, bitrate, framerate, but result the same, the only thing that hepls a little to reduce the latency - downgrade the resolution from 1920x1080 to e.g. 640x480, but this "solution" doesn't suit me because I want to encode/decode a realtime video with 1920x1080 resolution.
UPD 08.07.2021:
I found out that if I change the values of TIMEOUT_IN_BUFFER and TIMEOUT_OUT_BUFFER from 10_000L to 100_000L it decreases the latency a bit but increases the delay of showing the first frame quite a lot after start encoding/decoding process.

It's possible your encoder is producing B frames -- bilinear interpolation frames. They increase quality and latency, and are great for movies. But no good for low-latency applications.
Key frames = I (interframes)
Predicted frames = P (difference from previous frames)
Interpolated frames = B
A sequence of frames including B frames might look like this:
IBBBPBBBPBBBPBBBI
11111111
12345678901234567
The encoder must encode each P frame, and the decoder must decode it, before the preceding B frames make any sense. So in this example the frames get encoded out of order like this:
1 5 2 3 4 9 6 7 8 13 10 11 12 17 17 13 14 15
In this example the decoder can't handle frame 2 until the encoder has sent frame 5.
On the other hand, this sequence without B frames allows coding and decoding the frames in order.
IPPPPPPPPPPIPPPPPPPPP
Try using the Constrained Baseline Profile setting. It's designed for low latency and low power use. It suppresses B frames. I think this works.
mediaFormat.setInteger(
"profile",
CodecProfileLevel.AVCProfileConstrainedBaseline);

I believe android h264 decoder have latency (at-least in most cases i've tried). Probably that's why android developers added PARAMETER_KEY_LOW_LATENCY from API level 30.
However I could decrease the delay some frames by querying for the output some more times.
Reason: no idea. It's just result of boring trial and errors
int inputIndex = m_codec.dequeueInputBuffer(-1);// Pass in -1 here bc we don't have a playback time reference
if (inputIndex >= 0) {
ByteBuffer buffer;
if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.LOLLIPOP) {
buffer = m_codec.getInputBuffer(inputIndex);
} else {
ByteBuffer[] bbuf = m_codec.getInputBuffers();
buffer = bbuf[inputIndex];
}
buffer.put(frame);
// tell the decoder to process the frame
m_codec.queueInputBuffer(inputIndex, 0, frame.length, 0, 0);
}
MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
int outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}

You need to configure customized(or KEY_LOW_LATENCY if it is supported) low latency parameters for different cpu venders. It is a common problem for android phone.
Check this code https://github.com/moonlight-stream/moonlight-android/blob/master/app/src/main/java/com/limelight/binding/video/MediaCodecHelper.java

Underrun in Oboe/AAudio playback stream

I'm working on an Android app dealing with a device which is basically a USB microphone. I need to read the input data and process it. Sometimes, I need to send data the device (4 shorts * the number of channels which is usually 2) and this data does not depend on the input.
I'm using Oboe, and all the phones I use for testing use AAudio underneath.
The reading part works, but when I try to write data to the output stream, I get the following warning in logcat and nothing is written to the output:
W/AudioTrack: releaseBuffer() track 0x78e80a0400 disabled due to previous underrun, restarting
Here's my callback:
oboe::DataCallbackResult
OboeEngine::onAudioReady(oboe::AudioStream *oboeStream, void *audioData, int32_t numFrames) {
// check if there's data to write, agcData is a buffer previously allocated
// and h2iaudio::getAgc() returns true if data's available
if (h2iaudio::getAgc(this->agcData)) {
// padding the buffer
short* padPos = this->agcData+ 4 * playStream->getChannelCount();
memset(padPos, 0,
static_cast<size_t>((numFrames - 4) * playStream->getBytesPerFrame()));
// write the data
oboe::ResultWithValue<int32_t> result =
this->playStream->write(this->agcData, numFrames, 1);
if (result != oboe::Result::OK){
LOGE("Failed to create stream. Error: %s",
oboe::convertToText(result.error()));
return oboe::DataCallbackResult::Stop;
}
}else{
// if there's nothing to write, write silence
memset(this->agcData, 0,
static_cast<size_t>(numFrames * playStream->getBytesPerFrame()));
}
// data processing here
h2iaudio::processData(static_cast<short*>(audioData),
static_cast<size_t>(numFrames * oboeStream->getChannelCount()),
oboeStream->getSampleRate());
return oboe::DataCallbackResult::Continue;
}
//...
oboe::AudioStreamBuilder *OboeEngine::setupRecordingStreamParameters(
oboe::AudioStreamBuilder *builder) {
builder->setCallback(this)
->setDeviceId(this->recordingDeviceId)
->setDirection(oboe::Direction::Input)
->setSampleRate(this->sampleRate)
->setChannelCount(this->inputChannelCount)
->setFramesPerCallback(1024);
return setupCommonStreamParameters(builder);
}
As seen in setupRecordingStreamParameters, I'm registering the callback to the input stream. In all the Oboe examples, the callback is registered on the output stream, and the reading is blocking. Does this have an importance? If not, how many frames do I need to write to the stream to avoid underruns?
EDIT
In the meantime, I found the source of the underruns. The output stream was not reading the same amount of frames as the input stream (which in hindsight seems logical), so writing the amount of frames given by playStream->getFramesPerBurst() fix my issue. Here's my new callback:
oboe::DataCallbackResult
OboeEngine::onAudioReady(oboe::AudioStream *oboeStream, void *audioData, int32_t numFrames) {
int framesToWrite = playStream->getFramesPerBurst();
memset(agcData, 0, static_cast<size_t>(framesToWrite *
this->playStream->getChannelCount()));
h2iaudio::getAgc(agcData);
oboe::ResultWithValue<int32_t> result =
this->playStream->write(agcData, framesToWrite, 0);
if (result != oboe::Result::OK) {
LOGE("Failed to write AGC data. Error: %s",
oboe::convertToText(result.error()));
}
// data processing here
h2iaudio::processData(static_cast<short*>(audioData),
static_cast<size_t>(numFrames * oboeStream->getChannelCount()),
oboeStream->getSampleRate());
return oboe::DataCallbackResult::Continue;
}
It works this way, I'll change which stream has the callback attached if I notice any performance issue, for now I'll keep it this way.

Sometimes, I need to send data the device
You always need to write data to the output. Generally you need to write at least numFrames, maybe more. If you don't have any valid data to send then write zeros.
Warning: in your else block you are calling memset() but not writing to the stream.
->setFramesPerCallback(1024);
Do you need 1024 specifically? Is that for an FFT? If not then AAudio can optimize the callbacks better if the FramesPerCallback is not specified.
In all the Oboe examples, the callback is registered on the output stream,
and the reading is blocking. Does this have an importance?
Actually the read is NON-blocking. Whatever stream does not have the callback should be non-blocking. Use a timeoutNanos=0.
It is important to use the output stream for the callback if you want low latency. That is because the output stream can only provide low latency mode with callbacks and not with direct write()s. But an input stream can provide low latency with both callback and with read()s.
Once the streams are stabilized then you can read or write the same number of frames in each callback. But before it is stable, you may need to to read or write extra frames.
With an output callback you should drain the input for a while so that it is running close to empty.
With an input callback you should fill the output for a while so that it is running close to full.
write(this->agcData, numFrames, 1);
Your 1 nanosecond timeout is very small. But Oboe will still block. You should use a timeoutNanos of 0 for non-blocking mode.

According to Oboe documentation, during the onAudioReady callback, you have to write exactly numFrames frames directly into the buffer pointed to by *audioData. And you do not have to call Oboe "write" function but, instead, fill the buffer by yourself.
Not sure how your getAgc() function works but maybe you can give that function the pointer audioData as an argument to avoid having to copy data again from one buffer to another one.
If you really need the onAudioReady callback to request the same amount of frames, then you have to set that number while building the AudioStream using:
oboe::AudioStreamBuilder::setFramesPerCallback(int framesPerCallback)
Look here at the things that you should not do during an onAudioReady callback and you will find that oboe write function is forbidden:
https://google.github.io/oboe/reference/classoboe_1_1_audio_stream_callback.html

Regulate Android AudioTrack playback speed

I'm currently trying to playback audio using AudioTrack. Audio is received over the network and application continuously read data and add to an internal buffer. A separate thread is consuming data and using AudioTrack to playback.
Problems:
Audio playback fluctuate (feels like audio drop at a regular interval) continuously making it unclear.
Playback speed is too high or too low making them unrealistic.
In order to avoid the network latency and other factors I made the application to wait till it read enough data and playback at the end.
This makes the audio to play really fast. Here is a basic sample of logic I use.
sampleRate = AudioTrack.getNativeOutputSampleRate(AudioManager.STREAM_MUSIC);
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT),
AudioTrack.MODE_STREAM);
audioTrack.play();
short shortBuffer[] = new short[AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT)];
while (!stopRequested){
readData(shortBuffer);
audioTrack.write(shortBuffer, 0, shortBuffer.length, AudioTrack.WRITE_BLOCKING);
}
Is it correct to say that Android AudiTrack class doesn't have in built functionality to control the audio playback based on environment conditions? If so, are there better libraries available with a simplified way for audio playback?

The first issue that I see, it is an arbitrary sampling rate.
AudioTrack.getNativeOutputSampleRate will return the sampling rate that used by the sound system. It may be 44100, 48000, 96000, 192000 or whatever. But looks like you have audio data from some independent source, which produces the data on the very exact sampling rate.
Let's say audio data from the source is sampled at 44100 samples per second. If you start playing it at 96000 it will be speeded up and higher pitched.
So, use the sampling rate setting, along with the number of channels, sample format etc, as it given by the source, not relying on system defaults.
The second: are you sure the readData procedure always will be fast enough to successfully fill the buffer, whatever small the buffer is, and return back faster than the buffer is played?
You have created AudioTrack with AudioTrack.getMinBufferSize passed as bufferSizeInBytes parameter.
The getMinBufferSize function returns a minimum possible size of the buffer that can be used at this parameter. Let's say it returned the size corresponding to a buffer of 10ms length.
That means the new data should be prepared within this time interval. I.e. The time interval between previous write returned control and new write is performed should be less than the time size of the buffer.
So, if the readData function may delay for some reason longer than that time interval, the playback will be paused for that time, you'll hear small gaps in the playback.
The reasons why readData may delay could be various: if it's reading data from the file, then it may delay waiting for IO operations; if it allocates java objects, it may be bumped into garbage collector's delay; if it uses some kind of decoder of another kind of audio source which uses it's own buffering, it may periodically delay refilling the buffer.
But anyway, if you're not creating some kind of real-time synthesizer which should react as soon as possible to the user input, always use the buffer size reasonably high, but not less than getMinBufferSize returned. I.e.:
sampleRate = 44100;// sampling rate of the source
int bufSize = sampleRate * 4; // 1 second length; 4 - is the frame size: 2 chanels * 2 bytes per each sample
bufSize = max(bufSize, AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT)); // Not less than getMinBufferSize returns
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
bufSize,
AudioTrack.MODE_STREAM);

Like user #pskink said,
Most likely your sampleRate (or any other parameter passed to the
AudioTrack constructor) is invalid.
So I would start by checking what value you are actually setting the sample rate.
For reference, you can also set the speed of AudioTrack by calling the setPlayBackParams method:
public void setPlaybackParams (PlaybackParams params)
If you check the AudioTrack docs, you can see the PlaybackParams docs and can set the speed and pitch of the output audio. This object can then be passed to set the playback parameters within your AudioTrack object.
However, it is unlikely that you will need to use this if your only issue is the original constructor sampleRate (since we cannot see where the variable sampleRate comes from).

How to handle the PTS correctly using Android AudioRecord and MediaCodec as audio encoder?

I'm using AudioRecord to record the audio stream during a camera capturing process on Android device.
Since I want to process the frame data and handle audio/video samples, I do not use MediaRecorder.
I run AudioRecord in another thread with the calling of read() to gather the raw audio data.
Once I get a data stream, I feed them into an MediaCodec configured as an AAC audio encoder.
Here are some of my codes about the audio recorder / encoder:
m_encode_audio_mime = "audio/mp4a-latm";
m_audio_sample_rate = 44100;
m_audio_channels = AudioFormat.CHANNEL_IN_MONO;
m_audio_channel_count = (m_audio_channels == AudioFormat.CHANNEL_IN_MONO ? 1 : 2);
int audio_bit_rate = 64000;
int audio_data_format = AudioFormat.ENCODING_PCM_16BIT;
m_audio_buffer_size = AudioRecord.getMinBufferSize(m_audio_sample_rate, m_audio_channels, audio_data_format) * 2;
m_audio_recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, m_audio_sample_rate,
m_audio_channels, audio_data_format, m_audio_buffer_size);
m_audio_encoder = MediaCodec.createEncoderByType(m_encode_audio_mime);
MediaFormat audio_format = new MediaFormat();
audio_format.setString(MediaFormat.KEY_MIME, m_encode_audio_mime);
audio_format.setInteger(MediaFormat.KEY_BIT_RATE, audio_bit_rate);
audio_format.setInteger(MediaFormat.KEY_CHANNEL_COUNT, m_audio_channel_count);
audio_format.setInteger(MediaFormat.KEY_SAMPLE_RATE, m_audio_sample_rate);
audio_format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
audio_format.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, m_audio_buffer_size);
m_audio_encoder.configure(audio_format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
I found that the first time of AudioRecord.read() takes longer time to return, while the successive read() have time intervals that are more close to the real time of audio data.
For example, my audio format is 44100Hz 16Bit 1Channel, and the buffer size of AudioRecord is 16384, so a full buffer means 185.76 ms. When I record the system time for each call of read() and subtracting them from a base time, I get the following sequence:
time before each read(): 0ms, 345ms, 543ms, 692ms, 891ms, 1093ms, 1244ms, ...
I feed these raw data to the audio encoder with the above time values as PTS, and the encoder outputs encoded audio samples with the following PTS:
encoder output PTS: 0ms, 185ms, 371ms, 557ms, 743ms, 928ms, ...
It looks like that the encoder treats each part of data as having the same time period. I believe that the encoder works correctly since I give it raw data with the same size (16384) every time. However, if I use the encoder output PTS as the input of muxer, I'll get a video with audio content being faster then video content.
I want to ask that:
Is it expected that the first time of AudioRecord.read() blocks longer? I'm sure that the function call takes more than 300ms while it only records 16384 bytes as 186ms. Is this also an issue that depends on device / Android version?
What should I do to achieve audio/video synchronization? I have a workaround to measure the delay time of the first call of read(), then shift the PTS of audio samples by the delay. Is there another better way to handle this?

Convert the mono input to stereo. I was pulling my hair out for some time before I realised the AAC encoder exposed by MediaCoder only works with stereo input.

need to understad how AudioRecord and AudioTrack work for raw PCM capture and playback

I use the following code in a Thread to capture raw audio samples from the microphone and play it back through the speaker.
public void run(){
short[] lin = new short[SIZE_OF_RECORD_ARRAY];
int num = 0;
// am = (AudioManager) this.getSystemService(Context.AUDIO_SERVICE); // -> MOVED THESE TO init()
// am.setMode(AudioManager.MODE_IN_COMMUNICATION);
record.startRecording();
track.play();
while (passThroughMode) {
// while (!isInterrupted()) {
num = record.read(lin, 0, SIZE_OF_RECORD_ARRAY);
for(i=0;i<lin.length;i++)
lin[i] *= WAV_SAMPLE_MULTIPLICATION_FACTOR;
track.write(lin, 0, num);
}
// /*
record.stop();
track.stop();
record.release();
track.release();
// */
}
where record is an AudioRecord and track is an Audiotrack. I need to know in detail (and in a simplified way if possible) how the AudioRecord stores PCM data and AudioTrack plays PCM data. This is how I have understood it so far:
As the while() loop is continuously running, record obtains SIZE_OF_RECORD_ARRAY number of samples (which is 1024 for now) as shown in the figure. The samples get saved contiguously in the lin[] array of shorts (16 bit shorts, as I am using 16 bit PCM encoding). This is done by record.read(). Then track.write() places these samples in the speaker which is played by the hardware. Is this correct or am I missing something here?

As for how the samples are laid out in memory; they're just arrays of linear approximations to a sound wave, taken at discrete times (like your figure shows). In the case of stereo, the samples will be interleaved (LRLRLRLR...).
When it comes to the path the audio takes, you're essentially right, although there are a few more steps involved:
Writing data to your Java AudioTrack causes it to make a JNI (Java Native Interface) call to a native helper class, which in turn calls the native AudioTrack class.
The AudioTracks are owned by the AudioFlinger, which periodically takes data from all the AudioTracks on a given output thread (which have been mixed by the AudioMixer) and writes it to the audio HAL output stream class.
From there the data goes to the user-space ALSA library, and through a couple of intermediate steps to the kernel-space PCM driver. Then further on from there; typically going through some kind of DSP that applies various acoustic compensation filters, and eventually making it's way to the hardware codec, which controls the speaker DAC and amplifiers.
When recording from the internal microphone(s) you'd have more or less the same steps, except that they'd be done in the opposite order.
Note that some of these steps (essentially everything from the audio HAL and below) are platform-specific, and therefore might differ between platforms from different vendors (and even different platforms from the same vendor).

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.