Accurate POSIX thread timing using NDK

Accurate POSIX thread timing using NDK - android

I'm writing a simple NDK OpenSL ES audio app that records the users touches on a virtual piano keyboard and then plays them back forever over a set loop. After much experimenting and reading, I've settled on using a separate POSIX loop to achieve this. As you can see in the code it subtracts any processing time taken from the sleep time in order to make the interval of each loop as close to the desired sleep interval as possible (in this case it's 5000000 nanoseconds.
void init_timing_loop() {
pthread_t fade_in;
pthread_create(&fade_in, NULL, timing_loop, (void*)NULL);
}
void* timing_loop(void* args) {
while (1) {
clock_gettime(CLOCK_MONOTONIC, &timing.start_time_s);
tic_counter(); // simple logic gates that cycle the current tic
play_all_parts(); // for-loops through all parts and plays any notes (From an OpenSL buffer) that fall on the current tic
clock_gettime(CLOCK_MONOTONIC, &timing.finish_time_s);
timing.diff_time_s.tv_nsec = (5000000 - (timing.finish_time_s.tv_nsec - timing.start_time_s.tv_nsec));
nanosleep(&timing.diff_time_s, NULL);
}
return NULL;
}
The problem is that even using this the results are better, but quite inconsistent. sometimes notes will delay for perhaps even 50ms at a time, which makes for very wonky playback.
Is there a better way of approaching this? To debug I ran the following code:
gettimeofday(&timing.curr_time, &timing.tzp);
__android_log_print(ANDROID_LOG_DEBUG, "timing_loop", "gettimeofday: %d %d",
timing.curr_time.tv_sec, timing.curr_time.tv_usec);
Which gives a fairly consistent readout - that doesn't reflect the playback inaccuracies whatsoever. Are there other forces at work with Android preventing accurate timing? Or is OpenSL ES a potential issue? All the buffer data is loaded into memory - could there be bottlenecks there?
Happy to post more OpenSL code if needed... but at this stage I'm trying figure out if this thread loop is accurate or if there's a better way to do it.

You should consider seconds when using clock_gettime as well, you may get greater timing.start_time_s.tv_nsec than timing.finish_time_s.tv_nsec. tv_nsec starts from zero when tv_sec is increased.
timing.diff_time_s.tv_nsec =
(5000000 - (timing.finish_time_s.tv_nsec - timing.start_time_s.tv_nsec));
try something like
#define NS_IN_SEC 1000000000
(timing.finish_time_s.tv_sec * NS_IN_SEC + timing.finish_time_s.tv_nsec) -
(timing.start_time_s.tv_nsec * NS_IN_SEC + timing.start_time_s.tv_nsec)

Related

audio latency issues

In the application which I want to create, I face some technical obstacles. I have two music tracks in the application. For example, a user imports the music background as a first track. The second path is a voice recorded by the user to the rhythm of the first track played by the speaker device (or headphones). At this moment we face latency. After recording and playing back in the app, the user hears the loss of synchronisation between tracks, which occurs because of the microphone and speaker latencies.
Firstly, I try to detect the delay by filtering the input sound. I use android’s AudioRecord class, and the method read(). This method fills my short array with audio data.
I found that the initial values of this array are zeros so I decided to cut them out before I will start to write them into the output stream.
So I consider those zeros as a „warmup” latency of the microphone. Is this approach correct? This operation gives some results, but it doesn’t resolve the problem, and at this stage, I’m far away from that.
But the worse case is with the delay between starting the speakers and playing the music. This delay I cannot filter or detect. I tried to create some calibration feature which counts the delay. I play a „beep” sound through the speakers, and when I start to play it, I also begin to measure time. Then, I start recording and listen for this sound being detected by the microphone. When I recognise this sound in the app, I stop measuring time. I repeat this process several times, and the final value is the average from those results. That is how I try to measure the latency of the device. Now, when I have this value, I can simply shift the second track backwards to achieve synchronisation of both records (I will lose some initial milliseconds of the recording, but I skip this case, for now, there are some possibilities to fix it).
I thought that this approach would resolve the problem, but it turned out this is not as simple as I thought. I found two issues here:
1. Delay while playing two tracks simultaneously
2. Random in device audio latency.
The first: I play two tracks using AudioTrack class and I run method play() like this:
val firstTrack = //creating a track
val secondTrack = //creating a track
firstTrack.play()
secondTrack.play()
This code causes delays at the stage of playing tracks. Now, I don’t even have to think about latency while recording; I cannot play two tracks simultaneously without delays. I tested this with some external audio file (not recorded in my app) - I’m starting the same audio file using the code above, and I can see a delay. I also tried it with MediaPlayer class, and I have the same results. In this case, I even try to play tracks when callback OnPreparedListener invoke:
val firstTrack = //AudioPlayer
val secondTrack = //AudioPlayer
second.setOnPreparedListener {
first.start()
second.start()
}
And it doesn’t help.
I know that there is one more class provided by Android called SoundPool. According to the documentation, it can be better with playing tracks simultaneously, but I can’t use it because it supports only small audio files and that can't limit me.
How can I resolve this problem? How can I start playing two tracks precisely at the same time?
The second: Audio latency is not deterministic - sometimes it is smaller, and sometimes it’s huge, and it’s out of my hands. So measuring device latency can help but again - it cannot resolve the problem.
To sum up: is there any solution, which can give me exact latency per device (or app session?) or other triggers which detect actual delay, to provide the best synchronisation while playback two tracks at the same time?
Thank you in advance!

Synchronising audio for karaoke apps is tough. The main issue you seem to be facing is variable latency in the output stream.
This is almost certainly caused by "warm up" latency: the time it takes from hitting "play" on your backing track to the first frame of audio data being rendered by the audio device (e.g. headphones). This can have large variance and is difficult to measure.
The first (and easiest) thing to try is to use MODE_STREAM when constructing your AudioTrack and prime it with bufferSizeInBytes of data prior to calling play (more here). This should result in lower, more consistent "warm up" latency.
A better way is to use the Android NDK to have a continuously running audio stream which is just outputting silence until the moment you hit play, then start sending audio frames immediately. The only latency you have here is the continuous output latency.
If you decide to go down this route I recommend taking a look at the Oboe library (full disclosure: I am one of the authors).
To answer one of your specific questions...
Is there a way to calculate the latency of the audio output stream programatically?
Yes. The easiest way to explain this is with a code sample (this is C++ for the AAudio API but the principle is the same using Java AudioTrack):
// Get the index and time that a known audio frame was presented for playing
int64_t existingFrameIndex;
int64_t existingFramePresentationTime;
AAudioStream_getTimestamp(stream, CLOCK_MONOTONIC, &existingFrameIndex, &existingFramePresentationTime);
// Get the write index for the next audio frame
int64_t writeIndex = AAudioStream_getFramesWritten(stream);
// Calculate the number of frames between our known frame and the write index
int64_t frameIndexDelta = writeIndex - existingFrameIndex;
// Calculate the time which the next frame will be presented
int64_t frameTimeDelta = (frameIndexDelta * NANOS_PER_SECOND) / sampleRate_;
int64_t nextFramePresentationTime = existingFramePresentationTime + frameTimeDelta;
// Assume that the next frame will be written into the stream at the current time
int64_t nextFrameWriteTime = get_time_nanoseconds(CLOCK_MONOTONIC);
// Calculate the latency
*latencyMillis = (double) (nextFramePresentationTime - nextFrameWriteTime) / NANOS_PER_MILLISECOND;
A caveat: This method relies on accurate timestamps being reported by the audio hardware. I know this works on Google Pixel devices but have heard reports that it isn't so accurate on other devices so YMMV.

Following the answer of donturner, here's a Java version (that also uses other methods depending on the SDK version)
/** The audio latency has not been estimated yet */
private static long AUDIO_LATENCY_NOT_ESTIMATED = Long.MIN_VALUE+1;
/** The audio latency default value if we cannot estimate it */
private static long DEFAULT_AUDIO_LATENCY = 100L * 1000L * 1000L; // 100ms
/**
* Estimate the audio latency
*
* Not accurate at all, depends on SDK version, etc. But that's the best
* we can do.
*/
private static void estimateAudioLatency(AudioTrack track, long audioFramesWritten) {
long estimatedAudioLatency = AUDIO_LATENCY_NOT_ESTIMATED;
// First method. SDK >= 19.
if (Build.VERSION.SDK_INT >= 19 && track != null) {
AudioTimestamp audioTimestamp = new AudioTimestamp();
if (track.getTimestamp(audioTimestamp)) {
// Calculate the number of frames between our known frame and the write index
long frameIndexDelta = audioFramesWritten - audioTimestamp.framePosition;
// Calculate the time which the next frame will be presented
long frameTimeDelta = _framesToNanoSeconds(frameIndexDelta);
long nextFramePresentationTime = audioTimestamp.nanoTime + frameTimeDelta;
// Assume that the next frame will be written at the current time
long nextFrameWriteTime = System.nanoTime();
// Calculate the latency
estimatedAudioLatency = nextFramePresentationTime - nextFrameWriteTime;
}
}
// Second method. SDK >= 18.
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED && Build.VERSION.SDK_INT >= 18) {
Method getLatencyMethod;
try {
getLatencyMethod = AudioTrack.class.getMethod("getLatency", (Class<?>[]) null);
estimatedAudioLatency = (Integer) getLatencyMethod.invoke(track, (Object[]) null) * 1000000L;
} catch (Exception ignored) {}
}
// If no method has successfully gave us a value, let's try a third method
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
AudioManager audioManager = (AudioManager) CRT.getInstance().getSystemService(Context.AUDIO_SERVICE);
try {
Method getOutputLatencyMethod = audioManager.getClass().getMethod("getOutputLatency", int.class);
estimatedAudioLatency = (Integer) getOutputLatencyMethod.invoke(audioManager, AudioManager.STREAM_MUSIC) * 1000000L;
} catch (Exception ignored) {}
}
// No method gave us a value. Let's use a default value. Better than nothing.
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
estimatedAudioLatency = DEFAULT_AUDIO_LATENCY;
}
return estimatedAudioLatency
}
private static long _framesToNanoSeconds(long frames) {
return frames * 1000000000L / SAMPLE_RATE;
}

The android MediaPlayer class is notoriously slow to begin audio playback, I experienced an issue in an app I was creating where there was a greater than one second delay to begin playing an audio clip. I resolved it by switching to ExoPlayer which resulted in the playback starting within 100ms. I've also read that ffmpeg has even faster start audio startup time than ExoPlayer but I haven't used it so I can't make any promises.

OpenSL ES Android: "Too many objects" SL_RESULT_MEMORY_FAILURE

I'm having a problem with OpenSL ES on Android. I'm using OpenSL to play sound effects. Currently I'm creating a new player each time I play a sound. (I know this isn't terribly efficient, but it's "good enough" for the time being.)
After a while of playback, I start to get these errors:
E/libOpenSLES(25131): Too many objects
W/libOpenSLES(25131): Leaving Engine::CreateAudioPlayer (SL_RESULT_MEMORY_FAILURE)
I'm tracking my create/destroy pattern and I never go above 4 outstanding objects at any given time, well below the system limit of 32. Of course, this is assuming that the Destroy is properly working.
My only guess right now is that I'm doing something incorrectly when I clean up the player objects. One possible issue is that the Destroy is often called in the context of the player callback (basically destroying the player after it's finished playing), although I can't find any reference suggesting this is a problem. Are there any other cleanup steps I should be taking besides "Destroy"-ing the player object? Do the Interfaces need to be cleaned up somehow as well?
-- Added --
After more testing, it happens consistently after the 30th player is created (there is an engine and a mix too, so that brings the total to 32 objects). So I must not be destroying the object properly. Here's the code--I'd love to know what's going wrong:
SLuint32 playerState = 0;
SLresult result = (*pPlayerObject)->GetState(pPlayerObject, &playerState);
return_if_fail(result);
if (playerState == SL_OBJECT_STATE_REALIZED)
{
(*pPlayerObject)->AbortAsyncOperation(pPlayerObject);
(*pPlayerObject)->Destroy(pPlayerObject);
}
else
{
__android_log_print(1, LOG_TAG, "Player object in unexpected state (%d)", playerState);
return 1002;
}

if (playerState == SL_OBJECT_STATE_REALIZED)
is not needed. Try to do it always.
AbortAsyncOperation is called in Destroy => not needed.
So try just (*pPlayerObject)->Destroy(pPlayerObject); it should be enough.
Edit:
I tested, and found solution.
You cannot call Destroy() from player callback. Should make "destroy" list and destroy it somewhere else, for example, in main thread.

Strange performance of avcodec_decode_video2

I am developing an Android video player. I use ffmpeg in native code to decode video frame. In the native code, I have a thread called decode_thread that calls avcodec_decode_video2()
int decode_thread(void *arg) {
avcodec_decode_video2(codecCtx, pFrame, &frameFinished,pkt);
}
I have another thread called display_thread that uses aNativeWindow to display a decoded frame on a SurfaceView.
The problem is that if I let the decode_thread run continuously without a delay. It significantly reduces the performance of avcodec_decode_video2(). Sometimes it takes about 0.1 seconds to decode a frame. However if I put a delay on the decode_thread. Something likes this.
int decode_thread(void *arg) {
avcodec_decode_video2(codecCtx, pFrame, &frameFinished,pkt);
usleep(20*1000);
}
The performance of avcodec_decode_video2() is really good, about 0.001 seconds. However putting a delay on the decode_thread is not a good solution because it affects the playback. Could anyone explain the behavior of avcodec_decode_video2() and suggest me a solution?

It looks impossible that the performance of video decoding function would improve just because your thread sleeps. Most likely the video decoding thread gets preempted by another thread, and hence you get the increased timing (hence your thread did not work). When you add a call to usleep, this does the context switch to another thread. So when your decoding thread is scheduled again the next time, it starts with the full CPU slice, and is not interrupted in the decode_ video2 function anymore.
What should you do? You surely want to decode packets a little bit ahead than you show them - the performance of avcodec_decode_video2 certainly isn't constant, and if you try to stay just one frame ahead, you might not have enough time to decode one of the frames.
I'd create a producer-consumer queue with the decoded frames, with the top limit. The decoder thread is a producer, and it should run until it fills up the queue, and then it should wait until there's room for another frame. The display thread is a consumer, it would take frames from this queue and display them.

Scheduling latency of Android sensors handlers

rather than an answer I'm looking for an idea here.
I'd like to measure the scheduling latency of sensor sampling in Android. In particular I want to measure the time from the sensor interrupt request to when the bottom half, which is in charge of the data read, is executed.
The bottom half already has, besides the data read, a timestamping instruction. Indeed samples are collected by applications (being java or native, no difference) as a tuple [measurement, timestamp].
The timestamp follows the clock source clock_gettime(CLOCK_MONOTONIC, &t);
So assuming that the bottom-half is not preempted, somehow this timestamp gives an indication of the task scheduling instant. What is missing is a direct or indirect way to find out its corresponding irq instant.
Safely assume that we can ask any sampling rate to the sensor. The driver skeleton is the following (Galaxy's S3 gyroscope)
err = request_threaded_irq(data->client->irq, NULL,
lsm330dlc_gyro_interrupt_thread\
, IRQF_TRIGGER_RISING | IRQF_ONESHOT,\
"lsm330dlc_gyro", data);
static irqreturn_t lsm330dlc_gyro_interrupt_thread(int irq\
, void *lsm330dlc_gyro_data_p) {
...
struct lsm330dlc_gyro_data *data = lsm330dlc_gyro_data_p;
...
res = lsm330dlc_gyro_read_values(data->client,
&data->xyz_data, data->entries);
...
input_report_rel(data->input_dev, REL_RX, gyro_adjusted[0]);
input_report_rel(data->input_dev, REL_RY, gyro_adjusted[1]);
input_report_rel(data->input_dev, REL_RZ, gyro_adjusted[2]);
input_sync(data->input_dev);
...
}
The key constraint is that I need to (well, I only have enough resources to) perform this measurement from user-space, on a commercial device, without toucing and recompliling the kernel. Hopefully with a limited mpact on the experiment accuracy. I don't know if such an experiment is possible with this constraint and so far I couldn't figure out any reasonable method.
I might consider also recompiling the kernel if the experiment then becomes straightforward.
Thanks.

First Its not possible to perform this measurement without touching the kernel.
Second I didnt see any bottom half configured in your ISR code.
Third if at all Bottom half is scheduled and kernel can be recompiled , you can sample jiffie value in ISR and again resample it in bottom half. take the difference between the two samples and subtract that offset from timestamp that is exported to U-space.

Android - Scheduling an Events to Occur Every 10ms?

I'm working on creating an app that allows very low bandwidth communication via high frequency sound waves. I've gotten to the point where I can create a frequency and do the fourier transform (with the help of Moonblink's open source code for Audalyzer).
But here's my problem: I'm unable to get the code to run with the correct timing. Let's say I want a piece of code to execute every 10ms, how would I go about doing this?
I've tried using a TimerTask, but there is a huge delay before the code actually executes, like up to 100ms.
I also tried this method simply by pinging the current time and executing only when that time has elapsed. But there is still a delay problem. Do you guys have any ideas?
Thread analysis = new Thread(new Runnable()
{
#Override
public void run()
{
android.os.Process.setThreadPriority(android.os.Process.THREAD_PRIORITY_URGENT_DISPLAY);
long executeTime = System.currentTimeMillis();
manualAnalyzer.measureStart();
while (FFTransforming)
{
if(System.currentTimeMillis() >= executeTime)
{
//Reset the timer to execute again in 10ms
executeTime+=10;
//Perform Fourier Transform
manualAnalyzer.doUpdate(0);
//TODO: Analyze the results of the transform here...
}
}
manualAnalyzer.measureStop();
}
});
analysis.start();

I would recommend a very different approach: Do not try to run your code in real time.
Instead, rely on only the low-level audio code running in real time, by recording (or playing) continuously for a period of time encompassing the events of interest.
Your code then runs somewhat asynchronously to this, decoupled by the audio buffers. Your code's sense of time is determined not by the system clock as it executes, but rather by the defined inter-sample-interval of the audio data you work with. (ie, if you are using 48 Ksps then 10 mS later is 480 samples later)
You may need to modify your protocol governing interaction between the devices to widen the time window in which transmissions can be expected to occur. Ie, you can have precise timing with respect to the actual modulation and symbols within a "packet", but you should not expect nearly the same order of precision in determining when a packet is sent or received - you will have to "find" it amidst a longer recording containing noise.

Your thread/loop strategy is probably roughly as close as you're going to get. However, 10ms is not a lot of time, most Android devices are not super-powerful, and a Fourier transform is a lot of work to do. I find it unlikely that you'll be able to fit that much work in 10ms. I suspect you're going to have to increase that period.

i changed your code so that it takes the execution time of doUpdate into account. The use of System.nanoTime() should also increase accuracy.
public void run() {
android.os.Process.setThreadPriority(android.os.Process.THREAD_PRIORITY_URGENT_DISPLAY);
long executeTime=0;
long nextTime = System.nanoTime();
manualAnalyzer.measureStart();
while (FFTransforming)
{
if(System.nanoTime() >= nextTime)
{
executeTime = System.nanoTime();
//Perform Fourier Transform
manualAnalyzer.doUpdate(0);
//TODO: Analyze the results of the transform here...
executeTime = System.nanoTime() - executeTime;
//guard against the case that doUpdate took longer than 10ms
final long i = executeTime/10000000;
//set the timer to execute again at the next full 10ms intervall
nextTime+= 10000000+ i*10000000
}
}
manualAnalyzer.measureStop();
}
What else could you do?
eliminate Garbage Collection
go native with the NDK (just an idea, this might as well give no benefit)

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.