RTMP adaptive bitrate algorithm

RTMP adaptive bitrate algorithm - android

I searched online but there is very little information about this.
I have a live broadcasting app where I send encoded H264 video frames and AAC audio chunks resulting from camera & mic using the Android MediaCodec SDK, over a RTMP stack.
My live streams are 720p and I aim for great quality with 2500Kbps. This obviously requires a very good network connection which means 4G if you use a data plan.
Problem is even with the greatest connection there will be low peaks and congestion, so there will be moments where the network cant hold such heavy stream. Because I want to offer high reliability, I want to include automatic adaptive bitrate on my app so that the image quality is dropped in favor or reliability.
The thing is -- how to achieve this automatic adaptation to the network conditions without losing frames? Is it even possible? I've used professional encoding devices like Cerevo and they dont ever lose frames -- however with my app I always get some awful dragging due to p-frames being lost in the network.
This is what I currently have:
private long adaptBitrate(long idleNanos, Frame frame) {
int bytes = frame.getSize();
long nowNanos = System.nanoTime();
if (nowNanos - mLastNanos > 1000L * 1000 * 1000) {
double idle = (double) idleNanos / (double) (nowNanos - mLastNanos);
float actualBitrate = newBitrate;
int size = mBuffer.size();
String s = "Bitrate: " + actualBitrate / 1000
+ " kbps In-Flight:" + bytes
+ " idle: " + idle;
if (size > MAX_BUF_SIZE && size > mLastSize) {
Log.i(TAG, "adaptBitrate: Dropping bitrate");
newBitrate = (int) ((double) actualBitrate * BITRATE_DROP_MULTIPLIER);
if (newBitrate < MIN_BITRATE) {
newBitrate = MIN_BITRATE;
}
s += " late => " + newBitrate;
mRtmpHandler.requestBitrate(newBitrate);
} else if (size <= 2 && idle > IDLE_THRESHOLD) {
mIdleFrames++;
if(mIdleFrames >= MIN_IDLE_FRAMES){
Log.i(TAG, "adaptBitrate: Raising bitrate");
newBitrate = (int) ((double) newBitrate * BITRATE_RAISE_MULTIPLIER);
if (newBitrate > MAX_BITRATE) {
newBitrate = MAX_BITRATE;
}
s += " idle => " + newBitrate;
mRtmpHandler.requestBitrate(newBitrate);
mIdleFrames = 0;
}
}
debugThread(Log.VERBOSE, s);
mLastNanos = System.nanoTime();
mLastSize = size;
idleNanos = 0;
}
return idleNanos;
}
So if my buffer is exceeding a threshold, I lower the bitrate. If my app is spending too much time waiting for a new frame, for a number of consecutive frames, then I raise the bitrate.
No matter how cautious I am with the threshold values, I am always losing important information and my stream breaks until the next keyframe arrives (2 secs). Sometimes it seems that the network can hold a certain bitrate (stable at 1500kbps, for instance) but the image will still have some dragging as though a frame was lost in the way. With good network conditions, everything is smooth.
How do these streaming devices handle these situations? It always looks great with them, no dragging or skipped frames at all...

There is indeed no information online about adaptive bitrate from the broadcaster side, surprisingly. When I had to implement something like this with RTSP and two rtp sockets, I took a similar approach, creating a polling class that would modestly increase the bitrate of the mediacodec when the packet buffer was >$GOOD_PCT free, aggressively halve it when the queue was less than $BAD_PCT free, and do nothing if it was in between. Partially seen here. I'm not sure I have a complete picture of your solution based on the posted code, but you are adjusting the mediacodec bitrate directly, correct? The only time I had corruption is when I requested a sync frame from the mediacodec, so avoid that if it's in your code. Hope this helps.

Related

Android MediaCodec realtime h264 encoding/decoding latency

I'm working with Android MediaCodec and use it for a realtime H264 encoding and decoding frames from camera. I use MediaCodec in synchronous manner and render the output to the Surface of decoder and everething works fine except that I have a long latency from a realtime, it takes 1.5-2 seconds and I'm very confused why is it so.
I measured a total time of encoding and decoding processes and it keeps around 50-65 milliseconds so I think the problem isn't in them.
I tried to change the configuration of the encoder but it didn't help and currently it configured like this:
val formatEncoder = MediaFormat.createVideoFormat("video/avc", 1920, 1080)
formatEncoder.setInteger(MediaFormat.KEY_FRAME_RATE, 30)
formatEncoder.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5)
formatEncoder.setInteger(MediaFormat.KEY_BIT_RATE, 1920 * 1080)
formatEncoder.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
val encoder = MediaCodec.createEncoderByType("video/avc")
encoder.configure(formatEncoder, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
val inputSurface = encoder.createInputSurface() // I use it to send frames from camera to encoder
encoder.start()
Changing the configuration of the decoder also didn't help me at all and currently I configured it like this:
val formatDecoder = MediaFormat.createVideoFormat("video/avc", 1920, 1080)
val decoder = MediaCodec.createDecoderByType("video/avc")
decoder.configure(formatDecoder , outputSurface, null, 0) // I use outputSurface to render decoded frames into it
decoder.start()
I use the following timeouts for waiting for available encoder/decoder buffers I tried to reduce their values but it didn't help me and I left them like this:
var TIMEOUT_IN_BUFFER = 10000L // microseconds
var TIMEOUT_OUT_BUFFER = 10000L // microseconds
Also I measured the time of consuming the inputSurface a frame and this time takes 0.03-0.05 milliseconds so it isn't a bottleneck. Actually I measured all the places where a bottleneck could be, but I wasn't found anything and I think the problem is in the encoder or decoder itself or in their configurations, or maybe I should use some special routine for sending frames to encoding/decoding..
I also tried to use HW accelerated codec and it's the only thing that helped me, when I use it the latency reduces to ~ 500-800 milliseconds but it still doesn't fit me for a realtime streaming.
It seems to me that the encoder or decoder buffers several frames before start displaying them on the surface and eventually it leads to the latency and if it really so then how can I disable bufferization or reduce the time of it?
Please help me I'm stucking on this problem for about half a year and have no idea how to reduce the latency, I'm sure that it's possible because popular apps like Telegram, Viber, WhatsApp etc. work fine and without latency so what's the secret here?
UPD 07.07.2021:
I still haven't found a solution to get rid of the latency. I've tried to change h264 profiles, increase and decrease I-frame inteval, bitrate, framerate, but result the same, the only thing that hepls a little to reduce the latency - downgrade the resolution from 1920x1080 to e.g. 640x480, but this "solution" doesn't suit me because I want to encode/decode a realtime video with 1920x1080 resolution.
UPD 08.07.2021:
I found out that if I change the values of TIMEOUT_IN_BUFFER and TIMEOUT_OUT_BUFFER from 10_000L to 100_000L it decreases the latency a bit but increases the delay of showing the first frame quite a lot after start encoding/decoding process.

It's possible your encoder is producing B frames -- bilinear interpolation frames. They increase quality and latency, and are great for movies. But no good for low-latency applications.
Key frames = I (interframes)
Predicted frames = P (difference from previous frames)
Interpolated frames = B
A sequence of frames including B frames might look like this:
IBBBPBBBPBBBPBBBI
11111111
12345678901234567
The encoder must encode each P frame, and the decoder must decode it, before the preceding B frames make any sense. So in this example the frames get encoded out of order like this:
1 5 2 3 4 9 6 7 8 13 10 11 12 17 17 13 14 15
In this example the decoder can't handle frame 2 until the encoder has sent frame 5.
On the other hand, this sequence without B frames allows coding and decoding the frames in order.
IPPPPPPPPPPIPPPPPPPPP
Try using the Constrained Baseline Profile setting. It's designed for low latency and low power use. It suppresses B frames. I think this works.
mediaFormat.setInteger(
"profile",
CodecProfileLevel.AVCProfileConstrainedBaseline);

I believe android h264 decoder have latency (at-least in most cases i've tried). Probably that's why android developers added PARAMETER_KEY_LOW_LATENCY from API level 30.
However I could decrease the delay some frames by querying for the output some more times.
Reason: no idea. It's just result of boring trial and errors
int inputIndex = m_codec.dequeueInputBuffer(-1);// Pass in -1 here bc we don't have a playback time reference
if (inputIndex >= 0) {
ByteBuffer buffer;
if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.LOLLIPOP) {
buffer = m_codec.getInputBuffer(inputIndex);
} else {
ByteBuffer[] bbuf = m_codec.getInputBuffers();
buffer = bbuf[inputIndex];
}
buffer.put(frame);
// tell the decoder to process the frame
m_codec.queueInputBuffer(inputIndex, 0, frame.length, 0, 0);
}
MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
int outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}

You need to configure customized(or KEY_LOW_LATENCY if it is supported) low latency parameters for different cpu venders. It is a common problem for android phone.
Check this code https://github.com/moonlight-stream/moonlight-android/blob/master/app/src/main/java/com/limelight/binding/video/MediaCodecHelper.java

How to use AudioTrack.getTimestamp() on Android to calculate latency?

I see many resources recommending that AudioTrack.getTimestamp() be used on modern Android versions to calculate audio latency for audio/video sync.
For instance:
https://stackoverflow.com/a/37625791/332798
https://developer.amazon.com/docs/fire-tv/audio-video-synchronization.html#section1-1
https://groups.google.com/forum/#!topic/android-platform/PoHfyNK54ps
However, none of these explain how to use the timestamp to calculate the latency? I'm struggling to figure what to do with the timestamp's framePosition/nanoTime to come up with a latency number.

So prior to this API, you would use AudioTrack.getPlaybackHeadPosition() which was just an approximation. Thus, to account for latency you had to offset that value with a latency value from one of two hidden methods: AudioManager.getOutputLatency() or AudioTrack.getLatency().
With the new AudioTrack.getTimestamp() API, you get a snapshot of the playhead position at a given time, taken directly at the output. As such, it is fully accurate and already accounts for device latency. Thus there's no need to call any other APIs now to add/remove latency.
The caveat is that this timestamp is only a snapshot, and the docs recommend you don't call this new method very often. So the trick to getting the "current" position is to use your last snapshot and linearly interpolate what the current value should be:
playheadPos = timestamp.framePosition +
(System.nanoTime() - timestamp.nanoTime) * samplerate / 1e9;
This position can then be compared against how many frames you've written into the AudioTrack, by maintaining another counter which increments every time AudioTrack.write() completes:
int bytesWritten = track.write(...);
writtenPos += bytesWritten / pcmFrameSize;
If you're working with ENCODING_AC3, the playhead position reported by AudioTrack is still in terms of samples. You will either need to convert it to bytes, or convert the number of bytes you've written in back into samples. Either way, you will need to know the bitrate of your AC3 stream (i.e. 384000bps)
int bytesWritten = track.write(...);
writtenPos += bytesWritten * samplerate / (bitrate / 8);

audio latency issues

In the application which I want to create, I face some technical obstacles. I have two music tracks in the application. For example, a user imports the music background as a first track. The second path is a voice recorded by the user to the rhythm of the first track played by the speaker device (or headphones). At this moment we face latency. After recording and playing back in the app, the user hears the loss of synchronisation between tracks, which occurs because of the microphone and speaker latencies.
Firstly, I try to detect the delay by filtering the input sound. I use android’s AudioRecord class, and the method read(). This method fills my short array with audio data.
I found that the initial values of this array are zeros so I decided to cut them out before I will start to write them into the output stream.
So I consider those zeros as a „warmup” latency of the microphone. Is this approach correct? This operation gives some results, but it doesn’t resolve the problem, and at this stage, I’m far away from that.
But the worse case is with the delay between starting the speakers and playing the music. This delay I cannot filter or detect. I tried to create some calibration feature which counts the delay. I play a „beep” sound through the speakers, and when I start to play it, I also begin to measure time. Then, I start recording and listen for this sound being detected by the microphone. When I recognise this sound in the app, I stop measuring time. I repeat this process several times, and the final value is the average from those results. That is how I try to measure the latency of the device. Now, when I have this value, I can simply shift the second track backwards to achieve synchronisation of both records (I will lose some initial milliseconds of the recording, but I skip this case, for now, there are some possibilities to fix it).
I thought that this approach would resolve the problem, but it turned out this is not as simple as I thought. I found two issues here:
1. Delay while playing two tracks simultaneously
2. Random in device audio latency.
The first: I play two tracks using AudioTrack class and I run method play() like this:
val firstTrack = //creating a track
val secondTrack = //creating a track
firstTrack.play()
secondTrack.play()
This code causes delays at the stage of playing tracks. Now, I don’t even have to think about latency while recording; I cannot play two tracks simultaneously without delays. I tested this with some external audio file (not recorded in my app) - I’m starting the same audio file using the code above, and I can see a delay. I also tried it with MediaPlayer class, and I have the same results. In this case, I even try to play tracks when callback OnPreparedListener invoke:
val firstTrack = //AudioPlayer
val secondTrack = //AudioPlayer
second.setOnPreparedListener {
first.start()
second.start()
}
And it doesn’t help.
I know that there is one more class provided by Android called SoundPool. According to the documentation, it can be better with playing tracks simultaneously, but I can’t use it because it supports only small audio files and that can't limit me.
How can I resolve this problem? How can I start playing two tracks precisely at the same time?
The second: Audio latency is not deterministic - sometimes it is smaller, and sometimes it’s huge, and it’s out of my hands. So measuring device latency can help but again - it cannot resolve the problem.
To sum up: is there any solution, which can give me exact latency per device (or app session?) or other triggers which detect actual delay, to provide the best synchronisation while playback two tracks at the same time?
Thank you in advance!

Synchronising audio for karaoke apps is tough. The main issue you seem to be facing is variable latency in the output stream.
This is almost certainly caused by "warm up" latency: the time it takes from hitting "play" on your backing track to the first frame of audio data being rendered by the audio device (e.g. headphones). This can have large variance and is difficult to measure.
The first (and easiest) thing to try is to use MODE_STREAM when constructing your AudioTrack and prime it with bufferSizeInBytes of data prior to calling play (more here). This should result in lower, more consistent "warm up" latency.
A better way is to use the Android NDK to have a continuously running audio stream which is just outputting silence until the moment you hit play, then start sending audio frames immediately. The only latency you have here is the continuous output latency.
If you decide to go down this route I recommend taking a look at the Oboe library (full disclosure: I am one of the authors).
To answer one of your specific questions...
Is there a way to calculate the latency of the audio output stream programatically?
Yes. The easiest way to explain this is with a code sample (this is C++ for the AAudio API but the principle is the same using Java AudioTrack):
// Get the index and time that a known audio frame was presented for playing
int64_t existingFrameIndex;
int64_t existingFramePresentationTime;
AAudioStream_getTimestamp(stream, CLOCK_MONOTONIC, &existingFrameIndex, &existingFramePresentationTime);
// Get the write index for the next audio frame
int64_t writeIndex = AAudioStream_getFramesWritten(stream);
// Calculate the number of frames between our known frame and the write index
int64_t frameIndexDelta = writeIndex - existingFrameIndex;
// Calculate the time which the next frame will be presented
int64_t frameTimeDelta = (frameIndexDelta * NANOS_PER_SECOND) / sampleRate_;
int64_t nextFramePresentationTime = existingFramePresentationTime + frameTimeDelta;
// Assume that the next frame will be written into the stream at the current time
int64_t nextFrameWriteTime = get_time_nanoseconds(CLOCK_MONOTONIC);
// Calculate the latency
*latencyMillis = (double) (nextFramePresentationTime - nextFrameWriteTime) / NANOS_PER_MILLISECOND;
A caveat: This method relies on accurate timestamps being reported by the audio hardware. I know this works on Google Pixel devices but have heard reports that it isn't so accurate on other devices so YMMV.

Following the answer of donturner, here's a Java version (that also uses other methods depending on the SDK version)
/** The audio latency has not been estimated yet */
private static long AUDIO_LATENCY_NOT_ESTIMATED = Long.MIN_VALUE+1;
/** The audio latency default value if we cannot estimate it */
private static long DEFAULT_AUDIO_LATENCY = 100L * 1000L * 1000L; // 100ms
/**
* Estimate the audio latency
*
* Not accurate at all, depends on SDK version, etc. But that's the best
* we can do.
*/
private static void estimateAudioLatency(AudioTrack track, long audioFramesWritten) {
long estimatedAudioLatency = AUDIO_LATENCY_NOT_ESTIMATED;
// First method. SDK >= 19.
if (Build.VERSION.SDK_INT >= 19 && track != null) {
AudioTimestamp audioTimestamp = new AudioTimestamp();
if (track.getTimestamp(audioTimestamp)) {
// Calculate the number of frames between our known frame and the write index
long frameIndexDelta = audioFramesWritten - audioTimestamp.framePosition;
// Calculate the time which the next frame will be presented
long frameTimeDelta = _framesToNanoSeconds(frameIndexDelta);
long nextFramePresentationTime = audioTimestamp.nanoTime + frameTimeDelta;
// Assume that the next frame will be written at the current time
long nextFrameWriteTime = System.nanoTime();
// Calculate the latency
estimatedAudioLatency = nextFramePresentationTime - nextFrameWriteTime;
}
}
// Second method. SDK >= 18.
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED && Build.VERSION.SDK_INT >= 18) {
Method getLatencyMethod;
try {
getLatencyMethod = AudioTrack.class.getMethod("getLatency", (Class<?>[]) null);
estimatedAudioLatency = (Integer) getLatencyMethod.invoke(track, (Object[]) null) * 1000000L;
} catch (Exception ignored) {}
}
// If no method has successfully gave us a value, let's try a third method
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
AudioManager audioManager = (AudioManager) CRT.getInstance().getSystemService(Context.AUDIO_SERVICE);
try {
Method getOutputLatencyMethod = audioManager.getClass().getMethod("getOutputLatency", int.class);
estimatedAudioLatency = (Integer) getOutputLatencyMethod.invoke(audioManager, AudioManager.STREAM_MUSIC) * 1000000L;
} catch (Exception ignored) {}
}
// No method gave us a value. Let's use a default value. Better than nothing.
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
estimatedAudioLatency = DEFAULT_AUDIO_LATENCY;
}
return estimatedAudioLatency
}
private static long _framesToNanoSeconds(long frames) {
return frames * 1000000000L / SAMPLE_RATE;
}

The android MediaPlayer class is notoriously slow to begin audio playback, I experienced an issue in an app I was creating where there was a greater than one second delay to begin playing an audio clip. I resolved it by switching to ExoPlayer which resulted in the playback starting within 100ms. I've also read that ffmpeg has even faster start audio startup time than ExoPlayer but I haven't used it so I can't make any promises.

avcodec_decode_video2 fails to decode after frame resolution change

I'm using ffmpeg in Android project via JNI to decode real-time H264 video stream. On the Java side I'm only sending the the byte arrays into native module. Native code is running a loop and checking data buffers for new data to decode. Each data chunk is processed with:
int bytesLeft = data->GetSize();
int paserLength = 0;
int decodeDataLength = 0;
int gotPicture = 0;
const uint8_t* buffer = data->GetData();
while (bytesLeft > 0) {
AVPacket packet;
av_init_packet(&packet);
paserLength = av_parser_parse2(_codecPaser, _codecCtx, &packet.data, &packet.size, buffer, bytesLeft, AV_NOPTS_VALUE, AV_NOPTS_VALUE, AV_NOPTS_VALUE);
bytesLeft -= paserLength;
buffer += paserLength;
if (packet.size > 0) {
decodeDataLength = avcodec_decode_video2(_codecCtx, _frame, &gotPicture, &packet);
}
else {
break;
}
av_free_packet(&packet);
}
if (gotPicture) {
// pass the frame to rendering
}
The system works pretty well until incoming video's resolution changes. I need to handle transition between 4:3 and 16:9 aspect ratios. While having AVCodecContext configured as follows:
_codecCtx->flags2|=CODEC_FLAG2_FAST;
_codecCtx->thread_count = 2;
_codecCtx->thread_type = FF_THREAD_FRAME;
if(_codec->capabilities&CODEC_FLAG_LOW_DELAY){
_codecCtx->flags|=CODEC_FLAG_LOW_DELAY;
}
I wasn't able to continue decoding new frames after video resolution change. The got_picture_ptr flag that avcodec_decode_video2 enables when whole frame is available was never true after that.
This ticket made me wonder if the issue isn't connected with multithreading. Only useful thing I've noticed is that when I change thread_type to FF_THREAD_SLICE the decoder is not always blocked after resolution change, about half of my attempts were successfull. Switching to single-threaded processing is not possible, I need more computing power. Setting up the context to one thread does not solve the problem and makes the decoder not keeping up with processing incoming data.
Everything work well after app restart.
I can only think of one workoround (it doesn't really solve the problem): unloading and loading the whole library after stream resolution change (e.g as mentioned in here). I don't think it's good tho, it will propably introduce other bugs and take a lot of time (from user's viewpoint).
Is it possible to fix this issue?
EDIT:
I've dumped the stream data that is passed to decoding pipeline. I've changed the resolution few times while stream was being captured. Playing it with ffplay showed that in moment when resolution changed and preview in application froze, ffplay managed to continue, but preview is glitchy for a second or so. You can see full ffplay log here. In this case video preview stopped when I changed resolution to 960x720 for the second time. (Reinit context to 960x720, pix_fmt: yuv420p in log).

What is the best way to achieve Audio Video Synchronization in Android Based Media Player Application using MediaCodec API?

I'm trying to implement a Media Player in android using the MediaCodec API.
I've created three threads
Thread 1 : To de-queue the input buffers to get free indices and then queuing the audio and video frames in respective codec's input buffer
Thread 2 : To de-queue the audio codec's output buffer and render it using AudioTrack class' write method
Thread 3 : To de-queue the video codec's output buffer and render it using releaseBuffer method
I'm facing a lot of problem in achieving synchronization between audio and video frames. I never drop audio frames and before rendering video frames I check whether the decoded frames are late by more than 3omsecs, if they are I drop the frame, if they are more than 10ms early I don't render the frame.
To find the difference between audio and video I use following logic
public long calculateLateByUs(long timeUs) {
long nowUs = 0;
if (hasAudio && audioTrack != null) {
synchronized (audioTrack) {
if(first_audio_sample && startTimeUs >=0){
System.out.println("First video after audio Time Us: " + timeUs );
startTimeUs = -1;
first_audio_sample = false;
}
nowUs = (audioTrack.getPlaybackHeadPosition() * 1000000L) /
audioCodec.format.getInteger(MediaFormat.KEY_SAMPLE_RATE);
}
} else if(!hasAudio){
nowUs = System.currentTimeMillis() * 1000;
startTimeUs = 0;
}else{
nowUs = System.currentTimeMillis() * 1000;
}
if (startTimeUs == -1) {
startTimeUs = nowUs - timeUs;
}
if(syslog){
System.out.println("Timing Statistics:");
System.out.println("Key Sample Rate :"+ audioCodec.format.getInteger(MediaFormat.KEY_SAMPLE_RATE) + " nowUs: " + nowUs + " startTimeUs: "+startTimeUs + " timeUs: "+timeUs + " return value :"+(nowUs - (startTimeUs + timeUs)));
}
return (nowUs - (startTimeUs + timeUs));
}
timeUs is the presentation time in micro-seconds of the video frame. nowUs is supposed to contain the duration in micro-seconds for which audio has been playing. startTimeUs is the initial difference between audio and video frames which has to be maintained always.
The first if block checks, if there is indeed an audio track and it has been initialized and sets the value of nowUs by calculating it from audiotrack
If there is no audio (first else) nowUs is set to SystemTime and the initial gap is set to zero. startTimeUs is initialized to zero in main function.
The if block in the synchronized block is used in case, first frame to be rendered is audio and audio frame joins later. first_audio_sample flag is initially set to true.
Please let me know if anything is not clear.
Also if you know of any open source link where media player of an a-v file has been implemented using video codec, that would be great.

If you are working on one of the latest releases of Android, you can consider retrieving the audioTimeStamp from AudioTrack directly. Please refer to this documentation for more details. Similarly, you could also consider retrieving the sampling rate via getSampleRate.
If you wish to continue with your algorithm, you could consider a relatively similar implementation in this native example. SimplePlayer implements a player engine by employing MediaCodec and has an a-v sync section too. Please refer to this section of code where the synchronization is performed. I feel this should help as a good reference.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.