I'm using MediaCodec to encode an image into a video. It is a requirement that videos can have sub-second length (e.g. 3.5 seconds long).
My thinking in order to achieve this is is to determine the video frame rate like so.
int lengthInMillis = 3500;
float seconds = lengthInMillis / 1000f;
int ordinal = (int) seconds; // ordinal == 3
float point = seconds - ordinal;
float numFrames = seconds / point; // numFrames == 7
float fps = seconds / numFrames; // fps = 0.7
this.numFrames = (int) numFrames;
Unfortunately when attempting to configure the MediaCodec with a KEY_FRAME_RATE of less than 1 an IllegalStateException. So this method doesn't work. Is it possible to use MediaCodec to create a video with a running time that ends at a fraction of a second?
The length of the video, and the frame rate of the video, are not related.
A 3.5-second video with 7 frames is running at 2 fps, not 0.7 fps. You should be computing "frames per second" as "frames / seconds", not seconds / numFrames.
In any event, the frame-rate value is actually deprecated in API 21:
Note: On LOLLIPOP, the format to MediaCodecList.findDecoder/EncoderForFormat must not contain a frame rate. Use format.setString(MediaFormat.KEY_FRAME_RATE, null) to clear any existing frame rate setting in the format.
Related
I'm working with Android MediaCodec and use it for a realtime H264 encoding and decoding frames from camera. I use MediaCodec in synchronous manner and render the output to the Surface of decoder and everething works fine except that I have a long latency from a realtime, it takes 1.5-2 seconds and I'm very confused why is it so.
I measured a total time of encoding and decoding processes and it keeps around 50-65 milliseconds so I think the problem isn't in them.
I tried to change the configuration of the encoder but it didn't help and currently it configured like this:
val formatEncoder = MediaFormat.createVideoFormat("video/avc", 1920, 1080)
formatEncoder.setInteger(MediaFormat.KEY_FRAME_RATE, 30)
formatEncoder.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5)
formatEncoder.setInteger(MediaFormat.KEY_BIT_RATE, 1920 * 1080)
formatEncoder.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
val encoder = MediaCodec.createEncoderByType("video/avc")
encoder.configure(formatEncoder, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
val inputSurface = encoder.createInputSurface() // I use it to send frames from camera to encoder
encoder.start()
Changing the configuration of the decoder also didn't help me at all and currently I configured it like this:
val formatDecoder = MediaFormat.createVideoFormat("video/avc", 1920, 1080)
val decoder = MediaCodec.createDecoderByType("video/avc")
decoder.configure(formatDecoder , outputSurface, null, 0) // I use outputSurface to render decoded frames into it
decoder.start()
I use the following timeouts for waiting for available encoder/decoder buffers I tried to reduce their values but it didn't help me and I left them like this:
var TIMEOUT_IN_BUFFER = 10000L // microseconds
var TIMEOUT_OUT_BUFFER = 10000L // microseconds
Also I measured the time of consuming the inputSurface a frame and this time takes 0.03-0.05 milliseconds so it isn't a bottleneck. Actually I measured all the places where a bottleneck could be, but I wasn't found anything and I think the problem is in the encoder or decoder itself or in their configurations, or maybe I should use some special routine for sending frames to encoding/decoding..
I also tried to use HW accelerated codec and it's the only thing that helped me, when I use it the latency reduces to ~ 500-800 milliseconds but it still doesn't fit me for a realtime streaming.
It seems to me that the encoder or decoder buffers several frames before start displaying them on the surface and eventually it leads to the latency and if it really so then how can I disable bufferization or reduce the time of it?
Please help me I'm stucking on this problem for about half a year and have no idea how to reduce the latency, I'm sure that it's possible because popular apps like Telegram, Viber, WhatsApp etc. work fine and without latency so what's the secret here?
UPD 07.07.2021:
I still haven't found a solution to get rid of the latency. I've tried to change h264 profiles, increase and decrease I-frame inteval, bitrate, framerate, but result the same, the only thing that hepls a little to reduce the latency - downgrade the resolution from 1920x1080 to e.g. 640x480, but this "solution" doesn't suit me because I want to encode/decode a realtime video with 1920x1080 resolution.
UPD 08.07.2021:
I found out that if I change the values of TIMEOUT_IN_BUFFER and TIMEOUT_OUT_BUFFER from 10_000L to 100_000L it decreases the latency a bit but increases the delay of showing the first frame quite a lot after start encoding/decoding process.
It's possible your encoder is producing B frames -- bilinear interpolation frames. They increase quality and latency, and are great for movies. But no good for low-latency applications.
Key frames = I (interframes)
Predicted frames = P (difference from previous frames)
Interpolated frames = B
A sequence of frames including B frames might look like this:
IBBBPBBBPBBBPBBBI
11111111
12345678901234567
The encoder must encode each P frame, and the decoder must decode it, before the preceding B frames make any sense. So in this example the frames get encoded out of order like this:
1 5 2 3 4 9 6 7 8 13 10 11 12 17 17 13 14 15
In this example the decoder can't handle frame 2 until the encoder has sent frame 5.
On the other hand, this sequence without B frames allows coding and decoding the frames in order.
IPPPPPPPPPPIPPPPPPPPP
Try using the Constrained Baseline Profile setting. It's designed for low latency and low power use. It suppresses B frames. I think this works.
mediaFormat.setInteger(
"profile",
CodecProfileLevel.AVCProfileConstrainedBaseline);
I believe android h264 decoder have latency (at-least in most cases i've tried). Probably that's why android developers added PARAMETER_KEY_LOW_LATENCY from API level 30.
However I could decrease the delay some frames by querying for the output some more times.
Reason: no idea. It's just result of boring trial and errors
int inputIndex = m_codec.dequeueInputBuffer(-1);// Pass in -1 here bc we don't have a playback time reference
if (inputIndex >= 0) {
ByteBuffer buffer;
if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.LOLLIPOP) {
buffer = m_codec.getInputBuffer(inputIndex);
} else {
ByteBuffer[] bbuf = m_codec.getInputBuffers();
buffer = bbuf[inputIndex];
}
buffer.put(frame);
// tell the decoder to process the frame
m_codec.queueInputBuffer(inputIndex, 0, frame.length, 0, 0);
}
MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
int outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
You need to configure customized(or KEY_LOW_LATENCY if it is supported) low latency parameters for different cpu venders. It is a common problem for android phone.
Check this code https://github.com/moonlight-stream/moonlight-android/blob/master/app/src/main/java/com/limelight/binding/video/MediaCodecHelper.java
I searched online but there is very little information about this.
I have a live broadcasting app where I send encoded H264 video frames and AAC audio chunks resulting from camera & mic using the Android MediaCodec SDK, over a RTMP stack.
My live streams are 720p and I aim for great quality with 2500Kbps. This obviously requires a very good network connection which means 4G if you use a data plan.
Problem is even with the greatest connection there will be low peaks and congestion, so there will be moments where the network cant hold such heavy stream. Because I want to offer high reliability, I want to include automatic adaptive bitrate on my app so that the image quality is dropped in favor or reliability.
The thing is -- how to achieve this automatic adaptation to the network conditions without losing frames? Is it even possible? I've used professional encoding devices like Cerevo and they dont ever lose frames -- however with my app I always get some awful dragging due to p-frames being lost in the network.
This is what I currently have:
private long adaptBitrate(long idleNanos, Frame frame) {
int bytes = frame.getSize();
long nowNanos = System.nanoTime();
if (nowNanos - mLastNanos > 1000L * 1000 * 1000) {
double idle = (double) idleNanos / (double) (nowNanos - mLastNanos);
float actualBitrate = newBitrate;
int size = mBuffer.size();
String s = "Bitrate: " + actualBitrate / 1000
+ " kbps In-Flight:" + bytes
+ " idle: " + idle;
if (size > MAX_BUF_SIZE && size > mLastSize) {
Log.i(TAG, "adaptBitrate: Dropping bitrate");
newBitrate = (int) ((double) actualBitrate * BITRATE_DROP_MULTIPLIER);
if (newBitrate < MIN_BITRATE) {
newBitrate = MIN_BITRATE;
}
s += " late => " + newBitrate;
mRtmpHandler.requestBitrate(newBitrate);
} else if (size <= 2 && idle > IDLE_THRESHOLD) {
mIdleFrames++;
if(mIdleFrames >= MIN_IDLE_FRAMES){
Log.i(TAG, "adaptBitrate: Raising bitrate");
newBitrate = (int) ((double) newBitrate * BITRATE_RAISE_MULTIPLIER);
if (newBitrate > MAX_BITRATE) {
newBitrate = MAX_BITRATE;
}
s += " idle => " + newBitrate;
mRtmpHandler.requestBitrate(newBitrate);
mIdleFrames = 0;
}
}
debugThread(Log.VERBOSE, s);
mLastNanos = System.nanoTime();
mLastSize = size;
idleNanos = 0;
}
return idleNanos;
}
So if my buffer is exceeding a threshold, I lower the bitrate. If my app is spending too much time waiting for a new frame, for a number of consecutive frames, then I raise the bitrate.
No matter how cautious I am with the threshold values, I am always losing important information and my stream breaks until the next keyframe arrives (2 secs). Sometimes it seems that the network can hold a certain bitrate (stable at 1500kbps, for instance) but the image will still have some dragging as though a frame was lost in the way. With good network conditions, everything is smooth.
How do these streaming devices handle these situations? It always looks great with them, no dragging or skipped frames at all...
There is indeed no information online about adaptive bitrate from the broadcaster side, surprisingly. When I had to implement something like this with RTSP and two rtp sockets, I took a similar approach, creating a polling class that would modestly increase the bitrate of the mediacodec when the packet buffer was >$GOOD_PCT free, aggressively halve it when the queue was less than $BAD_PCT free, and do nothing if it was in between. Partially seen here. I'm not sure I have a complete picture of your solution based on the posted code, but you are adjusting the mediacodec bitrate directly, correct? The only time I had corruption is when I requested a sync frame from the mediacodec, so avoid that if it's in your code. Hope this helps.
I'm trying to calculate the exact song duration of an mp3 file in Android.
I tried using the calculation: song duration = filesize / bitrate, which produces extremely close results, but I'd like to calculate it more precisely.
I found a solution in Java here where it suggests this calculation: song duration = filesize / (framesize * framerate) and provides this example code:
AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(file);
AudioFormat format = audioInputStream.getFormat();
long audioFileLength = file.length();
int frameSize = format.getFrameSize();
float frameRate = format.getFrameRate();
float durationInSeconds = (audioFileLength / (frameSize * frameRate));
However, AudioInputStream is no longer supported in Android. So then I tried figuring out how to obtain the frame size and the frame rate with Android, but I can't find any solutions there either. Any ideas for how to calculate this in Android?
As an aside, using MediaMetadataRetriever.METADATA_KEY_DURATION as suggested here doesn't produce the correct song duration for me. The reason is because I'm streaming my mp3 file, and attempting to calculate the song duration strictly from the file's header information. That way, I can correctly update my progress bar very early on in the streaming process.
I am currently writing some code for a sample sequencer in Android. I am using the AudioTrack class. I have been told the only proper way to have accurate timing is to use the timing of the AudioTrack. EG I know that if I write a buffer of X samples to AudioTrack playing at a rate of 44100 samples per second, that the time to write will be (1/44100)X secs.
Then you use that info to know what samples should be written when.
I am trying to implement my first attempt using this approach. I am using only one sample and am writing it as continuous 16th notes at a tempo of 120bpm. But for some reason it is playing at a rate of 240bpm.
First I checked my code to derive the time of a 16th (nanoseconds) note at tempo X. It checks outs.
private void setPeriod()
{
period=(int)((1/(((double)TEMPO)/60))*1000);
period=(period*1000000)/4;
Log.i("test",String.valueOf(period));
}
Then I verified that my code to get the time for my buffer to be played at 44100khz in nanoseconds and it is correct.
long bufferTime=(1000000000/SAMPLE_RATE)*buffSize;
So now I am left thinking that the audio track is playing at a rate that is different from 44100. Maybe 96000khz, which would explain the doubling of speed. But when I instantiate the audioTrack, it was indeed set to 44100khz.
final int SAMPLE_RATE is set to 44100
buffSize = AudioTrack.getMinBufferSize(SAMPLE_RATE, AudioFormat.CHANNEL_OUT_MONO,
AudioFormat.ENCODING_PCM_16BIT);
track = new AudioTrack(AudioManager.STREAM_MUSIC, SAMPLE_RATE,
AudioFormat.CHANNEL_OUT_MONO,
AudioFormat.ENCODING_PCM_16BIT,
buffSize,
AudioTrack.MODE_STREAM);
So I am confused as to why my tempo is being doubled. I ran a debug to compare time elapsed audioTrack to time elapsed system time, and the it seems that the audiotrack is indeed playing twice as fast as it should be. I am confused.
Just to make sure, this is my play loop.
public void run() {
// TODO Auto-generated method stub
int buffSize=192;
byte[] output = new byte[buffSize];
int pos1=0;//index for output array
int pos2=0;//index for sample array
long bufferTime=(1000000000/SAMPLE_RATE)*buffSize;
long elapsed=0;
int writes=0;
currTrigger=trigger[triggerPointer];
Log.i("test","period="+String.valueOf(period));
Log.i("test","bufferTime="+String.valueOf(bufferTime));
long time=System.nanoTime();
while(play)
{
//fill up the buffer
while(pos1<buffSize)
{
output[pos1]=0;
if(currTrigger&&pos2<sample.length)
{
output[pos1]=sample[pos2];
pos2++;
}
pos1++;
}
track.write(output, 0, buffSize);
elapsed=elapsed+bufferTime;
writes++;
//time passed is more than one 16th note
if(elapsed>=period)
{
Log.i("test",String.valueOf(writes));
Log.i("test","elapsed A.T.="+String.valueOf(elapsed)+" elapsed S.T.="+String.valueOf(System.nanoTime()-time));
time=System.nanoTime();
writes=0;
elapsed=0;
triggerPointer++;
if(triggerPointer==16)
triggerPointer=0;
currTrigger=trigger[triggerPointer];
pos2=0;
}
pos1=0;
}
}
}
edited : rephrased and updated due to initial erroneous assumption that system time was used to synchronize sequenced audio :)
As for audio playing back at twice the speed, this is a bit strange as the "write"-method of the AudioTrack is blocking until the native layer has enqueued the next buffer, are you sure the render loop isn't invoked from two different sources (although I assume from your example you invoke the loop from within a thread).
However, what is certain is that there is a time synchronization issue to address: the problem herein lies with the calculation of the buffer time you use in your example:
(1000000000/SAMPLE_RATE)*buffSize;
Which will always return 4353741 at a buffer size of 192 samples at a sample rate of 44100 Hz, thus disregarding any cues in tempo (for instance this will be the same at 300 BPM or 40 BPM), Now, in your example this doesn't have any consequences for the actual syncing per se, but I'd like to point this out as we'll return to it shortly further on in this text.
Also, nanoseconds are a nicely precise unit, but too much as milliseconds will suffice for audio operations. As such, I will continue the illustration in milliseconds.
Your calculation for the period of a 16th note at 120 BPM indeed checks out at the correct value of 125 ms. The previously mentioned calculation for the period corresponding to each buffer size is 4.3537 ms. This indicates you will iterate the buffer loop 28.7112 times before the time of a single sixteenth note passes. In your example however, you check whether the "offset" for this sixteenth note has passed at the END of the buffer iteration loop (where the period for a single buffer has already been added to the elapsed time!), by using:
elapsed>=period
Which will already lead to drift at the first occasion as at this moment "elapsed" would be at (192 * 29 iterations) 5568 samples (or 126.26 ms), rather than at (192 * 28.7112 iterations) 5512 samples (or 126 ms). This is a difference of 56 samples (or when speaking in time : 1.02 ms). This wouldn't of course lead to samples playing back FASTER than expected (as you stated), but already leads to a irregularity in playback. For the second 16th note (which would occur at the 57.4224th iteration, the drift would be 11136 - 11025 = 111 samples or 2.517 ms (more than half your buffer time!) As such, you must perform this check WITHIN the
while(pos1<buffSize)
loop, where you are incrementing the read pointer up until the size of the buffer has been reached. As such you will need to increase another variable by a fraction of the buffer period PER buffer sample.
I hope the above example illustrates why I'd initially proposed counting time by sample iterations rather than elapsed time (of course the samples DO indicate time, as they are merely translations of a unit of time to an amount of samples in a buffer, but you can use these numbers as the markers, rather than adding a fixed interval to a counter as in your render loop).
First of all, some convenience math to help you with getting these values :
// calculate the amount of samples are necessary for storing the given length of time
// ( in milliSeconds ) at the given sample rate ( in Hz )
int millisecondsToSamples( int milliSeconds, int sampleRate )
{
return ( int ) ( milliSeconds * ( sampleRate / 1000 ));
}
OR : These calculations which are more convenient when thinking in a musical context like you mentioned in your post. Calculate the amount of samples that are present in a single bar of music at the given sample rate ( in Hz ), tempo ( in BPM ) and time signature ( timeSigBeatUnit being the "4" and timeSigBeatAmount being the "3" in a time signature of 3/4 - although most sequencers limit themselves to 4/4 I've added the calculation for explaining the logic).
int samplesPerBeat = ( int ) (( sampleRate * 60 ) / tempo );
int samplesPerBar = samplesPerBeat * timeSigBeatAmount;
int samplesPerSixteenth = ( int ) ( samplesPerBeat / 4 ); // 1/4 of a beat being a 16th
etc.
The way you then write the timed samples into the output buffer is by keeping track of the "playback position' in your buffer callback, i.e. each time you write a buffer, you'll be incrementing the playback position with the length of the buffer. Returning to a musical context: if you were to be "looping a single bar of 120 bpm in 4/4 time", when the playback position would exceed (( sampleRate * 60 ) / 120 * 4 = 88200 samples, you reset it to 0 to "loop" from the beginning.
So let's assume you have two "events" of audio that occur in a sequence of a single bar of 4/4 time at 120 BPM. One event is to play on the 1st beat of a bar and lasts for a quaver (1/8 of a bar) and the other is to play on the 3rd beat of a bar and lasts for another quaver. These two "events" (which you could represent in a value object) would have the following properties, for the first event:
int start = 0; // buffer position 0 is at the 1st beat/start of the bar
int length = 11025; // 1/8 of the full bar size
int end = 11025; // start + length
and the second event:
int start = 44100; // 3rd beat (or half-way through the bar)
int length = 11025;
int end = 55125; // start + length
These value objects could have two additional properties such as "sample" which could be the buffer containing the actual audio and "readPointer" which would hold the last sample-buffer index the sequencer has read from last.
Then in the buffer write loop:
int playbackPosition = 0; // at start of bar
int maximumPlaybackPosition = 88200; // i.e. a single bar of 4/4 at 120 bpm
public void run()
{
// loop through list of "audio events" / samples
for ( CustomValueObject audioEvent : audioEventList )
{
// loop through the buffer length this cycle will write
for ( int i = 0; i < bufferSize; ++i )
{
// calculate "sequence position" from playback position and current iteration
int seqPosition = playbackPosition + i;
// sequence position within start and end range of audio event ?
if ( seqPosition >= audioEvent.start && seqPosition <= audioEvent.end )
{
// YES! write its sample content into the output buffer
output[ i ] += audioEvent.sample[ audioEvent.readPointer ];
// update the sample read pointer to the next slot (but keep in bounds)
if ( ++audioEvent.readPointer == audioEvent.length )
audioEvent.readPointer = 0;
}
}
// update playback position and keep within sequencer range for looping
if ( playbackPosition += bufferSize > maximumPosition )
playbackPosition -= maximumPosition;
}
}
This should give you a perfectly timed approach in writing audio. There's still some magic you have to work out when you're hitting the iteration where the sequence will loop (i.e. read the remaining unprocessed buffer length from the start of the sample for seamless looping) but I hope this gives you a general idea on a working approach.
I'm looking at getInputBufferSize(...) function in AudioHardwareALSA.cpp and it returns hardcoded the value of 320. My question is: How is this value calculated?
I've done some pre-cals but still there are some questions.
sample_rate = 8000
format = S16_LE = 2 bytes/sample
period_time = 10000 us (guessing)
buffer_size = 2 * period_size
period_size = period_time * bytes/sec
buffer_size = 2 * (0.01 * sample_rate * 2) = 320 bytes.
I can't find the period_time in the code, so one question is: where is it defined or is just a rough calculation?
I'm also trying to add some more sample rates i.e 16000 and 32000 (later maybe more). How to calculate the right minimum buffer size? Is the delay always 10 ms for all the sample rates?
Any help is appreciated.
I believe Google implemented NB-AMR encode to start with. later they added support for AAC. In the case of NB-AMR, the frame size is 320 bytes.
You may be aware that for NB-AMR:
sampling rate = 8000 samples / sec
frame duration = 20ms
sample size = 2 bytes
channels = mono
So, each frame contains
8000 samples / sec * 0.02 sec * 2 bytes / sample / channel * 1 channels = 320 bytes
For AAC, these parameters are different and hence the framesize