I've noticed when using MediaExtractor that since there is a lot more audio samples than video frames for a movie, using a One-Image-Decode/One-SoundSample-Decode/One-Image-Encode/OneSound-SoundSample-Encode is not a good strategy as it always ends up by having a lot more audio samples queued for the encoding.
Is this possible when using MediaExtractor to have a custom PCM sample size greater than 4096 bytes (or whatever the size is for the context)? 8192 would be great.
For:
int size = videoExtractor.ReadSampleData (decoderInputBuffer, 0);
Size will always be 4096.
It means that for a 44100, stereo, 16 bits, sound, this represents +- 23ms, which is more or less 43.5 audio samples per second for, sometime, 24 images per second. Doubling the sample size of PCM would allow the video to always have equally or more of audio ready for encoding. This would even allow me to maybe sync it at 24 fps and play the audio samples using AudioTrack, so it opens possibilities for a live preview of my rendering.
I've tried:
inputFormat.SetInteger (MediaFormat.KeyMaxInputSize, 8192);
But it looks more like an optimization setting to set the minimum internal buffer for the decoder.
Is the size customizable?
[EDIT]
For now, I encode two audio samples for each video frame when queuing and the performance change is not noticeable for the video encoding speed. As predicted this ends almost at the same time.
while (_shouldCopyAudio && encodeMoreAudioThanVideo++ < 2) {
[...]
audioEncoder.QueueInputBuffer (encoderInputBufferIndex, 0, size,
pcmChunk.PresentationTimeUs, (MediaCodecBufferFlags)pcmChunk.Flags);
Related
I have android tv application playing 2 different videos udp streaming using Exoplayer, the below images(image1 and image2) show the specifications of every video .. First one with low resolution play well without any problem in buffering, the second one have bad buffering and streaming.
image 1 : the first video with lower resolution (1280x720) which is playing well without any freezing or problem
image 2 : the second video with high resolution (1920x1080) play with freezing, or incontinious buffering
Below is my exoplayer initialisation
ExoPlayer.Builder playerbuilder = new ExoPlayer.Builder(WelcomeActivity.this);
LoadControl loadControl = new DefaultLoadControl.Builder()
.setBufferDurationsMs(15000, 50000, 2000, 5000)
.setTargetBufferBytes(DefaultLoadControl.DEFAULT_TARGET_BUFFER_BYTES)
.setAllocator(new DefaultAllocator(true, C.DEFAULT_BUFFER_SEGMENT_SIZE))
.setPrioritizeTimeOverSizeThresholds(DefaultLoadControl.DEFAULT_PRIORITIZE_TIME_OVER_SIZE_THRESHOLDS)
.build();
player = playerbuilder .setLoadControl(loadControl).setRenderersFactory(buildRenderersFactory(WelcomeActivity.this)).build();
player.setTrackSelectionParameters(
player.getTrackSelectionParameters()
.buildUpon()
.setMaxVideoSize(1280, 720)
.setPreferredTextLanguage("ar")
.build());
As you notice i am setting the maximum video size to low resolution .setMaxVideoSize(1280, 720) but this not changing any thing regards the bad buffering of the second video with high resolution
The size and bit rate of an encoded video is determined by how it is encoded - e.g. what codec is used and what parameters and configuration the encoding uses.
Nearly all encoders try to reduce the size of the encoded video, to make it easier to store and to transmit, and there is a balance between the size reduction and the visual quality of the video when it is subsequently decoded and played.
Generally, smaller size/bit rate means less quality, but it does depend on the content itself - for example some cartoons are very tolerant of lower bit rates. More advanced commercial encoders can do 'context aware' encoding and adjust the quality and bit rate depending on the scenes.
The most common codec at this time is the h.264 codec and ffmpeg is probably the most common open source tool for encoding.
ffmpeg provides a guide for encoding videos, or re-encoding / transcoding videos with h.264 and this includes notes and examples on quality vs bit rate trade off:
https://trac.ffmpeg.org/wiki/Encode/H.264
The two key sections in the above are for Constant Rate Factor encoding and two pass encoding.
CRF
allows the encoder to attempt to achieve a certain output quality for the whole file when output file size is of less importance. This provides maximum compression efficiency with a single pass
Two Pass
Use this rate control mode if you are targeting a specific output file size, and if output quality from frame to frame is of less importance
There is also a note on Constrained encoding (VBV / maximum bit rate) which it would be good to look at too.
I think you might be best to start with CRF encoding and experiment with the CRF value to see if you can find a quality/bit rate combination you are happy with.
Finally, it would be worth you checking out ABR streaming also, especially if you plan to host many videos and want your users to have consistent experience. For ABR you create multiple copies of the video, each encoded with different resolution/bit rate.
The client device or player downloads the video in chunks, e.g 10 second chunks, and selects the next chunk from the bit rate most appropriate to the current network conditions. See some more info in this answer also: https://stackoverflow.com/a/42365034/334402
Is there any way to get or set output sample size using MediaCodec in NDK?
I am using AMediaCodec_createDecoderByType to decode different codec. For some of the codecs its fixed like for mp3 it gives 1152, for aac 1024, so on but in case of .wav i am getting different sample size for different device. it varies from 4096 to 16384 (uncertain).
Since I am maintaining a ring buffer blocks post resampling, and playing from that buffer, I want to know at least the maximum sample size initially to allocate same size accordingly. If there is any api or tricks to know it, would be much helpful.
Thanks,
I want to trim an existing aac-mp4 audio file. For the first time I want to "trim" 0 bytes, basically just to copy the file using MediaCodec/MediaExtractor.
Questions:
The header is fixed size and I can just copy it from the old file? Or it has some infos about the track duration and I need to update it? If it has fixed size which is that (in order to know how many bytes should I copy from the old file)?
Should I only use the extractor's getSampleData(ByteBuffer, offset) and advance() or I should also use the MediaCodec and extract the samples(decode) and then encode them again with an encoder - and write the encoded values?
If you use MediaExtractor, you probably aren't going to read the raw file yourself, so I don't see what header you're proposing to copy. This is probably easiest to do with MediaExtractor + MediaMuxer; just copy the MediaFormat and the packets you get from MediaExtractor to MediaMuxer.
This depends on how you want to do the trimming. It's absolutely simplest to not involve MediaCodec at all, but just copy packets from MediaExtractor to MediaMuxer, and skip the packets at the start that you want to omit (or use seekTo() for seeking to the right start position).
But keep in mind that audio frames have a certain length; for AAC-LC it's usually 1024 samples, which for 48 kHz audio is 21 milliseconds. So if you only copy individual packets, you can't get any closer trimming granularity than 21 milliseconds, for 48 kHz. This probably is fine for most cases, but if the audio has a lower sample rate, say 8 kHZ, the granularity ends up as high as 128 ms.
If you want to trim to a more exact position than the individual packets allow you, you need to decode using MediaCodec, skip the right amount of samples, repackage output frames from the decoder into new full frames for the encoder, and encode this.
I am encoding raw audio to AAC with the MediaCodec API of Android. The problem: I need to send to a server the AAC stream in chunks of one second. So I need to split the stream. Right now, since an AAC frame is 1024 samples, I take round(SAMPLE_RATE/1024) AAC frames for each chunk. However, because of "priming samples" this simple cutting of the AAC stream does not work.
More details follow. After sending a chunk to the server, a client receives it in the web browser Chrome and using Web Audio API plays all received chunks. The playback is done in such a way to be gapless: a large audiobuffer is initially allocated, the received chunks are decoded and copied in the audiobuffer, the audiobuffer is played.
Now, this does not work with AAC (it works with Ogg/Vorbis though). With AAC I have artifacts in the generated sound. At end of each second the start of the next second is zero, then, gradually, the waveform grows until it has a normal size. This lasts for 10, 20 milliseconds.
I believe the problem is caused by missing "priming samples". Maybe the Web Audio API is expecting "priming samples" at the start of each AAC chunk, it does not find them and thus modifies the actual audio.
The question is: how can I split the original AAC stream and send "good" AAC chunks of one second?
From what I have understood, I should include at the start of each chunk the previous two frames (last two frames of the previous chunk). However, this number should vary and there is not much documentation. Some expert advice is appreciated.
I am using the following method. I am not an expert of AAC so I may be missing something, but experimentally it is working.
Assuming that the Chrome decoder is expecting priming samples at the start of each chunk I do the following: before sending a chunk to the server, I add at its beginning the last 4 AAC frames of the previous chunk (if it is the first chunk I do not do this). Client-side, I retrieve a chunk, I decode it and the remove the first 4*1024 samples (1024 = samples in one AAC frame).
This is working.
I am building the capability to frame-accurately trim video files on Android. Transcoding is implemented with MediaExtractor, MediaCodec, and MediaMuxer. I need help truncating arbitrary Audio frames in order to match their Video frame counterparts.
I believe the Audio frames must be trimmed in the Decoder output buffer, which is the logical place in which uncompressed audio data is available for editing.
For in/out trims I am calculating the necessary offset and size adjustments to the raw Audio buffer to shoehorn it into the available endcap frames, and I am submitting the data with the following code:
MediaCodec.BufferInfo info = pendingAudioDecoderOutputBufferInfos.poll();
...
ByteBuffer decoderOutputBuffer = audioDecoder.getOutputBuffer(decoderIndex).duplicate();
decoderOutputBuffer.position(info.offset);
decoderOutputBuffer.limit(info.offset + info.size);
encoderInputBuffer.position(0);
encoderInputBuffer.put(decoderOutputBuffer);
info.flags |= MediaCodec.BUFFER_FLAG_END_OF_STREAM;
audioEncoder.queueInputBuffer(encoderIndex, info.offset, info.size, presentationTime, info.flags);
audioDecoder.releaseOutputBuffer(decoderIndex, false);
My problem is that the data adjustments appear to affect only the data copied onto the output audio buffer, but not to shorten the audio frame that gets written into the MediaMuxer. The output video either ends up with several milli-seconds of missing audio at the end of the clip, or if I write too much data the audio frame gets dropped completely from the end of the clip.
How to properly trim an Audio Frame?
There's a few things at play here:
As Dave pointed out, you should pass 0 instead of info.offset to audioEncoder.queueInputBuffer - you already took the offset of the decoder output buffer into account when you set the buffer position with decoderOutputBuffer.position(info.offset);. But perhaps you update it somehow already.
I'm not sure if MediaCodec audio encoders allow you to pass audio data in arbitrary sized chunks, or it you need to send it exactly full audio frames at a time. I think it might accept it though - then you're fine. If not, you need to buffer the audio up yourself and pass it to the encoder once you have a full frame (in case you trimmed out some at the start)
Keep in mind that audio also is frame based (for AAC, it's 1024 samples frames unless you use the low delay variants or HE-AAC), so for 44 kHz, you can have audio duration only with a 23 ms granularity. If you want your audio to end precisely after the right amount of samples, you need to use container signaling to indicate this. I'm not sure if the MediaCodec audio encoder flushes whatever half frame you have at the end, or if you manually need to pass it extra zeros at the end in order to get the last few samples, if you aren't aligned to the frame size. It might not be needed though.
Encoding AAC audio does introduce some delay into the audio stream; after decoding, you'll have a number of priming samples at the start of the decoded stream (the exact number of these depends on the encoder - for the software encoder in Android for AAC-LC, it's probably 2048 samples, but it might also vary). For the case of 2048 samples, it exactly lines up with 2 frames of audio, but it can also be something that isn't a whole number of frames. I don't think MediaCodec signals the exact amount of delay either. If you drop the 2 first output packets from the encoder (in case the delay is 2048 samples), you'll avoid the extra delay, but the actual decoded audio for the first few frames won't be exactly right. (The priming packets are necessary to be able to properly represent whatever samples your stream starts with, otherwise it will more or less converge towards your intended audio within 2048 samples.)