Android MediaCodec How to Frame Accurately Trim Audio

Android MediaCodec How to Frame Accurately Trim Audio - android

I am building the capability to frame-accurately trim video files on Android. Transcoding is implemented with MediaExtractor, MediaCodec, and MediaMuxer. I need help truncating arbitrary Audio frames in order to match their Video frame counterparts.
I believe the Audio frames must be trimmed in the Decoder output buffer, which is the logical place in which uncompressed audio data is available for editing.
For in/out trims I am calculating the necessary offset and size adjustments to the raw Audio buffer to shoehorn it into the available endcap frames, and I am submitting the data with the following code:
MediaCodec.BufferInfo info = pendingAudioDecoderOutputBufferInfos.poll();
...
ByteBuffer decoderOutputBuffer = audioDecoder.getOutputBuffer(decoderIndex).duplicate();
decoderOutputBuffer.position(info.offset);
decoderOutputBuffer.limit(info.offset + info.size);
encoderInputBuffer.position(0);
encoderInputBuffer.put(decoderOutputBuffer);
info.flags |= MediaCodec.BUFFER_FLAG_END_OF_STREAM;
audioEncoder.queueInputBuffer(encoderIndex, info.offset, info.size, presentationTime, info.flags);
audioDecoder.releaseOutputBuffer(decoderIndex, false);
My problem is that the data adjustments appear to affect only the data copied onto the output audio buffer, but not to shorten the audio frame that gets written into the MediaMuxer. The output video either ends up with several milli-seconds of missing audio at the end of the clip, or if I write too much data the audio frame gets dropped completely from the end of the clip.
How to properly trim an Audio Frame?

There's a few things at play here:
As Dave pointed out, you should pass 0 instead of info.offset to audioEncoder.queueInputBuffer - you already took the offset of the decoder output buffer into account when you set the buffer position with decoderOutputBuffer.position(info.offset);. But perhaps you update it somehow already.
I'm not sure if MediaCodec audio encoders allow you to pass audio data in arbitrary sized chunks, or it you need to send it exactly full audio frames at a time. I think it might accept it though - then you're fine. If not, you need to buffer the audio up yourself and pass it to the encoder once you have a full frame (in case you trimmed out some at the start)
Keep in mind that audio also is frame based (for AAC, it's 1024 samples frames unless you use the low delay variants or HE-AAC), so for 44 kHz, you can have audio duration only with a 23 ms granularity. If you want your audio to end precisely after the right amount of samples, you need to use container signaling to indicate this. I'm not sure if the MediaCodec audio encoder flushes whatever half frame you have at the end, or if you manually need to pass it extra zeros at the end in order to get the last few samples, if you aren't aligned to the frame size. It might not be needed though.
Encoding AAC audio does introduce some delay into the audio stream; after decoding, you'll have a number of priming samples at the start of the decoded stream (the exact number of these depends on the encoder - for the software encoder in Android for AAC-LC, it's probably 2048 samples, but it might also vary). For the case of 2048 samples, it exactly lines up with 2 frames of audio, but it can also be something that isn't a whole number of frames. I don't think MediaCodec signals the exact amount of delay either. If you drop the 2 first output packets from the encoder (in case the delay is 2048 samples), you'll avoid the extra delay, but the actual decoded audio for the first few frames won't be exactly right. (The priming packets are necessary to be able to properly represent whatever samples your stream starts with, otherwise it will more or less converge towards your intended audio within 2048 samples.)

Related

Trim aac-mp4 audio in android (mediaCodec/extractor)

I want to trim an existing aac-mp4 audio file. For the first time I want to "trim" 0 bytes, basically just to copy the file using MediaCodec/MediaExtractor.
Questions:
The header is fixed size and I can just copy it from the old file? Or it has some infos about the track duration and I need to update it? If it has fixed size which is that (in order to know how many bytes should I copy from the old file)?
Should I only use the extractor's getSampleData(ByteBuffer, offset) and advance() or I should also use the MediaCodec and extract the samples(decode) and then encode them again with an encoder - and write the encoded values?

If you use MediaExtractor, you probably aren't going to read the raw file yourself, so I don't see what header you're proposing to copy. This is probably easiest to do with MediaExtractor + MediaMuxer; just copy the MediaFormat and the packets you get from MediaExtractor to MediaMuxer.
This depends on how you want to do the trimming. It's absolutely simplest to not involve MediaCodec at all, but just copy packets from MediaExtractor to MediaMuxer, and skip the packets at the start that you want to omit (or use seekTo() for seeking to the right start position).
But keep in mind that audio frames have a certain length; for AAC-LC it's usually 1024 samples, which for 48 kHz audio is 21 milliseconds. So if you only copy individual packets, you can't get any closer trimming granularity than 21 milliseconds, for 48 kHz. This probably is fine for most cases, but if the audio has a lower sample rate, say 8 kHZ, the granularity ends up as high as 128 ms.
If you want to trim to a more exact position than the individual packets allow you, you need to decode using MediaCodec, skip the right amount of samples, repackage output frames from the decoder into new full frames for the encoder, and encode this.

Splitting an AAC stream, priming / padding samples problems (gapless playback)

I am encoding raw audio to AAC with the MediaCodec API of Android. The problem: I need to send to a server the AAC stream in chunks of one second. So I need to split the stream. Right now, since an AAC frame is 1024 samples, I take round(SAMPLE_RATE/1024) AAC frames for each chunk. However, because of "priming samples" this simple cutting of the AAC stream does not work.
More details follow. After sending a chunk to the server, a client receives it in the web browser Chrome and using Web Audio API plays all received chunks. The playback is done in such a way to be gapless: a large audiobuffer is initially allocated, the received chunks are decoded and copied in the audiobuffer, the audiobuffer is played.
Now, this does not work with AAC (it works with Ogg/Vorbis though). With AAC I have artifacts in the generated sound. At end of each second the start of the next second is zero, then, gradually, the waveform grows until it has a normal size. This lasts for 10, 20 milliseconds.
I believe the problem is caused by missing "priming samples". Maybe the Web Audio API is expecting "priming samples" at the start of each AAC chunk, it does not find them and thus modifies the actual audio.
The question is: how can I split the original AAC stream and send "good" AAC chunks of one second?
From what I have understood, I should include at the start of each chunk the previous two frames (last two frames of the previous chunk). However, this number should vary and there is not much documentation. Some expert advice is appreciated.

I am using the following method. I am not an expert of AAC so I may be missing something, but experimentally it is working.
Assuming that the Chrome decoder is expecting priming samples at the start of each chunk I do the following: before sending a chunk to the server, I add at its beginning the last 4 AAC frames of the previous chunk (if it is the first chunk I do not do this). Client-side, I retrieve a chunk, I decode it and the remove the first 4*1024 samples (1024 = samples in one AAC frame).
This is working.

Increase PCM sample data size

I've noticed when using MediaExtractor that since there is a lot more audio samples than video frames for a movie, using a One-Image-Decode/One-SoundSample-Decode/One-Image-Encode/OneSound-SoundSample-Encode is not a good strategy as it always ends up by having a lot more audio samples queued for the encoding.
Is this possible when using MediaExtractor to have a custom PCM sample size greater than 4096 bytes (or whatever the size is for the context)? 8192 would be great.
For:
int size = videoExtractor.ReadSampleData (decoderInputBuffer, 0);
Size will always be 4096.
It means that for a 44100, stereo, 16 bits, sound, this represents +- 23ms, which is more or less 43.5 audio samples per second for, sometime, 24 images per second. Doubling the sample size of PCM would allow the video to always have equally or more of audio ready for encoding. This would even allow me to maybe sync it at 24 fps and play the audio samples using AudioTrack, so it opens possibilities for a live preview of my rendering.
I've tried:
inputFormat.SetInteger (MediaFormat.KeyMaxInputSize, 8192);
But it looks more like an optimization setting to set the minimum internal buffer for the decoder.
Is the size customizable?
[EDIT]
For now, I encode two audio samples for each video frame when queuing and the performance change is not noticeable for the video encoding speed. As predicted this ends almost at the same time.
while (_shouldCopyAudio && encodeMoreAudioThanVideo++ < 2) {
[...]
audioEncoder.QueueInputBuffer (encoderInputBufferIndex, 0, size,
pcmChunk.PresentationTimeUs, (MediaCodecBufferFlags)pcmChunk.Flags);

Mux video with my own audio PCM track

Using Android MediaMuxer, what would be a decent way to add my own PCM track as the audio track in the final movie?
In a movie, at a certain time, I'm slowing down, stop, then accelerate and restart a video. For the video part, it's easy to directly affect the presentation time, but for audio, there is a chunk-by-chunk process that makes less intuitive to handle a slow down, a stop and a start in the audio track.
Currently, when iterating through the buffer I've received from the source, to slow down the whole track I do:
// Multiply by 3 the presentation time.
audioEncoderOutputBufferInfo.PresentationTimeUs =
audioEncoderOutputBufferInfo.PresentationTimeUs * ratio);
// I expand the sample by 3. Damn, just realized I haven't
// respected the sample alignment but anyway, the problem is not about white noise...
encoderOutputBuffer = Slowdown(encoderOutputBuffer, 3);
// I then write it in the muxer
muxer.WriteSampleData(outputAudioTrack, encoderOutputBuffer, audioEncoderOutputBufferInfo);
But this just doesn't play. Of course, if the MediaFormat from the source was copied to the destination, then it will have a 3 times shorter duration than the actual audio data.
Could I just take the whole PCM from an input, edit the byte[] array, and add it as a track to the MediaMuxer?

If you want to slow down your audio samples you need to do this before you encode them, so before you queue the input buffer of your audio codec.
From my experience, the audio presentation timestamps are ignored by most of the players out there (I tried it with VLC and ffplay). If you want to make sure that audio and video stay in sync, you must make sure that you actually have enough audio samples to fill in the gap between to pts, otherwise the player will just start to play the following samples regardless of their pts.
Furthermore you cannot just mux PCM samples using the MediaMuxer, you need to encode them first.

Android MediaCodec -long processing for each frame

Edit as i wasn't clear at first time:
I'm trying to use android MediaCodec to get each frame from existing video file(videoBefore.MP4) ,process the frame(like blur) and then encode each frame to a new video file(videoAfter.MP4).
The new video have to be in the same duration as the first.
Just 1 condition:
Every frame should be process with unlimited time,it mean that 10 sec video could take 1 minute for processing.
So far i saw only examples with quick processing (like blue shift) that could be done in real time.
Is there any way to grab the frame from the video,and then "take my time" to process it,and still preserved the new video with the same frame rate or frame timing?
*it could be better if i can preserve the audio too-but the frame is what important.
Thanks!

You can take as long as you like. The timing of the frames is determined by the presentation time stamp embedded in the .mp4 file, not the rate at which the code is accessed.
You get the time value for each frame from MediaExtractor#getSampleTime(), pass it into the decoder's queueInputBuffer(), and receive it in the BufferInfo struct associated with the decoder's output buffer. Do your processing and submit the frame to the encoder, again specifying the time stamp in queueInputBuffer(). It will be passed through BufferInfo to the output side of the encoder, and you just pass the whole BufferInfo to MediaMuxer#writeSampleData().
You can see the extraction side in ExtractMpegFramesTest and the muxing side in EncodeAndMuxTest. The DecodeEditEncodeTest does the encode/decode preserving the time stamp, but doesn't show the MediaExtractor or MediaMuxer usage.
Bear in mind that the codecs don't really care about time stamps. It's just the extractor/muxer code that handles the .mp4 file that cares. The value gets passed through the codec partly as a convenience, and partly because it's possible for encoded frames to appear out of order. (The decoded frames, i.e. what comes out of the decoder, will always be in order.)
If you fail to preserve the presentation times, you will get video that either lasts zero seconds (and isn't very interesting), or possibly video that lasts a very, very long time. The screenrecord command introduced in Android 4.4 uses the time stamps to avoid recording frames when the screen isn't being updated.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.