I want to trim an existing aac-mp4 audio file. For the first time I want to "trim" 0 bytes, basically just to copy the file using MediaCodec/MediaExtractor.
Questions:
The header is fixed size and I can just copy it from the old file? Or it has some infos about the track duration and I need to update it? If it has fixed size which is that (in order to know how many bytes should I copy from the old file)?
Should I only use the extractor's getSampleData(ByteBuffer, offset) and advance() or I should also use the MediaCodec and extract the samples(decode) and then encode them again with an encoder - and write the encoded values?
If you use MediaExtractor, you probably aren't going to read the raw file yourself, so I don't see what header you're proposing to copy. This is probably easiest to do with MediaExtractor + MediaMuxer; just copy the MediaFormat and the packets you get from MediaExtractor to MediaMuxer.
This depends on how you want to do the trimming. It's absolutely simplest to not involve MediaCodec at all, but just copy packets from MediaExtractor to MediaMuxer, and skip the packets at the start that you want to omit (or use seekTo() for seeking to the right start position).
But keep in mind that audio frames have a certain length; for AAC-LC it's usually 1024 samples, which for 48 kHz audio is 21 milliseconds. So if you only copy individual packets, you can't get any closer trimming granularity than 21 milliseconds, for 48 kHz. This probably is fine for most cases, but if the audio has a lower sample rate, say 8 kHZ, the granularity ends up as high as 128 ms.
If you want to trim to a more exact position than the individual packets allow you, you need to decode using MediaCodec, skip the right amount of samples, repackage output frames from the decoder into new full frames for the encoder, and encode this.
Related
How do I decode something on the order of a 1000 bytes of PCM audio from an mp3 file, without decoding the whole thing?
I need to mix together four to six tracks, to one, so that they're played simultaneously on an AudioTrack in the Android app.
This can be done if I can get a stream of PCM samples, and so simple add the decoded tracks together (and maybe adjust for clipping and volume), and then write them to an AudioTrack buffer.
That part is simple.
But how do I decode the individual mp3 files, to inputstreams I can get byte arrays from? I've found something called JLayer, but its not quite clear to me how to do this.
I'd rather avoid doing it in C++ (I'm a bit rusty, and my team doesn't like it), though if that's needed I can do it. Though I'd need a short example of how get say 240 decoded bytes from a file via mpg123, or other such libraries.
Any help is appreciated.
The smallest you can do is 576 samples, which is the smallest MP3 frame size. However, most MP3 streams use the bit reservoir meaning you likely have to decode frames around the frame you want to decode as well.
Complicating things further, bare MP3 streams don't have any internal timestamping, so if you want to drop accurately in the middle of a file, you have to decode up until that point. (MP3 frame headers don't contain byte lengths, so you can't just skim frame headers accurately.) You can try to needle-drop into the middle of the file based on byte length, but this isn't an accurate way of seeking and can be off by several seconds, even for CBR. For VBR, it's all over the place.
It sounds like all you need to do is have a stream decoder, so that the decoding happens as playback is occurring. I'm no Android developer, but it seems you can just use AudioTrack from the framework, in streaming mode. https://developer.android.com/reference/android/media/AudioTrack.html And then the MediaCodec to actually do the decoding. https://developer.android.com/reference/android/media/MediaCodec.html Android devices support MP3, so you don't need to do anything else.
i would like to use opus codec in my linphone application
but i have a few questions , if someone with opus codec knowledge could help me out would appreciate
OPUS. Does this codec compress as well as package the data?
What is the output data structure from OPUS?
Is the output data streaming or packets?
What does the audio sampling scheme look like?
and….
Within the audio sampling scheme, what are the values for silence?
Within the audio sampling scheme, what are the values for speech?
thx in advance
I see you asked this on the mailing list too, but I'll answer here. I'm not sure what you mean in some the questions, but here's a start. You've tagged your post as relating to Android; I mostly know about the reference C implementation, so if you're asking about the java interface available to Android applications this won't be much help.
OPUS. Does this codec compress as well as package the data?
The Opus codec compressed pcm audio data into packets. There's internal structure, but the codec requires a transport layer like RTP to keep track of the boundaries between compressed packets.
What is the output data structure from OPUS?
The reference encoder accepts a given duration of pcm audio data an fills in a given buffer with compressed data up to a maximum requested size. See opus_encode() and opus_encode_float() in the encoder documentation for details.
Is the output data streaming or packets?
Opus produces a sequence of packets.
What does the audio sampling scheme look like? and….
The reference encoder accepts interleaved mono, stereo, or surround pcm audio data with either 16-bit signed integer or floating point samples at 8, 12, 16, 24, or 48 kHz.
Within the audio sampling scheme, what are the values for silence?
Zero pcm values are silence. As a perceptual codec Opus will try to encode low-level noise if there is no other signal. There is also support for special zero-data compressed packets for sending silence or handling discontinuous transmission.
Within the audio sampling scheme, what are the values for speech?
I'm not sure what you're asking here. Speech is treated the same as music, and will sound equally normal down to 64 kbps. The codec can maintain transparency for speech down to much lower bitrates than for music (something like 24 kbps for mono) and is intelligible down to 6 kbps for narrowband speech.
I am encoding raw audio to AAC with the MediaCodec API of Android. The problem: I need to send to a server the AAC stream in chunks of one second. So I need to split the stream. Right now, since an AAC frame is 1024 samples, I take round(SAMPLE_RATE/1024) AAC frames for each chunk. However, because of "priming samples" this simple cutting of the AAC stream does not work.
More details follow. After sending a chunk to the server, a client receives it in the web browser Chrome and using Web Audio API plays all received chunks. The playback is done in such a way to be gapless: a large audiobuffer is initially allocated, the received chunks are decoded and copied in the audiobuffer, the audiobuffer is played.
Now, this does not work with AAC (it works with Ogg/Vorbis though). With AAC I have artifacts in the generated sound. At end of each second the start of the next second is zero, then, gradually, the waveform grows until it has a normal size. This lasts for 10, 20 milliseconds.
I believe the problem is caused by missing "priming samples". Maybe the Web Audio API is expecting "priming samples" at the start of each AAC chunk, it does not find them and thus modifies the actual audio.
The question is: how can I split the original AAC stream and send "good" AAC chunks of one second?
From what I have understood, I should include at the start of each chunk the previous two frames (last two frames of the previous chunk). However, this number should vary and there is not much documentation. Some expert advice is appreciated.
I am using the following method. I am not an expert of AAC so I may be missing something, but experimentally it is working.
Assuming that the Chrome decoder is expecting priming samples at the start of each chunk I do the following: before sending a chunk to the server, I add at its beginning the last 4 AAC frames of the previous chunk (if it is the first chunk I do not do this). Client-side, I retrieve a chunk, I decode it and the remove the first 4*1024 samples (1024 = samples in one AAC frame).
This is working.
I am building the capability to frame-accurately trim video files on Android. Transcoding is implemented with MediaExtractor, MediaCodec, and MediaMuxer. I need help truncating arbitrary Audio frames in order to match their Video frame counterparts.
I believe the Audio frames must be trimmed in the Decoder output buffer, which is the logical place in which uncompressed audio data is available for editing.
For in/out trims I am calculating the necessary offset and size adjustments to the raw Audio buffer to shoehorn it into the available endcap frames, and I am submitting the data with the following code:
MediaCodec.BufferInfo info = pendingAudioDecoderOutputBufferInfos.poll();
...
ByteBuffer decoderOutputBuffer = audioDecoder.getOutputBuffer(decoderIndex).duplicate();
decoderOutputBuffer.position(info.offset);
decoderOutputBuffer.limit(info.offset + info.size);
encoderInputBuffer.position(0);
encoderInputBuffer.put(decoderOutputBuffer);
info.flags |= MediaCodec.BUFFER_FLAG_END_OF_STREAM;
audioEncoder.queueInputBuffer(encoderIndex, info.offset, info.size, presentationTime, info.flags);
audioDecoder.releaseOutputBuffer(decoderIndex, false);
My problem is that the data adjustments appear to affect only the data copied onto the output audio buffer, but not to shorten the audio frame that gets written into the MediaMuxer. The output video either ends up with several milli-seconds of missing audio at the end of the clip, or if I write too much data the audio frame gets dropped completely from the end of the clip.
How to properly trim an Audio Frame?
There's a few things at play here:
As Dave pointed out, you should pass 0 instead of info.offset to audioEncoder.queueInputBuffer - you already took the offset of the decoder output buffer into account when you set the buffer position with decoderOutputBuffer.position(info.offset);. But perhaps you update it somehow already.
I'm not sure if MediaCodec audio encoders allow you to pass audio data in arbitrary sized chunks, or it you need to send it exactly full audio frames at a time. I think it might accept it though - then you're fine. If not, you need to buffer the audio up yourself and pass it to the encoder once you have a full frame (in case you trimmed out some at the start)
Keep in mind that audio also is frame based (for AAC, it's 1024 samples frames unless you use the low delay variants or HE-AAC), so for 44 kHz, you can have audio duration only with a 23 ms granularity. If you want your audio to end precisely after the right amount of samples, you need to use container signaling to indicate this. I'm not sure if the MediaCodec audio encoder flushes whatever half frame you have at the end, or if you manually need to pass it extra zeros at the end in order to get the last few samples, if you aren't aligned to the frame size. It might not be needed though.
Encoding AAC audio does introduce some delay into the audio stream; after decoding, you'll have a number of priming samples at the start of the decoded stream (the exact number of these depends on the encoder - for the software encoder in Android for AAC-LC, it's probably 2048 samples, but it might also vary). For the case of 2048 samples, it exactly lines up with 2 frames of audio, but it can also be something that isn't a whole number of frames. I don't think MediaCodec signals the exact amount of delay either. If you drop the 2 first output packets from the encoder (in case the delay is 2048 samples), you'll avoid the extra delay, but the actual decoded audio for the first few frames won't be exactly right. (The priming packets are necessary to be able to properly represent whatever samples your stream starts with, otherwise it will more or less converge towards your intended audio within 2048 samples.)
I've noticed when using MediaExtractor that since there is a lot more audio samples than video frames for a movie, using a One-Image-Decode/One-SoundSample-Decode/One-Image-Encode/OneSound-SoundSample-Encode is not a good strategy as it always ends up by having a lot more audio samples queued for the encoding.
Is this possible when using MediaExtractor to have a custom PCM sample size greater than 4096 bytes (or whatever the size is for the context)? 8192 would be great.
For:
int size = videoExtractor.ReadSampleData (decoderInputBuffer, 0);
Size will always be 4096.
It means that for a 44100, stereo, 16 bits, sound, this represents +- 23ms, which is more or less 43.5 audio samples per second for, sometime, 24 images per second. Doubling the sample size of PCM would allow the video to always have equally or more of audio ready for encoding. This would even allow me to maybe sync it at 24 fps and play the audio samples using AudioTrack, so it opens possibilities for a live preview of my rendering.
I've tried:
inputFormat.SetInteger (MediaFormat.KeyMaxInputSize, 8192);
But it looks more like an optimization setting to set the minimum internal buffer for the decoder.
Is the size customizable?
[EDIT]
For now, I encode two audio samples for each video frame when queuing and the performance change is not noticeable for the video encoding speed. As predicted this ends almost at the same time.
while (_shouldCopyAudio && encodeMoreAudioThanVideo++ < 2) {
[...]
audioEncoder.QueueInputBuffer (encoderInputBufferIndex, 0, size,
pcmChunk.PresentationTimeUs, (MediaCodecBufferFlags)pcmChunk.Flags);