I have some automated tests that try to decode a few m4a files to PCM data using Android's MediaDecoder and MediaExtractor. The files are generated with various encoders: fdk-aac, ffmpeg (with fdk or the default aac encoder), iOS.
On Android 9 the test fails for the clips created with ffmpeg, which results in empty PCM files. The same clips are decoded fine on older versions of Android.
I double checked my code and the decoding process goes as expected:
I extract compressed data using MediaExtractor
Enqueue it to the codec
Dequeue the output buffer from the codec.
The issue is that by the time the last available input buffer is enqueued and the output buffer with MediaCodec.BUFFER_FLAG_END_OF_STREAM is dequeued, all output buffers are empty!
Then I noticed that the MediaFormat info extracted from the audio file with MediaExtractor.getTrackFormat(int track) contains an undocumented "encoder-delay" key.
For android 8 and lower, that key is only present for m4a clips encoded with the iTunSMPB tag info. Here's a summary of the values I get for my test files:
iOS-encoded file: 2112 frames
fdkaac with iTunSMPB tag: 2048 frames
fdkaac with ISO delay info: key not present
ffmpeg: key not present
ffmpeg (fdk): key not present
On Android 9, instead, I get the following results:
iOS-encoded file: 2112 frames
fdkaac with iTunSMPB tag: 2048 frames
fdkaac with ISO delay info: 2048 frames
ffmpeg: 45158 frames
ffmpeg (fdk): 90317 frames
It looks like something has changed and MediaExtractor is now able to retrieve the encoder delay for all the files under test. This is good in theory, since the files with no "encoder-delay" info do show a delay in the decoded PCM data (this was a known issue).
But... while the value for the "fdkaac with ISO delay info" case is correct and leads to a valid PCM file with no initial padding (finally!), the values for the ffmpeg-generated files look huge and likely wrong!
I know the real encoder delay values are 1024 for the ffmpeg case, and 2048 for the ffmpeg (fdk) case, and I think the high value for key in the extracted format is the reason why the file is empty.
In fact, if I try setting the "encoder-delay" key to 0 in the format just before passing it to MediaCodec.configure(...) I get the correct uncompressed data with the expected delay.
My guess at this point is that the MediaExtractor encoder delay value retrieval has some bug, but maybe there's something I am overlooking.
Since ffmpeg is quite popular, it's quite likely that many of my app users will try importing files generated with it, and at this point I can't see a foolproof solution to the issue.
Does anyone have a suggestion / workaround?
I opened an issue on the android issue tracker:
https://issuetracker.google.com/issues/118398811
And for now I just implemented a workaround: when the "encoder-delay" value is present in the MediaFormat object and it's an impossibly high value, I just set it to zero. Something like:
if (format.containsKey("encoder-delay") && format.getInteger("encoder-delay") > THRESHOLD) {
format.setInteger("encoder-delay", 0);
}
NB: This means the initial gap will not be trimmed away, but for M4a files that don't have such info this is already the case on pre-android-9 devices.
Related
I want to trim an existing aac-mp4 audio file. For the first time I want to "trim" 0 bytes, basically just to copy the file using MediaCodec/MediaExtractor.
Questions:
The header is fixed size and I can just copy it from the old file? Or it has some infos about the track duration and I need to update it? If it has fixed size which is that (in order to know how many bytes should I copy from the old file)?
Should I only use the extractor's getSampleData(ByteBuffer, offset) and advance() or I should also use the MediaCodec and extract the samples(decode) and then encode them again with an encoder - and write the encoded values?
If you use MediaExtractor, you probably aren't going to read the raw file yourself, so I don't see what header you're proposing to copy. This is probably easiest to do with MediaExtractor + MediaMuxer; just copy the MediaFormat and the packets you get from MediaExtractor to MediaMuxer.
This depends on how you want to do the trimming. It's absolutely simplest to not involve MediaCodec at all, but just copy packets from MediaExtractor to MediaMuxer, and skip the packets at the start that you want to omit (or use seekTo() for seeking to the right start position).
But keep in mind that audio frames have a certain length; for AAC-LC it's usually 1024 samples, which for 48 kHz audio is 21 milliseconds. So if you only copy individual packets, you can't get any closer trimming granularity than 21 milliseconds, for 48 kHz. This probably is fine for most cases, but if the audio has a lower sample rate, say 8 kHZ, the granularity ends up as high as 128 ms.
If you want to trim to a more exact position than the individual packets allow you, you need to decode using MediaCodec, skip the right amount of samples, repackage output frames from the decoder into new full frames for the encoder, and encode this.
I'm having a very difficult time with MediaCodc. I've used it previously to decode a raw h.264 stream and learned a significant amount. At least I thought I had.
My stream is h.264 in Annex B format. Looking at the raw data, the structure of my NAL packet code types are as follows:
[0x09][0x09][0x06] and then 8 packets of [0x21].
[0x09][0x09][0x27][0x28][0x06] and then 8 packets of [0x21].
This is not how I am receiving them. I am attempting to build a complete Access Unit from these raw NAL unit types.
First thing that is strange to me is the double [0x09] which is the Access Unit Delimiter packet. I am pretty sure the h.264 document specifies only 1 AUD per Access Unit. BTW, I am able to record the raw data and play it using ffmpeg with, and without, the extra AUD. For now, I am detecting this message and stripping the first one off before sending the entire Access Unit to the MediaCodec.
Second thing is, I have hardcoded the SPS/PPS byte array messages [0x27/0x28] and I am setting these in the MediaFormat used to initialize the MediaCodec, similar to:
format.setByteBuffer("csd-0", ByteBuffer.wrap( mySPS ));
format.setByteBuffer("csd-1", ByteBuffer.wrap( myPPS ));
My video stream provider vendor tells me the video is 1280 x 720, however, when I convert it to an mp4 file, the metadata says its 960 x 720. Another oddity.
Changing these different parameters around, I am still unable to get a valid buffer index in my Thread that processes the decoder output (dequeueOutputBuffer returns -1). I have also varied the timeout for this to no avail. If I manually set the SPS/PPS as the first packet NOT using the example above, I do get the -3 "output buffers have changed" which is meaningless since I am using API 20. But everything else I get returned is -1.
I've read about the Emulation Prevention Byte encoding of h.264. I am able to strip this byte out and send to the MediaCodec. Doesn't seem to make a difference. Also, the documentation for MediaCodec doesn't explicitly say if the code is expecting the EPB to be stripped out or not ... ?
Other than the video frame resolution, the only other thing that is different than my previous success is the existence of the SEI packet type [0x06]. I'm not sure if I should be doing something special with this or not?
I know there are a number of folks who have used MediaCodec have had issues with it, mostly because the documentation is not very good. Can anyone offer any other advice as to what I could be doing wrong?
I am successfully using MediaCodec to decode audio, however when I load a file with 24-bit samples, I have no way of knowing this has occurred. Since the application was assuming 16-bit samples, it fails.
When I print the MediaFormat, I see
{mime=audio/raw, durationUs=239000000, bits-format=6, channel-count=2, channel-mask=0, sample-rate=96000}
I assume that the "bits-format" would be a hint, however this key is not declared in the API, and is not actually emitted when the output format changes. I get
{mime=audio/raw, what=1869968451, channel-count=2, channel-mask=0, sample-rate=96000}
(By the way what is the "what" key? I notice if I interpret as a 4charcode, it is "outC"... just a flag that it is an output format?)
So what is the best recourse here? If I feed the ByteBuffer straight to the AudioTrack it plays static of course (assuming PCM 16).
If I know the value, then I can convert it myself!
I understand from other questions that you cannot dictate the output format either.
I'm trying to draw a custom waveform view from a decoded audio file. The problem is that when I use the MediaCodec class to decode an m4a sound file to pcm, the decoding process may take a long time.
I followed this CTS test to get the decoded array.
Is there any way to minimize the decoding process or to get relevant information about the audio file frames without decoding the whole file each time.
You don't need to wait for the whole file to be decoded, you can use the decoded data from each frame once it is available.
In the CTS test you linked, have a look at lines 223-234 - here a buffer of decoded data has been received from the decoder, which is copied into an output buffer. If you change this to do the processing of the data right there, you can do your handling of the decoded data immediately without waiting for the whole file. (This is how playback of a file would be done as well.)
Keep in mind that you can't really assume the size of the output frames, it depends on the frame size of the codec used in the file. (For the most common versions of AAC, it is 1024 samples.) So depending on how you want to process your output data, you might want to buffer it up into slightly larger chunks if necessary.
I'm writing an audio streaming app that buffers AAC file chunks, decodes those chunks to PCM byte arrays, and writes the PCM audio data to AudioTrack. Occasionally, I get the following error when I try to either skip to a different song, call AudioTrack.pause(), or AudioTrack.flush():
obtainbuffer timed out -- is cpu pegged?
And then what happens is that a split second of audio continues to play. I've tried reading a set of AAC files from the sdcard and got the same result. The behavior I'm expecting is that the audio stops immediately. Does anyone know why this happens? I wonder if its an Audio latency issue with Android 2.3.
edit: The AAC audio contains an ADTS Header. The header + audio payload constitute what I'm calling ADTSFrame. These are fed to the decoder one frame at a time. The resulting PCM byte array that gets returned from the C layer to the Java Layer gets fed to Android's AudioTrack API.
edit 2: I got my nexus 7 (Android 4.1 OS) today. Loaded the same APP onto the device. Didn't have any of these problems at all.
it is highly possible about sample rate. one of your devices might be supporting the sample rate u used while the other could not. Please check it. I had the same issue, it was about sample rate. use 44.1kHz (44100) and try again please.