Drawing an audio waveform from a decoded sound file in Android - android

I'm trying to draw a custom waveform view from a decoded audio file. The problem is that when I use the MediaCodec class to decode an m4a sound file to pcm, the decoding process may take a long time.
I followed this CTS test to get the decoded array.
Is there any way to minimize the decoding process or to get relevant information about the audio file frames without decoding the whole file each time.

You don't need to wait for the whole file to be decoded, you can use the decoded data from each frame once it is available.
In the CTS test you linked, have a look at lines 223-234 - here a buffer of decoded data has been received from the decoder, which is copied into an output buffer. If you change this to do the processing of the data right there, you can do your handling of the decoded data immediately without waiting for the whole file. (This is how playback of a file would be done as well.)
Keep in mind that you can't really assume the size of the output frames, it depends on the frame size of the codec used in the file. (For the most common versions of AAC, it is 1024 samples.) So depending on how you want to process your output data, you might want to buffer it up into slightly larger chunks if necessary.

Related

Android 9 AAC decoder outputs zero samples with ffmpeg-encoded files

I have some automated tests that try to decode a few m4a files to PCM data using Android's MediaDecoder and MediaExtractor. The files are generated with various encoders: fdk-aac, ffmpeg (with fdk or the default aac encoder), iOS.
On Android 9 the test fails for the clips created with ffmpeg, which results in empty PCM files. The same clips are decoded fine on older versions of Android.
I double checked my code and the decoding process goes as expected:
I extract compressed data using MediaExtractor
Enqueue it to the codec
Dequeue the output buffer from the codec.
The issue is that by the time the last available input buffer is enqueued and the output buffer with MediaCodec.BUFFER_FLAG_END_OF_STREAM is dequeued, all output buffers are empty!
Then I noticed that the MediaFormat info extracted from the audio file with MediaExtractor.getTrackFormat(int track) contains an undocumented "encoder-delay" key.
For android 8 and lower, that key is only present for m4a clips encoded with the iTunSMPB tag info. Here's a summary of the values I get for my test files:
iOS-encoded file: 2112 frames
fdkaac with iTunSMPB tag: 2048 frames
fdkaac with ISO delay info: key not present
ffmpeg: key not present
ffmpeg (fdk): key not present
On Android 9, instead, I get the following results:
iOS-encoded file: 2112 frames
fdkaac with iTunSMPB tag: 2048 frames
fdkaac with ISO delay info: 2048 frames
ffmpeg: 45158 frames
ffmpeg (fdk): 90317 frames
It looks like something has changed and MediaExtractor is now able to retrieve the encoder delay for all the files under test. This is good in theory, since the files with no "encoder-delay" info do show a delay in the decoded PCM data (this was a known issue).
But... while the value for the "fdkaac with ISO delay info" case is correct and leads to a valid PCM file with no initial padding (finally!), the values for the ffmpeg-generated files look huge and likely wrong!
I know the real encoder delay values are 1024 for the ffmpeg case, and 2048 for the ffmpeg (fdk) case, and I think the high value for key in the extracted format is the reason why the file is empty.
In fact, if I try setting the "encoder-delay" key to 0 in the format just before passing it to MediaCodec.configure(...) I get the correct uncompressed data with the expected delay.
My guess at this point is that the MediaExtractor encoder delay value retrieval has some bug, but maybe there's something I am overlooking.
Since ffmpeg is quite popular, it's quite likely that many of my app users will try importing files generated with it, and at this point I can't see a foolproof solution to the issue.
Does anyone have a suggestion / workaround?
I opened an issue on the android issue tracker:
https://issuetracker.google.com/issues/118398811
And for now I just implemented a workaround: when the "encoder-delay" value is present in the MediaFormat object and it's an impossibly high value, I just set it to zero. Something like:
if (format.containsKey("encoder-delay") && format.getInteger("encoder-delay") > THRESHOLD) {
format.setInteger("encoder-delay", 0);
}
NB: This means the initial gap will not be trimmed away, but for M4a files that don't have such info this is already the case on pre-android-9 devices.

Decoding only some PCM bytes at a time from an mp3 file

How do I decode something on the order of a 1000 bytes of PCM audio from an mp3 file, without decoding the whole thing?
I need to mix together four to six tracks, to one, so that they're played simultaneously on an AudioTrack in the Android app.
This can be done if I can get a stream of PCM samples, and so simple add the decoded tracks together (and maybe adjust for clipping and volume), and then write them to an AudioTrack buffer.
That part is simple.
But how do I decode the individual mp3 files, to inputstreams I can get byte arrays from? I've found something called JLayer, but its not quite clear to me how to do this.
I'd rather avoid doing it in C++ (I'm a bit rusty, and my team doesn't like it), though if that's needed I can do it. Though I'd need a short example of how get say 240 decoded bytes from a file via mpg123, or other such libraries.
Any help is appreciated.
The smallest you can do is 576 samples, which is the smallest MP3 frame size. However, most MP3 streams use the bit reservoir meaning you likely have to decode frames around the frame you want to decode as well.
Complicating things further, bare MP3 streams don't have any internal timestamping, so if you want to drop accurately in the middle of a file, you have to decode up until that point. (MP3 frame headers don't contain byte lengths, so you can't just skim frame headers accurately.) You can try to needle-drop into the middle of the file based on byte length, but this isn't an accurate way of seeking and can be off by several seconds, even for CBR. For VBR, it's all over the place.
It sounds like all you need to do is have a stream decoder, so that the decoding happens as playback is occurring. I'm no Android developer, but it seems you can just use AudioTrack from the framework, in streaming mode. https://developer.android.com/reference/android/media/AudioTrack.html And then the MediaCodec to actually do the decoding. https://developer.android.com/reference/android/media/MediaCodec.html Android devices support MP3, so you don't need to do anything else.

Trim aac-mp4 audio in android (mediaCodec/extractor)

I want to trim an existing aac-mp4 audio file. For the first time I want to "trim" 0 bytes, basically just to copy the file using MediaCodec/MediaExtractor.
Questions:
The header is fixed size and I can just copy it from the old file? Or it has some infos about the track duration and I need to update it? If it has fixed size which is that (in order to know how many bytes should I copy from the old file)?
Should I only use the extractor's getSampleData(ByteBuffer, offset) and advance() or I should also use the MediaCodec and extract the samples(decode) and then encode them again with an encoder - and write the encoded values?
If you use MediaExtractor, you probably aren't going to read the raw file yourself, so I don't see what header you're proposing to copy. This is probably easiest to do with MediaExtractor + MediaMuxer; just copy the MediaFormat and the packets you get from MediaExtractor to MediaMuxer.
This depends on how you want to do the trimming. It's absolutely simplest to not involve MediaCodec at all, but just copy packets from MediaExtractor to MediaMuxer, and skip the packets at the start that you want to omit (or use seekTo() for seeking to the right start position).
But keep in mind that audio frames have a certain length; for AAC-LC it's usually 1024 samples, which for 48 kHz audio is 21 milliseconds. So if you only copy individual packets, you can't get any closer trimming granularity than 21 milliseconds, for 48 kHz. This probably is fine for most cases, but if the audio has a lower sample rate, say 8 kHZ, the granularity ends up as high as 128 ms.
If you want to trim to a more exact position than the individual packets allow you, you need to decode using MediaCodec, skip the right amount of samples, repackage output frames from the decoder into new full frames for the encoder, and encode this.

Transcode video to lower bitrate and stream

I have a working app that streams video to Chromecast(using nannoHttpd) and everything is working fine. Now my problem is: videos recorded using new devices are too large in size to stream, so I want to re-encode videos to some lower bitrate.
I tried ffmpeg but the results are not satisfactory and it will increase the apk size by 14 MB.
Now I am trying the MediaCodec api. It is faster than ffmpeg, but it takes the input file and writes it to the output file and I want to re-encode byte data that is to be served by nannohttpd.
Now a solution comes to my mind, that is to transcode the video and stream the output file but its has two drawbacks;
What if the file is too large and the user doesn't see the whole video? Much of CPU, battery resource is wasted.
What if the user fast forwards a long video to a time which is not re-encoded yet?
1 MediaCodec just do one thing decode encode! and you will get raw bytes of new encoded data. So it is up to the programmer to choose to either dump that into a container (.mp4 file) using a muxer. So no need here to rewrite everything back into a file.
2 Seek to the proper chunk of data and restart MediaCodec.

Android MediaCodec How to Frame Accurately Trim Audio

I am building the capability to frame-accurately trim video files on Android. Transcoding is implemented with MediaExtractor, MediaCodec, and MediaMuxer. I need help truncating arbitrary Audio frames in order to match their Video frame counterparts.
I believe the Audio frames must be trimmed in the Decoder output buffer, which is the logical place in which uncompressed audio data is available for editing.
For in/out trims I am calculating the necessary offset and size adjustments to the raw Audio buffer to shoehorn it into the available endcap frames, and I am submitting the data with the following code:
MediaCodec.BufferInfo info = pendingAudioDecoderOutputBufferInfos.poll();
...
ByteBuffer decoderOutputBuffer = audioDecoder.getOutputBuffer(decoderIndex).duplicate();
decoderOutputBuffer.position(info.offset);
decoderOutputBuffer.limit(info.offset + info.size);
encoderInputBuffer.position(0);
encoderInputBuffer.put(decoderOutputBuffer);
info.flags |= MediaCodec.BUFFER_FLAG_END_OF_STREAM;
audioEncoder.queueInputBuffer(encoderIndex, info.offset, info.size, presentationTime, info.flags);
audioDecoder.releaseOutputBuffer(decoderIndex, false);
My problem is that the data adjustments appear to affect only the data copied onto the output audio buffer, but not to shorten the audio frame that gets written into the MediaMuxer. The output video either ends up with several milli-seconds of missing audio at the end of the clip, or if I write too much data the audio frame gets dropped completely from the end of the clip.
How to properly trim an Audio Frame?
There's a few things at play here:
As Dave pointed out, you should pass 0 instead of info.offset to audioEncoder.queueInputBuffer - you already took the offset of the decoder output buffer into account when you set the buffer position with decoderOutputBuffer.position(info.offset);. But perhaps you update it somehow already.
I'm not sure if MediaCodec audio encoders allow you to pass audio data in arbitrary sized chunks, or it you need to send it exactly full audio frames at a time. I think it might accept it though - then you're fine. If not, you need to buffer the audio up yourself and pass it to the encoder once you have a full frame (in case you trimmed out some at the start)
Keep in mind that audio also is frame based (for AAC, it's 1024 samples frames unless you use the low delay variants or HE-AAC), so for 44 kHz, you can have audio duration only with a 23 ms granularity. If you want your audio to end precisely after the right amount of samples, you need to use container signaling to indicate this. I'm not sure if the MediaCodec audio encoder flushes whatever half frame you have at the end, or if you manually need to pass it extra zeros at the end in order to get the last few samples, if you aren't aligned to the frame size. It might not be needed though.
Encoding AAC audio does introduce some delay into the audio stream; after decoding, you'll have a number of priming samples at the start of the decoded stream (the exact number of these depends on the encoder - for the software encoder in Android for AAC-LC, it's probably 2048 samples, but it might also vary). For the case of 2048 samples, it exactly lines up with 2 frames of audio, but it can also be something that isn't a whole number of frames. I don't think MediaCodec signals the exact amount of delay either. If you drop the 2 first output packets from the encoder (in case the delay is 2048 samples), you'll avoid the extra delay, but the actual decoded audio for the first few frames won't be exactly right. (The priming packets are necessary to be able to properly represent whatever samples your stream starts with, otherwise it will more or less converge towards your intended audio within 2048 samples.)

Categories

Resources