Splitting an AAC stream, priming / padding samples problems (gapless playback)

Splitting an AAC stream, priming / padding samples problems (gapless playback) - android

I am encoding raw audio to AAC with the MediaCodec API of Android. The problem: I need to send to a server the AAC stream in chunks of one second. So I need to split the stream. Right now, since an AAC frame is 1024 samples, I take round(SAMPLE_RATE/1024) AAC frames for each chunk. However, because of "priming samples" this simple cutting of the AAC stream does not work.
More details follow. After sending a chunk to the server, a client receives it in the web browser Chrome and using Web Audio API plays all received chunks. The playback is done in such a way to be gapless: a large audiobuffer is initially allocated, the received chunks are decoded and copied in the audiobuffer, the audiobuffer is played.
Now, this does not work with AAC (it works with Ogg/Vorbis though). With AAC I have artifacts in the generated sound. At end of each second the start of the next second is zero, then, gradually, the waveform grows until it has a normal size. This lasts for 10, 20 milliseconds.
I believe the problem is caused by missing "priming samples". Maybe the Web Audio API is expecting "priming samples" at the start of each AAC chunk, it does not find them and thus modifies the actual audio.
The question is: how can I split the original AAC stream and send "good" AAC chunks of one second?
From what I have understood, I should include at the start of each chunk the previous two frames (last two frames of the previous chunk). However, this number should vary and there is not much documentation. Some expert advice is appreciated.

I am using the following method. I am not an expert of AAC so I may be missing something, but experimentally it is working.
Assuming that the Chrome decoder is expecting priming samples at the start of each chunk I do the following: before sending a chunk to the server, I add at its beginning the last 4 AAC frames of the previous chunk (if it is the first chunk I do not do this). Client-side, I retrieve a chunk, I decode it and the remove the first 4*1024 samples (1024 = samples in one AAC frame).
This is working.

Related

Decoding only some PCM bytes at a time from an mp3 file

How do I decode something on the order of a 1000 bytes of PCM audio from an mp3 file, without decoding the whole thing?
I need to mix together four to six tracks, to one, so that they're played simultaneously on an AudioTrack in the Android app.
This can be done if I can get a stream of PCM samples, and so simple add the decoded tracks together (and maybe adjust for clipping and volume), and then write them to an AudioTrack buffer.
That part is simple.
But how do I decode the individual mp3 files, to inputstreams I can get byte arrays from? I've found something called JLayer, but its not quite clear to me how to do this.
I'd rather avoid doing it in C++ (I'm a bit rusty, and my team doesn't like it), though if that's needed I can do it. Though I'd need a short example of how get say 240 decoded bytes from a file via mpg123, or other such libraries.
Any help is appreciated.

The smallest you can do is 576 samples, which is the smallest MP3 frame size. However, most MP3 streams use the bit reservoir meaning you likely have to decode frames around the frame you want to decode as well.
Complicating things further, bare MP3 streams don't have any internal timestamping, so if you want to drop accurately in the middle of a file, you have to decode up until that point. (MP3 frame headers don't contain byte lengths, so you can't just skim frame headers accurately.) You can try to needle-drop into the middle of the file based on byte length, but this isn't an accurate way of seeking and can be off by several seconds, even for CBR. For VBR, it's all over the place.
It sounds like all you need to do is have a stream decoder, so that the decoding happens as playback is occurring. I'm no Android developer, but it seems you can just use AudioTrack from the framework, in streaming mode. https://developer.android.com/reference/android/media/AudioTrack.html And then the MediaCodec to actually do the decoding. https://developer.android.com/reference/android/media/MediaCodec.html Android devices support MP3, so you don't need to do anything else.

Trim aac-mp4 audio in android (mediaCodec/extractor)

I want to trim an existing aac-mp4 audio file. For the first time I want to "trim" 0 bytes, basically just to copy the file using MediaCodec/MediaExtractor.
Questions:
The header is fixed size and I can just copy it from the old file? Or it has some infos about the track duration and I need to update it? If it has fixed size which is that (in order to know how many bytes should I copy from the old file)?
Should I only use the extractor's getSampleData(ByteBuffer, offset) and advance() or I should also use the MediaCodec and extract the samples(decode) and then encode them again with an encoder - and write the encoded values?

If you use MediaExtractor, you probably aren't going to read the raw file yourself, so I don't see what header you're proposing to copy. This is probably easiest to do with MediaExtractor + MediaMuxer; just copy the MediaFormat and the packets you get from MediaExtractor to MediaMuxer.
This depends on how you want to do the trimming. It's absolutely simplest to not involve MediaCodec at all, but just copy packets from MediaExtractor to MediaMuxer, and skip the packets at the start that you want to omit (or use seekTo() for seeking to the right start position).
But keep in mind that audio frames have a certain length; for AAC-LC it's usually 1024 samples, which for 48 kHz audio is 21 milliseconds. So if you only copy individual packets, you can't get any closer trimming granularity than 21 milliseconds, for 48 kHz. This probably is fine for most cases, but if the audio has a lower sample rate, say 8 kHZ, the granularity ends up as high as 128 ms.
If you want to trim to a more exact position than the individual packets allow you, you need to decode using MediaCodec, skip the right amount of samples, repackage output frames from the decoder into new full frames for the encoder, and encode this.

Android MediaCodec How to Frame Accurately Trim Audio

I am building the capability to frame-accurately trim video files on Android. Transcoding is implemented with MediaExtractor, MediaCodec, and MediaMuxer. I need help truncating arbitrary Audio frames in order to match their Video frame counterparts.
I believe the Audio frames must be trimmed in the Decoder output buffer, which is the logical place in which uncompressed audio data is available for editing.
For in/out trims I am calculating the necessary offset and size adjustments to the raw Audio buffer to shoehorn it into the available endcap frames, and I am submitting the data with the following code:
MediaCodec.BufferInfo info = pendingAudioDecoderOutputBufferInfos.poll();
...
ByteBuffer decoderOutputBuffer = audioDecoder.getOutputBuffer(decoderIndex).duplicate();
decoderOutputBuffer.position(info.offset);
decoderOutputBuffer.limit(info.offset + info.size);
encoderInputBuffer.position(0);
encoderInputBuffer.put(decoderOutputBuffer);
info.flags |= MediaCodec.BUFFER_FLAG_END_OF_STREAM;
audioEncoder.queueInputBuffer(encoderIndex, info.offset, info.size, presentationTime, info.flags);
audioDecoder.releaseOutputBuffer(decoderIndex, false);
My problem is that the data adjustments appear to affect only the data copied onto the output audio buffer, but not to shorten the audio frame that gets written into the MediaMuxer. The output video either ends up with several milli-seconds of missing audio at the end of the clip, or if I write too much data the audio frame gets dropped completely from the end of the clip.
How to properly trim an Audio Frame?

There's a few things at play here:
As Dave pointed out, you should pass 0 instead of info.offset to audioEncoder.queueInputBuffer - you already took the offset of the decoder output buffer into account when you set the buffer position with decoderOutputBuffer.position(info.offset);. But perhaps you update it somehow already.
I'm not sure if MediaCodec audio encoders allow you to pass audio data in arbitrary sized chunks, or it you need to send it exactly full audio frames at a time. I think it might accept it though - then you're fine. If not, you need to buffer the audio up yourself and pass it to the encoder once you have a full frame (in case you trimmed out some at the start)
Keep in mind that audio also is frame based (for AAC, it's 1024 samples frames unless you use the low delay variants or HE-AAC), so for 44 kHz, you can have audio duration only with a 23 ms granularity. If you want your audio to end precisely after the right amount of samples, you need to use container signaling to indicate this. I'm not sure if the MediaCodec audio encoder flushes whatever half frame you have at the end, or if you manually need to pass it extra zeros at the end in order to get the last few samples, if you aren't aligned to the frame size. It might not be needed though.
Encoding AAC audio does introduce some delay into the audio stream; after decoding, you'll have a number of priming samples at the start of the decoded stream (the exact number of these depends on the encoder - for the software encoder in Android for AAC-LC, it's probably 2048 samples, but it might also vary). For the case of 2048 samples, it exactly lines up with 2 frames of audio, but it can also be something that isn't a whole number of frames. I don't think MediaCodec signals the exact amount of delay either. If you drop the 2 first output packets from the encoder (in case the delay is 2048 samples), you'll avoid the extra delay, but the actual decoded audio for the first few frames won't be exactly right. (The priming packets are necessary to be able to properly represent whatever samples your stream starts with, otherwise it will more or less converge towards your intended audio within 2048 samples.)

Decoding Raw H264 stream in android?

I have a project where I have been asked to display a video stream in android, the stream is raw H.264 and I am connecting to a server and will receive a byte stream from the server.
Basically I'm wondering is there a way to send raw bytes to a decoder in android and display it on a surface?
I have been successful in decoding H264 wrapped in an mp4 container using the new MediaCodec and MediaExtractor API in android 4.1, unfortunately I have not found a way to decode a raw H264 file or stream using these API's.
I understand that one way is to compile and use FFmpeg but I'd rather use a built in method that can use HW acceleration. I also understand RTSP streaming is supported in android but this is not an option. Android version is not an issue.

I can't provide any code for this unfortunately, but I'll do my best to explain it based on how I got it to work.
So here is my overview of how I got raw H.264 encoded video to work using the MediaCodec class.
Using the link above there is an example of getting the decoder setup and how to use it, you will need to set it up for decoding H264 AVC.
The format of H.264 is that it’s made up of NAL Units, each starting with a start prefix of three bytes with the values 0x00, 0x00, 0x01 and each unit has a different type depending on the value of the 4th byte right after these 3 starting bytes. One NAL Unit IS NOT one frame in the video, each frame is made up of a number of NAL Units.
Basically I wrote a method that finds each individual unit and passes it to the decoder (one NAL Unit being the starting prefix and any bytes there after up until the next starting prefix).
Now if you have the decoder setup for decoding H.264 AVC and have an InputBuffer from the decoder then you are ready to go. You need to fill this InputBuffer with a NAL Unit and pass it back to the decoder and continue doing this for the length of the stream.
But, to make this work I had to pass the decoder a SPS (Sequence Parameter Set) NAL Unit first. This unit has a byte value of 0x67 after the starting prefix (the 4th byte), on some devices the decoder would crash unless it received this Unit first.
Basically until you find this unit, ignore all other NAL Units and keep parsing the stream until you get this unit, then you can pass all other units to the decoder.
Some devices didn't need the SPS first and some did, but you are better of passing it in first.
Now if you had a surface that you passed to the decoder when you configured it then once it gets enough NAL units for a frame it should display it on the surface.

You can download the raw H.264 from the server, then offer it via a local HTTP server running on the phone and then let VLC for Android do playback from that HTTP server. You should use VLC's http/h264:// scheme to force the demuxer to raw H.264 (if you don't force the demuxer VLC may not be able to recognize the stream, even when the MIME type returned by the HTTP server is set correctly). See
https://github.com/rauljim/tgs-android/blob/integrate_record/src/com/tudelft/triblerdroid/first/VideoPlayerActivity.java#L211
for an example on how to create an Intent that will launch VLC.
Note: raw H.264 apparently has no timing info, so VLC will play as fast as possible.
First embedding it in MPEGTS will be better. Haven't found a Android lib that will do that yet.

Here are the resources I've found helpful in a similar project:
This video has been super insightful in understanding how MediaCodec handles raw h.264 streams on a high level.
This thread goes into a bit more detail as to handling the SPS/PPS NALUs specifically. As was mentioned above, you need to separate individual NAL Units using the start prefix, and then hand the remaining data to the MediaCodec.
This repo (libstreaming) is a great example of decoding an H264 stream in Android using RTSP/RTP for transmission.

Android 2.3 AudioTrack issue(obtainbuffer timed out -- is cpu pegged?)

I'm writing an audio streaming app that buffers AAC file chunks, decodes those chunks to PCM byte arrays, and writes the PCM audio data to AudioTrack. Occasionally, I get the following error when I try to either skip to a different song, call AudioTrack.pause(), or AudioTrack.flush():
obtainbuffer timed out -- is cpu pegged?
And then what happens is that a split second of audio continues to play. I've tried reading a set of AAC files from the sdcard and got the same result. The behavior I'm expecting is that the audio stops immediately. Does anyone know why this happens? I wonder if its an Audio latency issue with Android 2.3.
edit: The AAC audio contains an ADTS Header. The header + audio payload constitute what I'm calling ADTSFrame. These are fed to the decoder one frame at a time. The resulting PCM byte array that gets returned from the C layer to the Java Layer gets fed to Android's AudioTrack API.
edit 2: I got my nexus 7 (Android 4.1 OS) today. Loaded the same APP onto the device. Didn't have any of these problems at all.

it is highly possible about sample rate. one of your devices might be supporting the sample rate u used while the other could not. Please check it. I had the same issue, it was about sample rate. use 44.1kHz (44100) and try again please.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.