Android - get duration of AMR audio file programmatically

Android - get duration of AMR audio file programmatically - android

How do I get duration of an AMR file?
mRecorder.setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP);
mRecorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
I want to get the duration of the file after the recording is stopped WITHOUT creating any MediaPlayer and get the duration from it. For a regular Wav file I simply do this:
fileLength / byteRate
but for AMR I didn't know the byteRate and I'm not sure this will be ok though since WAV is raw PCM data(uncompressed) and AMR is compressed.

Maybe the 3GP container contains information about the content length? The 3GPP file format spec is available if you want to read it.
For a raw .amr file you'd have to traverse all the frames to find the length of the audio, since each frame can be encoded with a different bitrate.
The process for doing this would be:
Skip the first 6 bytes of the file (the AMR signature).
The rest of the file will be audio frames, which each starts with a one-byte header. Read that byte and look at bits 3..6 (the codec mode). For AMR-NB the valid codec modes are 0..7, which you can map to the size of the frame in bytes using the table below.
Once you know the size of the current frame, skip past it and parse the next frame. Repeat until you reach the end of the file.
If you've counted the number of frames in the file you can multiply that number by 20 to get the length of the audio in milliseconds.
Frame size table:
Codec mode Frame size
----------------------
0 13
1 14
2 16
3 18
4 20
5 21
6 27
7 32
(Source)

Java Code:
https://blog.csdn.net/fjh658/article/details/12869073
C# Code (From MemoryStream):
private double getAmrDuration(MemoryStream originalAudio)
{
double duration = -1;
int[] packedSize = new int[] { 12, 13, 15, 17, 19, 20, 26, 31, 5, 0, 0, 0, 0, 0, 0, 0 };
long length = originalAudio.Length;
int pos = 6;
int frameCount = 0;
int packedPos = -1;
byte[] datas = new byte[1];
while ((pos <= length))
{
originalAudio.Seek(pos, SeekOrigin.Begin);
if ((originalAudio.Read(datas, 0, 1) != 1))
{
duration = length > 0 ? ((length - 6) / 650) : 0;
break;
}
packedPos = (datas[0] >> 3) & 0x0F;
pos += packedSize[packedPos] + 1;
frameCount++;
}
/// //////////////////////////////////////////////////
duration = (duration + (frameCount * 20));
// 'p*20
return duration/1000;
}

Related

AMR decoding from RTP

I'm receving some RTP stream, which I know only its AMR-WB octet-aligned 100 ms per packet. Some 3rd party can receive same stream and its "hearable", so its proper. Now I'm receiving this data and trying to decode, without luck...
init:
val sampleRate = 16000
val mc = MediaCodec.createDecoderByType(MediaFormat.MIMETYPE_AUDIO_AMR_WB)
val mf = MediaFormat.createAudioFormat(MediaFormat.MIMETYPE_AUDIO_AMR_WB, sampleRate, 1)
mf.setInteger(MediaFormat.KEY_SAMPLE_RATE, sampleRate) // is it needed?
mc.configure(mf, null, null, 0)
mc.start()
decode each packet separatelly:
private fun decode(decoder: MediaCodec, mediaFormat: MediaFormat, rtpPacket: RtpPacket): ByteArray {
var outputBuffer: ByteBuffer
var outputBufferIndex: Int
val inputBuffers: Array<ByteBuffer> = decoder.inputBuffers
var outputBuffers: Array<ByteBuffer> = decoder.outputBuffers
// input
val inputBufferIndex = decoder.dequeueInputBuffer(-1L)
if (inputBufferIndex >= 0) {
val inputBuffer = inputBuffers[inputBufferIndex]
inputBuffer.clear()
inputBuffer.put(rtpPacket.payload)
// native ACodec/MediaCodec crash in here (log below)
decoder.queueInputBuffer(inputBufferIndex, 0, rtpPacket.payload.size, System.nanoTime()/1000, 0)
}
// output
val bufferInfo: MediaCodec.BufferInfo = MediaCodec.BufferInfo()
outputBufferIndex = decoder.dequeueOutputBuffer(bufferInfo, -1L)
Timber.i("outputBufferIndex: ${outputBufferIndex}")
when (outputBufferIndex) {
MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED -> {
Timber.d("INFO_OUTPUT_BUFFERS_CHANGED")
outputBuffers = decoder.outputBuffers
}
MediaCodec.INFO_OUTPUT_FORMAT_CHANGED -> {
val format: MediaFormat = decoder.outputFormat
Timber.d("INFO_OUTPUT_FORMAT_CHANGED $format")
audioTrack.playbackRate = format.getInteger(MediaFormat.KEY_SAMPLE_RATE)
}
MediaCodec.INFO_TRY_AGAIN_LATER -> Timber.d("INFO_TRY_AGAIN_LATER")
else -> {
val outBuffer = outputBuffers[outputBufferIndex]
outBuffer.position(bufferInfo.offset);
outBuffer.limit(bufferInfo.offset + bufferInfo.size);
val chunk = ByteArray(bufferInfo.size)
outBuffer[chunk]
outBuffer.clear()
audioTrack.write(
chunk,
bufferInfo.offset,
bufferInfo.offset + bufferInfo.size
)
decoder.releaseOutputBuffer(outputBufferIndex, false)
Timber.v("chunk size:${chunk.size}")
return chunk
}
}
// All decoded frames have been rendered, we can stop playing now
if (bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM != 0) {
Timber.d("BUFFER_FLAG_END_OF_STREAM")
}
return ByteArray(0)
}
sadly I'm getting on some (clean) Android 10
E/ACodec: [OMX.google.amrwb.decoder] ERROR(0x80001001)
E/ACodec: signalError(omxError 0x80001001, internalError -2147483648)
E/MediaCodec: Codec reported err 0x80001001, actionCode 0, while in state 6
E/RtpReceiver: java.lang.IllegalStateException
at android.media.MediaCodec.native_dequeueInputBuffer(Native Method)
at android.media.MediaCodec.dequeueInputBuffer(MediaCodec.java:2727)
I should probably pack up dequeueOutputBuffer+when in some while(true), but then I'm getting similar logs as above, but with 0x8000100b
on another device - Android 12 on Pixel - Im' getting similar
D/BufferPoolAccessor2.0: bufferpool2 0xb400007067901978 : 4(32768 size) total buffers - 4(32768 size) used buffers - 0/5 (recycle/alloc) - 0/0 (fetch/transfer)
D/CCodecBufferChannel: [c2.android.amrwb.decoder#471] work failed to complete: 14
E/MediaCodec: Codec reported err 0xe, actionCode 0, while in state 6/STARTED
E/RtpReceiver: java.lang.IllegalStateException
at android.media.MediaCodec.native_dequeueOutputBuffer(Native Method)
at android.media.MediaCodec.dequeueOutputBuffer(MediaCodec.java:3535)
I'm obviusly cutting off RTP header (payload used above), but nothing else done. Should I also recognize payload/AMR header? Inside of it there is e.g. FT - frame type index - which is determining bitrate, so decoder should got this param before start() call right? Or can I pass whole payload, with CMR, ToC with FT, Q etc. straight to decoder, but I've inited it not so well? Or my decode method is somehow wrongly implemented? In short: how to properly decode (and play) AMR-WB got from RTP stream?
edit: worth mentioning that payload starts with F0 84 84 84 84 04 for every packet

turned out that I have to "unpack" also AMR header and "re-pack" data into AMR frames. first bytes of payload posted in question are ToC list.
F0 is CMR and may be ommited, starting pos 1 we can calculate ToC size - number of consecutive bytes with 1 on msb (or as int >= 128 or as hex first char >= 8) + 1. so if payload[1] starts with 0(hex) then ToC size is 1 and payload is one frame and we can pass it to decoder (don't forget to skip first CMR byte!). in my sample ToC size is 5, so I have to divide rest of payload and interlace with ToC bytes, where "frame" = one byte for ToC + frame-payload.
my whole payload has 91 bytes
-1 for cmr
-5 ToCs
gives 85 bytes for 5 frames (toc size)
which gives 5 frames with 1 (toc byte) + 17 (85/5 amrpayload) size
we can just divide rest of payload, but its worth ensuring that size by checking bitrate mode passed in every ToC byte for every frame and comparing with fixed frame sizes per bitrate (check out index in below code)
fun decode(rtpPacket: RtpPacket): ByteArray {
var outData = ByteArray(0)
var position = 0
position++ // skip payload header, ignore CMR - rtpPacket.payload[0]
var tocLen = 0
while (getBit(rtpPacket.payload[position].toInt(), 7)) {
//first byte has 1 at msb
position++
tocLen++
}
if (tocLen > 0) { // if there is any toc detected
// first byte which has NOT 1 at msb also belongs to ToC
position++
tocLen++
}
//Timber.i("decoded tocListSize: $tocLen")
if (tocLen > 0) {
// starting from 1 because this is first ToC byte position after ommiting CMR
for (i in 1 until (tocLen + 1)) {
val index = rtpPacket.payload[i].toInt() shr 3 and 0xf
if (index >= 9) {
Timber.w("Bad AMR ToC, index=$index")
break
}
val amr_frame_sizes = intArrayOf(17, 23, 32, 36, 40, 46, 50, 58, 60, 5)
val frameSize = amr_frame_sizes[index]
//Timber.i("decoded i:$i index:$index frameSize:frameSize position:$position")
if (position + frameSize > rtpPacket.payloadLength) {
Timber.w("Truncated AMR frame")
break
}
val frame = ByteArray(1 + frameSize)
frame[0] = rtpPacket.payload[i]
System.arraycopy(rtpPacket.payload, position, frame, 1, frameSize)
outData = outData.plus(decode(frame))
position += frameSize
}
} else { // single frame case, NOT TESTED!!
outData = ByteArray(rtpPacket.payloadLength - 1) // without CMR
System.arraycopy(rtpPacket.payload, 1, outData, 0, outData.size)
outData = decode(outData)
}
return outData
}
returned data may be used instead of rtpPacket.payload in decode method posted in question (well, code of decoder itself may be a bit improved, as last lines are unreachable, but even in this form is working)
amr_frame_sizes is const array for my case, in which 100 ms of AMR is divided into 5 frames. these sizes are adjusted to such case - 20ms frame - and position according to index ("changeable" bitrate)

Android Creating Video from Screen Scraping: Why is output Image wonky?

Update #6 Discovered I was accessing RGB values improperly. I assumed I was accessing data from an Int[], but was instead accessing byte information from a Byte[]. Changed to accessing from Int[] and get the following image:
Update #5 Adding code used to get RGBA ByteBuffer for reference
private void screenScrape() {
Log.d(TAG, "In screenScrape");
//read pixels from frame buffer into PBO (GL_PIXEL_PACK_BUFFER)
mSurface.queueEvent(new Runnable() {
#Override
public void run() {
Log.d(TAG, "In Screen Scrape 1");
//generate and bind buffer ID
GLES30.glGenBuffers(1, pboIds);
checkGlError("Gen Buffers");
GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pboIds.get(0));
checkGlError("Bind Buffers");
//creates and initializes data store for PBO. Any pre-existing data store is deleted
GLES30.glBufferData(GLES30.GL_PIXEL_PACK_BUFFER, (mWidth * mHeight * 4), null, GLES30.GL_STATIC_READ);
checkGlError("Buffer Data");
//glReadPixelsPBO(0,0,w,h,GLES30.GL_RGB,GLES30.GL_UNSIGNED_SHORT_5_6_5,0);
glReadPixelsPBO(0, 0, mWidth, mHeight, GLES30.GL_RGBA, GLES30.GL_UNSIGNED_BYTE, 0);
checkGlError("Read Pixels");
//GLES30.glReadPixels(0,0,w,h,GLES30.GL_RGBA,GLES30.GL_UNSIGNED_BYTE,intBuffer);
}
});
//map PBO data into client address space
mSurface.queueEvent(new Runnable() {
#Override
public void run() {
Log.d(TAG, "In Screen Scrape 2");
//read pixels from PBO into a byte buffer for processing. Unmap buffer for use in next pass
mapBuffer = ((ByteBuffer) GLES30.glMapBufferRange(GLES30.GL_PIXEL_PACK_BUFFER, 0, 4 * mWidth * mHeight, GLES30.GL_MAP_READ_BIT)).order(ByteOrder.nativeOrder());
checkGlError("Map Buffer");
GLES30.glUnmapBuffer(GLES30.GL_PIXEL_PACK_BUFFER);
checkGlError("Unmap Buffer");
isByteBufferEmpty(mapBuffer, "MAP BUFFER");
convertColorSpaceByteArray(mapBuffer);
mapBuffer.clear();
}
});
}
Update #4 For reference, here is the original image to compare against.
Update #3 This is the output image after interleaving all U/V data into a single array and passing it to the Image object at inputImagePlanes[1]; inputImagePlanes[2]; is unused;
The next image is the same interleaved UV data, but we load this into inputImagePlanes[2]; instead of inputImagePlanes[1];
Update #2 This is the output image after padding the U/V buffers with a zero in between each byte of 'real' data. uArray[uvByteIndex] = (byte) 0;
Update #1 As suggested by a comment, here are the row and pixel strides I get from calling getPixelStride and getRowStride
Y Plane Pixel Stride = 1, Row Stride = 960
U Plane Pixel Stride = 2, Row Stride = 960
V Plane Pixel Stride = 2, Row Stride = 960
The goal of my application is to read pixels out from the screen, compress them, and then send that h264 stream over WiFi to be played be a receiver.
Currently I'm using the MediaMuxer class to convert the raw h264 stream to an MP4, and then save it to file. However the end result video is messed up and I can't figure out why. Lets walk through some of processing and see if we can find anything that jumps out.
Step 1 Set up the encoder. I'm currently taking screen images once every 2 seconds, and using "video/avc" for MIME_TYPE
//create codec for compression
try {
mCodec = MediaCodec.createEncoderByType(MIME_TYPE);
} catch (IOException e) {
Log.d(TAG, "FAILED: Initializing Media Codec");
}
//set up format for codec
MediaFormat mFormat = MediaFormat.createVideoFormat(MIME_TYPE, mWidth, mHeight);
mFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420Flexible);
mFormat.setInteger(MediaFormat.KEY_BIT_RATE, 16000000);
mFormat.setInteger(MediaFormat.KEY_FRAME_RATE, 1/2);
mFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5);
Step 2 Read pixels out from screen. This is done using openGL ES, and the pixels are read out in RGBA format. (I've confirmed this part to be working)
Step 3 Convert the RGBA pixels to YUV420 (IYUV) format. This is done using the following method. Note that I have 2 methods for encoding called at the end of this method.
private void convertColorSpaceByteArray(ByteBuffer rgbBuffer) {
long startTime = System.currentTimeMillis();
Log.d(TAG, "In convertColorspace");
final int frameSize = mWidth * mHeight;
final int chromaSize = frameSize / 4;
byte[] rgbByteArray = new byte[rgbBuffer.remaining()];
rgbBuffer.get(rgbByteArray);
byte[] yuvByteArray = new byte[inputBufferSize];
Log.d(TAG, "Input Buffer size = " + inputBufferSize);
byte[] yArray = new byte[frameSize];
byte[] uArray = new byte[(frameSize / 4)];
byte[] vArray = new byte[(frameSize / 4)];
isByteBufferEmpty(rgbBuffer, "RGB BUFFER");
int yIndex = 0;
int uIndex = frameSize;
int vIndex = frameSize + chromaSize;
int yByteIndex = 0;
int uvByteIndex = 0;
int R, G, B, Y, U, V;
int index = 0;
//this loop controls the rows
for (int i = 0; i < mHeight; i++) {
//this loop controls the columns
for (int j = 0; j < mWidth; j++) {
R = (rgbByteArray[index] & 0xff0000) >> 16;
G = (rgbByteArray[index] & 0xff00) >> 8;
B = (rgbByteArray[index] & 0xff);
Y = ((66 * R + 129 * G + 25 * B + 128) >> 8) + 16;
U = ((-38 * R - 74 * G + 112 * B + 128) >> 8) + 128;
V = ((112 * R - 94 * G - 18 * B + 128) >> 8) + 128;
//clamp and load in the Y data
yuvByteArray[yIndex++] = (byte) ((Y < 16) ? 16 : ((Y > 235) ? 235 : Y));
yArray[yByteIndex] = (byte) ((Y < 16) ? 16 : ((Y > 235) ? 235 : Y));
yByteIndex++;
if (i % 2 == 0 && index % 2 == 0) {
//clamp and load in the U & V data
yuvByteArray[uIndex++] = (byte) ((U < 16) ? 16 : ((U > 239) ? 239 : U));
yuvByteArray[vIndex++] = (byte) ((V < 16) ? 16 : ((V > 239) ? 239 : V));
uArray[uvByteIndex] = (byte) ((U < 16) ? 16 : ((U > 239) ? 239 : U));
vArray[uvByteIndex] = (byte) ((V < 16) ? 16 : ((V > 239) ? 239 : V));
uvByteIndex++;
}
index++;
}
}
encodeVideoFromImage(yArray, uArray, vArray);
encodeVideoFromBuffer(yuvByteArray);
}
Step 4 Encode the data! I currently have two different ways of doing this, and each has a different output. One uses a ByteBuffer returned from MediaCodec.getInputBuffer();, the other uses an Image returned from MediaCodec.getInputImage();
Encoding using ByteBuffer
private void encodeVideoFromBuffer(byte[] yuvData) {
Log.d(TAG, "In encodeVideo");
int inputSize = 0;
//create index for input buffer
inputBufferIndex = mCodec.dequeueInputBuffer(0);
//create the input buffer for submission to encoder
ByteBuffer inputBuffer = mCodec.getInputBuffer(inputBufferIndex);
//clear, then copy yuv buffer into the input buffer
inputBuffer.clear();
inputBuffer.put(yuvData);
//flip buffer before reading data out of it
inputBuffer.flip();
mCodec.queueInputBuffer(inputBufferIndex, 0, inputBuffer.remaining(), presentationTime, 0);
presentationTime += MICROSECONDS_BETWEEN_FRAMES;
sendToWifi();
}
And the associated output image (note: I took a screenshot of the MP4)
Encoding using Image
private void encodeVideoFromImage(byte[] yToEncode, byte[] uToEncode, byte[]vToEncode) {
Log.d(TAG, "In encodeVideo");
int inputSize = 0;
//create index for input buffer
inputBufferIndex = mCodec.dequeueInputBuffer(0);
//create the input buffer for submission to encoder
Image inputImage = mCodec.getInputImage(inputBufferIndex);
Image.Plane[] inputImagePlanes = inputImage.getPlanes();
ByteBuffer yPlaneBuffer = inputImagePlanes[0].getBuffer();
ByteBuffer uPlaneBuffer = inputImagePlanes[1].getBuffer();
ByteBuffer vPlaneBuffer = inputImagePlanes[2].getBuffer();
yPlaneBuffer.put(yToEncode);
uPlaneBuffer.put(uToEncode);
vPlaneBuffer.put(vToEncode);
yPlaneBuffer.flip();
uPlaneBuffer.flip();
vPlaneBuffer.flip();
mCodec.queueInputBuffer(inputBufferIndex, 0, inputBufferSize, presentationTime, 0);
presentationTime += MICROSECONDS_BETWEEN_FRAMES;
sendToWifi();
}
And the associated output image (note: I took a screenshot of the MP4)
Step 5 Convert H264 Stream to MP4. Finally I grab the output buffer from the codec, and use MediaMuxer to convert the raw h264 stream to an MP4 that I can play and test for correctness
private void sendToWifi() {
Log.d(TAG, "In sendToWifi");
MediaCodec.BufferInfo mBufferInfo = new MediaCodec.BufferInfo();
//Check to see if encoder has output before proceeding
boolean waitingForOutput = true;
boolean outputHasChanged = false;
int outputBufferIndex = 0;
while (waitingForOutput) {
//access the output buffer from the codec
outputBufferIndex = mCodec.dequeueOutputBuffer(mBufferInfo, -1);
if (outputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
outputFormat = mCodec.getOutputFormat();
outputHasChanged = true;
Log.d(TAG, "OUTPUT FORMAT HAS CHANGED");
}
if (outputBufferIndex >= 0) {
waitingForOutput = false;
}
}
//this buffer now contains the compressed YUV data, ready to be sent over WiFi
ByteBuffer outputBuffer = mCodec.getOutputBuffer(outputBufferIndex);
//adjust output buffer position and limit. As of API 19, this is not automatic
if(mBufferInfo.size != 0) {
outputBuffer.position(mBufferInfo.offset);
outputBuffer.limit(mBufferInfo.offset + mBufferInfo.size);
}
////////////////////////////////FOR DEGBUG/////////////////////////////
if (muxerNotStarted && outputHasChanged) {
//set up track
mTrackIndex = mMuxer.addTrack(outputFormat);
mMuxer.start();
muxerNotStarted = false;
}
if (!muxerNotStarted) {
mMuxer.writeSampleData(mTrackIndex, outputBuffer, mBufferInfo);
}
////////////////////////////END DEBUG//////////////////////////////////
//release the buffer
mCodec.releaseOutputBuffer(outputBufferIndex, false);
muxerPasses++;
}
If you've made it this far you're a gentleman (or lady!) and a scholar! Basically I'm stumped as to why my image is not coming out properly. I'm relatively new to video processing so I'm sure I'm just missing something.

If you're API 19+, might as well stick with encoding method #2, getImage()/encodeVideoFromImage(), since that is more modern.
Focusing on that method: One problem was, you had an unexpected image format. With COLOR_FormatYUV420Flexible, you know you're going to have 8-bit U and V components, but you won't know in advance where they go. That's why you have to query the Image.Plane formats. Could be different on every device.
In this case, the UV format turned out to be interleaved (very common on Android devices). If you're using Java, and you supply each array (U/V) separately, with the "stride" requested ("spacer" byte in-between each sample), I believe one array ends up clobbering the other, because these are actually "direct" ByteBuffers, and they were intended to be used from native code, like in this answer. The solution I explained was to copy an interleaved array into the third (V) plane, and ignore the U plane. On the native side, these two planes actually overlap each other in memory (except for the first and last byte), so filling one causes the implementation to fill both.
If you use the second (U) plane instead, you'll find things work, but the colors look funny. That's also because of the overlapping arrangement of these two planes; what that does, effectively, is shift every array element by one byte (which puts U's where V's should be, and vice versa.)
...In other words, this solution is actually a bit of a hack. Probably the only way to do this correctly, and have it work on all devices, is to use native code (as in the answer I linked above).
Once the color plane problem is fixed, that leaves all the funny overlapping text and vertical striations. These were actually caused by your interpretation of the RGB data, which had the wrong stride.
And, once that is fixed, you have a decent-looking picture. It's been mirrored vertically; I don't know the root cause of that, but I suspect it's an OpenGL issue.

MediaCodec - downsampled audio from 48k Hz to 44.1k Hz still plays at slower speed

So far in my quest to concatenate videos with MediaCodec I've finally managed to resample 48k Hz audio to 44.1k Hz.
I've been testing joining videos together with two videos, the first one having an audio track with 22050 Hz 2 channels format, the second one having an audio track with 24000 Hz 1 channel format. Since my decoder just outputs 44100 Hz 2 channels raw audio for the first video and 48000 Hz 2 channels raw audio for the second one, I resampled the ByteBuffers that the second video's decoder outputs from 48000 Hz down to 44100 Hz using this method:
private byte[] minorDownsamplingFrom48kTo44k(byte[] origByteArray)
{
int origLength = origByteArray.length;
int moddedLength = origLength * 147/160;
//int moddedLength = 187*36;
int delta = origLength - moddedLength;
byte[] resultByteArray = new byte[moddedLength];
int arrayIndex = 0;
for(int i = 0; i < origLength; i+=44)
{
for(int j = i; j < (i+40 > origLength ? origLength : i + 40); j++)
{
resultByteArray[arrayIndex] = origByteArray[j];
arrayIndex++;
}
//Log.i("array_iter", i+" "+arrayIndex);
}
//smoothArray(resultByteArray, 3);
return resultByteArray;
}
However, in the output video file, the video plays at a slower speed upon reaching the second video with the downsampled audio track. The pitch is the same and the noise is gone, but the audio samples just play slower.
My output format is actually 22050 Hz 2 channels, following the first video.
EDIT: It's as if the player still plays the audio as if it has a sample rate of 48000 Hz even after it's downsampled to 44100 Hz.
My questions:
How do I mitigate this problem? Because I don't think changing the timestamps works in this case. I just use the decoder-provided timestamps with some offset based on the first video's last timestamp.
Is the issue related to the CSD-0 ByteBuffers?
If MediaCodec has the option of changing the video bitrate on the fly, would a new feature of changing the audio sample rate or channel count on the fly be feasible?

Turns out it was something as simple as limiting the size of my ByteBuffers.
The decoder outputs 8192 bytes (2048 samples).
After downsampling, the data becomes 7524 bytes (1881 samples) - originally 7526 bytes but that amounts to 1881.5 samples, so I rounded it down.
The prime mistake was in this code where I have to bring the sample rate close to the original:
byte[] finalByteBufferContent = new byte[size / 2]; //here
for (int i = 0; i < bufferSize; i += 2) {
if ((i + 1) * ((int) samplingFactor) > testBufferContents.length) {
finalByteBufferContent[i] = 0;
finalByteBufferContent[i + 1] = 0;
} else {
finalByteBufferContent[i] = testBufferContents[i * ((int) samplingFactor)];
finalByteBufferContent[i + 1] = testBufferContents[i * ((int) samplingFactor) + 1];
}
}
bufferSize = finalByteBufferContent.length;
Where size is the decoder output ByteBuffer's length and testBufferContents is the byte array I use to modify its contents (and is the one that was downsampled to 7524 bytes).
The resulting byte array's length was still 4096 bytes instead of 3762 bytes.
Changing new byte[size / 2] to new byte[testBufferContents.length / 2] resolved that problem.

ffmpeg how to save decoded audio data to pcm

I have succeed decode audio data from a mp4 using avcodec_decode_audio4， I want to save the decoded frames,so I tried below
if (got_frame) {
int size;
uint8_t *data;
int ref = 0;
ret = swr_convert(swr, &data, frame->nb_samples, (const uint8_t **)frame->extended_data, frame->nb_samples);
//fwrite(data, 1, frame->nb_samples, fp_audio);
ref++;
int szie = av_samples_get_buffer_size(NULL, 2, 1024, AV_SAMPLE_FMT_FLTP, 1);
for (int i = 0; i < frame->linesize[0]/4; i++)
{
fwrite(frame->data[0] + 4*i, 1, 4, fp_audio);
fwrite(frame->data[1] + 4*i, 1, 4, fp_audio);
ref++;
}
av_frame_unref(frame);
}
but the pcm sounds strange, I also tried directed write as follows
fwrite(frame->data[0], 1, frame->linesize[0], fp_audio);
or:
fwrite(frame->data[0], 1, frame->linesize[0], fp_audio);
fwrite(frame->data[1], 1, frame->linesize[0], fp_audio);
I know that the decoded pcm is AV_SAMPLE_FMT_FLTP
any help would be appreciated

FLTP is planar float, so in case of stereo, you have two buffers, data[0] and data[1], which are per-channel planes.
For things like .wav or so, you typically want to write interleaved data, so basically an array where each even entry is left and each odd entry is right channel. To do that, convert to FLT (without P). Also note that .wav typically uses int16, not float, so for that, convert to S16.
Decoders output planar because that's how compressed streams typically layout their data, so for the individual decoders, this makes more sense.

Android pulse an icon when .wav volume changes

I'm building an android app that pulses an icon - simple pulse, 2x size at loudest volume and 1x at no volume - based on audio. Worth noting my min api is 15.
The user selects the mode (file)to play and I use AudioTrack to play it back on an infinite loop. Each wav sample ranges from < second to 2 or 3 seconds. Audiotrack lets me set the volume and pitch in real-time based on user input (SoundPool wasn't correctly changing pitch in Kitkat).
As the volume changes within each audiotrack, I'm trying to shrink and grow the icon. So far I've tried visualizer to get the waveform and fft data as the track is playing, but I'm not sure that's correct.
Is there a way to get the (nearest possible) real-time db changes from an audiotrack? The wave form function seems to always be between 108 and 112, so I don't think I'm using it correctly. The easiest pulse.wav example is here
My audiotrack init using a byte[] from pcm data
AudioTrack mAudioTrack = new AudioTrack(AudioAudioManager.STREAM_MUSIC, sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT, getMinBuffer(sound), AudioTrack.MODE_STATIC);
mAudioTrack.write(mSound, 0, mSound.length);
mAudioTrack.setLoopPoints(0, (int)(mSound.length / 4), -1);
My Visualizer
Visualizer mVisualizer = new Visualizer(mAudioTrack.getAudioSessionId());
mVisualizer.setEnabled(false);
mVisualizer.setCaptureSize(Visualizer.getCaptureSizeRange()[1]);
mVisualizer.setDataCaptureListener(new Visualizer.OnDataCaptureListener {
#Override
public void onWaveFormDataCapture(Visualizer visualizer, byte[] bytes, int samplingRate) {
double sum = 0;
for (int i = 0; i < bytes.length; i++) {
sum += Math.abs(bytes[i]) * Math.abs(bytes[i]);
}
double volume = (double) Math.sqrt(1.0d * sum / bytes.length);
//THIS IS THE RESIZE FUNCTION//
//resizeHeart((double) volume);
System.out.println("Volume: " + volume); //always prints out between 108 and 112.
}
#Override
public void onFftDataCapture(Visualizer visualizer, byte[] bytes, int samplingRate) {
//not sure what to do here.
}
}, Visualizer.getMaxCaptureRate() / 2, true, true);
mVisualizer.setEnabled(true);

The problem is that you're treating the bytes as samples even though you've specified a 16-bit sample size. Try something like this (note the abs is unnecessary since you're squaring anyway):
for (int i = 0; i < bytes.length/2; i+=2) {
int sample = bytes[i] << 8 || bytes[i+1];
sum += sample * sample;
}

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.