I have a mp4 video file in my sd card. I would like to extract the audio from the video and then save the extracted audio as a separate file on the sd card using MediaExtractor Api. Here is the code I've tried:
MediaExtractor extractor = new MediaExtractor();
extractor.setDataSource(MEDIA_PATH_To_File_On_SDCARD);
for (i = 0; i < extractor.getTrackCount(); i++) {
MediaFormat format = extractor.getTrackFormat(i);
String mime = format.getString(MediaFormat.KEY_MIME);
if (mime.startsWith("audio/")) {
extractor.selectTrack(i);
decoder = MediaCodec.createDecoderByType(mime);
if(decoder != null)
{
decoder.configure(format, null, null, 0);
}
break;
}
}
Am stuck here I have no idea of how to take the selected audio track and save it to the sd card.
Late to the party, this can be done by using MediaExtractor and MediaMuxer Apis together, Check out the working URL from below,
/**
* #param srcPath the path of source video file.
* #param dstPath the path of destination video file.
* #param startMs starting time in milliseconds for trimming. Set to
* negative if starting from beginning.
* #param endMs end time for trimming in milliseconds. Set to negative if
* no trimming at the end.
* #param useAudio true if keep the audio track from the source.
* #param useVideo true if keep the video track from the source.
* #throws IOException
*/
#SuppressLint("NewApi")
public void genVideoUsingMuxer(String srcPath, String dstPath, int startMs, int endMs, boolean useAudio, boolean useVideo) throws IOException {
// Set up MediaExtractor to read from the source.
MediaExtractor extractor = new MediaExtractor();
extractor.setDataSource(srcPath);
int trackCount = extractor.getTrackCount();
// Set up MediaMuxer for the destination.
MediaMuxer muxer;
muxer = new MediaMuxer(dstPath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4);
// Set up the tracks and retrieve the max buffer size for selected
// tracks.
HashMap<Integer, Integer> indexMap = new HashMap<Integer, Integer>(trackCount);
int bufferSize = -1;
for (int i = 0; i < trackCount; i++) {
MediaFormat format = extractor.getTrackFormat(i);
String mime = format.getString(MediaFormat.KEY_MIME);
boolean selectCurrentTrack = false;
if (mime.startsWith("audio/") && useAudio) {
selectCurrentTrack = true;
} else if (mime.startsWith("video/") && useVideo) {
selectCurrentTrack = true;
}
if (selectCurrentTrack) {
extractor.selectTrack(i);
int dstIndex = muxer.addTrack(format);
indexMap.put(i, dstIndex);
if (format.containsKey(MediaFormat.KEY_MAX_INPUT_SIZE)) {
int newSize = format.getInteger(MediaFormat.KEY_MAX_INPUT_SIZE);
bufferSize = newSize > bufferSize ? newSize : bufferSize;
}
}
}
if (bufferSize < 0) {
bufferSize = DEFAULT_BUFFER_SIZE;
}
// Set up the orientation and starting time for extractor.
MediaMetadataRetriever retrieverSrc = new MediaMetadataRetriever();
retrieverSrc.setDataSource(srcPath);
String degreesString = retrieverSrc.extractMetadata(MediaMetadataRetriever.METADATA_KEY_VIDEO_ROTATION);
if (degreesString != null) {
int degrees = Integer.parseInt(degreesString);
if (degrees >= 0) {
muxer.setOrientationHint(degrees);
}
}
if (startMs > 0) {
extractor.seekTo(startMs * 1000, MediaExtractor.SEEK_TO_CLOSEST_SYNC);
}
// Copy the samples from MediaExtractor to MediaMuxer. We will loop
// for copying each sample and stop when we get to the end of the source
// file or exceed the end time of the trimming.
int offset = 0;
int trackIndex = -1;
ByteBuffer dstBuf = ByteBuffer.allocate(bufferSize);
MediaCodec.BufferInfo bufferInfo = new MediaCodec.BufferInfo();
muxer.start();
while (true) {
bufferInfo.offset = offset;
bufferInfo.size = extractor.readSampleData(dstBuf, offset);
if (bufferInfo.size < 0) {
Log.d(TAG, "Saw input EOS.");
bufferInfo.size = 0;
break;
} else {
bufferInfo.presentationTimeUs = extractor.getSampleTime();
if (endMs > 0 && bufferInfo.presentationTimeUs > (endMs * 1000)) {
Log.d(TAG, "The current sample is over the trim end time.");
break;
} else {
bufferInfo.flags = extractor.getSampleFlags();
trackIndex = extractor.getSampleTrackIndex();
muxer.writeSampleData(indexMap.get(trackIndex), dstBuf, bufferInfo);
extractor.advance();
}
}
}
muxer.stop();
muxer.release();
return;
}
You can use the above-mentioned method by using a single line:
genVideoUsingMuxer(videoFile, originalAudio, -1, -1, true, false)
Also, read the comments to use this method more efficiently.
GIST: https://gist.github.com/ArsalRaza/132a6e99d59aa80b9861ae368bc786d0
take a look at my post Decoding Video and Encoding again by Mediacodec gets a corrupted file where there is an example (just take care about the answer too).
you have to use a MediaMuxer, call AddTrack for the video track and write the data to this track to the muxer after encoding each frame. You have to add track for audio too. If you just want only audio, ignore the video part, and just save the data to the muxer related to the audio. You can see some examples in grafika page, one of them could be this: https://github.com/google/grafika/
Also you can find more examples here: http://www.bigflake.com/mediacodec/
Thanks
Related
i went into a problem as mentioned bellow;
i use android frameworks's Mediacodec API to encode camera preview flow from opengl texture,and wish to generate a ts file;
since Mediacodec does not support generating ts file, so i use ffmpeg to do so;
every thing is OK, the ts file is successfully generated, and it can be played by media player either on my android or pc, but there are still 2 problems bothering me;
the video file does not have a image co, wherever the xxx.ts is shown on my android phone or PC;
when i move the xxx.ts onto my pc(windows), right click the file to check its attribute, the attribute frame-rate is also empty;
Does any one have ideas on these isssues?
the encoder mediacodec configuration is as bellow:
mBufferInfo = new MediaCodec.BufferInfo();
MediaFormat format = MediaFormat.createVideoFormat(MIME_TYPE, width, height);
format.setInteger(MediaFormat.KEY_COLOR_FORMAT,
MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
format.setInteger(MediaFormat.KEY_BIT_RATE, 4 * 1024 * 1024);
format.setInteger(MediaFormat.KEY_FRAME_RATE, 25);
format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1);
mEncoder = MediaCodec.createEncoderByType("video/avc");
mEncoder.configure(format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
mInputSurface = mEncoder.createInputSurface();
mEncoder.start();
mFFmpegMuxer = new FFmpegMuxer();
mFFmpegMuxer.prepare();
FFmepgMuxer->prepare() which is also known as ffmpeg muxer's configuration is as bellow:
mOutputFormatCtx = avformat_alloc_context();
AVOutputFormat * outputFormat = av_guess_format(nullptr, "xxx.ts", nullptr);
mOutputFormatCtx->oformat = outputFormat;
AVStream *stream = avformat_new_stream(mOutputFormatCtx, nullptr);
stream->codecpar->codec_id = AV_CODEC_ID_H264;
stream->codecpar->format = AV_PIX_FMT_RGBA;
stream->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
stream->codecpar->codec_tag = av_codec_get_tag(mOutputFormatCtx->oformat->codec_tag,
AV_CODEC_ID_H264);
stream->codecpar->width = 1080;
stream->codecpar->height = 1200;
stream->codecpar->bit_rate = 4 * 1024 * 1024;
stream->time_base.num = 1;
stream->time_base.den = 25;
mOutputStreamInd = stream->index;
if (mOutputFormatCtx->oformat->flags & AVFMT_GLOBALHEADER) {
mOutputFormatCtx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
}
avio_open2(&mOutputFormatCtx->pb, outputPath.c_str(), AVIO_FLAG_WRITE, nullptr, nullptr);
av_dict_set(&opts, "movflags", "faststart", 0);
avformat_write_header(mOutputFormatCtx, &opts);
av_dict_free(&opts);
everytime the mediacodec sucessfully encoded a packet, the packet will be enqueue into ffmpeg:
void FFmpegMuxer::enqueueBuffer(uint8_t *data, int offset, int size, long pts, bool keyFrame) {
if (mPacket == nullptr) { mPacket = av_packet_alloc(); }
av_init_packet(mPacket);
mPacket->stream_index = mOutputStreamInd;
mPacket->size = size;
mPacket->data = data + offset;
if (mRecStartPts == 0) {
mRecStartPts = pts;
mPacket->pts = 0;
mPacket->dts = 0;
} else {
int64_t dstPts = pts - mRecStartPts;
dstPts = av_rescale_q(dstPts, AV_TIME_BASE_Q,
mOutputFormatCtx->streams[mOutputStreamInd]->time_base);
mPacket->pts = dstPts;
mPacket->dts = dstPts;
}
if (keyFrame) {
mPacket->flags = AV_PKT_FLAG_KEY;
}
int status = av_interleaved_write_frame(mOutputFormatCtx, mPacket);
if (status < 0) {
..........
}
av_packet_unref(mPacket);
}
when the recording is required to be stopped, the code is as bellow:
av_write_trailer(mOutputFormatCtx);
the code snippets are information i could supply, can any one find out what is wrong?
finally i find out what is wrong;
the ts is real-time bit flow, so i need to insert sps and pps before every i frame;
that is the answer.
I am encoding raw data on Android using ffmpeg libraries. The native code reads the audio data from an external device and encodes it into AAC format in an mp4 container. I am finding that the audio data is successfully encoded (I can play it with Groove Music, my default Windows audio player). But the metadata, as reported by ffprobe, has an incorrect duration of 0.05 secs - it's actually several seconds long. Also the bitrate is reported wrongly as around 65kbps even though I specified 192kbps.
I've tried recordings of various durations but the result is always similar - the (very small) duration and bitrate. I've tried various other audio players such as Quicktime but they play only the first 0.05 secs or so of the audio.
I've removed error-checking from the following. The actual code checks every call and no problems are reported.
Initialisation:
void AudioWriter::initialise( const char *filePath )
{
AVCodecID avCodecID = AVCodecID::AV_CODEC_ID_AAC;
int bitRate = 192000;
char *containerFormat = "mp4";
int sampleRate = 48000;
int nChannels = 2;
mAvCodec = avcodec_find_encoder(avCodecID);
mAvCodecContext = avcodec_alloc_context3(mAvCodec);
mAvCodecContext->codec_id = avCodecID;
mAvCodecContext->codec_type = AVMEDIA_TYPE_AUDIO;
mAvCodecContext->sample_fmt = AV_SAMPLE_FMT_FLTP;
mAvCodecContext->bit_rate = bitRate;
mAvCodecContext->sample_rate = sampleRate;
mAvCodecContext->channels = nChannels;
mAvCodecContext->channel_layout = AV_CH_LAYOUT_STEREO;
avcodec_open2( mAvCodecContext, mAvCodec, nullptr );
mAvFormatContext = avformat_alloc_context();
avformat_alloc_output_context2(&mAvFormatContext, nullptr, containerFormat, nullptr);
mAvFormatContext->audio_codec = mAvCodec;
mAvFormatContext->audio_codec_id = avCodecID;
mAvOutputStream = avformat_new_stream(mAvFormatContext, mAvCodec);
avcodec_parameters_from_context(mAvOutputStream->codecpar, mAvCodecContext);
if (!(mAvFormatContext->oformat->flags & AVFMT_NOFILE))
{
avio_open(&mAvFormatContext->pb, filePath, AVIO_FLAG_WRITE);
}
if ( mAvFormatContext->oformat->flags & AVFMT_GLOBALHEADER )
{
mAvCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
}
avformat_write_header(mAvFormatContext, NULL);
mAvAudioFrame = av_frame_alloc();
mAvAudioFrame->nb_samples = mAvCodecContext->frame_size;
mAvAudioFrame->format = mAvCodecContext->sample_fmt;
mAvAudioFrame->channel_layout = mAvCodecContext->channel_layout;
av_samples_get_buffer_size(NULL, mAvCodecContext->channels, mAvCodecContext->frame_size,
mAvCodecContext->sample_fmt, 0);
av_frame_get_buffer(mAvAudioFrame, 0);
av_frame_make_writable(mAvAudioFrame);
mAvPacket = av_packet_alloc();
}
Encoding:
// SoundRecording is a custom class with the raw samples to be encoded
bool AudioWriter::encodeToContainer( SoundRecording *soundRecording )
{
int ret;
int frameCount = mAvCodecContext->frame_size;
int nChannels = mAvCodecContext->channels;
float *buf = new float[frameCount*nChannels];
while ( soundRecording->hasReadableData() )
{
//Populate the frame
int samplesRead = soundRecording->read( buf, frameCount*nChannels );
// Planar data
int nFrames = samplesRead/nChannels;
for ( int i = 0; i < nFrames; ++i )
{
for (int c = 0; c < nChannels; ++c )
{
samples[c][i] = buf[nChannels*i +c];
}
}
// Fill a gap at the end with silence
if ( samplesRead < frameCount*nChannels )
{
for ( int i = samplesRead; i < frameCount*nChannels; ++i )
{
for (int c = 0; c < nChannels; ++c )
{
samples[c][i] = 0.0;
}
}
}
encodeFrame( mAvAudioFrame ) )
}
finish();
}
bool AudioWriter::encodeFrame( AVFrame *frame )
{
//send the frame for encoding
int ret;
if ( frame != nullptr )
{
frame->pts = mAudFrameCounter++;
}
avcodec_send_frame(mAvCodecContext, frame );
while (ret >= 0)
{
ret = avcodec_receive_packet(mAvCodecContext, mAvPacket);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF )
{
break;
}
else
if (ret < 0) {
return false;
}
av_packet_rescale_ts(mAvPacket, mAvCodecContext->time_base, mAvOutputStream->time_base);
mAvPacket->stream_index = mAvOutputStream->index;
av_interleaved_write_frame(mAvFormatContext, mAvPacket);
av_packet_unref(mAvPacket);
}
return true;
}
void AudioWriter::finish()
{
// Flush by sending a null frame
encodeFrame( nullptr );
av_write_trailer(mAvFormatContext);
}
Since the resultant file contains the recorded music, the code to manipulate the audio data seems to be correct (unless I am overwriting other memory somehow).
The inaccurate duration and bitrate suggest that information concerning time is not being properly managed. I set the pts of the frames using a simple increasing integer. I'm unclear what the code that sets the timestamp and stream index achieves - and whether it's even necessary: I copied it from supposedly working code but I've seen other code without it.
Can anyone see what I'm doing wrong?
The timestamp need to be correct. Set the time_base to 1/sample_rate and increment the timestamp by 1024 each frame. Note: 1024 is aac specific. If you change codecs, you need to change the frame size.
Using the MediaExtractor class, I am able to get encoded audio sample data from an saved mp4 video with the below:
ByteBuffer byteBuffer = ByteBuffer.allocate(1024 * 256);
MediaExtractor audioExtractor = new MediaExtractor();
try {
int trackIndex = -1;
audioExtractor.setDataSource(originalMediaItem.getFilePath());
for (int i = 0; i < audioExtractor.getTrackCount(); i++) {
MediaFormat format = audioExtractor.getTrackFormat(i);
String mime = format.getString(MediaFormat.KEY_MIME);
if (mime.startsWith("audio/")) {
trackIndex = i;
break;
}
}
audioExtractor.selectTrack(trackIndex);
mAudioFormatMedia = audioExtractor.getTrackFormat(trackIndex);
mAudioTrackIndex = mMediaMuxer.addTrack(mAudioFormatMedia);
int size = audioExtractor.readSampleData(byteBuffer, 0);
do {
if (audioExtractor.getSampleTrackIndex() == 1) {
long presentationTime = audioExtractor.getSampleTime();
mInputBufferHashMap.put(presentationTime, byteBuffer);
audioExtractor.advance();
size = audioExtractor.readSampleData(byteBuffer, 0);
}
} while (size >= 0);
audioExtractor.release();
audioExtractor = null;
} catch (IOException e) {
e.printStackTrace();
}
I have a video source coming from a GlSurface and then want to use a MediaMuxer to mux this video with the audio extraction mentioned previously. Audio is interleaved into the muxer using the hashmap as video is being processed. I am successful in muxing both the Video and Audio and creating a playable mp4 video, however the audio does not sound anything like the original audio of the original mp4.
I do see the expected bufferinfo.size and bufferInfo.presentationTimeUs when I write to the muxer:
mMediaMuxer.writeSampleData(mAudioTrackIndex, buffer, mAudioBufferInfo);
Log.d(TAG, String.format("Wrote %d audio bytes at %d", mAudioBufferInfo.size, mAudioBufferInfo.presentationTimeUs));
I've tried to use the standard inputBuffer, outputBuffer with MediaCodec, like this https://gist.github.com/a-m-s/1991ab18fbcb0fcc2cf9, but this produces the same audio, and from my understanding, MediaExtractor should already be encoded audio data, so data should be able to be piped directly.
What is also interesting is that when i check for the flags when initially extracting:
if( (audioExtractor.getSampleFlags() & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0)
Log.d(TAG, "BUFFER_FLAG_END_OF_STREAM")
Neither of the above get printed for the original mp4 video. I am now questioning the original mp4 video and whether if it is possible to have a non-extractable audiotrack for an mp4 and how I can possibly confirm this.
I believe I've looked at most if not all the MediaExtractor questions on stackoverflow and a lot of the singleton solutions for MediaExtractor on github. Does anyone know of a way to extract audio another way, i.e. using ExoPlayer (preferrably not ffmpeg because it adds a ton of overhead on the android project). Any insights would help if there are any errors in my current implementation!
EDIT 1: This is what the format is audioExtractor.getTrackFormat(trackIndex):
{max-bitrate=512000, sample-rate=48000, track-id=2, durationUs=22373187, mime=audio/mp4a-latm, profile=2, channel-count=4, language=```, aac-profile=2, bitrate=512000, max-input-size=1764, csd-0=java.nio.HeapByteBuffer[pos=0 lim=2 cap=2]}
Problem was attempting to create a Map for the audio data. The AudioData was not correct. I was able to solve this by batching audio sample data while writing videoData using a method like the below:
private void writeAudioSampleData(
MediaExtractor audioExtractor, MediaMuxer muxer, int filterStart, int filterEnd) {
mFilterStart = filterEnd;
MediaCodec.BufferInfo audioBufferInfo = new MediaCodec.BufferInfo();
boolean audioExtractorDone = false;
audioExtractor.seekTo(filterStart, MediaExtractor.SEEK_TO_CLOSEST_SYNC);
synchronized (mAudioLockObject) {
while (!audioExtractorDone) {
try {
audioBufferInfo.size =
audioExtractor.readSampleData(audioInputBuffer, 0);
} catch (Exception e) {
e.printStackTrace();
}
if (DEBUG) {
Log.d(TAG, "audioBufferInfo.size: " + audioBufferInfo.size);
}
if (audioBufferInfo.size < 0) {
audioBufferInfo.size = 0;
audioExtractorDone = true;
} else {
audioBufferInfo.presentationTimeUs = audioExtractor.getSampleTime();
if (audioBufferInfo.presentationTimeUs > filterEnd) {
break; //out of while
}
if (audioBufferInfo.presentationTimeUs >= filterStart &&
audioBufferInfo.presentationTimeUs <= filterEnd) {
audioBufferInfo.presentationTimeUs -= mOriginalMediaItem.mRecordingStartTs;
audioBufferInfo.flags = audioExtractor.getSampleFlags();
try {
muxer.writeSampleData(mAudioTrackIndex, audioInputBuffer,
audioBufferInfo);
if (DEBUG)Log.d(TAG, String.format("Wrote %d audio bytes at %d",
audioBufferInfo.size, audioBufferInfo.presentationTimeUs));
} catch(IllegalArgumentException | IllegalStateException |
NullPointerException ignore) {}
}
audioExtractor.advance();
}
}
}
I have application (Qt + Android), that creates live stream from Android's Camera (AVC) + AudioRecorder (AAC) and then sends encoded data to RTMP server using librtmp library (v 2.4).
AVC MediaCodec main func.:
public void videoEncode(byte[] data) {
// Video buffers
videoCodecInputBuffers = videoMediaCodec.getInputBuffers();
videoCodecOutputBuffers = videoMediaCodec.getOutputBuffers();
int inputBufferIndex = videoMediaCodec.dequeueInputBuffer(-1);
if (inputBufferIndex >= 0) {
videoInputBuffer = videoCodecInputBuffers[inputBufferIndex];
videoCodecInputData = YV12toYUV420Planar(data, encWidth * encHeight);
videoInputBuffer.clear();
videoInputBuffer.put(videoCodecInputData);
videoMediaCodec.queueInputBuffer(inputBufferIndex, 0, videoCodecInputData.length, 0, 0);
}
// Get AVC/H.264 frame
int outputBufferIndex = videoMediaCodec.dequeueOutputBuffer(videoBufferInfo, 0);
while(outputBufferIndex >= 0) {
videoOutputBuffer = videoCodecOutputBuffers[outputBufferIndex];
videoOutputBuffer.get(videoCodecOutputData, 0, videoBufferInfo.size);
// H.264 / AVC header
if(videoCodecOutputData[0] == 0x00 && videoCodecOutputData[1] == 0x00 && videoCodecOutputData[2] == 0x00 && videoCodecOutputData[3] == 0x01) {
// I-frame
boolean keyFrame = false;
if((videoBufferInfo.flags & MediaCodec.BUFFER_FLAG_SYNC_FRAME) == MediaCodec.BUFFER_FLAG_SYNC_FRAME) {
resetTimestamp();
keyFrame = true;
}
int currentTimestamp = cameraAndroid.calcTimestamp();
if(prevTimestamp == currentTimestamp) currentTimestamp++;
sendVideoData(videoCodecOutputData, videoBufferInfo.size, currentTimestamp, cameraAndroid.calcTimestamp()); // Native C func
prevTimestamp = currentTimestamp;
// SPS / PPS sent
spsPpsFrame = true;
}
videoMediaCodec.releaseOutputBuffer(outputBufferIndex, false);
outputBufferIndex = videoMediaCodec.dequeueOutputBuffer(videoBufferInfo, 0);
}
}
AAC MediaCodec main func.:
public void audioEncode(byte[] data) {
// Audio buffers
audioCodecInputBuffers = audioMediaCodec.getInputBuffers();
audioCodecOutputBuffers = audioMediaCodec.getOutputBuffers();
// Add raw chunk into buffer
int inputBufferIndex = audioMediaCodec.dequeueInputBuffer(-1);
if (inputBufferIndex >= 0) {
audioInputBuffer = audioCodecInputBuffers[inputBufferIndex];
audioInputBuffer.clear();
audioInputBuffer.put(data);
audioMediaCodec.queueInputBuffer(inputBufferIndex, 0, data.length, 0, 0);
}
// Encode AAC
int outputBufferIndex = audioMediaCodec.dequeueOutputBuffer(audioBufferInfo, 0),
audioOutputBufferSize = 0;
while(outputBufferIndex >= 0) {
audioOutputBuffer = audioCodecOutputBuffers[outputBufferIndex];
audioOutputBuffer.get(audioCodecOutputData, 0, audioBufferInfo.size);
if(spsPpsFrame || esdsChunk) {
int currentTimestamp = cameraAndroid.calcTimestamp();
if(prevTimestamp == currentTimestamp) currentTimestamp++;
sendAudioData(audioCodecOutputData, audioBufferInfo.size, currentTimestamp); // Native C func
prevTimestamp = currentTimestamp;
esdsChunk = false;
}
// Next chunk
audioMediaCodec.releaseOutputBuffer(outputBufferIndex, false);
outputBufferIndex = audioMediaCodec.dequeueOutputBuffer(audioBufferInfo, 0);
}
}
Camera frames encoded in setPreviewCallbackWithBuffer and AudioRecorder's chunks in other thread:
audioThread = new Thread(new Runnable() {
public void run() {
audioBufferSize = AudioRecord.getMinBufferSize(44100, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT);
while(!audioThread.interrupted()) {
int ret = mic.read(audioCodecInputData, 0, audioBufferSize);
if(ret >= 0)
cameraAndroid.audioEncode(audioCodecInputData);
}
}
});
sendVideoData and sendAudioData are native C functions (librtmp func-s + JNI):
public synchronized native void sendVideoData(byte[] buf, int size, int timestamp, boolean keyFrame);
public synchronized native void sendAudioData(byte[] buf, int size, int timestamp);
The main thing, that I can't understood is: why live stream is absolutely unstable, when I playing them from Adobe Flash Player?
First 1-2 seconds of stream is absolutely correct, but then I always see I-frames every 2 seconds (videoMediaFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 2)) and very bad sound stream, that I can hear for milliseconds during I-frame interval and then it interrupts.
Can someone show to me correct way for creating stable live stream, please? Where I'm wrong?
Also, I post here AVC/AAC MediaCodec settings (may be something wrong here?):
// H.264/AVC (advanced video coding) format
MediaFormat videoMediaFormat = MediaFormat.createVideoFormat("video/avc", encWidth, encHeight);
videoMediaFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420Planar);
videoMediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, encWidth * encHeight * 4); // бит в секунду
videoMediaFormat.setInteger(MediaFormat.KEY_FRAME_RATE, fps); // FPS
videoMediaFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, iFrameInterval); // interval секунд между I-frames
videoMediaCodec = MediaCodec.createEncoderByType("video/avc");
videoMediaCodec.configure(videoMediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
// AAC (advanced audio coding) format
MediaFormat audioMediaFormat = MediaFormat.createAudioFormat("audio/mp4a-latm", 44100, 1); // mime-type, sample rate, channel count
audioMediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, 64 * 1000); // kbps
audioMediaFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
audioMediaFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, audioBufferSize); // 4096 (default) / 4736 * 1 (min audio buffer size)
audioMediaCodec = MediaCodec.createEncoderByType("audio/mp4a-latm");
audioMediaCodec.configure(audioMediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
Update:
I tried to play stream with ffmpeg (thanks #Robert Rowntree) and what I see in console constantly:
Non-monotonous DTS in output stream 0:1; previous: 95054, current:
46136; changing to 95056. This may result in incorrect timestamps in
the output file.
So, I check output from android app, but I can't see wrong lines (a - encoded AAC chunk, v - encoded AVC frame, integer value - timestamp in milliseconds): output.txt
Is that correct timestamps?
I would like to produce mp4 file by multiplexing audio from mic (overwrite didGetAudioData) and video from camera (overwrite onpreviewframe).However, I encountered the sound and video synchronization problem, video will appear faster than audio. I wondered if the problem related to incompatible configurations or presentationTimeUs, could someone guide me how to fix the problem. Below were my software.
Video configuration
formatVideo = MediaFormat.createVideoFormat(MIME_TYPE_VIDEO, 640, 360);
formatVideo.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420SemiPlanar);
formatVideo.setInteger(MediaFormat.KEY_BIT_RATE, 2000000);
formatVideo.setInteger(MediaFormat.KEY_FRAME_RATE, 30);
formatVideo.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5);
got video presentationPTS as below,
if(generateIndex == 0) {
videoAbsolutePtsUs = 132;
StartVideoAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentVideoAbsolutePtsUs = System.nanoTime() / 1000L;
videoAbsolutePtsUs =132+ CurrentVideoAbsolutePtsUs-StartVideoAbsolutePtsUs;
}
generateIndex++;
audio configuration
format = MediaFormat.createAudioFormat(MIME_TYPE, 48000/*sample rate*/, AudioFormat.CHANNEL_IN_MONO /*Channel config*/);
format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
format.setInteger(MediaFormat.KEY_SAMPLE_RATE,48000);
format.setInteger(MediaFormat.KEY_CHANNEL_COUNT,1);
format.setInteger(MediaFormat.KEY_BIT_RATE,64000);
got audio presentationPTS as below,
if(generateIndex == 0) {
audioAbsolutePtsUs = 132;
StartAudioAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentAudioAbsolutePtsUs = System.nanoTime() / 1000L;
audioAbsolutePtsUs =CurrentAudioAbsolutePtsUs - StartAudioAbsolutePtsUs;
}
generateIndex++;
audioAbsolutePtsUs = getJitterFreePTS(audioAbsolutePtsUs, audioInputLength / 2);
long startPTS = 0;
long totalSamplesNum = 0;
private long getJitterFreePTS(long bufferPts, long bufferSamplesNum) {
long correctedPts = 0;
long bufferDuration = (1000000 * bufferSamplesNum) / 48000;
bufferPts -= bufferDuration; // accounts for the delay of acquiring the audio buffer
if (totalSamplesNum == 0) {
// reset
startPTS = bufferPts;
totalSamplesNum = 0;
}
correctedPts = startPTS + (1000000 * totalSamplesNum) / 48000;
if(bufferPts - correctedPts >= 2*bufferDuration) {
// reset
startPTS = bufferPts;
totalSamplesNum = 0;
correctedPts = startPTS;
}
totalSamplesNum += bufferSamplesNum;
return correctedPts;
}
Was my issue caused by applying jitter function for audio only? If yes, how could I apply jitter function for video? I also tried to find correct audio and video presentationPTS by https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java. But encodedecodeTest only provided video PTS. That's the reason my implementation used system nanotime for both audio and video. If I want to use video presentationPTS in encodedecodetest, how to construct the compatible audio presentationPTS? Thanks for help!
below are how i queue yuv frame to video mediacodec for reference. For audio part, it is identical except for different presentationPTS.
int videoInputBufferIndex;
int videoInputLength;
long videoAbsolutePtsUs;
long StartVideoAbsolutePtsUs, CurrentVideoAbsolutePtsUs;
int put_v =0;
int get_v =0;
int generateIndex = 0;
public void setByteBufferVideo(byte[] buffer, boolean isUsingFrontCamera, boolean Input_endOfStream){
if(Build.VERSION.SDK_INT >=18){
try{
endOfStream = Input_endOfStream;
if(!Input_endOfStream){
ByteBuffer[] inputBuffers = mVideoCodec.getInputBuffers();
videoInputBufferIndex = mVideoCodec.dequeueInputBuffer(-1);
if (VERBOSE) {
Log.w(TAG,"[put_v]:"+(put_v)+"; videoInputBufferIndex = "+videoInputBufferIndex+"; endOfStream = "+endOfStream);
}
if(videoInputBufferIndex>=0) {
ByteBuffer inputBuffer = inputBuffers[videoInputBufferIndex];
inputBuffer.clear();
inputBuffer.put(mNV21Convertor.convert(buffer));
videoInputLength = buffer.length;
if(generateIndex == 0) {
videoAbsolutePtsUs = 132;
StartVideoAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentVideoAbsolutePtsUs = System.nanoTime() / 1000L;
videoAbsolutePtsUs =132+ CurrentVideoAbsolutePtsUs - StartVideoAbsolutePtsUs;
}
generateIndex++;
if (VERBOSE) {
Log.w(TAG, "[put_v]:"+(put_v)+"; videoAbsolutePtsUs = " + videoAbsolutePtsUs + "; CurrentVideoAbsolutePtsUs = "+CurrentVideoAbsolutePtsUs);
}
if (videoInputLength == AudioRecord.ERROR_INVALID_OPERATION) {
Log.w(TAG, "[put_v]ERROR_INVALID_OPERATION");
} else if (videoInputLength == AudioRecord.ERROR_BAD_VALUE) {
Log.w(TAG, "[put_v]ERROR_ERROR_BAD_VALUE");
}
if (endOfStream) {
Log.w(TAG, "[put_v]:"+(put_v++)+"; [get] receive endOfStream");
mVideoCodec.queueInputBuffer(videoInputBufferIndex, 0, videoInputLength, videoAbsolutePtsUs, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
} else {
Log.w(TAG, "[put_v]:"+(put_v++)+"; receive videoInputLength :" + videoInputLength);
mVideoCodec.queueInputBuffer(videoInputBufferIndex, 0, videoInputLength, videoAbsolutePtsUs, 0);
}
}
}
}catch (Exception x) {
x.printStackTrace();
}
}
}
How I solved this in my application was by setting the PTS of all video and audio frames against a shared "sync clock" (note the sync also means it's thread-safe) that starts when the first video frame (having a PTS 0 on its own) is available. So if audio recording starts sooner than video, audio data is dismissed (doesn't go into encoder) until video starts, and if it starts later, then the first audio PTS will be relative to the start of the entire video.
Ofcourse you are free to allow audio to start first, but players will usually skip or wait for the first video frame anyway. Also be careful that encoded audio frames will arrive "out of order" and MediaMuxer will fail with an error sooner or later. My solution was to queue them all like this: sort them by pts when a new one comes in, then write everything that is older than 500 ms (relative to the newest one) to MediaMuxer, but only those with a PTS higher than the latest written frame. Ideally this means data is smoothly written to MediaMuxer, with a 500 ms delay. Worst case, you will lose a few audio frames.