I am writing an Android app where I plan to encode several images in to a live h.264 video stream that can be replayed on any browser. I am using the MediaCodec API for encoding and then MediaMuxer to write it to a file as per the example here http://bigflake.com/mediacodec/.
What I am stuck with is that how to tell the encoder/muxer to encode it such that it can be progressively played back. From the examples only when the encoder/muxer.stop()/encoder/muxer.release() call is made, then the video file gets the right meta headers, etc..
Thanks
I guess you are considering the time at which each frame is shown.
You need to give MediaMuxer, along with the frame, the right "MediaCodec.BufferInfo," whose "presentationTimeUs" is set accordingly.
For example, there are 3 frames, each is shown for 1 second in the video:
sec 0---------1---------2-----------
frame1 frame2 frame3
int[] timestampSec = {0, 1, 2};
for (int i = 0; i < 3; i++) {
muxer.writeSampleData(trackId,
frame[i],
timeStampSec[i] * 1000000);
}
As to initialization and ending of MediaMuxer:
addTrack: when you get index == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED when you call MediaCodec.dequeueOutputBuffer(), send the format to MediaMuxer to initialize a new track for this format(it's "vidio/avc" in this case).
mediamuxer.start()
start put frames as above
mediamuxer.stop(), release()
Related
I've managed to join audio tracks of video files using MediaCodec. There are no problems doing so if the channel count and the sample rate of both audio tracks are the same.
(for some reason the OMX.SEC.aac.dec always outputs 44100 Hz 2 channel audio if the original track is a 22050 Hz, and outputs 48000 Hz 2 channel audio if the original track is 24000 Hz.)
The problem comes in when I try appending a 24000 Hz audio track after a 22050 Hz audio track. Assuming I want to output a 22050 Hz audio track consisting of both said tracks, I'll have to resample the 24000 Hz one.
I tried this:
private byte[] minorDownsamplingFrom48kTo44k(byte[] origByteArray)
{
int origLength = origByteArray.length;
int moddedLength = origLength * 147/160;
int delta = origLength - moddedLength;
byte[] resultByteArray = new byte[moddedLength];
int arrayIndex = 0;
for(int i = 0; i < origLength; i+=11)
{
for(int j = i; j < i+10; j++)
{
resultByteArray[arrayIndex] = origByteArray[j];
arrayIndex++;
}
}
return resultByteArray;
}
It returns a byte array of 3700-something bytes and the correct audio after the encoding... behind a very loud scrambled sound.
My questions:
How do I correctly downsample the audio track without leaving such artifacts? Should I use average values?
Should I use a resampler implemented using OpenSL ES to make the process faster and/or better?
The main issue is that you're just skipping bytes when you should be skipping samples.
Each sample is 16 bits, so two bytes. If the audio is stereo there are four bytes per sample. You have to always skip that many bytes or otherwise your samples will get completely mixed up.
Using the same ratio (10/11) you can use 40/44 to always skip a full four-byte sample and keep the samples proper.
As to why the resulting video is playing at a different speed, that's something completely different.
I'm using ffmpeg in Android project via JNI to decode real-time H264 video stream. On the Java side I'm only sending the the byte arrays into native module. Native code is running a loop and checking data buffers for new data to decode. Each data chunk is processed with:
int bytesLeft = data->GetSize();
int paserLength = 0;
int decodeDataLength = 0;
int gotPicture = 0;
const uint8_t* buffer = data->GetData();
while (bytesLeft > 0) {
AVPacket packet;
av_init_packet(&packet);
paserLength = av_parser_parse2(_codecPaser, _codecCtx, &packet.data, &packet.size, buffer, bytesLeft, AV_NOPTS_VALUE, AV_NOPTS_VALUE, AV_NOPTS_VALUE);
bytesLeft -= paserLength;
buffer += paserLength;
if (packet.size > 0) {
decodeDataLength = avcodec_decode_video2(_codecCtx, _frame, &gotPicture, &packet);
}
else {
break;
}
av_free_packet(&packet);
}
if (gotPicture) {
// pass the frame to rendering
}
The system works pretty well until incoming video's resolution changes. I need to handle transition between 4:3 and 16:9 aspect ratios. While having AVCodecContext configured as follows:
_codecCtx->flags2|=CODEC_FLAG2_FAST;
_codecCtx->thread_count = 2;
_codecCtx->thread_type = FF_THREAD_FRAME;
if(_codec->capabilities&CODEC_FLAG_LOW_DELAY){
_codecCtx->flags|=CODEC_FLAG_LOW_DELAY;
}
I wasn't able to continue decoding new frames after video resolution change. The got_picture_ptr flag that avcodec_decode_video2 enables when whole frame is available was never true after that.
This ticket made me wonder if the issue isn't connected with multithreading. Only useful thing I've noticed is that when I change thread_type to FF_THREAD_SLICE the decoder is not always blocked after resolution change, about half of my attempts were successfull. Switching to single-threaded processing is not possible, I need more computing power. Setting up the context to one thread does not solve the problem and makes the decoder not keeping up with processing incoming data.
Everything work well after app restart.
I can only think of one workoround (it doesn't really solve the problem): unloading and loading the whole library after stream resolution change (e.g as mentioned in here). I don't think it's good tho, it will propably introduce other bugs and take a lot of time (from user's viewpoint).
Is it possible to fix this issue?
EDIT:
I've dumped the stream data that is passed to decoding pipeline. I've changed the resolution few times while stream was being captured. Playing it with ffplay showed that in moment when resolution changed and preview in application froze, ffplay managed to continue, but preview is glitchy for a second or so. You can see full ffplay log here. In this case video preview stopped when I changed resolution to 960x720 for the second time. (Reinit context to 960x720, pix_fmt: yuv420p in log).
I have a problem with rtmp streaming of android surface to a client application. My solution has a very big latency, because my surface is not producing frames 60 times a second, it can produce it in any time (once in 30 seconds for example). So I want to show each new produced frame to the client immediately.
Android is pushing every frame, it looks fine. Client app (jwplayer or vlc) receives it, but it waiting for something. It becomes showing video only after receiving a number of frames. But I need to see every incoming frame on the client side when it just have been received.
How it is working now:
I have a Surface object, obtained from MediaCodec class. MediaCodec is set for h264 video encoding.
MediaCodec mEncoder;
.....
MediaFormat format = MediaFormat.createVideoFormat("video/avc", width, height);
format.setInteger(MediaFormat.KEY_COLOR_FORMAT, colorFormat);
format.setInteger(MediaFormat.KEY_BIT_RATE, videoBitrate);
format.setInteger(MediaFormat.KEY_FRAME_RATE, videoFramePerSecond);
format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, iframeInterval);
try {
mEncoder = MediaCodec.createEncoderByType("video/avc");
} catch (IOException e) {
e.printStackTrace();
}
mEncoder.configure(format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
mSurface = mEncoder.createInputSurface();
if (mSurfaceCallback!=null)
mSurfaceCallback.onSurfaceCreated(mSurface);
mEncoder.start();
Sometimes android is drawing to the surface. I can't control the rate of this drawings. Also I can't draw anything to that surface. When something is changed on the surface, MediaCodec is producing new byteBuffer with h264 frame. I send this frame by rtmp.
On a client side I have html page with jwplayer
<pre id="myElement"></pre>
<script>
var playerInstance = jwplayer("myElement");
playerInstance.setup({
file:"rtmp://127.0.0.1:1935/live/stream",
height: 800,
width: 480,
autostart: true,
controls: false,
rtmp: {
bufferlength: 0.1
}
});
</script>
I've tried to change iframeInterval, fps of encoding, bufferlength.. Nothing is really helpful.
Is there is any possibility to show incomming frames immeditely?
What do you mean?
If I understood right - you have:
vlc(client) ---- rtmp protocol ---- android (producer)
You encode video from something (may be camera) using MediaCodec and in vlc there is time latency? right?
At first - what are you using - direct input buffer or MediaCodec.Callback() ?
In callback - you can check every frame in onOutputBufferAvailable and calculate time from one frame to another - this will show you - is this problem on android side.
Then you can try to resolve frame transef problem
You can use WireShark to determine frame sending timing and chek - may be this is network problem
Than - vlc and other players try to fill some internal buffer and only after this starting to show video. Try to turn of vlc buffer (https://forum.videolan.org/viewtopic.php?t=40408). Then - there is common that vlc waiting for IDR frame. You can set interval for sending IDR frames in code
format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, iframeInterval);
iframeInterval in seconds (try to set 1 second)
(this will increase streaming size)
Sorry for my bad english
You can hopefully generate video frames at constant rate, even more than 20 fps to produce smooth video with acceptable latency. h264 encoder will handle a stable picture (one changing once in ~30 sec) gracefully, and when there is no change, frame size will be minimal.
I'm using AudioRecord to get audio in real-time from the device microphone, and encoding / saving it to file in the background using the MediaCodec and MediaMuxer classes.
Is there any way to change the Pitch and (or) Tempo of the audio stream before it is saved to file?
By pitch/tempo, do you mean the frequency itself, or really the speed of the samples? If so, then each sample should be projected in a shorter or longer period of time:
Example:
private static byte[] ChangePitch(byte[] samples, float ratio) {
byte[] result = new byte[(int)(Math.Floor (samples.Length * ratio))];
for (int i = 0; i < result.Length; i++) {
var pointer = (int)((float)i / ratio);
result [i] = samples [pointer];
}
return result;
}
If you just want to change the pitch without affecting the speed, then you need to read about phase vocoder. This is sound science, and there are a lot of projects to achieve this. https://en.wikipedia.org/wiki/Phase_vocoder
To modify the pitch/tempo of the audio stream you'll have to resample it yourself before you encode it using the codec. Keep in mind that you also need to modify the timestamps if you change the tempo of the stream.
I'm using AudioRecord to record the audio stream during a camera capturing process on Android device.
Since I want to process the frame data and handle audio/video samples, I do not use MediaRecorder.
I run AudioRecord in another thread with the calling of read() to gather the raw audio data.
Once I get a data stream, I feed them into an MediaCodec configured as an AAC audio encoder.
Here are some of my codes about the audio recorder / encoder:
m_encode_audio_mime = "audio/mp4a-latm";
m_audio_sample_rate = 44100;
m_audio_channels = AudioFormat.CHANNEL_IN_MONO;
m_audio_channel_count = (m_audio_channels == AudioFormat.CHANNEL_IN_MONO ? 1 : 2);
int audio_bit_rate = 64000;
int audio_data_format = AudioFormat.ENCODING_PCM_16BIT;
m_audio_buffer_size = AudioRecord.getMinBufferSize(m_audio_sample_rate, m_audio_channels, audio_data_format) * 2;
m_audio_recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, m_audio_sample_rate,
m_audio_channels, audio_data_format, m_audio_buffer_size);
m_audio_encoder = MediaCodec.createEncoderByType(m_encode_audio_mime);
MediaFormat audio_format = new MediaFormat();
audio_format.setString(MediaFormat.KEY_MIME, m_encode_audio_mime);
audio_format.setInteger(MediaFormat.KEY_BIT_RATE, audio_bit_rate);
audio_format.setInteger(MediaFormat.KEY_CHANNEL_COUNT, m_audio_channel_count);
audio_format.setInteger(MediaFormat.KEY_SAMPLE_RATE, m_audio_sample_rate);
audio_format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
audio_format.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, m_audio_buffer_size);
m_audio_encoder.configure(audio_format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
I found that the first time of AudioRecord.read() takes longer time to return, while the successive read() have time intervals that are more close to the real time of audio data.
For example, my audio format is 44100Hz 16Bit 1Channel, and the buffer size of AudioRecord is 16384, so a full buffer means 185.76 ms. When I record the system time for each call of read() and subtracting them from a base time, I get the following sequence:
time before each read(): 0ms, 345ms, 543ms, 692ms, 891ms, 1093ms, 1244ms, ...
I feed these raw data to the audio encoder with the above time values as PTS, and the encoder outputs encoded audio samples with the following PTS:
encoder output PTS: 0ms, 185ms, 371ms, 557ms, 743ms, 928ms, ...
It looks like that the encoder treats each part of data as having the same time period. I believe that the encoder works correctly since I give it raw data with the same size (16384) every time. However, if I use the encoder output PTS as the input of muxer, I'll get a video with audio content being faster then video content.
I want to ask that:
Is it expected that the first time of AudioRecord.read() blocks longer? I'm sure that the function call takes more than 300ms while it only records 16384 bytes as 186ms. Is this also an issue that depends on device / Android version?
What should I do to achieve audio/video synchronization? I have a workaround to measure the delay time of the first call of read(), then shift the PTS of audio samples by the delay. Is there another better way to handle this?
Convert the mono input to stereo. I was pulling my hair out for some time before I realised the AAC encoder exposed by MediaCoder only works with stereo input.