I am reading the Android documents about MediaCodec and other online tutorials/examples. As I understand it, the way to use the MediaCodec is like this (decoder example in pseudo code):
//-------- prepare audio decoder, format, buffers, and files --------
MediaExtractor extractor;
MediaCodec codec;
ByteBuffer[] codecInputBuffers;
ByteBuffer[] codecOutputBuffers;
extractor = new MediaExtractor();
extractor.setDataSource();
MediaFormat format = extractor.getTrackFormat(0);
//---------------- start decoding ----------------
codec = MediaCodec.createDecoderByType(mime);
codec.configure(format, null /* surface */, null /* crypto */, 0 /* flags */);
codec.start();
codecInputBuffers = codec.getInputBuffers();
codecOutputBuffers = codec.getOutputBuffers();
extractor.selectTrack(0);
//---------------- decoder loop ----------------
while (MP3_file_not_EOS) {
//-------- grasp control of input buffer from codec --------
codec.dequeueInputBuffer();
//---- fill input buffer with data from MP3 file ----
extractor.readSampleData();
//-------- release input buffer so codec can have it --------
codec.queueInputBuffer();
//-------- grasp control of output buffer from codec --------
codec.dequeueOutputBuffer();
//-- copy PCM samples from output buffer into another buffer --
short[] PCMoutBuffer = copy_of(OutputBuffer);
//-------- release output buffer so codec can have it --------
codec.releaseOutputBuffer();
//-------- write PCMoutBuffer into a file, or play it -------
}
//---------------- stop decoding ----------------
codec.stop();
codec.release();
Is this the right way to use the MediaCodec? If not, please enlighten me with the right approach. If this is the right way, how do I measure the performance of the MediaCodec? Is it the time difference between when codec.dequeueOutputBuffer() returns and when codec.queueInputBuffer() returns? I'd like an accuracy/precision of microseconds. Your ideas and thoughts are appreciated.
(merging comments and expanding slightly)
You can't simply time how long a single buffer submission takes, because the codec might want to queue up more than one buffer before doing anything. You will need to measure it in aggregate, timing the duration of the entire file decode with System.nanoTime(). If you turn the copy_of operation into a no-op and just discard the decoded data, you'll keep the output side (writing the decoded data to disk) out of the calculation.
Excluding the I/O from the input side is more difficult. As noted in the MediaCodec docs, the encoded input/output "is not a stream of bytes, it's a stream of access units". So you'd have to populate any necessary codec-specific-data keys in MediaFormat, and then identify individual frames of input so you can properly feed the codec.
An easier but less accurate approach would be to conduct a separate pass in which you time how long it takes to read the input data, and then subtract that from the total time. In your sample code, you would keep the operations on extractor (like readSampleData), but do nothing with codec (maybe dequeue one buffer and just re-use it every time). That way you only measure the MediaExtractor overhead. The trick here is to run it twice, immediately before the full test, and ignore the results from the first -- the first pass "warms up" the disk cache.
If you're interested in performance differences between devices, it may be the case that the difference in input I/O time, especially from a "warm" cache, is similar enough and small enough that you can just disregard it and not go through all the extra gymnastics.
Related
I have a server that encodes real-time voice into mono or stereo mp3 thanks to libmp3lame and sends it chunk by chunk through a WebSocket.
I'm trying to make an Android App that receives those mp3 chunks and play them with the most appropriate Audio player Android have. I went with AudioTrack since it seems pretty easy to add chunks to the player as well as "stream" oriented. (Since what I'm doing is sending to the track some byte array and not a full song that is locally stocked in the Android phone).
Since AudioTrack does not support compressed audio format (such as MP3), I have to decode those chunks into PCM to play them afterward. I'm using the famous JLayer to do this real-time decoding. Thanks to that, I can play each sample into my AudioTrack and hear what the server is sending.
My problem is that the received/player audio is badly hashed. (I can understand whatever the speaker is saying perfectly, but the quality is bad, like if the speaker had a "robotic voice").
Here is the code I'm using to receive/decode/play those byte[].
public void addSample(byte[] data) throws BitstreamException, DecoderException, IOException {
// JLayer decoder
Decoder decoder = new Decoder();
// Input Stream with the byte[] voice data
InputStream bis = new ByteArrayInputStream(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Bitstream bits = new Bitstream(bis);
// Decoding MP3 data into PCM in a PCM BUFFER
SampleBuffer pcmBuffer = (SampleBuffer) decoder.decodeFrame(bits.readFrame(), bits);
// Sending the PCMBuffer data into Audio Track to play it
mTrack.write(pcmBuffer.getBuffer(), 0, pcmBuffer.getBufferLength());
bits.closeFrame();
}
And here is my AudioTrack initialization
mTrack= new AudioTrack.Builder()
.setAudioAttributes(new AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_MEDIA)
.setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
.build())
.setAudioFormat(new AudioFormat.Builder()
.setEncoding(AudioFormat.ENCODING_PCM_16BIT)
.setSampleRate(48000)
.setChannelMask(AudioFormat.CHANNEL_OUT_STEREO)
.build())
.setBufferSizeInBytes(AudioTrack.getMinBufferSize(48000, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT))
.build();
mTrack.play();
So to understand what was happening I tried to lag each data contained in the pcmBuffer. It seems like a huge part of those data where 0 at the very beginning of the buffer (I'd say 1/5 of the buffer is 0, all of them located at the beginning). So then I took an oscilloscope and tried to get the signal my Android phone was receiving. Here is the result:
As you can see, each frame is present, but as some "blank" or 0 data values. Those 0 in the beginning of each frame makes the signal hashed and pretty annoying to listen.
I have no idea whether this comes from the MP3 signal itself, the way I'm playing it, AudioTrack, JLayer, or the way I'm decoding it. So if anyone has an idea it would be really awesome.
EDIT :
Found out something interesting. By decoding each frame header I can have access to a lot of information such as the time in ms for each frame. I logged it :
System.out.println(bits.readFrame().ms_per_frame());
I found out that each of my frames are 24ms. When I look back at the oscilloscope, I can see that each frame actually take 24ms, but the beginning/end of each frame is filled with 0. So first of all, is it a decoding problem ? If it is not, how can I have a clear signal without small breakup in each frame ?
I've been printing all the data that each frame is sending me, each frame starts with a looot of zeros. How am I supposed to have a clear signal if each frame have some kind of audio void ?
If I print the MP3 data that I'm receiving each frame (96 bits), I have the first four bytes (probably the header?) that always have the same value :
"-1, -5, 20, -60"
Then I have a fifth bit that is always equal to 0, and sometimes a sixth bit that is also equal to 0. Should I be removing those ?
I've managed to combine multiple videos with audio tracks, but then I realized that if I combine multiple videos with one of them not having an audio track, I have to add silence to the combined audio track.
So, how do I go about doing it? Should I encode a ByteBuffer filled with 0s with timestamps for silence?
So, how do I go about doing it? Should I encode a ByteBuffer filled with 0s with timestamps for silence?
Essentially yes. I am using the function below to encode silence at a certain presentation time.
For the length of your video with no audio, you should be encoding silence at a regular interval. I determined that the interval should match the audio before it. So in my case, the period between audio presentation times of my first video was 21333 us.
Using that info I started encoding silence:
from the last presentation time of the first video's audio + 21333,
at intervals of 21333 until I encoded enough silence to last the full video
I am still trying to figure out how to use a video with no audio (as the first video) followed by a video with audio. I will update my answer if I figure it out.
private byte[] zerodArray = new byte[2048];// Used to encode silent audio... Not really sure how big this should be ......
private void encodeSilenceForFrame(long presentationTime){
//mAudioEncoder is the audio encoder you are using to combine the other videos' audio.
final int TIMEOUT_USEC = 10000;
int encoderInputBufferIndex = mAudioEncoder.dequeueInputBuffer(TIMEOUT_USEC);
if (encoderInputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
if (VERBOSE) Log.d(TAG, "no audio encoder input buffer");
}
if (VERBOSE) {
Log.d(TAG, "audio encoder: returned input buffer: " + encoderInputBufferIndex);
}
ByteBuffer encoderInputBuffer = mAudioEncoder.getInputBuffer(encoderInputBufferIndex);
encoderInputBuffer.position(0);
encoderInputBuffer.put(zerodArray);
Log.d(TAG, "audio silence: pending buffer for time " + presentationTime);
mAudioEncoder.queueInputBuffer(
encoderInputBufferIndex,
0,
zerodArray.length,
presentationTime,0);
}
I have seen the below example for encode/decode using MediaCodec API.
https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java
In which there is a comparsion of the guessed presentation time and the presentation time received from decoded info.
assertEquals("Wrong time stamp", computePresentationTime(checkIndex),
info.presentationTimeUs);
Because the decoder just decode the data in encoded buffer, I think there is any timestamp info could be parsed in this encoder's output H.264 stream.
I am writing an Android application which mux a H264 stream (.h264) encoded by MediaCodec to mp4 container by using ffmpeg (libavformat).
I don't want to use MediaMuxer because it require version 4.3 which is too high.
However, ffmpeg seems not recognize the presentation timestamp in a packet encoded by MediaCodec, so I always get NO_PTS value when try to read a frame from the stream.
Anyone know how to get the correct presentation timestamp in this situation?
to send timestamps from MediaCodec encoder to ffmpeg you need to convert like that:
jint Java_com_classclass_WriteVideoFrame(JNIEnv * env, jobject this, jbyteArray data, jint datasize, jlong timestamp) {
....
AVPacket pkt;
av_init_packet(&pkt);
AVCodecContext *c = m_pVideoStream->codec;
pkt.pts = (long)((double)timestamp * (double)c->time_base.den / 1000.0);
pkt.stream_index = m_pVideoStream->index;
pkt.data = rawjBytes;
pkt.size = datasize;
where time_base depends on framerate
upd re timestamps flow in pipline:
neither decoder nor encoder knows time-stamps by their own. timestamps are set to these components via
decoder.queueInputBuffer(inputBufIndex, 0, info.size, info.presentationTimeUs, info.flags);
or
encoder.queueInputBuffer(inputBufIndex, 0, 0, ptsUsec, info.flags);
these timestamps could be taken from extractor, from camera or generated by app, but decoder\encoder just passes through these time-stamps without changing them. as a result time-stamps go unchanged from source to sink (muxer).
for sure there are some exclusions: if frames frequency in changed - frame rate conversion for example. or if encoder makes encoding with B-frames and reordering happens.
or encoder can add time-stamps to the encoder frame header - optional, not mandatory by standard. i think all of this is not applied to current android version, codecs or your usage scenario.
I have followed this example to convert raw audio data coming from AudioRecord to mp3, and it happened successfully, if I store this data in a file the mp3 file and play with music player then it is audible.
Now my question is instead of storing mp3 data to a file i need to play it with AudioTrack, the data is coming from the Red5 media server as live stream, but the problem is AudioTrack can only play PCM data, so i can only hear noise from my data.
Now i am using JLayer to my require task.
My code is as follows.
int readresult = recorder.read(audioData, 0, recorderBufSize);
int encResult = SimpleLame.encode(audioData,audioData, readresult, mp3buffer);
and this mp3buffer data is sent to other user by Red5 stream.
data received at other user is in form of stream, so for playing it the code is
Bitstream bitstream = new Bitstream(data.read());
Decoder decoder = new Decoder();
Header frameHeader = bitstream.readFrame();
SampleBuffer output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);
short[] pcm = output.getBuffer();
player.write(pcm, 0, pcm.length);
But my code freezes at bitstream.readFrame after 2-3 seconds, also no sound is produced before that.
Any guess what will be the problem? Any suggestion is appreciated.
Note: I don't need to store the mp3 data, so i cant use MediaPlayer, as it requires a file or filedescriptor.
just a tip, but try to
output.close();
bitstream.closeFrame();
after yours write code. I'm processing MP3 same as you do, but I'm closing buffers after usage and I have no problem.
Second tip - do it in Thread or any other Background process. As you mentioned these deaf 2 seconds, media player may wait until you process whole stream because you are loading it in same thread.
Try both tips (and you should anyway). In first, problem could be in internal buffers; In second you probably fulfill Media's input buffer and you locked app (same thread, full buffer cannot receive your input and code to play it and release same buffer is not invoked because writing locks it...)
Also, if you don't doing it now, check for 'frameHeader == null' due to file end.
Good luck.
You need to loop through the frames like this:
While (frameHeader = bitstream.readFrame()){
SampleBuffer output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);
short[] pcm = output.getBuffer();
player.write(pcm, 0, pcm.length);
bitstream.close();
}
And make sure you are not running them on main thread.(This is probably the reason of freezing.)
I am trying to write a video player using the MediaDecoder class, i came across a problem that its blocking my development,
here is a code snippet
extractor = new MediaExtractor();
extractor.setDataSource(filename);
MediaFormat format = extractor.getTrackFormat(i);
extractor.selectTrack(0);
MediaCodec decoder = MediaCodec.createDecoderByType(mime);
decoder.configure(format, null, null, 0);
decoder.start();
ByteBuffer[] inputBuffers = decoder.getInputBuffers();
ByteBuffer[] outputBuffers = decoder.getOutputBuffers();
Log.d(TAG, " "+decoder.getOutputFormat());
The problem is the output format printed changes with each device making it impossible to print it out to an Open GL texture.
Is there a way to force the decoder to output always the same format?
If no, does anyone knows of any libraries available to make the conversion?
Thanks a lot for any insights
The decoder's output format can't be set. In fact, you probably shouldn't even examine it until you receive an updated MediaFormat from dequeueOutputBuffer() -- before the first chunk of data is returned, you'll get a MediaCodec.INFO_OUTPUT_FORMAT_CHANGED result.
If you want to access the decoded frame from OpenGL ES, you should create a Surface from a SurfaceTexture and pass that into the decoder configure() call. It's more efficient, and doesn't require you to know anything about the data format.
For an example, see the DecodeEditEncodeTest.java CTS test, and note in particular the OutputSurface class and how it's used in checkVideoFile().