Android - Choosing between MediaRecorder, MediaCodec and Ffmpeg

Android - Choosing between MediaRecorder, MediaCodec and Ffmpeg - android

I am working on a video recording and sharing application for Android. The specifications of the app are as follows:-
Recording a 10 second (maximum) video from inside the app (not using the device's camera app)
No further editing on the video
Storing the video in a Firebase Cloud Storage (GCS) bucket
Downloading and playing of the said video by other users
From the research, I did on SO and others sources for this, I have found the following (please correct me if I am wrong):-
The three options and their respective features are:-
1.Ffmpeg
Capable of achieving the above goal and has extensive answers and explanations on sites like SO, however
Increases the APK size by 20-30mb (large library)
Runs the risk of not working properly on certain 64-bit devices
2.MediaRecorder
Reliable and supported by most devices
Will store files in .mp4 format (unless converted to h264)
Easier for playback (no decoding needed)
Adds the mp4 and 3gp headers
Increases latency according to this question
3.MediaCodec
Low level
Will require MediaCodec, MediaMuxer, and MediaExtractor
Output in h264 ( without using MediaMuxer for playback )
Good for video manipulations (though, not required in my use case)
Not supported by pre 4.3 (API 18) devices
More difficult to implement and code (my opinion - please correct me if I am wrong)
Unavailability of extensive information, tutorials, answers or samples (Bigflake.com being the only exception)
After spending days on this, I still can't figure out which approach suits my particular use case. Please elaborate on what I should do for my application. If there's a completely different approach, then I am open to that as well.
My biggest criteria are that the video encoding process be as efficient as possible and the video to be stored in the cloud should have the lowest possible space usage without compromising on the video quality.
Also, I'd be grateful if you could suggest the appropriate format for saving and distributing the video in Firebase Storage, and point me to tutorials or samples of your suggested approach.
Thank you in advance! And sorry for the long read.

Your overview on this topic is applicable to the point.
I'll just add my 2 cents on this topic that you might have missed as addition:
1.FFMpeg
+/-If you build your own SO then you can reduce the size down to about 2-3 MB depending on the use-case of course. Editing a 6000 lines buildscript takes time and effort though
++Supports wide range of formats (almost everything)
++Results are the same for every device
++Any resolution supported
--High energy consumption due do SW-En-/Decoding, while also making it slow. There is a plugin to support lib-stagefright, but it doesn't work on many devices (as of May 2016)
--Licensing can be problematic depending on your location and use-case. I'm not a lawyer, but we had legal consulting on this topic and it's quite complex.
2. MediaRecorder
++Easiest to implement (simplified access to mediacodec/libstagefright) Raw data gets passed to the encoder directly so no messing around there
++HW Accelerated on most devices. Makes it fast and energy saving.
++Delay only applies to live streaming
--Dependent on implementation of HW-manufacturers
--Results may vary from device to device
++No licensing problems
3.MediaCodec
+/-Most of 2.MediaRecorder applies to this as well (apart from ease of use)
++Most flexible access to HW-en-/decoding
--Hard to use for cases that were not thought of (e.g. mixing videos from different sources)
+/-Delay for streaming can be eliminated (is tricky though)
--HW-manufacturers sometimes don't implement things correctly (e.g the Samsung Galaxy S5 sometimes produces a SIG-SEV if live data from some DLSR is fed to the encoder. Works fine for a while, then all of a sudden it's SIG-SEV. This might be the dslr's fault, but the SIG-SEV is not avoidable and crashes the app, which in the end is the app developers fault ;) )
--If used without MediaMuxer you need either good understanding of media containers or rely on 3rd party libraries
The list is obviously not complete and some points might not be correct. The last time I worked with video was almost half a year ago.
As for your use-case I would recommend using MediaRecorder since it is the easiest to implement, supported on all devices, and offers a good deal of quality/size option. FFMpeg produces better results for the same storage size, but takes longer (extreme case, DSLR live footage was encoded 30 times faster), and is more energy consuming.
As far as I understand your use-case, there is no need to fiddle around with MediaCodec since you want to encode and decode only.
I suggest using VP8 or 9 since you wont run into licensing problems. Again I'm no lawyer but distributing H264 over your own server might make you a broadcasting station, so i was told.
Hope this helps you in your decision making

Related

Is there a way to play an .ogg file in Android at an increased speed?

I have sound files that I want to play at 1x speed, 2x speed and 3x speed. It should be pitch corrected. There are many solutions for changing the file offline and simple having three files. However that means that my app takes more disc space.
Is there a straightforward way for online speed increase, so that I only have to include one version of the file in my App?
My App is closed source, so I can't use a GLP library.

The feature you are probably looking for is Audio Time Stretching. Simply changing the sample rate is not an option as it will induce pitch variation (similar to the effect produced by analogue records or cassettes)
If you want true time stretching, try using a real-time Digital Signal Processing Library. If you're willing to add an additional library to your project, TarsosDSP is a native Java framework works on Android on default.
https://github.com/JorenSix/TarsosDSP
There is even Audio Time Stretching example code included in their repository which even comes with a swing interface.
EDIT: TarsosDSP is GPL'd. Audio timestretching is a big deal in the audio industry so many of the algorithms used are either proprietary or GPL'd.
If you are willing to learn some DSP, I would recommend checking out
https://github.com/philburk/jsyn
It is under the Apache license and supports Android.

Maximum number of simultaneous MediaRecorder instances on android?

I created android app that records device screen (using MediaProjection) API and video from camera at the same time. I use MediaRecorder in both cases. I need a way to find out whether device is actually capable of recording two video streams simultaneously. I assume there is some limit on number of streams that can be encoded simultaneously on given devices but I cannot find any API on android platform to query for that information.
Things I discovered so far:
Documentation for MediaRecorder.release() advises to release MediaRecorder as soon as possible as:
" Even if multiple instances of the same codec are supported, some performance degradation may be expected when unnecessary multiple instances are used at the same time."
This suggests that there's a limit on number of instances of the coded which directly limits number of MediaRecorders.
I've wrote testing code that creates MediaRecorders (configured to use MPEG4/H264) and starts them in a loop - On Nexus 5 it always fails with java.io.IOException: prepare failed when calling prepare() on 6th instance. This suggests you can have only 5 instances of MediaRecorder on Nexus5.

I'm not aware of anything you can query for this information, though it's possible something went into Lollipop that I didn't see.
There is a limit on the number of hardware codec instances that is largely dependent on hardware bandwidth. It's not a simple question of how many streams the device can handle -- some devices might be able to encode two 720p streams but not two 1080p streams.
On some devices the codec may fall back to a software implementation if it runs out of hardware resources. Things will work but will be considerably slower. (I've seen this for H.264 decoding, but I don't know if it also happens for encoding.)
I don't believe there is a minimum system requirement in CTS. It would be useful to know that all devices could, say, decode two 1080p streams and encode one 1080p simultaneously, so that a video editor could be made for all devices, but I don't know if such a thing has been added. (Some very inexpensive devices would struggle to meet that.)

I think it really depends on devices and ram capacity ... you could read the buffers for screen and cam as much as you like but only one read at a time not simultaneously I think to prevent concurrency but honestly I don't really know for sure

Android: playing many video simultaneously

I am developing a chat and we have high quality emoticons with extension mp4 (file size of about 300kb). GIF format is not used because of the poor quality and limited colors (256).
I need to display the files in the ListView as cyclic video.
Now I'm trying to do this using TextureView and MediaCodec classes.
Sources can be found at https://github.com/google/grafika.
The problem is that when you try to play more than 4 video simultaneously, an error occurs
IllegalStateException at android.media.MediaCodec.dequeueOutputBuffer.
I think this happens because of the large memory consumption,
on my device (HTC ONE M7) while playing four videos, the processor is loaded more than 60% !
How can I solve this problem? Maybe I need to use third party codecs?
Or the idea of using video to display smileys is bad and I need to give it up and use something like GIF ?

There is a limit on the number of simultaneous decoders, if for no other reason than at some point you'll exceed the maximum bandwidth of the hardware. On some devices I've seen it switch to software decoding after two hardware decoders are configured. AFAIK there's no enforced behavior here.
One possible solution to your problem is to have a single multiplexed video, where you have all of your emoticons in a single .mp4 file. Play that into a SurfaceTexture, which is then used as a "sprite sheet". This approach requires that all animations have roughly the same number of frames, so you may have to adjust some or just pad out the sequence.
Update: according to this link, the 'M' release is scheduled to add MediaCodecInfo.CodecCapabilities.getMaxSupportedInstances(), which provides "a hint for the max number of the supported concurrent codec instances." Doesn't really help with your issue, but at least it'd give you a number. Hopefully the API will take the video resolution(s) into account.

Appropriate audio capture and noise reduction

In my android application I need to capture the user's speech from the microphone and then pass it to the server. Currently, I use the MediaRecorder class. However, it doesn't satisfy my needs, because I want to make glowing effect, based on the current volume of input sound, so I need an AudioStream, or something like that, I guess. Currently, I use the following:
this.recorder = new MediaRecorder();
this.recorder.setAudioSource(MediaRecorder.AudioSource.MIC);
this.recorder.setOutputFormat(MediaRecorder.OutputFormat.MPEG_4);
this.recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
this.recorder.setOutputFile(FILENAME);
I am writing using API level 7, so I don't see any other AudioEncoders, but AMR Narrow Band. Maybe that's the reason of awful noise which I hear in my recordings.
The second problem I am facing is poor sound quality, noise, so I want to reduct (cancel, suppress) it, because it is really awful, especially on my noname chinese tablet. This should be server-side, because, as far as I know, requiers a lot of resources, and not all of the modern gadgets (especially noname chinese tablets) can do that as fast as possible. I am free to choose, which platform to use on the server, so it can be ASP.NET, PHP, JSP, or whatever helps me to make the sound better. Speaking about ASP.NET, I have come across a library, called NAudio, may be it can help me in some way. I know, that there is no any noise reduction solution built in the library, but I have found some examples on FFT and auto-corellation using it, so it may help.
To be honest, I have never worked with sound this close before and I have no idea where to start. I have googled a lot about noise reduction techniques, code examples and found nothing. You guys are my last hope.
Thanks in advance.

Have a look at this article.
Long story short, it uses MediaRecorder.AudioSource.VOICE_RECOGNITION instead of AudioSource.MIC, which gave me really good results and noise in the background did reduce very much.
The great thing about this solution is, it can be used with both AudioRecord and MediaRecorder class.

For audio capture you can use the AudioRecord class. This lets you record raw audio, i.e. you are not restricted to "narrow band" and you can also measure the volume.

Many smartphones have two microphones, one is the MIC you are using, the other one is near camera for video shooting, called CAMCORDER. You can get data from both of them to do noise reduction. There are many papers talking about audio noise reduction with multiple microphones.
Ref: http://developer.android.com/reference/android/media/MediaRecorder.AudioSource.html
https://www.google.com/search?q=noise+reduction+algorithm+with+two+mic

Which is the best SIP compatible codec type for Android

I want to develop a Android App which will use a SIP Server of my client. My client is exposing couple of REST API from the SIP server for communicating with the apps.
I want to know which would be the best codec type for this app?
Basically, I want to create a SIP-Stack and send the SIP Packets to the Server. So, there should be a coding and decoding system for the packets. My client prefers 16 kb/sec but I am not sure which should I use.

As others have said, SIP does not transfer audio or video. Although in theory, you can send data over any transport, including ATM, analog lines, a DS0, etc, in the real world, RTP is the most common. RTP (Real Time Protocol) and RTCP (Real Time Control Protocol) or SRTP (Secure RTP) usually carry the audio and video.
As far as codecs go, you will be limited by what your server supports. Here are a few common codecs and some pros and cons of each.
G.711 - Toll quality (ie good as a good analog phone line, or even a bit better). "Universal" in that virtually every device supports G.711. Takes a lot of bandwidth, it doesn't really compress data (G.711 is a "compander"). The baseline G.711 is pretty bare-bones (its really a couple of look up tables). Appendix I adds packet loss concealment (PLC) and Appendix II adds silence suppression and comfort noise generation.
GSM - used on cellphones, sounds ok, good PLC, good compression
G.729A - widely used, near toll quality, good compression (8Kbps)
G.723.1 - widely used, almost as good as G.729, better compression (4-5Kbps)
G.722 - sounds better than G.711, wideband (twice the audio bandwidth of G.711 or an analog call), same bandwidth used on the line as G.711
GIPS - various implemnetations exist, one is free. IIRC, uses about 13.5Kbps on the line, sound is not as good asG.723.1 (but this is a perceptual metric, YMMV) Takes a lot of processor.
All the codecs use some processor and other system resources, as a rule of thumb the more aggressive the codec (the smaller the bandwidth) the more processor used. Also, all of these particular codecs are lossy codecs--they lose some of the data. This means that there is compression, not that portions of the audio are dropped due to poor routing and poor line quality. Much like an MP3 is considered a LOSSY codec while FLAC is considered Lossless. If you're interested the following wikipedia article explains in further detail: http://en.wikipedia.org/wiki/Lossy_compression

You need to know what codecs and protocols this SIP server will support. If you control both ends and want to stick to 16Kbps, you'll want iLBC (no royalties) or G.729 (royalties apply). G.711 and (now) G.722 have no royalties either, but both use ~64Kbps.
The list given before is good, with a few issues.:
GIPS - various implemnetations exist, one is free. IIRC, uses about 13.5Kbps on the line, sound is not as good asG.723.1 (but this is a perceptual metric, YMMV) Takes a lot of processor.
GIPS is not a codec - iLBC and iSAC are codecs designed by GIPS. iLBC is free and standardized. iLBC is high quality, 13 or 15Kbps, and is very resilient to packet loss compared to G.729 or even G.711. You can have 30 or even 50% loss with iLBC and still be understood. I'm not sure I'd say it uses a lot of CPU compared to say G.729.
All the codecs use some processor and other system resources, as a rule of thumb the more aggressive the codec (the smaller the bandwidth) the more processor used. Also, all codecs are lossy codecs--they lose some of the data.
Well, G.711 isn't really lossy per-se (in theory yes, but it's almost more quantization-level loss). 64K G.722 isn't very lossy either. G.723 sucks dead gerbils through garden hoses. :-)

It sounds like a bad idea to do it yourself. Developing a sip client is not a trivial task since there are several protocols you would have to implement. Choosing the coding is not very important decision compared to the rest.
imho you should use one of the open source sip stacks available (like pjsip) or build your application on top of an open source sip client (like sipdroid).
But since you asked for codec: Use the GSM codec. Saved bandwidth and sounds OK. G.711 is otherwise the standard codec that 99% of all sip servers support.

Any?
SIP does not send and deal with ANY data packets. SIP is the Session Initiation Protocol and it handles the NEGOTIATION OF SESSIONS.
The sessions then arae - in case of auio and video - are based on RTP and use RTSP for signalling. So, your question indicates a REAL lack of knowledge of what you need to do - the real uestion is: you need a RTP compatible codec.
Which is similar senseless. RTP is jsut a carrier protocol. THis is like asking "what is a HTTP compatioble image format". HTTP does not care. The browser does.
In case of RTP, it means - RTP does not care. It can transport ANY data. WHat is important is that BOTH SIDES know the same codec. So, in your case it means:
If you program both sides then it is your choice.
If you program only one sidwe (like a SIP phone system), then the question is waht "normal" programs handle.

There are two things you need to take into consideration here:
What other devices/servers that handle media are being deployed or are planned to be deployed
Is your customer looking for narrowband or wideband solution - this will affect the voice quality of the call greatly
Once you have nailed down the answers to the two questions above you will be able to select.
For mobile devices, the voice codecs usually used are AMR-NB or AMR-WB. For SIP it is usually G.729 or G.722.x.
You also have Speex, ISAC and SILK to choose from.
You will probably need to do G.711 in any case just to interoperate with everything - bandwidth will be higher though.
No easy answer here. If your customer can select, or state what other devices are being used - it will be easier for you to select.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.