Segmenting videos during camera capture for higher upload efficiency - android

I want to develop an Android app that captures a video with the phone's camera, then uploads the video (as a single file) to cloud. And then I want to make it more efficient by segmenting the recorded video into small chunks (say 5 seconds) and upload the chunks to the cloud. And then compare the two approach and show that the second approach is more efficient and provides faster upload according to this blog.
Is the chunking approach more efficient? In what ways? And how can I implement this? Shall I wait until the video is finished, then chunk them up, or can we do it in real-time as capturing progresses? Any tips or experience doing this would certainly help.

Breaking a video into chunks and performing processing, such are encoding or packaging, in parallel is a very common technique for VOD video at this time.
It allows for faster processing so long as you have the computing resources to tackle tasks in parallel, which many multi core computers and the cloud do have.
It can also allow you to to schedule the processing when your resources are most available, in a period of low load from other jobs for example, or when they are cheapest which can help with cloud computing costs.
Its hard to say if that is more efficient as it depends on what you are measuring - if you added up the total compute instructions or cycles it is quite likely it actually takes more this way, but for the reasons above it is often the preferred approach anyway.
For video transfer or transport, if you were able to send the different chunks on different paths, each of which had a limit that you could fill with that segment, it might indeed provide time savings or efficiency.
However, If your device has a single transmission path, for example it's WiFi IP connection, then the its not clear that segmenting the video would have benefit beyond the 'breaking up' of the video that already happens when you send it packet by packet over an IP network anyway.
If you goal is fast video transmission then it would likely be worth taking a look at some of the specialist protocols which exist for fast and efficient live video transmission. Some of these may be UDP based rather than TCP and if so you may need to check that your target network firewalls and routing rules will support them. SRT would be a good example to look, see like below, bit others exists such as ZiXi which is proprietary:
https://en.wikipedia.org/wiki/Secure_Reliable_Transport
Focusing on transmission, internet Video streams are actually broken into chunks anyway, because they are delivered over packet networks. The video stream will also be 'chunked' at a higher level also frequently, in addition to encoding and container (e.g. mp4) groupings, to support ABR protocols which segment the video to allow different bitrate renditions be switched between (see: https://stackoverflow.com/a/42365034/334402).
Breaking a video into chunks can also help with playback if the chunks can start playing before the entire video is downloaded - again this is typical of most streamed videos on the internet.
If we ignore any effect of different protocols retry, packet loss etc strategies, and if you have a single transmission 'pipe' of a set capacity, breaking the video into chunks does not make it faster to transfer.
If, however, you have multiple transmission 'pipes' then breaking the video up and sending different parts in parallel across the different pipes may indeed speed it up. It important to remember that, even then, you are limited if the video is a live stream, by the actual live video rate itself - i.e. you can't transfer the video faster than the source is producing it.

Related

Android - Choosing between MediaRecorder, MediaCodec and Ffmpeg

I am working on a video recording and sharing application for Android. The specifications of the app are as follows:-
Recording a 10 second (maximum) video from inside the app (not using the device's camera app)
No further editing on the video
Storing the video in a Firebase Cloud Storage (GCS) bucket
Downloading and playing of the said video by other users
From the research, I did on SO and others sources for this, I have found the following (please correct me if I am wrong):-
The three options and their respective features are:-
1.Ffmpeg
Capable of achieving the above goal and has extensive answers and explanations on sites like SO, however
Increases the APK size by 20-30mb (large library)
Runs the risk of not working properly on certain 64-bit devices
2.MediaRecorder
Reliable and supported by most devices
Will store files in .mp4 format (unless converted to h264)
Easier for playback (no decoding needed)
Adds the mp4 and 3gp headers
Increases latency according to this question
3.MediaCodec
Low level
Will require MediaCodec, MediaMuxer, and MediaExtractor
Output in h264 ( without using MediaMuxer for playback )
Good for video manipulations (though, not required in my use case)
Not supported by pre 4.3 (API 18) devices
More difficult to implement and code (my opinion - please correct me if I am wrong)
Unavailability of extensive information, tutorials, answers or samples (Bigflake.com being the only exception)
After spending days on this, I still can't figure out which approach suits my particular use case. Please elaborate on what I should do for my application. If there's a completely different approach, then I am open to that as well.
My biggest criteria are that the video encoding process be as efficient as possible and the video to be stored in the cloud should have the lowest possible space usage without compromising on the video quality.
Also, I'd be grateful if you could suggest the appropriate format for saving and distributing the video in Firebase Storage, and point me to tutorials or samples of your suggested approach.
Thank you in advance! And sorry for the long read.
Your overview on this topic is applicable to the point.
I'll just add my 2 cents on this topic that you might have missed as addition:
1.FFMpeg
+/-If you build your own SO then you can reduce the size down to about 2-3 MB depending on the use-case of course. Editing a 6000 lines buildscript takes time and effort though
++Supports wide range of formats (almost everything)
++Results are the same for every device
++Any resolution supported
--High energy consumption due do SW-En-/Decoding, while also making it slow. There is a plugin to support lib-stagefright, but it doesn't work on many devices (as of May 2016)
--Licensing can be problematic depending on your location and use-case. I'm not a lawyer, but we had legal consulting on this topic and it's quite complex.
2. MediaRecorder
++Easiest to implement (simplified access to mediacodec/libstagefright) Raw data gets passed to the encoder directly so no messing around there
++HW Accelerated on most devices. Makes it fast and energy saving.
++Delay only applies to live streaming
--Dependent on implementation of HW-manufacturers
--Results may vary from device to device
++No licensing problems
3.MediaCodec
+/-Most of 2.MediaRecorder applies to this as well (apart from ease of use)
++Most flexible access to HW-en-/decoding
--Hard to use for cases that were not thought of (e.g. mixing videos from different sources)
+/-Delay for streaming can be eliminated (is tricky though)
--HW-manufacturers sometimes don't implement things correctly (e.g the Samsung Galaxy S5 sometimes produces a SIG-SEV if live data from some DLSR is fed to the encoder. Works fine for a while, then all of a sudden it's SIG-SEV. This might be the dslr's fault, but the SIG-SEV is not avoidable and crashes the app, which in the end is the app developers fault ;) )
--If used without MediaMuxer you need either good understanding of media containers or rely on 3rd party libraries
The list is obviously not complete and some points might not be correct. The last time I worked with video was almost half a year ago.
As for your use-case I would recommend using MediaRecorder since it is the easiest to implement, supported on all devices, and offers a good deal of quality/size option. FFMpeg produces better results for the same storage size, but takes longer (extreme case, DSLR live footage was encoded 30 times faster), and is more energy consuming.
As far as I understand your use-case, there is no need to fiddle around with MediaCodec since you want to encode and decode only.
I suggest using VP8 or 9 since you wont run into licensing problems. Again I'm no lawyer but distributing H264 over your own server might make you a broadcasting station, so i was told.
Hope this helps you in your decision making

Using smartphones' cameras for video surveillance

I have two smartphones that I no longer use and I'd like to use their cameras for a basic video surveillance system, rather than buying expensive cameras.
My idea is to record overnight, and save the videos on my laptop.
It would be a good opportunity for me to learn a bit more about Android programming as well.
So I guess the approach is:
a TCP/IP server, gathering the information coming from the two (or N)
phones;
a TCP/IP client, to run on each phone, recording and sending the
information to the server;
I am not sure what that "information" should be, though. Should that be the single frames captured by the cameras, or is there a way to stream a video?
In case I wanted to implement a basic motion detection, would it be better to do it on the clients or on the server?
Is my approach above correct?
Have a look at the open-source libstreaming project, which will allow you stream video from your phones. But if you want a time-lapse recording, e.g. 1 frame per second or less, then sending single frames may be preferable.
Note that your smartphones will need all-time power supply, because camera and communications will drain any battery in very short time. Also keep in mind that phone cameras do not perform well in low-illumination conditions, and this may justify investment in expensive dedicated cameras.
You can use OpenCV for motion detection either on device or on server, or even both. The choice depends on your needs and resources. E.g. you may significantly reduce overall data volumes if the device only send the video when it detects motion.

expected live streaming lag from an android device

How much lag is expected when an android device's camera is streamed live?
I checked Miracast and it has a lag of around 150-300 ms(usually its around 160-190 ms).
I have a bluetooth and a wifi direct app and both lag by around 400-550 ms. I was wondering if it would be possible to reproduce or come closer to Miracast's performance.
I am encoding the camera frames in H.264 and using my own custom protocol to transmit the encoded frames over a TCP connection(in case of WiFi Direct).
Video encoding is compute intensive so any help you can get from the hardware usually speeds things up. It is worth making sure you are using codecs which leverage the hardware - i.e avoid having your H.264 encoding all in software.
There is a video here which discusses how to access HW accelerated video codecs, although it is a little old, but will apply to older devices:
https://www.youtube.com/watch?v=tLH-oGfiGaA
I believe that MediaCodec now provides a Java API to the Hardware Codecs (have not tried or seen speed comparison tests myself), which makes things easier:
http://developer.android.com/about/versions/android-4.1.html#Multimedia
See the note here also about using the camera's Surface preview as a speed aid:
https://stackoverflow.com/a/17243244/334402
As an aside, 500ms does not seem that bad, and you are probably into diminishing returns so the effort to reduce may be high if you can live with your current lag.
Also, I assume you are measuring the lag (or latency) on the server side? If so you need to look at how the server is decoding and presenting also, especially if you are comparing your own player with a third party one.
It is worth looking at your jitter buffer on the receiving side, irrespective of the stream etc latency - in simple terms waiting for a larger number of packets before you start playback may cause more startup delay but it also may provide a better overall user experience. This is because the large buffer will be more tolerant of delayed packets, before going into a 'buffering' mode that users tend not to like.
Its a balance, and your needs may dictate a bias one way or the other. If you were planning a video chat like application for example, then the delay is very important as it starts to become annoying to users above 200-300 ms. If you are providing a feed from a sports event, then delay may be less important and avoiding buffering pauses may give better user perceived quality.

Streaming Android Screen

I'm currently working on an app with the end goal of being roughly analogous to an Android version of Air Play for the iDevices.
Streaming media and all that is easy enough, but I'd like to be able to include games as well. The problem with that is that to do so I'd have to stream the screen.
I've looked around at various things about taking screenshots (this question and the derivatives from it in particular), but I'm concerned about the frequency/latency. When gaming, anything less than 15-20 fps simply isn't going to cut it, and I'm not certain such is possible with the methods I've seen so far.
Does anyone know if such a thing is plausible, and if so what it would take?
Edit: To make it more clear, I'm basically trying to create a more limited form of "remote desktop" for Android. Essentially, capture what the device is currently doing (movie, game, whatever) and replicate it on another device.
My initial thoughts are to simply grab the audio buffer and the frame buffer and pass them through a socket to the other device, but I'm concerned that the methods I've seen for capturing the frame buffer are too slow for the intended use. I've seen people throwing around comments of 3 FPS limits and whatnot on some of the more common ways of accessing the frame buffer.
What I'm looking for is a way to get at the buffer without those limitations.
I am not sure what you are trying to accomplish when you refer to "Stream" a video game.
But if you are trying to mimic AirPlay, all you need to do is connect via a Bluetooth/ internet connection to a device and allow sound. Then save the results or handle it accordingly.
But video games do not "Stream" a screen because the mobile device will not handle much of a work load. There are other problems like, how to will you handle the game if the person looses internet connection while playing? On top of that, this would require a lot of servers to support the game workload on the backend and bandwidth.
But if you are trying to create an online game. Essentially all you need to do is send and receive messages from a server. That is simple. If you want to "Stream" to another device, simply connect the mobile device to speakers or a TV. Just about all mobile video games or applications just send simple messages via JSON or something similar. This reduces overhead, is simple syntax, and may be used across multiple platforms.
It sounds like you should take a look at this (repost):
https://stackoverflow.com/questions/2885533/where-to-start-game-programming-for-android
If not, this is more of an open question about how to implement a video game.

Which is the best SIP compatible codec type for Android

I want to develop a Android App which will use a SIP Server of my client. My client is exposing couple of REST API from the SIP server for communicating with the apps.
I want to know which would be the best codec type for this app?
Basically, I want to create a SIP-Stack and send the SIP Packets to the Server. So, there should be a coding and decoding system for the packets. My client prefers 16 kb/sec but I am not sure which should I use.
As others have said, SIP does not transfer audio or video. Although in theory, you can send data over any transport, including ATM, analog lines, a DS0, etc, in the real world, RTP is the most common. RTP (Real Time Protocol) and RTCP (Real Time Control Protocol) or SRTP (Secure RTP) usually carry the audio and video.
As far as codecs go, you will be limited by what your server supports. Here are a few common codecs and some pros and cons of each.
G.711 - Toll quality (ie good as a good analog phone line, or even a bit better). "Universal" in that virtually every device supports G.711. Takes a lot of bandwidth, it doesn't really compress data (G.711 is a "compander"). The baseline G.711 is pretty bare-bones (its really a couple of look up tables). Appendix I adds packet loss concealment (PLC) and Appendix II adds silence suppression and comfort noise generation.
GSM - used on cellphones, sounds ok, good PLC, good compression
G.729A - widely used, near toll quality, good compression (8Kbps)
G.723.1 - widely used, almost as good as G.729, better compression (4-5Kbps)
G.722 - sounds better than G.711, wideband (twice the audio bandwidth of G.711 or an analog call), same bandwidth used on the line as G.711
GIPS - various implemnetations exist, one is free. IIRC, uses about 13.5Kbps on the line, sound is not as good asG.723.1 (but this is a perceptual metric, YMMV) Takes a lot of processor.
All the codecs use some processor and other system resources, as a rule of thumb the more aggressive the codec (the smaller the bandwidth) the more processor used. Also, all of these particular codecs are lossy codecs--they lose some of the data. This means that there is compression, not that portions of the audio are dropped due to poor routing and poor line quality. Much like an MP3 is considered a LOSSY codec while FLAC is considered Lossless. If you're interested the following wikipedia article explains in further detail: http://en.wikipedia.org/wiki/Lossy_compression
You need to know what codecs and protocols this SIP server will support. If you control both ends and want to stick to 16Kbps, you'll want iLBC (no royalties) or G.729 (royalties apply). G.711 and (now) G.722 have no royalties either, but both use ~64Kbps.
The list given before is good, with a few issues.:
GIPS - various implemnetations exist, one is free. IIRC, uses about 13.5Kbps on the line, sound is not as good asG.723.1 (but this is a perceptual metric, YMMV) Takes a lot of processor.
GIPS is not a codec - iLBC and iSAC are codecs designed by GIPS. iLBC is free and standardized. iLBC is high quality, 13 or 15Kbps, and is very resilient to packet loss compared to G.729 or even G.711. You can have 30 or even 50% loss with iLBC and still be understood. I'm not sure I'd say it uses a lot of CPU compared to say G.729.
All the codecs use some processor and other system resources, as a rule of thumb the more aggressive the codec (the smaller the bandwidth) the more processor used. Also, all codecs are lossy codecs--they lose some of the data.
Well, G.711 isn't really lossy per-se (in theory yes, but it's almost more quantization-level loss). 64K G.722 isn't very lossy either. G.723 sucks dead gerbils through garden hoses. :-)
It sounds like a bad idea to do it yourself. Developing a sip client is not a trivial task since there are several protocols you would have to implement. Choosing the coding is not very important decision compared to the rest.
imho you should use one of the open source sip stacks available (like pjsip) or build your application on top of an open source sip client (like sipdroid).
But since you asked for codec: Use the GSM codec. Saved bandwidth and sounds OK. G.711 is otherwise the standard codec that 99% of all sip servers support.
Any?
SIP does not send and deal with ANY data packets. SIP is the Session Initiation Protocol and it handles the NEGOTIATION OF SESSIONS.
The sessions then arae - in case of auio and video - are based on RTP and use RTSP for signalling. So, your question indicates a REAL lack of knowledge of what you need to do - the real uestion is: you need a RTP compatible codec.
Which is similar senseless. RTP is jsut a carrier protocol. THis is like asking "what is a HTTP compatioble image format". HTTP does not care. The browser does.
In case of RTP, it means - RTP does not care. It can transport ANY data. WHat is important is that BOTH SIDES know the same codec. So, in your case it means:
If you program both sides then it is your choice.
If you program only one sidwe (like a SIP phone system), then the question is waht "normal" programs handle.
There are two things you need to take into consideration here:
What other devices/servers that handle media are being deployed or are planned to be deployed
Is your customer looking for narrowband or wideband solution - this will affect the voice quality of the call greatly
Once you have nailed down the answers to the two questions above you will be able to select.
For mobile devices, the voice codecs usually used are AMR-NB or AMR-WB. For SIP it is usually G.729 or G.722.x.
You also have Speex, ISAC and SILK to choose from.
You will probably need to do G.711 in any case just to interoperate with everything - bandwidth will be higher though.
No easy answer here. If your customer can select, or state what other devices are being used - it will be easier for you to select.

Categories

Resources