expected live streaming lag from an android device - android

How much lag is expected when an android device's camera is streamed live?
I checked Miracast and it has a lag of around 150-300 ms(usually its around 160-190 ms).
I have a bluetooth and a wifi direct app and both lag by around 400-550 ms. I was wondering if it would be possible to reproduce or come closer to Miracast's performance.
I am encoding the camera frames in H.264 and using my own custom protocol to transmit the encoded frames over a TCP connection(in case of WiFi Direct).

Video encoding is compute intensive so any help you can get from the hardware usually speeds things up. It is worth making sure you are using codecs which leverage the hardware - i.e avoid having your H.264 encoding all in software.
There is a video here which discusses how to access HW accelerated video codecs, although it is a little old, but will apply to older devices:
https://www.youtube.com/watch?v=tLH-oGfiGaA
I believe that MediaCodec now provides a Java API to the Hardware Codecs (have not tried or seen speed comparison tests myself), which makes things easier:
http://developer.android.com/about/versions/android-4.1.html#Multimedia
See the note here also about using the camera's Surface preview as a speed aid:
https://stackoverflow.com/a/17243244/334402
As an aside, 500ms does not seem that bad, and you are probably into diminishing returns so the effort to reduce may be high if you can live with your current lag.
Also, I assume you are measuring the lag (or latency) on the server side? If so you need to look at how the server is decoding and presenting also, especially if you are comparing your own player with a third party one.
It is worth looking at your jitter buffer on the receiving side, irrespective of the stream etc latency - in simple terms waiting for a larger number of packets before you start playback may cause more startup delay but it also may provide a better overall user experience. This is because the large buffer will be more tolerant of delayed packets, before going into a 'buffering' mode that users tend not to like.
Its a balance, and your needs may dictate a bias one way or the other. If you were planning a video chat like application for example, then the delay is very important as it starts to become annoying to users above 200-300 ms. If you are providing a feed from a sports event, then delay may be less important and avoiding buffering pauses may give better user perceived quality.

Related

Segmenting videos during camera capture for higher upload efficiency

I want to develop an Android app that captures a video with the phone's camera, then uploads the video (as a single file) to cloud. And then I want to make it more efficient by segmenting the recorded video into small chunks (say 5 seconds) and upload the chunks to the cloud. And then compare the two approach and show that the second approach is more efficient and provides faster upload according to this blog.
Is the chunking approach more efficient? In what ways? And how can I implement this? Shall I wait until the video is finished, then chunk them up, or can we do it in real-time as capturing progresses? Any tips or experience doing this would certainly help.
Breaking a video into chunks and performing processing, such are encoding or packaging, in parallel is a very common technique for VOD video at this time.
It allows for faster processing so long as you have the computing resources to tackle tasks in parallel, which many multi core computers and the cloud do have.
It can also allow you to to schedule the processing when your resources are most available, in a period of low load from other jobs for example, or when they are cheapest which can help with cloud computing costs.
Its hard to say if that is more efficient as it depends on what you are measuring - if you added up the total compute instructions or cycles it is quite likely it actually takes more this way, but for the reasons above it is often the preferred approach anyway.
For video transfer or transport, if you were able to send the different chunks on different paths, each of which had a limit that you could fill with that segment, it might indeed provide time savings or efficiency.
However, If your device has a single transmission path, for example it's WiFi IP connection, then the its not clear that segmenting the video would have benefit beyond the 'breaking up' of the video that already happens when you send it packet by packet over an IP network anyway.
If you goal is fast video transmission then it would likely be worth taking a look at some of the specialist protocols which exist for fast and efficient live video transmission. Some of these may be UDP based rather than TCP and if so you may need to check that your target network firewalls and routing rules will support them. SRT would be a good example to look, see like below, bit others exists such as ZiXi which is proprietary:
https://en.wikipedia.org/wiki/Secure_Reliable_Transport
Focusing on transmission, internet Video streams are actually broken into chunks anyway, because they are delivered over packet networks. The video stream will also be 'chunked' at a higher level also frequently, in addition to encoding and container (e.g. mp4) groupings, to support ABR protocols which segment the video to allow different bitrate renditions be switched between (see: https://stackoverflow.com/a/42365034/334402).
Breaking a video into chunks can also help with playback if the chunks can start playing before the entire video is downloaded - again this is typical of most streamed videos on the internet.
If we ignore any effect of different protocols retry, packet loss etc strategies, and if you have a single transmission 'pipe' of a set capacity, breaking the video into chunks does not make it faster to transfer.
If, however, you have multiple transmission 'pipes' then breaking the video up and sending different parts in parallel across the different pipes may indeed speed it up. It important to remember that, even then, you are limited if the video is a live stream, by the actual live video rate itself - i.e. you can't transfer the video faster than the source is producing it.

Using smartphones' cameras for video surveillance

I have two smartphones that I no longer use and I'd like to use their cameras for a basic video surveillance system, rather than buying expensive cameras.
My idea is to record overnight, and save the videos on my laptop.
It would be a good opportunity for me to learn a bit more about Android programming as well.
So I guess the approach is:
a TCP/IP server, gathering the information coming from the two (or N)
phones;
a TCP/IP client, to run on each phone, recording and sending the
information to the server;
I am not sure what that "information" should be, though. Should that be the single frames captured by the cameras, or is there a way to stream a video?
In case I wanted to implement a basic motion detection, would it be better to do it on the clients or on the server?
Is my approach above correct?
Have a look at the open-source libstreaming project, which will allow you stream video from your phones. But if you want a time-lapse recording, e.g. 1 frame per second or less, then sending single frames may be preferable.
Note that your smartphones will need all-time power supply, because camera and communications will drain any battery in very short time. Also keep in mind that phone cameras do not perform well in low-illumination conditions, and this may justify investment in expensive dedicated cameras.
You can use OpenCV for motion detection either on device or on server, or even both. The choice depends on your needs and resources. E.g. you may significantly reduce overall data volumes if the device only send the video when it detects motion.

Android OpenSL ES crystal frequency

we have an app with mobile audio clients written in low-level OpenSL ES to achieve low-latency input from microphone. Than we are sending 10ms frames encapsulated in UDP datagram to server.
On server we are doing some post-processing which is curucially dependent on aan assumption that frames from mobile clients comes in fixed intervals (eg. 10ms per frame), so we can align them.
It seems that internal crystal frequencies on mobile phones can vary a lot and due to this, we are getting perfect alignment on the beggining but poor alignment after few minutes.
I know, that ALSA on Linux can tell you exact frequency of the crystal - so you can correct your counts based on this. Unfortunatelly I don't know how to get this information in Android.
Thx for help
The essence of the problem you face is that you have an ADC and a DAC on separate systems with different local oscillators. You're presumably timing your packets against a 3rd (and possibly 4th) CPU clock.
The correct solution to this problem is some kind of clock recovery algorithm. To do this properly you need some means of accurately timestamping (e.g. to bit accuracy) transmitted packets, and then use a PLL to drive the clock-rate of the receiver's sample clock. This is is precisely the approach that both IEEE1394 audio and MPEG2 Transport streams use.
Since probably can't do either of these things, your approach is most likely going to involve dropping or repeating samples (or even entire packets) periodically to keep your receive buffer from under- or over-flowing.
USB Audio has a similar lack of hardware support for clock recovery, and the approaches used there may be applicable to your situation.
Relying on the transmission and reception timing of network packets is a terrible idea. The jitter on delivery times is horrendous - particularly with Wifi or cellular connections. You'd be well advised to use not rely on it at all, and instead do as both IEEE1394 audio and MPEG 2 TS do, which is to decouple audio data transport from consumption using a model FIFO in which data is consumed at a constant rate and delivered to it in packets of unreliable timing.
As for ALSA, all it can do (unless if has an accurate external timing reference) is to measure the drift between the sample clock of the audio interface and the CPU's clock. This does not yield 'the exact frequency' of anything as neither oscillator is likely to be accurate, and both may drift dependent on temperature.

Android Getting distance using sound between two devices

The idea is Phone A sends a sound signal and bluetooth signal at the same time and Phone B will calculate the delay between the two signals.
In practice I am getting inconsistent results with delays from 90ms-160ms.
I tried optimizing both ends as much as possible.
On the output end:
Tone is generated once
Bluetooth and audio output each have their own thread
Bluetooth only outputs after AudioTrack.write and AudioTrack is in streaming mode so it should start outputting before the write is even completed.
On the receiving end:
Again two separate threads
System time is recorded before each AudioRecord.read
Sampling specs:
44.1khz
Reading entire buffer
Sampling 100 samples at a time using fft
Taking into account how many samples transformed since initial read()
Your method relies on basically zero latency throughout the whole pipeline, which is realistically impossible. You just can't synchronize it with that degree of accuracy. If you could get the delays down to 5-6ms, it might be possible, but you'll beat your head into your keyboard before that happens. Even then, it could only possibly be accurate to 1.5 meters or so.
Consider the lower end of the delays you're receiving. In 90ms, sound can travel slightly over 30m. That's the very end of the marketed bluetooth range, without even considering that you'll likely be in non-ideal transmitting conditions.
Here's a thread discussing low latency audio in Android. TL;DR is that it sucks, but is getting better. With the latest APIs and recent devices, you may be able to get it down to 30ms or so, assuming you run some hand-tuned audio functions. No simple AudioTrack here. Even then, that's still a good 10-meter circular error probability.
Edit:
A better approach, assuming you can synchronize the devices' clocks, would be to embed a timestamp into the audio signal, using a simple am/fm modulation or pulse train. Then you could decode it at the other end and know when it was sent. You still have to deal with the latency problem, but it simplifies the whole thing nicely. There's no need for bluetooth at all, since it isn't really a reliable clock anyway, since it can be regarded as having latency problems of its own.
This gives you a pretty good approach
http://netscale.cse.nd.edu/twiki/pub/Main/Projects/Analyze_the_frequency_and_strength_of_sound_in_Android.pdf
You have to create an 1 kHz sound with some amplitude (measure in dB) and try to measure the amplitude of the sound arrived to the other device. From the sedation you might be able to measure the distance.
As I remember: a0 = 20*log (4*pi*distance/lambda) where a0 is the sedation and lambda is given (you can count it from the 1kHz)
But in such a sensitive environment, the noise might spoil the whole thing, just an idea, how I would do if I were you.

Which is the best SIP compatible codec type for Android

I want to develop a Android App which will use a SIP Server of my client. My client is exposing couple of REST API from the SIP server for communicating with the apps.
I want to know which would be the best codec type for this app?
Basically, I want to create a SIP-Stack and send the SIP Packets to the Server. So, there should be a coding and decoding system for the packets. My client prefers 16 kb/sec but I am not sure which should I use.
As others have said, SIP does not transfer audio or video. Although in theory, you can send data over any transport, including ATM, analog lines, a DS0, etc, in the real world, RTP is the most common. RTP (Real Time Protocol) and RTCP (Real Time Control Protocol) or SRTP (Secure RTP) usually carry the audio and video.
As far as codecs go, you will be limited by what your server supports. Here are a few common codecs and some pros and cons of each.
G.711 - Toll quality (ie good as a good analog phone line, or even a bit better). "Universal" in that virtually every device supports G.711. Takes a lot of bandwidth, it doesn't really compress data (G.711 is a "compander"). The baseline G.711 is pretty bare-bones (its really a couple of look up tables). Appendix I adds packet loss concealment (PLC) and Appendix II adds silence suppression and comfort noise generation.
GSM - used on cellphones, sounds ok, good PLC, good compression
G.729A - widely used, near toll quality, good compression (8Kbps)
G.723.1 - widely used, almost as good as G.729, better compression (4-5Kbps)
G.722 - sounds better than G.711, wideband (twice the audio bandwidth of G.711 or an analog call), same bandwidth used on the line as G.711
GIPS - various implemnetations exist, one is free. IIRC, uses about 13.5Kbps on the line, sound is not as good asG.723.1 (but this is a perceptual metric, YMMV) Takes a lot of processor.
All the codecs use some processor and other system resources, as a rule of thumb the more aggressive the codec (the smaller the bandwidth) the more processor used. Also, all of these particular codecs are lossy codecs--they lose some of the data. This means that there is compression, not that portions of the audio are dropped due to poor routing and poor line quality. Much like an MP3 is considered a LOSSY codec while FLAC is considered Lossless. If you're interested the following wikipedia article explains in further detail: http://en.wikipedia.org/wiki/Lossy_compression
You need to know what codecs and protocols this SIP server will support. If you control both ends and want to stick to 16Kbps, you'll want iLBC (no royalties) or G.729 (royalties apply). G.711 and (now) G.722 have no royalties either, but both use ~64Kbps.
The list given before is good, with a few issues.:
GIPS - various implemnetations exist, one is free. IIRC, uses about 13.5Kbps on the line, sound is not as good asG.723.1 (but this is a perceptual metric, YMMV) Takes a lot of processor.
GIPS is not a codec - iLBC and iSAC are codecs designed by GIPS. iLBC is free and standardized. iLBC is high quality, 13 or 15Kbps, and is very resilient to packet loss compared to G.729 or even G.711. You can have 30 or even 50% loss with iLBC and still be understood. I'm not sure I'd say it uses a lot of CPU compared to say G.729.
All the codecs use some processor and other system resources, as a rule of thumb the more aggressive the codec (the smaller the bandwidth) the more processor used. Also, all codecs are lossy codecs--they lose some of the data.
Well, G.711 isn't really lossy per-se (in theory yes, but it's almost more quantization-level loss). 64K G.722 isn't very lossy either. G.723 sucks dead gerbils through garden hoses. :-)
It sounds like a bad idea to do it yourself. Developing a sip client is not a trivial task since there are several protocols you would have to implement. Choosing the coding is not very important decision compared to the rest.
imho you should use one of the open source sip stacks available (like pjsip) or build your application on top of an open source sip client (like sipdroid).
But since you asked for codec: Use the GSM codec. Saved bandwidth and sounds OK. G.711 is otherwise the standard codec that 99% of all sip servers support.
Any?
SIP does not send and deal with ANY data packets. SIP is the Session Initiation Protocol and it handles the NEGOTIATION OF SESSIONS.
The sessions then arae - in case of auio and video - are based on RTP and use RTSP for signalling. So, your question indicates a REAL lack of knowledge of what you need to do - the real uestion is: you need a RTP compatible codec.
Which is similar senseless. RTP is jsut a carrier protocol. THis is like asking "what is a HTTP compatioble image format". HTTP does not care. The browser does.
In case of RTP, it means - RTP does not care. It can transport ANY data. WHat is important is that BOTH SIDES know the same codec. So, in your case it means:
If you program both sides then it is your choice.
If you program only one sidwe (like a SIP phone system), then the question is waht "normal" programs handle.
There are two things you need to take into consideration here:
What other devices/servers that handle media are being deployed or are planned to be deployed
Is your customer looking for narrowband or wideband solution - this will affect the voice quality of the call greatly
Once you have nailed down the answers to the two questions above you will be able to select.
For mobile devices, the voice codecs usually used are AMR-NB or AMR-WB. For SIP it is usually G.729 or G.722.x.
You also have Speex, ISAC and SILK to choose from.
You will probably need to do G.711 in any case just to interoperate with everything - bandwidth will be higher though.
No easy answer here. If your customer can select, or state what other devices are being used - it will be easier for you to select.

Categories

Resources