we have an app with mobile audio clients written in low-level OpenSL ES to achieve low-latency input from microphone. Than we are sending 10ms frames encapsulated in UDP datagram to server.
On server we are doing some post-processing which is curucially dependent on aan assumption that frames from mobile clients comes in fixed intervals (eg. 10ms per frame), so we can align them.
It seems that internal crystal frequencies on mobile phones can vary a lot and due to this, we are getting perfect alignment on the beggining but poor alignment after few minutes.
I know, that ALSA on Linux can tell you exact frequency of the crystal - so you can correct your counts based on this. Unfortunatelly I don't know how to get this information in Android.
Thx for help
The essence of the problem you face is that you have an ADC and a DAC on separate systems with different local oscillators. You're presumably timing your packets against a 3rd (and possibly 4th) CPU clock.
The correct solution to this problem is some kind of clock recovery algorithm. To do this properly you need some means of accurately timestamping (e.g. to bit accuracy) transmitted packets, and then use a PLL to drive the clock-rate of the receiver's sample clock. This is is precisely the approach that both IEEE1394 audio and MPEG2 Transport streams use.
Since probably can't do either of these things, your approach is most likely going to involve dropping or repeating samples (or even entire packets) periodically to keep your receive buffer from under- or over-flowing.
USB Audio has a similar lack of hardware support for clock recovery, and the approaches used there may be applicable to your situation.
Relying on the transmission and reception timing of network packets is a terrible idea. The jitter on delivery times is horrendous - particularly with Wifi or cellular connections. You'd be well advised to use not rely on it at all, and instead do as both IEEE1394 audio and MPEG 2 TS do, which is to decouple audio data transport from consumption using a model FIFO in which data is consumed at a constant rate and delivered to it in packets of unreliable timing.
As for ALSA, all it can do (unless if has an accurate external timing reference) is to measure the drift between the sample clock of the audio interface and the CPU's clock. This does not yield 'the exact frequency' of anything as neither oscillator is likely to be accurate, and both may drift dependent on temperature.
Related
How much lag is expected when an android device's camera is streamed live?
I checked Miracast and it has a lag of around 150-300 ms(usually its around 160-190 ms).
I have a bluetooth and a wifi direct app and both lag by around 400-550 ms. I was wondering if it would be possible to reproduce or come closer to Miracast's performance.
I am encoding the camera frames in H.264 and using my own custom protocol to transmit the encoded frames over a TCP connection(in case of WiFi Direct).
Video encoding is compute intensive so any help you can get from the hardware usually speeds things up. It is worth making sure you are using codecs which leverage the hardware - i.e avoid having your H.264 encoding all in software.
There is a video here which discusses how to access HW accelerated video codecs, although it is a little old, but will apply to older devices:
https://www.youtube.com/watch?v=tLH-oGfiGaA
I believe that MediaCodec now provides a Java API to the Hardware Codecs (have not tried or seen speed comparison tests myself), which makes things easier:
http://developer.android.com/about/versions/android-4.1.html#Multimedia
See the note here also about using the camera's Surface preview as a speed aid:
https://stackoverflow.com/a/17243244/334402
As an aside, 500ms does not seem that bad, and you are probably into diminishing returns so the effort to reduce may be high if you can live with your current lag.
Also, I assume you are measuring the lag (or latency) on the server side? If so you need to look at how the server is decoding and presenting also, especially if you are comparing your own player with a third party one.
It is worth looking at your jitter buffer on the receiving side, irrespective of the stream etc latency - in simple terms waiting for a larger number of packets before you start playback may cause more startup delay but it also may provide a better overall user experience. This is because the large buffer will be more tolerant of delayed packets, before going into a 'buffering' mode that users tend not to like.
Its a balance, and your needs may dictate a bias one way or the other. If you were planning a video chat like application for example, then the delay is very important as it starts to become annoying to users above 200-300 ms. If you are providing a feed from a sports event, then delay may be less important and avoiding buffering pauses may give better user perceived quality.
I'm trying to see how gsm affects data on a phone call. Here is what I'm trying to do. One person will be talking on a phone and I will record his voice from phone's mic while he speaks and on the other phone I will get the data coming from gsm and compare them. I want to write an android application to get that data. Is that possible on android or can you suggest another way to achieve this?
Some background (you may know this already)...
When you make a GSM call, the analogue signal in the phone microphone corresponding to your speech is converted into a series of digital values and then encoded with a voice codec. This is basically a clever algorithm to capture as much of the speech as possible, in as little data as possible.
The idea is to maintain very good speech quality while saving on the amount of bandwidth needed for a call. Techniques used include not transmitting quite periods (when you are not speaking) and various compressions and predictive encoding algorithms. There have been and still are a number of codecs in use in GSM, but the latest and general preferred codec is called AMR-Narrowband.
Nearly all GSM deployments encrypt speech between the phone and the base station - while there are publicised weaknesses in the various encryption algorithms, I am assuming (hoping...) that decrypting is not what you are looking for.
Your question - 'I want to see that if there will be data loss or corruption when voice reaches over gsm'
Firstly, is it worth noting that speech is 'relatively' tolerant of small amounts of data loss and corruption, at least compared to data. It is quite common to have bursts of packet loss in VoIP networks and it may cause a temporary degradation in voice quality. Secondly, packet loss in a VoIP network will include delayed packets which can be confusing - if the packet arrives too late to be included in the 'sound' being played in the receivers speaker then it is effectively lost from the VoIP point of view, even though other measures may show that it simply arrived late.
To actually measure the loss between the GSM phone and the basestation you would need access to the data received at the basestation, which you will not usually have unless you are the operator.
If you do an end to end test, from one GSM to another, your speech path will traverse other network nodes also, so you will not know if any loss or corruption is happening over the GSM air interface or in one or more of the other nodes.
You would also need to be aware of handover from one cell to another and from 2G to 3G (GSM to UMTS) which may affect your tests (even stationary phones can handover in certain circumstances).
If your interest is purely academic then the easiest thing might be to create your own GSM base station and test on this - there exists several open source GSM 'network in a box' projects which should allow you do this. I have not used it myself, but this one looks the most actively supported at this time - check out the mailing list under the community tab for a good place to maybe follow up your investigations:
http://openbts.org
The idea is Phone A sends a sound signal and bluetooth signal at the same time and Phone B will calculate the delay between the two signals.
In practice I am getting inconsistent results with delays from 90ms-160ms.
I tried optimizing both ends as much as possible.
On the output end:
Tone is generated once
Bluetooth and audio output each have their own thread
Bluetooth only outputs after AudioTrack.write and AudioTrack is in streaming mode so it should start outputting before the write is even completed.
On the receiving end:
Again two separate threads
System time is recorded before each AudioRecord.read
Sampling specs:
44.1khz
Reading entire buffer
Sampling 100 samples at a time using fft
Taking into account how many samples transformed since initial read()
Your method relies on basically zero latency throughout the whole pipeline, which is realistically impossible. You just can't synchronize it with that degree of accuracy. If you could get the delays down to 5-6ms, it might be possible, but you'll beat your head into your keyboard before that happens. Even then, it could only possibly be accurate to 1.5 meters or so.
Consider the lower end of the delays you're receiving. In 90ms, sound can travel slightly over 30m. That's the very end of the marketed bluetooth range, without even considering that you'll likely be in non-ideal transmitting conditions.
Here's a thread discussing low latency audio in Android. TL;DR is that it sucks, but is getting better. With the latest APIs and recent devices, you may be able to get it down to 30ms or so, assuming you run some hand-tuned audio functions. No simple AudioTrack here. Even then, that's still a good 10-meter circular error probability.
Edit:
A better approach, assuming you can synchronize the devices' clocks, would be to embed a timestamp into the audio signal, using a simple am/fm modulation or pulse train. Then you could decode it at the other end and know when it was sent. You still have to deal with the latency problem, but it simplifies the whole thing nicely. There's no need for bluetooth at all, since it isn't really a reliable clock anyway, since it can be regarded as having latency problems of its own.
This gives you a pretty good approach
http://netscale.cse.nd.edu/twiki/pub/Main/Projects/Analyze_the_frequency_and_strength_of_sound_in_Android.pdf
You have to create an 1 kHz sound with some amplitude (measure in dB) and try to measure the amplitude of the sound arrived to the other device. From the sedation you might be able to measure the distance.
As I remember: a0 = 20*log (4*pi*distance/lambda) where a0 is the sedation and lambda is given (you can count it from the 1kHz)
But in such a sensitive environment, the noise might spoil the whole thing, just an idea, how I would do if I were you.
I'm reviewing all kinds of Android sound API and I'd like to know which one I should use.
My goal is to get low latency audio or, at least, deterministic behavior regarding delay of playback.
We've had a lot of problems and it seems that Android sound API is crap, so I'm exploring possibilities.
The problem we have is that there is significant delay between sound_out.write(sound_samples); and actual sound played from the speakers. Usually it is around 300 ms. The problem is that on all devices it's different; some don't have that problem, but most are crippled (however, CS call has zero latency). The biggest issue with this ridiculous delay is that on some devices this delay appears to be some random value (i.e. it's not always 300ms).
I'm reading about OpenSL ES and I'd like to know if anybody had experience with it, or it's the same shit but wrapped in different package?
I prefer to have native access, but I don't mind Java layer indirection as long as I can get deterministic behavior: either the delay has to be constant (for a given device), or I'd like to get access to current playback position instead of guessing it with a error range of ±300 ms...
EDIT:1.5 years later I tried multiple android phones to see how I can get best possible latency for a real time voice communication. Using specialized tools I measured the delay of waveout path. Best results were over 100 ms, most phones were in 180ms range. Anybody have ideas?
SoundPool is the lowest-latency interface on most devices, because the pool is stored in the audio process. All of the other audio paths require inter-process communication. OpenSL is the best choice if SoundPool doesn't meet your needs.
Why OpenSL? AudioTrack and OpenSL have similar latencies, with one important difference: AudioTrack buffer callbacks are serviced in Dalvik, while OpenSL callbacks are serviced in native threads. The current implementation of Dalvik isn't capable of servicing callbacks at extremely low latencies, because there is no way to suspend garbage collection during audio callbacks. This means that the minimum size for AudioTrack buffers has to be larger than the minimum size for OpenSL buffers to sustain glitch-free playback.
On most Android releases this difference between AudioTrack and OpenSL made no difference at all. But with Jellybean, Android now has a low-latency audio path. The actual latency is still device dependent, but it can be considerably lower than previously. For instance, http://code.google.com/p/music-synthesizer-for-android/ uses 384-frame buffers on the Galaxy Nexus for a total output latency of under 30ms. This requires the audio thread to service buffers approximately once every 8ms, which was not feasible on previous Android releases. It is still not feasible in a Dalvik thread.
This explanation assumes two things: first, that you are requesting the smallest possible buffers from OpenSL and doing your processing in the buffer callback rather than with a buffer queue. Second, that your device supports the low-latency path. On most current devices you will not see much difference between AudioTrack and OpenSL ES. But on devices that support Jellybean+ and low-latency audio, OpenSL ES will give you the lowest-latency path.
IIRC, OpenSL is passed through the same interface as AudioTrack, so at best it will match AudioTrack. (FWIW, I'm currently using OpenSL for "low latency" output)
The sad truth is there is no such thing as low latency audio on Android. There isn't even a proper way to flag and/or filter devices based on audio latency.
What interface you'll want to use to minimize latency is going to depend on what you are trying to do.
If you want to have an audio stream you'll be looking at either OpenSL or AudioTrack.
If you want to trigger some static oneshots you may want to use SoundPool. For static oneshots SoundPool will have low latency as the samples are preloaded to the hardware. I think it's possible to preload oneshots using OpenSL as well, but I haven't tried.
The lowest latency you can get is from SoundPool. There's a limit on how big of a sound you can play that way, but if you're under the limit (1Mb, IIRC) it's pretty low latency. Even that's probably not 40ms, though.
But it is faster than what you can get by streaming, at least in my experience.
Caveat: You may see occasional crashes in SoundPool on Samsung devices. I'm have a theory that it only happens when you access the SoundPool from multiple threads, but I haven't verified this.
EDIT: OpenSL ES apparently has extremely HIGH latency on Kindle Fire, while SoundPool is much better, but the reverse may be true on other platforms.
About the problem of a deterministic/constant latency, here you can find an interesting article:
APPROACHES FOR CONSTANT AUDIO LATENCY ON ANDROID
The core of their investigations is: Because the Audio HAL, which is one of the deeper levels of the audiopath and responsible for the timing of the audio-callback-events, is vendor-implemented the relative latencies can vary, especially in cheap hardware.
So they suggest two approaches to reduce the variance of the latency. One is to take care of the callback-timing by inserting audio in fixed intervals, the other one is to filter the callback times to estimate the time at which a constant latency callback should have occured by appliyng a smoothing filter.
With this two approaches the could significantly reduce the variance of the latency.
It should also be mentioned, that there is a new native Android-Audio-API, AAudio.
AAudio API
It's available/stable from Android Oreo 8.1 (API Level 27).
There is also a wrapper, which dynamically chooses between OpenSL ES and AAudio and is much easier to code then OpenSL ES. It's still in developer preview.
Oboe Audio Library
The best way to get low latency for native code on Android is to use Oboe.
https://github.com/google/oboe
Oboe wraps AAudio on newer devices. AAudio offers the lowest possible latency paths. If AAudio is not available then Oboe calls OpenSL ES. Oboe is much easier to use than OpenSL ES.
AAudio either calls through AudioTrack or through a new MMAP data path. AAudio makes it easier to get a FAST track because you can leave some parameters unspecified. AAudio will then choose the right parameters needed for a FAST track.
I want to develop a Android App which will use a SIP Server of my client. My client is exposing couple of REST API from the SIP server for communicating with the apps.
I want to know which would be the best codec type for this app?
Basically, I want to create a SIP-Stack and send the SIP Packets to the Server. So, there should be a coding and decoding system for the packets. My client prefers 16 kb/sec but I am not sure which should I use.
As others have said, SIP does not transfer audio or video. Although in theory, you can send data over any transport, including ATM, analog lines, a DS0, etc, in the real world, RTP is the most common. RTP (Real Time Protocol) and RTCP (Real Time Control Protocol) or SRTP (Secure RTP) usually carry the audio and video.
As far as codecs go, you will be limited by what your server supports. Here are a few common codecs and some pros and cons of each.
G.711 - Toll quality (ie good as a good analog phone line, or even a bit better). "Universal" in that virtually every device supports G.711. Takes a lot of bandwidth, it doesn't really compress data (G.711 is a "compander"). The baseline G.711 is pretty bare-bones (its really a couple of look up tables). Appendix I adds packet loss concealment (PLC) and Appendix II adds silence suppression and comfort noise generation.
GSM - used on cellphones, sounds ok, good PLC, good compression
G.729A - widely used, near toll quality, good compression (8Kbps)
G.723.1 - widely used, almost as good as G.729, better compression (4-5Kbps)
G.722 - sounds better than G.711, wideband (twice the audio bandwidth of G.711 or an analog call), same bandwidth used on the line as G.711
GIPS - various implemnetations exist, one is free. IIRC, uses about 13.5Kbps on the line, sound is not as good asG.723.1 (but this is a perceptual metric, YMMV) Takes a lot of processor.
All the codecs use some processor and other system resources, as a rule of thumb the more aggressive the codec (the smaller the bandwidth) the more processor used. Also, all of these particular codecs are lossy codecs--they lose some of the data. This means that there is compression, not that portions of the audio are dropped due to poor routing and poor line quality. Much like an MP3 is considered a LOSSY codec while FLAC is considered Lossless. If you're interested the following wikipedia article explains in further detail: http://en.wikipedia.org/wiki/Lossy_compression
You need to know what codecs and protocols this SIP server will support. If you control both ends and want to stick to 16Kbps, you'll want iLBC (no royalties) or G.729 (royalties apply). G.711 and (now) G.722 have no royalties either, but both use ~64Kbps.
The list given before is good, with a few issues.:
GIPS - various implemnetations exist, one is free. IIRC, uses about 13.5Kbps on the line, sound is not as good asG.723.1 (but this is a perceptual metric, YMMV) Takes a lot of processor.
GIPS is not a codec - iLBC and iSAC are codecs designed by GIPS. iLBC is free and standardized. iLBC is high quality, 13 or 15Kbps, and is very resilient to packet loss compared to G.729 or even G.711. You can have 30 or even 50% loss with iLBC and still be understood. I'm not sure I'd say it uses a lot of CPU compared to say G.729.
All the codecs use some processor and other system resources, as a rule of thumb the more aggressive the codec (the smaller the bandwidth) the more processor used. Also, all codecs are lossy codecs--they lose some of the data.
Well, G.711 isn't really lossy per-se (in theory yes, but it's almost more quantization-level loss). 64K G.722 isn't very lossy either. G.723 sucks dead gerbils through garden hoses. :-)
It sounds like a bad idea to do it yourself. Developing a sip client is not a trivial task since there are several protocols you would have to implement. Choosing the coding is not very important decision compared to the rest.
imho you should use one of the open source sip stacks available (like pjsip) or build your application on top of an open source sip client (like sipdroid).
But since you asked for codec: Use the GSM codec. Saved bandwidth and sounds OK. G.711 is otherwise the standard codec that 99% of all sip servers support.
Any?
SIP does not send and deal with ANY data packets. SIP is the Session Initiation Protocol and it handles the NEGOTIATION OF SESSIONS.
The sessions then arae - in case of auio and video - are based on RTP and use RTSP for signalling. So, your question indicates a REAL lack of knowledge of what you need to do - the real uestion is: you need a RTP compatible codec.
Which is similar senseless. RTP is jsut a carrier protocol. THis is like asking "what is a HTTP compatioble image format". HTTP does not care. The browser does.
In case of RTP, it means - RTP does not care. It can transport ANY data. WHat is important is that BOTH SIDES know the same codec. So, in your case it means:
If you program both sides then it is your choice.
If you program only one sidwe (like a SIP phone system), then the question is waht "normal" programs handle.
There are two things you need to take into consideration here:
What other devices/servers that handle media are being deployed or are planned to be deployed
Is your customer looking for narrowband or wideband solution - this will affect the voice quality of the call greatly
Once you have nailed down the answers to the two questions above you will be able to select.
For mobile devices, the voice codecs usually used are AMR-NB or AMR-WB. For SIP it is usually G.729 or G.722.x.
You also have Speex, ISAC and SILK to choose from.
You will probably need to do G.711 in any case just to interoperate with everything - bandwidth will be higher though.
No easy answer here. If your customer can select, or state what other devices are being used - it will be easier for you to select.