I would like to design a small Android app which triggers an event when the microphone's threshold is above a specific raw value.
Something along the lines of this: http://code.google.com/p/android-labs/wiki/NoiseAlert
As it is an always-on service, I'd like to keep battery consumption to minimum.
I understand that sound amplitude is not something that is handled by the system (am I right?), so I must calculate it from the raw values. In addition, I cannot measure amplitude without recording...
So that leaves me with taking care that there will be minimum effort for the CPU. I can use lowest 8kHz sample, but even that is overkill just for calculating aplitude.
Is there a way to use an ultra low sampling rate, like 50Hz or even 20Hz?
And in general, what advice is there to help me maintain bat consumption to a minimum using that application...?
Thanks a lot in advance!
This is going to suck the battery dry really quickly. And a phone with a flat battery is a terrible user-experience.
I doubt the sample-rate will have a huge effect on power consumption. However, buffer period will.
Whilst the phone will need to keep the audio codec, DMA engines and memory controller active pretty much the whole time, you can at least limit the number of times the CPU is woken up (a power hungry operation). In addition, processing a large number of samples at one is considerably cheaper per-sample that processing a small number infrequently.
Whether AudioFlinger modifies its own buffer period in response to active applications, I don't know. You might have to go direct with ALSA (on phones that have it). This has the added benefit of reducing the amount of code executed for each buffer period.
Related
I have an application that depends on recording all sounds and analyses it and notify me when it records a specific tone.
So this app consuming battery power as it works all time to detect the sound tone wanted.
I need an idea to prevent this problem please.
Thanks in advance.
It appears that you are not allowing the processor to drop into a quiescent low power state. To allow the processor to conserve power, you need to have the processor idle as much as possible. If you are continuously sampling, this isn't going to happen. My answer here can give you some background.
I suggest you do the following:
Find out what is the minimum fidelity you can use and still identify the tones you want. To say this differently, determine the maximum sampling interval. For example, you may find that you can get by with sampling every quarter second and still identify the tone you want. This will allow the processor to drop into an energy conservation state.
Make sure you are using interrupts and not polling, i.e. use something like usleep(). So to check every .25 sec, you'll use something like while( running ){ sampleTone(); usleep(250000); }.
Check your sound sampling device's capabilities. It may have the ability to do something more sophisticated that will further minimize the number of samples/sec you need. For example, it may allow you to send the samples directly to disk or memory without requiring the CPU to wake up.
we have an app with mobile audio clients written in low-level OpenSL ES to achieve low-latency input from microphone. Than we are sending 10ms frames encapsulated in UDP datagram to server.
On server we are doing some post-processing which is curucially dependent on aan assumption that frames from mobile clients comes in fixed intervals (eg. 10ms per frame), so we can align them.
It seems that internal crystal frequencies on mobile phones can vary a lot and due to this, we are getting perfect alignment on the beggining but poor alignment after few minutes.
I know, that ALSA on Linux can tell you exact frequency of the crystal - so you can correct your counts based on this. Unfortunatelly I don't know how to get this information in Android.
Thx for help
The essence of the problem you face is that you have an ADC and a DAC on separate systems with different local oscillators. You're presumably timing your packets against a 3rd (and possibly 4th) CPU clock.
The correct solution to this problem is some kind of clock recovery algorithm. To do this properly you need some means of accurately timestamping (e.g. to bit accuracy) transmitted packets, and then use a PLL to drive the clock-rate of the receiver's sample clock. This is is precisely the approach that both IEEE1394 audio and MPEG2 Transport streams use.
Since probably can't do either of these things, your approach is most likely going to involve dropping or repeating samples (or even entire packets) periodically to keep your receive buffer from under- or over-flowing.
USB Audio has a similar lack of hardware support for clock recovery, and the approaches used there may be applicable to your situation.
Relying on the transmission and reception timing of network packets is a terrible idea. The jitter on delivery times is horrendous - particularly with Wifi or cellular connections. You'd be well advised to use not rely on it at all, and instead do as both IEEE1394 audio and MPEG 2 TS do, which is to decouple audio data transport from consumption using a model FIFO in which data is consumed at a constant rate and delivered to it in packets of unreliable timing.
As for ALSA, all it can do (unless if has an accurate external timing reference) is to measure the drift between the sample clock of the audio interface and the CPU's clock. This does not yield 'the exact frequency' of anything as neither oscillator is likely to be accurate, and both may drift dependent on temperature.
I want an app that drains the battery by using the CPU resources in a controlled fashion. Her, by controlled fashion, what I meant to say is that let's say 'X units/ms' is the maximum amount of the battery drain rate and the 'Y units/ms' is the minimum amount of battery drain rate.
Now, I want to give an integer from 1 to 100 as an input to the program and my app generates a battery drain corresponding to its value. Assume, only this app is running on the system.
So, is there any way to do this?
Due to the differences in hardware and configuration, such an app would likely need to calibrate itself. That is, it should run power-consuming tasks while monitoring the battery, to estimate how much power those power-consuming tasks take.
So, there are two things needed:
Battery Monitoring
Android provides an Intent for getting battery information. There's an SDK tutorial1. Unfortunately, the granularity of the results will be limited, likely to each percentage point. This means you need longer calibration tests and your results (and thus drain) will be of limited accuracy.
Power Consuming Tasks
The CPU is generally not the biggest power consumer in a mobile device. Whether the LCD is on might affect the drain more than tying up all of your CPU cores. Radio hardware (3G/GPS/WIFI) can also produce a higher drain than the CPU. An LCD at max brightness will drain more power than an LCD on at min brightness. An AMOLED would drain less power than the LCD.
The performance of different tasks will vary greatly depending on the hardware being used. This is what necessitates calibration.
So I'm trying to build an android app which acts as a real time audio analyzer as a precursor to a project that will involve detecting and filtering out certain sounds.
So I think I've got the basics of discrete Fourier transforms down, however I'm not sure what the best parameters should be for doing real time frequency analysis.
I get the impression that under ideal situations (unlimited computing power), I would take all the samples from the 44100 sample/sec PCM stream I'm getting from the AudioRecord class and put them through a 44100 element fifo "window" (padded to 2**16 with 0's and maybe a tapering function?) , running an FFT on the window every time a new sample came in. This would (I think), give me the spectrum for 0 - ~22 KHz updated 44100 times per second.
It seems like this is not going to happen on a smartphone. The thing is, I'm not sure which parameters of the computation I should reduce in order to make in order to make it tractable on my Galaxy Nexus while still holding on to as much quality as possible. Eventually I would like to be using an external microphone with better sensitivity.
I figure it will involve moving the window more than one sample between taking FFT's, but I have no idea at what point this becomes more detrimental to accuracy/aliasing/whatever than just doing the FFT on a smaller window, or if there is a third option I'm overlooking.
With the natively implemented KissFFT I'm using from libgdx, I seem to be able to do somewhere between 30-42 44100 element FFT's per 44100 samples and still have it be responsive (meaning that the buffer getting filled from the thread doing AudioRecord.read() isn't filling up faster than the thread doing the fft's can drain it).
So my questions are:
Could the performance I'm currently getting just be the best I'm going to get? Or does it seem like I must be something stupid because much faster speeds are possible?
Is my approach to this at least fundamentally correct or am I barking entirely up the wrong tree?
I'd be happy to show any of my code if that would help answer my questions, but there's a lot of it so I figured I would do so selectively instead of posting it all.
if there is a third option I'm overlooking
Yes: doing both at the same time, a reduction of the FFT size as well as a larger step size. In a comment you pointed out that you want to detect "sniffling/chewing with mouth". So, what you want to do is similar to the typical task of speech recognition. There, you typically extract a feature vector in steps of 10ms (meaning with Fs=44.1kHz every 441 samples) and the signal window to transform is roughly about double the size of the step size, so 20ms which yields to a 2^X FFT size of 1024 samples (make sure that you choose an FFT size which is a power of 2, because it is faster).
Any increase in window size or reduction in step size increases the data but mainly adds redundancy.
Additional hints:
#SztupY correctly pointed out that you need to "window" your signal prior to the FFT, typically with a Hamming-wondow. (But this is not "filtering". It is just multiplying each sample value with the corresponding window value without accumulating the result).
The raw FFT output is hardly suited to recognize "sniffling/chewing with mouth", a classical recognizer consists of HMMs or ANNs which process sequences of MFCCs and their deltas.
Could the performance I'm currently getting just be the best I'm going to get? Or does it seem like I must be something stupid because much faster speeds are possible?
It's close to the best, but you are wasting all the CPU power to estimate highly redundant data, leaving no CPU power to the recognizer.
Is my approach to this at least fundamentally correct or am I barking entirely up the wrong tree?
After considering my answer you might re-think your approach.
The idea is Phone A sends a sound signal and bluetooth signal at the same time and Phone B will calculate the delay between the two signals.
In practice I am getting inconsistent results with delays from 90ms-160ms.
I tried optimizing both ends as much as possible.
On the output end:
Tone is generated once
Bluetooth and audio output each have their own thread
Bluetooth only outputs after AudioTrack.write and AudioTrack is in streaming mode so it should start outputting before the write is even completed.
On the receiving end:
Again two separate threads
System time is recorded before each AudioRecord.read
Sampling specs:
44.1khz
Reading entire buffer
Sampling 100 samples at a time using fft
Taking into account how many samples transformed since initial read()
Your method relies on basically zero latency throughout the whole pipeline, which is realistically impossible. You just can't synchronize it with that degree of accuracy. If you could get the delays down to 5-6ms, it might be possible, but you'll beat your head into your keyboard before that happens. Even then, it could only possibly be accurate to 1.5 meters or so.
Consider the lower end of the delays you're receiving. In 90ms, sound can travel slightly over 30m. That's the very end of the marketed bluetooth range, without even considering that you'll likely be in non-ideal transmitting conditions.
Here's a thread discussing low latency audio in Android. TL;DR is that it sucks, but is getting better. With the latest APIs and recent devices, you may be able to get it down to 30ms or so, assuming you run some hand-tuned audio functions. No simple AudioTrack here. Even then, that's still a good 10-meter circular error probability.
Edit:
A better approach, assuming you can synchronize the devices' clocks, would be to embed a timestamp into the audio signal, using a simple am/fm modulation or pulse train. Then you could decode it at the other end and know when it was sent. You still have to deal with the latency problem, but it simplifies the whole thing nicely. There's no need for bluetooth at all, since it isn't really a reliable clock anyway, since it can be regarded as having latency problems of its own.
This gives you a pretty good approach
http://netscale.cse.nd.edu/twiki/pub/Main/Projects/Analyze_the_frequency_and_strength_of_sound_in_Android.pdf
You have to create an 1 kHz sound with some amplitude (measure in dB) and try to measure the amplitude of the sound arrived to the other device. From the sedation you might be able to measure the distance.
As I remember: a0 = 20*log (4*pi*distance/lambda) where a0 is the sedation and lambda is given (you can count it from the 1kHz)
But in such a sensitive environment, the noise might spoil the whole thing, just an idea, how I would do if I were you.