I am developing an android application in which I have to record an audio from an android phone and find the power of a specific frequency using Goertzel algorithm. Using that power value, I am making some decisions regarding some secret messages sent through the audio.
In order for this application to be used in a variety of android phones, I need to make sure that the microphone gains of all the phones are the same. But unfortunately, different phones have different gains.
(Sony phones have very small gains and Samsung, HTC phones have very large gains). Is there a way to set a common gain to the microphones through android?
One other option is, I can record the audio to the full length and normalize it. However, I cannot do this because, I am doing this processing in real time, i.e. I have to do the processing for each and every frame received from the audiorecorder object and calculate the power value, the app cannot wait until the end of the audio to decide the message.
So, if possible, please suggest me some methods I can try to overcome this.
Related
I am developing an android app for recording the sound. In my app i will display the SPL (Sound Pressure Level) in dB. As part of my search, i come across, mobile hardware can only record sounds up to <= 110 dB. The reason is, mobiles are designed for human voice recording and that falls under the range of 60 dB. So if i need to record the sounds which is more than 110 dB how the mobile hardware will respond to that? Do i need to depend upon external devices and not the mobiles? Please provide your comments.
Thanks & regards,
Siva.
Your question is in fact about the dynamic range of the audio input of a mobile phone - any value you record must be capable of being represented in the scale used to measure it.
There is an associated question of what the largest sound pressure level that a particular phone can record, but this is ultimately limited by the dynamic range and the design of transducer used. Any absolutely measure is relative a calibration point - which in digital audio systems is dB FSD (e.g. ratio sample to maximum), yielding negative values.
The dynamic range in dB of a ideal PCM system is limited by quantisation noise and is related directly to bit-depth (Q) of the sample:
SQNR = 20*log10(2 ^ Q) =~ 6.02Q
State-of-the-art ADCs used in pro-audio equipment typically have 24-bit sample depth giving a SQNR of 144dB. It's worth noting, that in silicon ADCs and DACs, the thermal noise floor of the analogue section of the converter is smaller than this, and the LSB might as well be random.
AFAIK, Android is using 16-bit PCM, which has a SQNR of 96dB. This is the same performance as the CD Audio standard. A SNR of 110dB wouldn't be bad for pro-audio equipment.
In practice, audio quality is rarely a headline feature of phones and most get nowhere near this. Most users use crappy headphones or the on-board speaker of their phone for voice calls and won't notice the difference. It's an obvious corner to cut from both a cost and power budget point of view for a phone manufacturer.
Additionally, good digital audio design is a black-art. Factors such as decoupling of digital signals from ground and physical proximity of analogue components come into play. You find that in tear-downs of Apple kit, they often place the codec right next to the headphone jack, and away from the main board of the system. Again, other cost-conscious manufactures don't do this, and it'll degrade the dynamic range of the system.
In order to get meaningful measurements from the audio input you will need to disable both automatic gain control (AGC) and probably the HFP (used to remove DC bias, and often set with Fc > 100Hz for voice calls).
If your intention is to record absolute SPL, you will need to calibrate the audio system of the device to a set-point. There is no standardisation of this between manufacturers (or even devices from any given manufacturer). Unless you fancy doing this for the devices on the market (of which there are a lot), you'll never provide universally accurate measurements.
I am developing an android application which records audio in PCM using the AudioRecord API. I want to adjust the mic sensitivity to low, medium and high as the user chooses it in the settings.
Is it possible to adjust the mic sensitivity? Your answers will be highly appreciated :)
Not really. It's usually possible to get at least two different "sensitivities" (acoustic tunings used by the platform) implicitly by using different AudioSources.There should at least be one tuning for handset recording and one for far-field recording. On some devices you might also have different far-field tunings, e.g. one for recording audio a few decimeters away and one for recording audio a few meters away.
The problem is that you can't really know which AudioSource corresponds to which tuning, as there's no standard for it. CAMCORDER typically means far-field, and VOICE_RECOGNITION often means handset mode, but there's no guarantee for it. You should also keep in mind that vendors typically apply automatic gain control, noise reduction, etc that you as a user / app developer can't disable, in order to meet acoustic requirements for their products.
Your best bet would probably be to use a single AudioSource and then do attenuation of the signal in your app to simulate a lower mic sensitivity. You could do amplification as well, but that would be akin to using digital zoom in the camera app (it works, but doesn't look all that good because you're just scaling the existing data).
If someone wants to write a android application that interacts with a physical device, specifically a reader using mobiles audio jack
(e.g. Like how Square Inc is doing ) how is this done?
Is there a api's to interact with the reader and get the cards data?
When a company creates a reader (physical device) does it provide relevant apis?
Are the physical details abstracted from the application programmer?
I have found the AudioRecord class which can record magnetic stripe data from audio jack
But I can't fiqure out how to capture the actual card swipe event and
to extract the meaningful data from RAW DATA
Can any one help me with this
Any input is highly welcome!
The way this usually works is by encoding the data signal sent out by the device, like the card reader, in such a way that is can be decoded on the other end. Sound is a wave, and different amplitudes correspond to different loudness, and different frequencies correspond to different pitches. Imagine you have a sine wave, that varies between a high and a low frequency that are sufficiently different from each other so as to be easily distinguishable. The device sending out binary data (0's and 1's) can translate this data into an audio signal that varies by frequency (an alternative is varying amplitude). The receiver, in this case the mobile device, decodes the signal back into 0's and 1's. This is called "Frequency-shift-keying" (check out more here: http://en.wikipedia.org/wiki/Frequency-shift_keying).
The simplest way to implement this is to try and find an open library that already does it. The device sending the data will also need to contain some kind of microcontroller that can perform the initial modulation. If you come across any good libraries, let me know, because I'm currently
looking.
To answer your question, companies do not generally provide APIs etc to perform this.
This may seem like a lot of extra work to convert a digital signal, into an audio signal, and back, and you're right. However, every mobile device has essentially the same headphone jack, whereas the USB port on an Android is drastically different from an iPhone's lighting connector, or the connector in previous iPhones. Sending wirelessly through a network or Bluetooth is also an option, but they have their disadvantages as well.
Now the mobile device must be using a special headphone jack that supports microphones, otherwise it cannot receive input, it can only output sound. Most smartphones can do this.
Radios work on this principle (FM = Frequency modulation, AM = amplitude modulation).
Old dial up modems used FSK, which is why you heard those weird noises each time it connected.
Hope that helps!
I am writing an application that will behave similar to the existing Voice recognition but will be sending the sound data to a proprietary web service to perform the speech recognition part. I am using the standard MediaRecord (which is AMR-NB encoded) which seems to be perfect to speech recognition. The only data provided by this is the Amplitude via the getMaxAmplitude() method.
I am trying to detect when the person starts to talk so that when the person stops talking for about 2 seconds I can proceed to send the sound data to the web service. Right now I am using a threshold for the amplitude that if its goes over a value (i.e. 1500) then I assume the person is speaking. My concern is that the amplitude levels may vary by device (i.e. Nexus One v Droid), so I am looking for a more standard approach to this that can be derived from the amplitude values.
P.S.
I looked at graphing-amplitude but it doesn't provide a way to do it with just the amplitude.
Well, this might not be of much help but how about starting by measuring the offset noise captured by the microphone of the device by the application, and apply the threshold dynamically based on that? That way you would make it adaptable to the different devices' microphones and also to the environment the user is using it at, at a given time.
1500 is too low of a number. Measuring the change in amplitude will work better.
However, it will still result in miss detections.
I fear the only way to solve this problem is to figure out how to recognize a simple word or tone rather than simply detect noise.
There are now multiple VAD library designed for Android. One of these are:
https://github.com/gkonovalov/android-vad
Most of the smartphones come with a proximity sensor. Android has API for using these sensors. This would be adequate for the job you described. When the user moves the phone near to his ear, you can code the app to start recording. It should be easy enough.
Sensor class for android
I'm trying to build a gadget that detects pistol shots using Android. It's a part of a training aid for pistol shooters that tells how the shots are distributed in time and I use a HTC Tattoo for testing.
I use the MediaRecorder and its getMaxAmplitude method to get the highest amplitude during the last 1/100 s but it does not work as expected; speech gives me values from getMaxAmplitude in the range from 0 to about 25000 while the pistol shots (or shouting!) only reaches about 15000. With a sampling frequency of 8kHz there should be some samples with considerably high level.
Anyone who knows how these things work? Are there filters that are applied before registering the max amplitude. If so, is it hardware or software?
Thanks,
/George
It seems there's an AGC (Automatic Gain Control) filter in place. You should also be able to identify the shot by its frequency characteristics. I would expect it to show up across most of the audible spectrum, but get a spectrum analyzer (there are a few on the app market, like SpectralView) and try identifying the event by its frequency "signature" and amplitude. If you clap your hands what do you get for max amplitude? You could also try covering the phone with something to muffle the sound like a few layers of cloth
It seems like AGC is in the media recorder. When I use AudioRecord I can detect shots using the amplitude even though it sometimes reacts on sounds other than shots. This is not a problem since the shooter usually doesn't make any other noise while shooting.
But I will do some FFT too to get it perfect :-)
Sounds like you figured out your agc problem. One further suggestion: I'm not sure the FFT is the right tool for the job. You might have better detection and lower CPU use with a sliding power estimator.
e.g.
signal => square => moving average => peak detection
All of the above can be implemented very efficiently using fixed point math, which fits well with mobile android platforms.
You can find more info by searching for "Parseval's Theorem" and "CIC filter" (cascaded integrator comb)
Sorry for the late response; I didn't see this question until I started searching for a different problem...
I have started an application to do what I think you're attempting. It's an audio-based lap timer (button to start/stop recording, and loud audio noises for lap setting). It' not finished, but might provide you with a decent base to get started.
Right now, it allows you to monitor the signal volume coming from the mic, and set the ambient noise amount. It's also using the new BSD license, so feel free to check out the code here: http://code.google.com/p/audio-timer/. It's set up to use the 1.5 API to include as many devices as possible.
It's not finished, in that it has two main issues:
The audio capture doesn't currently work for emulated devices because of the unsupported frequency requested
The timer functionality doesn't work yet - was focusing on getting the audio capture first.
I'm looking into the frequency support, but Android doesn't seem to have a way to find out which frequencies are supported without trial and error per-device.
I also have on my local dev machine some extra code to create a layout for the listview items to display "lap" information. Got sidetracked by the frequency problem though. But since the display and audio capture are pretty much done, using the system time to fill in the display values for timing information should be relatively straightforward, and then it shouldn't be too difficult to add the ability to export the data table to a CSV on the SD card.
Let me know if you want to join this project, or if you have any questions.