Mic sensitivity in Android CDD

Mic sensitivity in Android CDD - android

The Android Compatibility Definition Document states that
"Audio input sensitivity SHOULD be set such that a 90 dB sound power level (SPL) source at 1000 Hz yields RMS of 2500 for 16-bit samples".
"PCM amplitude levels SHOULD linearly track input SPL changes over at least a 30 dB range from -18 dB to +12 dB re 90 dB SPL at the microphone."
Questions:
Does (1) include Mic sensitivity plus the internal gain of android device to achieve RMS of 2500?
Is (2) for Mic maximum acoustic level only or includes internal of Android device?

Your questions are confusing me. I think you are mixing different levels and gains.
An acoustic level of 90dBspl rms is transfered into the electrical domain through the microphone. The microphone has a different acoustic unit for conversion. It measures sound pressure level in dBPa or Pa. (94dBspl = 0dBPa) The specified -42dBV/Pa means that if you have 0dBPa or 1Pa (which is the same sound pressure level) then you will get -42dBV out from the microphone in the analog electrical domain. -42dBV = 7.94mV (0dBV =1V)
Now, from this point there can be different gains analog and digital. First you can have some analog gain and then you have an A/D converter. After that you are in the digital domain and then you can have digital gain as well. The Android requirement does not specify these gains. It specifies what final digital level you should have with a given acoustic sound pressure level. You can of course calculate each and every step inside the sound chain, but the easiast way is to set all digital gains to 0dB and maybe set the analog gain to something around +20dB (if possible), then you try to get an acoustic sound source with the proper sound pressure level. You will need a sound pressure level meter and a sinusoidal 1kHz tone played through a loudspeaker at maybe 20cm distance in a fairly non-reverberant echo-free room.
Now you record the 90dBspl 1kHz tone with your device and analyze the recording in the digital domain. If you can, you should adjust the gain in the analog domain. Then the digital headroom will be correct. If you do not know what you are doing you could easily try to adjust too much in the digital domain, leading to digital clipping or quantization noise. Digital gain should only put in when you have done everything you can in the analog domain.
If everything is correctly adjusted you will have a good matching between 90dBspl rms acoustic level and the recorded digital level of -22dBFS rms which is the level of 2500 rms in a 16 bit system (this is however a very strange way of measuring). 0dBFS rms is a fully saturated square wave in such a system. A fully saturated sinusoidal will have -3dBFS rms or 0dBFS peak.
Be aware of that if you have enabled any automatic gain control, you will probably not be able to comply with the requirement of linearity.

Related

what is the maximum sound recording capacity of mobile hardware?

I am developing an android app for recording the sound. In my app i will display the SPL (Sound Pressure Level) in dB. As part of my search, i come across, mobile hardware can only record sounds up to <= 110 dB. The reason is, mobiles are designed for human voice recording and that falls under the range of 60 dB. So if i need to record the sounds which is more than 110 dB how the mobile hardware will respond to that? Do i need to depend upon external devices and not the mobiles? Please provide your comments.
Thanks & regards,
Siva.

Your question is in fact about the dynamic range of the audio input of a mobile phone - any value you record must be capable of being represented in the scale used to measure it.
There is an associated question of what the largest sound pressure level that a particular phone can record, but this is ultimately limited by the dynamic range and the design of transducer used. Any absolutely measure is relative a calibration point - which in digital audio systems is dB FSD (e.g. ratio sample to maximum), yielding negative values.
The dynamic range in dB of a ideal PCM system is limited by quantisation noise and is related directly to bit-depth (Q) of the sample:
SQNR = 20*log10(2 ^ Q) =~ 6.02Q
State-of-the-art ADCs used in pro-audio equipment typically have 24-bit sample depth giving a SQNR of 144dB. It's worth noting, that in silicon ADCs and DACs, the thermal noise floor of the analogue section of the converter is smaller than this, and the LSB might as well be random.
AFAIK, Android is using 16-bit PCM, which has a SQNR of 96dB. This is the same performance as the CD Audio standard. A SNR of 110dB wouldn't be bad for pro-audio equipment.
In practice, audio quality is rarely a headline feature of phones and most get nowhere near this. Most users use crappy headphones or the on-board speaker of their phone for voice calls and won't notice the difference. It's an obvious corner to cut from both a cost and power budget point of view for a phone manufacturer.
Additionally, good digital audio design is a black-art. Factors such as decoupling of digital signals from ground and physical proximity of analogue components come into play. You find that in tear-downs of Apple kit, they often place the codec right next to the headphone jack, and away from the main board of the system. Again, other cost-conscious manufactures don't do this, and it'll degrade the dynamic range of the system.
In order to get meaningful measurements from the audio input you will need to disable both automatic gain control (AGC) and probably the HFP (used to remove DC bias, and often set with Fc > 100Hz for voice calls).
If your intention is to record absolute SPL, you will need to calibrate the audio system of the device to a set-point. There is no standardisation of this between manufacturers (or even devices from any given manufacturer). Unless you fancy doing this for the devices on the market (of which there are a lot), you'll never provide universally accurate measurements.

What does the content of the short[] array of AudioRecord.read() represent in Android [duplicate]

I am starting out with audio recording using my Android smartphone.
I successfully saved voice recordings to a PCM file. When I parse the data and print out the signed, 16-bit values, I can create a graph like the one below. However, I do not understand the amplitude values along the y-axis.
What exactly are the units for the amplitude values? The values are signed 16-bit, so they must range from -32K to +32K. But what do these values represent? Decibels?
If I use 8-bit values, then the values must range from -128 to +128. How would that get mapped to the volume/"loudness" of the 16-bit values? Would you just use a 16-to-1 quantisation mapping?
Why are there negative values? I would think that complete silence would result in values of 0.
If someone can point me to a website with information on what's being recorded, I would appreciate it. I found webpages on the PCM file format, but not what the data values are.

Think of the surface of the microphone. When it's silent, the surface is motionless at position zero. When you talk, that causes the air around your mouth to vibrate. Vibrations are spring like, and have movement in both directions, as in back and forth, or up and down, or in and out. The vibrations in the air cause the microphone surface to vibrate as well, as in move up and down. When it moves down, that might be measured or sampled a positive value. When it moves up that might be sampled as a negative value. (Or it could be the opposite.) When you stop talking the surface settles back down to the zero position.
What numbers you get from your PCM recording data depend on the gain of the system. With common 16 bit samples, the range is from -32768 to 32767 for the largest possible excursion of a vibration that can be recorded without distortion, clipping or overflow. Usually the gain is set a bit lower so that the maximum values aren't right on the edge of distortion.
ADDED:
8-bit PCM audio is often an unsigned data type, with the range from 0..255, with a value of 128 indicating "silence". So you have to add/subtract this bias, as well as scale by about 256 to convert between 8-bit and 16-bit audio PCM waveforms.

The raw numbers are an artefact of the quantization process used to convert an analog audio signal into digital. It makes more sense to think of an audio signal as a vibration around 0, extending as far as +1 and -1 for maximum excursion of the signal. Outside that, you get clipping, which distorts the harmonics and sounds terrible.
However, computers don't work all that well in terms of fractions, so discrete integers from 0 to 65536 are used to map that range. In most applications like this, a +32767 is considered maximum positive excursion of the microphone's or speaker's diaphragm. There is no correlation between a sample point and a sound pressure level, unless you start factoring in the characteristics of the recording (or playback) circuits.
(BTW, 16-bit audio is very standard and widely used. It is a good balance of signal-to-noise ratio and dynamic range. 8-bit is noisy unless you do some funky non-standard scaling.)

Lots of good answers here, but they don't directly address your questions in an easy to read way.
What exactly are the units for the amplitude values? The values are
signed 16-bit, so they must range from
-32K to +32K. But what do these values represent? Decibels?
The values have no unit. They simply represent a number that has come out of an analog-to-digital converter. The numbers from the A/D converter are a function of the microphone and pre-amplifier characteristics.
If I use 8-bit values, then the values
must range from -128 to +128. How
would that get mapped to the
volume/"loudness" of the 16-bit
values? Would you just use a 16-to-1
quantisation mapping?
I don't understand this question. If you are recording 8-bit audio, your values will be 8-bits. Are you converting 8-bit audio to 16-bit?
Why are there negative values? I would
think that complete silence would
result in values of 0
The diaphragm on a microphone vibrates in both directions and as a result creates positive and negative voltages. A value of 0 is silence as it indicates that the diaphragm is not moving. See how microphones work
For more details on how sound is represented digitally, see here.

Why are there negative values? I would think that complete silence
would result in values of 0
The diaphragm on a microphone vibrates in both directions and as a
result creates positive and negative voltages. A value of 0 is silence
as it indicates that the diaphragm is not moving. See how microphones
work
Small clarification: The position of the diaphragm is being recorded. Silence occurs when there is no vibration, when there is no change in position. So the vibration you are seeing is what is pushing the air and creating changes in air pressure over time. The air is no longer being pushed at the top and bottom peaks of any vibration, so the peaks are when silence occurs. The loudest part of the signal is when the position changes the fastest which is somewhere in the middle of the peaks. The speed with which the diaphragm moves from one peak to another determines the amount of pressure that's generated by the diaphragm. When the top and bottom peaks are reduced to zero (or some other number they share) then there is no vibration and no sound at all. Also as the diaphragm slows down so that there's a greater space of time between peaks, there is less sound pressure being generated or recorded.
I recommend the Yamaha Sound Reinforcement Handbook for more in depth reading. Understanding the idea of calculus would help the understanding of audio and vibration as well.

The 16bit numbers are the A/D convertor values from your microphone (you knew this). Know also that the amplifier between your microphone and the A/D convertor has an Automatic Gain Control (AGC). The AGC will actively change the amplification of the microphone signal to prevent too much voltage from hitting the A/D convertor (usually < 2Volts dc). Also, there is DC voltage de-coupling which sets the input signal in the middle of the A/D convertor's range (say 1Volt dc).
So, when there is no sound hitting the microphone, the AGC amplifier is sending a flat line 1.0 Volt dc signal to the A/D convertor. When sound waves hit the microphone, it creates a corresponding AC voltage wave. The AGC amp takes the AC voltage wave, centers it at 1.0 Vdc, and sends it to the A/D convertor. The A/D samples (measures the DC Voltage at say 44,000 / per second), and spits out the +/-16bit values of the voltage. So -65,536 = 0.0 Vdc and +65,536 = 2.0 Vdc. A value of +100 = 1.00001529 Vdc and -100 = 0.99998474 Vdc hitting the A/D convertor.
+Values are above 1.0 Vdc, -Values are below 1.0 Vdc.
Note, most audio systems use a log formula to curve the audio wave logarithmically, so a human ear can better hear it. In digital audio systems (with ADCs), Digital Signal Processing puts this curve on the signal. DSPs chips are big business, TI has made a fortune using them for all kinds of applications, not just audio processing. DSPs can work the very complicated math onto a real time stream of data that would choke an iPhone's ARM7 processor. Say you are sending 2MHz pulses to an array of 256 ultrasound sensor/receivers--you get the idea.

How to measure sound volume in dB scale Android

We are working on a cross-platform project that requires sound volume sampling on smartphones and analyse the result with as high accuracy as possible, the IPhone developer used iOS implemented functionality of returnning sound power/volume in dB scale calculated by the OS itself. as far as i know there is no equivalent functionality in Android OS.
as of now, I am working on Android with the MediaRecorder class given by the OS, and i use getMaxAmplitude to measure the sound power/volume, i have seen a lot of answers on the net regard how to transfer amplitude to dB scale, the answer that sounded most reasonable was using the formula :
20*Math.log10(amplitude/MAX_AMPLITUDE)
but then i must know what the MAX_AMPLITUDE that can be returned by getMaxAmplitude, thing is that it is diffrent on diffrent devices, for exemple i tested getMaxAmplitude on HTC Desire, and on Samsung Galaxy S3,
on HTC it was reaching 32767 (which i saw in some answer that is the documented max), and on the S3 it was not going beyond 16383 (half of the HTC).
Q1 :
is this(the approach discussed above) the correct approach? its just that I read that the correct way to measure sound power/volume is by calculating RSM and then convert it to dB, is this how its done on IPhone?
Q2 :
no metter if i use RSM or just the Amplitude from getMaxAmplitude, it seems to me that i still need to know whats the highest amplitude i can get from the record hardware, is there a way to know that? or is there a way to somehow go around it?

90dBspl is an rms value in the acoustic domain.
The digital level of 2500 rms in a 16bit system is the same as approximately -22dB FS rms (actually -22.35), where 0dBFS rms is a full scale square wave. A full scale sinusoidal in such a system is 0dBFS peak and -3dB FS rms (reaching from -32768 to +32767).
A square wave of +/-2500 can be calculated as:
20 * log ( 2500/32767) = -22.35 dB FS rms
Please note that peaks of sinusoidals are always 3dB higher than the rms level. The only signal that has the same rms and peak level is the square wave.
Now, Android has a requirement of 30dB linearity around 90dBspl, but this linearity shall be +12dB above 90dBspl and -18dB below the same point. Outside this range there can be compression in different ways, depending on which phone model you test.
The guaranteed highest linear level inside an Android phone is -22dBFS +12dB = -10dBFS rms. Above this level it is uncertain. The most common scenario is that the last 7dB of peak headroom are still linear, leading to an acoustic maximum level of 90dBspl + (22-3 dB) = 109dB spl rms for a sinusoidal without clipping (or 112 dB spl peak).
In some phones you will find a peak limiter that reduces the gain above 102dBspl rms. The outcome of this is that you can still record up to the level of saturation for the microphone. This saturation level varies, but it is common to have like 2% distortion at 120dB spl. Above this level the microphone component starts to saturate and clip.
Looking at the other end of the scale:
Small phone microphones are in general noisy. The latest microphones can have a noise floor at -63dB below 0dBPa (94dBspl), but most microphones are between -58 and -60dB below 0dBPa.
How can this be calculated to dBFS rms ?
0dBPa rms is 94dB spl rms. From the statement above we know that 90dBspl rms acoustic level will be recorded at the digital level of -22dBFS rms in Android phones. -63dB below 90dBspl is the same as -22dBFSrms +4dB -63dB = -81dBFSrms. The absolute maximum range of dynamics in a 16 bit system can be approximated to 96dB (or 93dB depending how you see it), so the noise level is at least 12dB above the quantization noise in the digital file.
This is a very important finding for video recording mode. Unfortunately many video applications in Android tend to have too high microphone gain in the recording. This leads to clipping when recording loud music concerts and similar situations. We also know that the microphone itself is good up to at least 120dB. So it would be a good idea for any audio system engineer to make a video recording mode that actually used the whole dynamic range of the microphone. This means that the gain should be set at least 8dB lower. It is always possible to change the rms level afterwards in a video recording if the sound is too soft, but if it is clipped, then you have damaged the recording forever.
So, my message to you programmers is to implement a video recording mode where the acoustic level of 90dB spl rms is recorded at -30dBFSrms or slightly below that. Any maximization can be done afterwards. In this way we could record rock concerts with much better sound. Doing automatic gain control does not help the sound quality. The dynamic range is often too big to be controlled automatically. You get a lot of pumping in the sound. It is better to implement two different video recording modes: Concert mode and speech mode. In speech mode (optimized for a talking person at 1m distance) the recording gain could be even higher than -22dBFSrms for 90dBspl. I would say -12dBFS rms for 90dBspl would be a suitable recording level. (speech at 1m distance has an rms level of approximately 57dB spl and peaks 20-30dB higher).
Björn Gröhn
Audio system engineer at Sony mobile Lund, Sweden

Audio output level in a form that can be converted to decibel

I need to find a way to get the current audio output volume while the phone is making noise on the headphones, this value will be converted to a decibel level. The android API does not appear to have any way of accessing a constant volume level other than a seemingly arbitrary volume setting level, but I dont see a way to convert that to a standard decibel level or "loudness" measurement. I have seen some ways to use the mic for this, but that wont work with headsets very well.
Does anyone know a way to measure either the maximum possible decibel (or some standard) output level to compare against, or possible the voltage being sent to the headset?
Help is welcomed.

Be aware that there are many different meanings of the word 'deciBel'. It is a means of representing some quantity (such as intensity/power/loudness) relative to a reference point. For audio signals inside equipment, or in an audio application, there is a peak level of 0dB. When sound is emitted from a speaker, the perceived loudness is measured as a Sound Pressure Level, often described as 'dB (SPL)' (or weighted variants such as dBA). When you see the tables of values such as rock concerts at 100dB then this is the SPL that is being described. This measurement is itself relative to a reference level.
So what will have available in the API is the buffer of audio data from which you can easily obtain the audio level in terms of the raw signal (which has a maximum of 0dB). You can't however easily convert this to a physical loudness because this will be dependent on the hardware. It will be different between one model of phone and the next, and will depend on the headphones too. The only way of doing this will be to calibrate the phone by measuring with an SPL meter, but then this will give you a result which will only give reasonable results on this particular phone.

I'm doing it like this:
SLmillibel gain_to_attenuation(float volume)
{
SLmillibel volume_mb;
if(volume>=1.0f) volume_mb=SL_MILLIBEL_MAX;
else if(volume<=0.02f) volume_mb=SL_MILLIBEL_MIN;
else
{
volume_mb=M_LN2/log(1.0f/(1.0f-volume))*-1000.0f;
if(volume_mb>0) volume_mb=SL_MILLIBEL_MIN;
}
return volume_mb;
}

How to best determine volume of a signal?

I want to determine the volume of an audio signal.
I have found two options:
Compute Root Mean Squared of the amplitude
find the maximum amplitude
Are there advantages to using #1 or #2?
Here is what I am trying to do:
I want my Android to analyze audio from the microphone. I want the device to detect a loud noise. The input is a short [].

If you use the maximum amplitude (2), then your volume level would be determined by a single sample (which you might not even be able to hear). When calculating a value that correlates with your impression of the loudness of the sound such as the Sound Pressure Level or the Sound Power Level you need to use the RMS (1).
Because you ear is not equally sensitive to all frequencies, a better correlate of your perception can be had by using an A-weighting on the signal. Split (filter) the signal in octave bands, calculate the RMS for each band and apply the A-weighting.

If you want to check volume level, just compute its dB Value (I assume the signal is normalized i.e. 1 == maximum level):
level[n] = - 20 x log(1/signal[n]);
However, detecting audio noise is not a trivial task. The most common and simple technique is to use algorithm called NoiseGate which basically compares the signal level with some dB Threshold value - if the signal level is above threshold, then the output is zeroed. But it is unusable in practice; there must be also some Attack and Release times for smooth thresholding otherwise it would affect also a real signal (music, speech) and produce some kind of clipping.
Check Google, it will give you a lot of resources about NoiseGate algorithm and noise removal techniques:
http://en.wikipedia.org/wiki/Noise_gate
http://www.developer.com/java/other/article.php/3599661/Adaptive-Noise-Cancellation-using-Java.htm

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.