How to best determine volume of a signal?

How to best determine volume of a signal? - android

I want to determine the volume of an audio signal.
I have found two options:
Compute Root Mean Squared of the amplitude
find the maximum amplitude
Are there advantages to using #1 or #2?
Here is what I am trying to do:
I want my Android to analyze audio from the microphone. I want the device to detect a loud noise. The input is a short [].

If you use the maximum amplitude (2), then your volume level would be determined by a single sample (which you might not even be able to hear). When calculating a value that correlates with your impression of the loudness of the sound such as the Sound Pressure Level or the Sound Power Level you need to use the RMS (1).
Because you ear is not equally sensitive to all frequencies, a better correlate of your perception can be had by using an A-weighting on the signal. Split (filter) the signal in octave bands, calculate the RMS for each band and apply the A-weighting.

If you want to check volume level, just compute its dB Value (I assume the signal is normalized i.e. 1 == maximum level):
level[n] = - 20 x log(1/signal[n]);
However, detecting audio noise is not a trivial task. The most common and simple technique is to use algorithm called NoiseGate which basically compares the signal level with some dB Threshold value - if the signal level is above threshold, then the output is zeroed. But it is unusable in practice; there must be also some Attack and Release times for smooth thresholding otherwise it would affect also a real signal (music, speech) and produce some kind of clipping.
Check Google, it will give you a lot of resources about NoiseGate algorithm and noise removal techniques:
http://en.wikipedia.org/wiki/Noise_gate
http://www.developer.com/java/other/article.php/3599661/Adaptive-Noise-Cancellation-using-Java.htm

Related

Max Amplitude from PCM Buffers - Audio Android

I am trying to find maximum amplitude value from PCM Buffer.
My questions are-
1) I found that to find this value in DB, formula is : amplDB=20log(abs(ampl)/32767). Now given that ampl is in range of -32768 to 32767, the value of log((abs)ampl/32767) would be always negative. So is this formula the correct one? Should I just negate the value of amplDB?
2) My values are coming very high. For normal song also the Maximum amplitude value is 32767, which doesn't seem correct. What are the usual amplitude values for a song?
3) I found another formula amplDb=ampl/2700. What is this 2700 for?
4) Is there any other way I can calculate the amplitude value?
Thanks

The formula you are using is correct. Keep in mind that dB is a perceptual measurement that compares an intensity with a reference level you set. Therefore, it makes sense that it is always negative since your reference level being used at the formula is the maximum PCM level. In other words, your dB will always be lower (negative), than your maximum level (0 dB).
Regarding the values you're obtaining, it is quite normal to obtain the maximum amplitude. If it is a commercial song, a common mastering practice is to boost the signal as much as possible. If it is a recording you made, it could have to do with the microphones sensitivity and the sounds you're recording.
Finally, just to be clear, this has nothing to do with the sound pressure levels at which the sound will happen upon playback, since you're only looking at the differences in amplitude of a recorded sound.

What does the content of the short[] array of AudioRecord.read() represent in Android [duplicate]

I am starting out with audio recording using my Android smartphone.
I successfully saved voice recordings to a PCM file. When I parse the data and print out the signed, 16-bit values, I can create a graph like the one below. However, I do not understand the amplitude values along the y-axis.
What exactly are the units for the amplitude values? The values are signed 16-bit, so they must range from -32K to +32K. But what do these values represent? Decibels?
If I use 8-bit values, then the values must range from -128 to +128. How would that get mapped to the volume/"loudness" of the 16-bit values? Would you just use a 16-to-1 quantisation mapping?
Why are there negative values? I would think that complete silence would result in values of 0.
If someone can point me to a website with information on what's being recorded, I would appreciate it. I found webpages on the PCM file format, but not what the data values are.

Think of the surface of the microphone. When it's silent, the surface is motionless at position zero. When you talk, that causes the air around your mouth to vibrate. Vibrations are spring like, and have movement in both directions, as in back and forth, or up and down, or in and out. The vibrations in the air cause the microphone surface to vibrate as well, as in move up and down. When it moves down, that might be measured or sampled a positive value. When it moves up that might be sampled as a negative value. (Or it could be the opposite.) When you stop talking the surface settles back down to the zero position.
What numbers you get from your PCM recording data depend on the gain of the system. With common 16 bit samples, the range is from -32768 to 32767 for the largest possible excursion of a vibration that can be recorded without distortion, clipping or overflow. Usually the gain is set a bit lower so that the maximum values aren't right on the edge of distortion.
ADDED:
8-bit PCM audio is often an unsigned data type, with the range from 0..255, with a value of 128 indicating "silence". So you have to add/subtract this bias, as well as scale by about 256 to convert between 8-bit and 16-bit audio PCM waveforms.

The raw numbers are an artefact of the quantization process used to convert an analog audio signal into digital. It makes more sense to think of an audio signal as a vibration around 0, extending as far as +1 and -1 for maximum excursion of the signal. Outside that, you get clipping, which distorts the harmonics and sounds terrible.
However, computers don't work all that well in terms of fractions, so discrete integers from 0 to 65536 are used to map that range. In most applications like this, a +32767 is considered maximum positive excursion of the microphone's or speaker's diaphragm. There is no correlation between a sample point and a sound pressure level, unless you start factoring in the characteristics of the recording (or playback) circuits.
(BTW, 16-bit audio is very standard and widely used. It is a good balance of signal-to-noise ratio and dynamic range. 8-bit is noisy unless you do some funky non-standard scaling.)

Lots of good answers here, but they don't directly address your questions in an easy to read way.
What exactly are the units for the amplitude values? The values are
signed 16-bit, so they must range from
-32K to +32K. But what do these values represent? Decibels?
The values have no unit. They simply represent a number that has come out of an analog-to-digital converter. The numbers from the A/D converter are a function of the microphone and pre-amplifier characteristics.
If I use 8-bit values, then the values
must range from -128 to +128. How
would that get mapped to the
volume/"loudness" of the 16-bit
values? Would you just use a 16-to-1
quantisation mapping?
I don't understand this question. If you are recording 8-bit audio, your values will be 8-bits. Are you converting 8-bit audio to 16-bit?
Why are there negative values? I would
think that complete silence would
result in values of 0
The diaphragm on a microphone vibrates in both directions and as a result creates positive and negative voltages. A value of 0 is silence as it indicates that the diaphragm is not moving. See how microphones work
For more details on how sound is represented digitally, see here.

Why are there negative values? I would think that complete silence
would result in values of 0
The diaphragm on a microphone vibrates in both directions and as a
result creates positive and negative voltages. A value of 0 is silence
as it indicates that the diaphragm is not moving. See how microphones
work
Small clarification: The position of the diaphragm is being recorded. Silence occurs when there is no vibration, when there is no change in position. So the vibration you are seeing is what is pushing the air and creating changes in air pressure over time. The air is no longer being pushed at the top and bottom peaks of any vibration, so the peaks are when silence occurs. The loudest part of the signal is when the position changes the fastest which is somewhere in the middle of the peaks. The speed with which the diaphragm moves from one peak to another determines the amount of pressure that's generated by the diaphragm. When the top and bottom peaks are reduced to zero (or some other number they share) then there is no vibration and no sound at all. Also as the diaphragm slows down so that there's a greater space of time between peaks, there is less sound pressure being generated or recorded.
I recommend the Yamaha Sound Reinforcement Handbook for more in depth reading. Understanding the idea of calculus would help the understanding of audio and vibration as well.

The 16bit numbers are the A/D convertor values from your microphone (you knew this). Know also that the amplifier between your microphone and the A/D convertor has an Automatic Gain Control (AGC). The AGC will actively change the amplification of the microphone signal to prevent too much voltage from hitting the A/D convertor (usually < 2Volts dc). Also, there is DC voltage de-coupling which sets the input signal in the middle of the A/D convertor's range (say 1Volt dc).
So, when there is no sound hitting the microphone, the AGC amplifier is sending a flat line 1.0 Volt dc signal to the A/D convertor. When sound waves hit the microphone, it creates a corresponding AC voltage wave. The AGC amp takes the AC voltage wave, centers it at 1.0 Vdc, and sends it to the A/D convertor. The A/D samples (measures the DC Voltage at say 44,000 / per second), and spits out the +/-16bit values of the voltage. So -65,536 = 0.0 Vdc and +65,536 = 2.0 Vdc. A value of +100 = 1.00001529 Vdc and -100 = 0.99998474 Vdc hitting the A/D convertor.
+Values are above 1.0 Vdc, -Values are below 1.0 Vdc.
Note, most audio systems use a log formula to curve the audio wave logarithmically, so a human ear can better hear it. In digital audio systems (with ADCs), Digital Signal Processing puts this curve on the signal. DSPs chips are big business, TI has made a fortune using them for all kinds of applications, not just audio processing. DSPs can work the very complicated math onto a real time stream of data that would choke an iPhone's ARM7 processor. Say you are sending 2MHz pulses to an array of 256 ultrasound sensor/receivers--you get the idea.

How to measure sound volume in dB scale Android

We are working on a cross-platform project that requires sound volume sampling on smartphones and analyse the result with as high accuracy as possible, the IPhone developer used iOS implemented functionality of returnning sound power/volume in dB scale calculated by the OS itself. as far as i know there is no equivalent functionality in Android OS.
as of now, I am working on Android with the MediaRecorder class given by the OS, and i use getMaxAmplitude to measure the sound power/volume, i have seen a lot of answers on the net regard how to transfer amplitude to dB scale, the answer that sounded most reasonable was using the formula :
20*Math.log10(amplitude/MAX_AMPLITUDE)
but then i must know what the MAX_AMPLITUDE that can be returned by getMaxAmplitude, thing is that it is diffrent on diffrent devices, for exemple i tested getMaxAmplitude on HTC Desire, and on Samsung Galaxy S3,
on HTC it was reaching 32767 (which i saw in some answer that is the documented max), and on the S3 it was not going beyond 16383 (half of the HTC).
Q1 :
is this(the approach discussed above) the correct approach? its just that I read that the correct way to measure sound power/volume is by calculating RSM and then convert it to dB, is this how its done on IPhone?
Q2 :
no metter if i use RSM or just the Amplitude from getMaxAmplitude, it seems to me that i still need to know whats the highest amplitude i can get from the record hardware, is there a way to know that? or is there a way to somehow go around it?

90dBspl is an rms value in the acoustic domain.
The digital level of 2500 rms in a 16bit system is the same as approximately -22dB FS rms (actually -22.35), where 0dBFS rms is a full scale square wave. A full scale sinusoidal in such a system is 0dBFS peak and -3dB FS rms (reaching from -32768 to +32767).
A square wave of +/-2500 can be calculated as:
20 * log ( 2500/32767) = -22.35 dB FS rms
Please note that peaks of sinusoidals are always 3dB higher than the rms level. The only signal that has the same rms and peak level is the square wave.
Now, Android has a requirement of 30dB linearity around 90dBspl, but this linearity shall be +12dB above 90dBspl and -18dB below the same point. Outside this range there can be compression in different ways, depending on which phone model you test.
The guaranteed highest linear level inside an Android phone is -22dBFS +12dB = -10dBFS rms. Above this level it is uncertain. The most common scenario is that the last 7dB of peak headroom are still linear, leading to an acoustic maximum level of 90dBspl + (22-3 dB) = 109dB spl rms for a sinusoidal without clipping (or 112 dB spl peak).
In some phones you will find a peak limiter that reduces the gain above 102dBspl rms. The outcome of this is that you can still record up to the level of saturation for the microphone. This saturation level varies, but it is common to have like 2% distortion at 120dB spl. Above this level the microphone component starts to saturate and clip.
Looking at the other end of the scale:
Small phone microphones are in general noisy. The latest microphones can have a noise floor at -63dB below 0dBPa (94dBspl), but most microphones are between -58 and -60dB below 0dBPa.
How can this be calculated to dBFS rms ?
0dBPa rms is 94dB spl rms. From the statement above we know that 90dBspl rms acoustic level will be recorded at the digital level of -22dBFS rms in Android phones. -63dB below 90dBspl is the same as -22dBFSrms +4dB -63dB = -81dBFSrms. The absolute maximum range of dynamics in a 16 bit system can be approximated to 96dB (or 93dB depending how you see it), so the noise level is at least 12dB above the quantization noise in the digital file.
This is a very important finding for video recording mode. Unfortunately many video applications in Android tend to have too high microphone gain in the recording. This leads to clipping when recording loud music concerts and similar situations. We also know that the microphone itself is good up to at least 120dB. So it would be a good idea for any audio system engineer to make a video recording mode that actually used the whole dynamic range of the microphone. This means that the gain should be set at least 8dB lower. It is always possible to change the rms level afterwards in a video recording if the sound is too soft, but if it is clipped, then you have damaged the recording forever.
So, my message to you programmers is to implement a video recording mode where the acoustic level of 90dB spl rms is recorded at -30dBFSrms or slightly below that. Any maximization can be done afterwards. In this way we could record rock concerts with much better sound. Doing automatic gain control does not help the sound quality. The dynamic range is often too big to be controlled automatically. You get a lot of pumping in the sound. It is better to implement two different video recording modes: Concert mode and speech mode. In speech mode (optimized for a talking person at 1m distance) the recording gain could be even higher than -22dBFSrms for 90dBspl. I would say -12dBFS rms for 90dBspl would be a suitable recording level. (speech at 1m distance has an rms level of approximately 57dB spl and peaks 20-30dB higher).
Björn Gröhn
Audio system engineer at Sony mobile Lund, Sweden

Audio output level in a form that can be converted to decibel

I need to find a way to get the current audio output volume while the phone is making noise on the headphones, this value will be converted to a decibel level. The android API does not appear to have any way of accessing a constant volume level other than a seemingly arbitrary volume setting level, but I dont see a way to convert that to a standard decibel level or "loudness" measurement. I have seen some ways to use the mic for this, but that wont work with headsets very well.
Does anyone know a way to measure either the maximum possible decibel (or some standard) output level to compare against, or possible the voltage being sent to the headset?
Help is welcomed.

Be aware that there are many different meanings of the word 'deciBel'. It is a means of representing some quantity (such as intensity/power/loudness) relative to a reference point. For audio signals inside equipment, or in an audio application, there is a peak level of 0dB. When sound is emitted from a speaker, the perceived loudness is measured as a Sound Pressure Level, often described as 'dB (SPL)' (or weighted variants such as dBA). When you see the tables of values such as rock concerts at 100dB then this is the SPL that is being described. This measurement is itself relative to a reference level.
So what will have available in the API is the buffer of audio data from which you can easily obtain the audio level in terms of the raw signal (which has a maximum of 0dB). You can't however easily convert this to a physical loudness because this will be dependent on the hardware. It will be different between one model of phone and the next, and will depend on the headphones too. The only way of doing this will be to calibrate the phone by measuring with an SPL meter, but then this will give you a result which will only give reasonable results on this particular phone.

I'm doing it like this:
SLmillibel gain_to_attenuation(float volume)
{
SLmillibel volume_mb;
if(volume>=1.0f) volume_mb=SL_MILLIBEL_MAX;
else if(volume<=0.02f) volume_mb=SL_MILLIBEL_MIN;
else
{
volume_mb=M_LN2/log(1.0f/(1.0f-volume))*-1000.0f;
if(volume_mb>0) volume_mb=SL_MILLIBEL_MIN;
}
return volume_mb;
}

How to calculate microphone audio input power in decibel unit

Please help me calculate decibels from phone microphone. The microphone has a getMaxAmplitude() function. How I can I use it to calculate decibels? I read in some forums that the decibel calculation formula is power_db = 20 * log10(amplitude / reference_amplitude). But I don't understand how to find the reference_amplitude.

In sound, decibel values are referenced to a sound pressure level of 20µPa (20 micro Pascal).
So in your case the reference_amplitude would be the amplitude generated by your microphone in the presence of a sound field with a level of 20µPa.
In practice, to find this level, microphones are often calibrated (using a microphone calibrator) with a signal of some precisely known level (often around 94dB). The amplitude resulting from this calibration signal can then be used to calculate the amplitude for the reference signal (assuming the response of the microphone is linear).

Decibels are a unit widely used to define some quantity relative to something else. There are a number of different types of decibel measurements, depending on what you're trying to describe about the signal you're receiving.
Read this link to get you started, it explains everything you need to know much better than I can!

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.