Max Amplitude from PCM Buffers - Audio Android

Max Amplitude from PCM Buffers - Audio Android - android

I am trying to find maximum amplitude value from PCM Buffer.
My questions are-
1) I found that to find this value in DB, formula is : amplDB=20log(abs(ampl)/32767). Now given that ampl is in range of -32768 to 32767, the value of log((abs)ampl/32767) would be always negative. So is this formula the correct one? Should I just negate the value of amplDB?
2) My values are coming very high. For normal song also the Maximum amplitude value is 32767, which doesn't seem correct. What are the usual amplitude values for a song?
3) I found another formula amplDb=ampl/2700. What is this 2700 for?
4) Is there any other way I can calculate the amplitude value?
Thanks

The formula you are using is correct. Keep in mind that dB is a perceptual measurement that compares an intensity with a reference level you set. Therefore, it makes sense that it is always negative since your reference level being used at the formula is the maximum PCM level. In other words, your dB will always be lower (negative), than your maximum level (0 dB).
Regarding the values you're obtaining, it is quite normal to obtain the maximum amplitude. If it is a commercial song, a common mastering practice is to boost the signal as much as possible. If it is a recording you made, it could have to do with the microphones sensitivity and the sounds you're recording.
Finally, just to be clear, this has nothing to do with the sound pressure levels at which the sound will happen upon playback, since you're only looking at the differences in amplitude of a recorded sound.

Related

FFT frequency bucket amplitude varies even with constant tone applied

I am trying to use FFT to decode morse code, but I'm finding that when I examine the resulting frequency bin/bucket I'm interested in, the absolute value is varying quite significantly even when a constant tone is presented. This makes it impossible for me to use the rise and fall around a threshold and therefore decode audio morse.
I've even tried the simple example that seems to be copied everywhere, but it also varies...
I can't work out what I'm doing wrong, and my maths is not clever enough to understand all the formulas associated with FFT.
I now it must be possible, but I can't find out how... can anyone help please?

Make sure you are using the magnitude of the FFT result, not just the real or imaginary component of a complex result.
In general, when a longer constant amplitude sinusoid is fed to a series of shorter FFTs (windowed STFT), the magnitude result will only be constant if the period of the sinusoid is exactly integer periodic in the FFT length. e.g.
f_tone modulo (f_sampling_rate / FFT_length) == 0
If you are only interested in the magnitude of one selected tone frequency, the Goertzel algorithm would serve as a more efficient filter than a full FFT. And, depending on the setup and length restrictions required by your chosen FFT library, it may be easier to vary the length of a Goertzel to match the requirements for your target tone frequency, as well as the time/frequency resolution trade-off needed.

What does the content of the short[] array of AudioRecord.read() represent in Android [duplicate]

I am starting out with audio recording using my Android smartphone.
I successfully saved voice recordings to a PCM file. When I parse the data and print out the signed, 16-bit values, I can create a graph like the one below. However, I do not understand the amplitude values along the y-axis.
What exactly are the units for the amplitude values? The values are signed 16-bit, so they must range from -32K to +32K. But what do these values represent? Decibels?
If I use 8-bit values, then the values must range from -128 to +128. How would that get mapped to the volume/"loudness" of the 16-bit values? Would you just use a 16-to-1 quantisation mapping?
Why are there negative values? I would think that complete silence would result in values of 0.
If someone can point me to a website with information on what's being recorded, I would appreciate it. I found webpages on the PCM file format, but not what the data values are.

Think of the surface of the microphone. When it's silent, the surface is motionless at position zero. When you talk, that causes the air around your mouth to vibrate. Vibrations are spring like, and have movement in both directions, as in back and forth, or up and down, or in and out. The vibrations in the air cause the microphone surface to vibrate as well, as in move up and down. When it moves down, that might be measured or sampled a positive value. When it moves up that might be sampled as a negative value. (Or it could be the opposite.) When you stop talking the surface settles back down to the zero position.
What numbers you get from your PCM recording data depend on the gain of the system. With common 16 bit samples, the range is from -32768 to 32767 for the largest possible excursion of a vibration that can be recorded without distortion, clipping or overflow. Usually the gain is set a bit lower so that the maximum values aren't right on the edge of distortion.
ADDED:
8-bit PCM audio is often an unsigned data type, with the range from 0..255, with a value of 128 indicating "silence". So you have to add/subtract this bias, as well as scale by about 256 to convert between 8-bit and 16-bit audio PCM waveforms.

The raw numbers are an artefact of the quantization process used to convert an analog audio signal into digital. It makes more sense to think of an audio signal as a vibration around 0, extending as far as +1 and -1 for maximum excursion of the signal. Outside that, you get clipping, which distorts the harmonics and sounds terrible.
However, computers don't work all that well in terms of fractions, so discrete integers from 0 to 65536 are used to map that range. In most applications like this, a +32767 is considered maximum positive excursion of the microphone's or speaker's diaphragm. There is no correlation between a sample point and a sound pressure level, unless you start factoring in the characteristics of the recording (or playback) circuits.
(BTW, 16-bit audio is very standard and widely used. It is a good balance of signal-to-noise ratio and dynamic range. 8-bit is noisy unless you do some funky non-standard scaling.)

Lots of good answers here, but they don't directly address your questions in an easy to read way.
What exactly are the units for the amplitude values? The values are
signed 16-bit, so they must range from
-32K to +32K. But what do these values represent? Decibels?
The values have no unit. They simply represent a number that has come out of an analog-to-digital converter. The numbers from the A/D converter are a function of the microphone and pre-amplifier characteristics.
If I use 8-bit values, then the values
must range from -128 to +128. How
would that get mapped to the
volume/"loudness" of the 16-bit
values? Would you just use a 16-to-1
quantisation mapping?
I don't understand this question. If you are recording 8-bit audio, your values will be 8-bits. Are you converting 8-bit audio to 16-bit?
Why are there negative values? I would
think that complete silence would
result in values of 0
The diaphragm on a microphone vibrates in both directions and as a result creates positive and negative voltages. A value of 0 is silence as it indicates that the diaphragm is not moving. See how microphones work
For more details on how sound is represented digitally, see here.

Why are there negative values? I would think that complete silence
would result in values of 0
The diaphragm on a microphone vibrates in both directions and as a
result creates positive and negative voltages. A value of 0 is silence
as it indicates that the diaphragm is not moving. See how microphones
work
Small clarification: The position of the diaphragm is being recorded. Silence occurs when there is no vibration, when there is no change in position. So the vibration you are seeing is what is pushing the air and creating changes in air pressure over time. The air is no longer being pushed at the top and bottom peaks of any vibration, so the peaks are when silence occurs. The loudest part of the signal is when the position changes the fastest which is somewhere in the middle of the peaks. The speed with which the diaphragm moves from one peak to another determines the amount of pressure that's generated by the diaphragm. When the top and bottom peaks are reduced to zero (or some other number they share) then there is no vibration and no sound at all. Also as the diaphragm slows down so that there's a greater space of time between peaks, there is less sound pressure being generated or recorded.
I recommend the Yamaha Sound Reinforcement Handbook for more in depth reading. Understanding the idea of calculus would help the understanding of audio and vibration as well.

The 16bit numbers are the A/D convertor values from your microphone (you knew this). Know also that the amplifier between your microphone and the A/D convertor has an Automatic Gain Control (AGC). The AGC will actively change the amplification of the microphone signal to prevent too much voltage from hitting the A/D convertor (usually < 2Volts dc). Also, there is DC voltage de-coupling which sets the input signal in the middle of the A/D convertor's range (say 1Volt dc).
So, when there is no sound hitting the microphone, the AGC amplifier is sending a flat line 1.0 Volt dc signal to the A/D convertor. When sound waves hit the microphone, it creates a corresponding AC voltage wave. The AGC amp takes the AC voltage wave, centers it at 1.0 Vdc, and sends it to the A/D convertor. The A/D samples (measures the DC Voltage at say 44,000 / per second), and spits out the +/-16bit values of the voltage. So -65,536 = 0.0 Vdc and +65,536 = 2.0 Vdc. A value of +100 = 1.00001529 Vdc and -100 = 0.99998474 Vdc hitting the A/D convertor.
+Values are above 1.0 Vdc, -Values are below 1.0 Vdc.
Note, most audio systems use a log formula to curve the audio wave logarithmically, so a human ear can better hear it. In digital audio systems (with ADCs), Digital Signal Processing puts this curve on the signal. DSPs chips are big business, TI has made a fortune using them for all kinds of applications, not just audio processing. DSPs can work the very complicated math onto a real time stream of data that would choke an iPhone's ARM7 processor. Say you are sending 2MHz pulses to an array of 256 ultrasound sensor/receivers--you get the idea.

Audio output level in a form that can be converted to decibel

I need to find a way to get the current audio output volume while the phone is making noise on the headphones, this value will be converted to a decibel level. The android API does not appear to have any way of accessing a constant volume level other than a seemingly arbitrary volume setting level, but I dont see a way to convert that to a standard decibel level or "loudness" measurement. I have seen some ways to use the mic for this, but that wont work with headsets very well.
Does anyone know a way to measure either the maximum possible decibel (or some standard) output level to compare against, or possible the voltage being sent to the headset?
Help is welcomed.

Be aware that there are many different meanings of the word 'deciBel'. It is a means of representing some quantity (such as intensity/power/loudness) relative to a reference point. For audio signals inside equipment, or in an audio application, there is a peak level of 0dB. When sound is emitted from a speaker, the perceived loudness is measured as a Sound Pressure Level, often described as 'dB (SPL)' (or weighted variants such as dBA). When you see the tables of values such as rock concerts at 100dB then this is the SPL that is being described. This measurement is itself relative to a reference level.
So what will have available in the API is the buffer of audio data from which you can easily obtain the audio level in terms of the raw signal (which has a maximum of 0dB). You can't however easily convert this to a physical loudness because this will be dependent on the hardware. It will be different between one model of phone and the next, and will depend on the headphones too. The only way of doing this will be to calibrate the phone by measuring with an SPL meter, but then this will give you a result which will only give reasonable results on this particular phone.

I'm doing it like this:
SLmillibel gain_to_attenuation(float volume)
{
SLmillibel volume_mb;
if(volume>=1.0f) volume_mb=SL_MILLIBEL_MAX;
else if(volume<=0.02f) volume_mb=SL_MILLIBEL_MIN;
else
{
volume_mb=M_LN2/log(1.0f/(1.0f-volume))*-1000.0f;
if(volume_mb>0) volume_mb=SL_MILLIBEL_MIN;
}
return volume_mb;
}

How to best determine volume of a signal?

I want to determine the volume of an audio signal.
I have found two options:
Compute Root Mean Squared of the amplitude
find the maximum amplitude
Are there advantages to using #1 or #2?
Here is what I am trying to do:
I want my Android to analyze audio from the microphone. I want the device to detect a loud noise. The input is a short [].

If you use the maximum amplitude (2), then your volume level would be determined by a single sample (which you might not even be able to hear). When calculating a value that correlates with your impression of the loudness of the sound such as the Sound Pressure Level or the Sound Power Level you need to use the RMS (1).
Because you ear is not equally sensitive to all frequencies, a better correlate of your perception can be had by using an A-weighting on the signal. Split (filter) the signal in octave bands, calculate the RMS for each band and apply the A-weighting.

If you want to check volume level, just compute its dB Value (I assume the signal is normalized i.e. 1 == maximum level):
level[n] = - 20 x log(1/signal[n]);
However, detecting audio noise is not a trivial task. The most common and simple technique is to use algorithm called NoiseGate which basically compares the signal level with some dB Threshold value - if the signal level is above threshold, then the output is zeroed. But it is unusable in practice; there must be also some Attack and Release times for smooth thresholding otherwise it would affect also a real signal (music, speech) and produce some kind of clipping.
Check Google, it will give you a lot of resources about NoiseGate algorithm and noise removal techniques:
http://en.wikipedia.org/wiki/Noise_gate
http://www.developer.com/java/other/article.php/3599661/Adaptive-Noise-Cancellation-using-Java.htm

How to calculate microphone audio input power in decibel unit

Please help me calculate decibels from phone microphone. The microphone has a getMaxAmplitude() function. How I can I use it to calculate decibels? I read in some forums that the decibel calculation formula is power_db = 20 * log10(amplitude / reference_amplitude). But I don't understand how to find the reference_amplitude.

In sound, decibel values are referenced to a sound pressure level of 20µPa (20 micro Pascal).
So in your case the reference_amplitude would be the amplitude generated by your microphone in the presence of a sound field with a level of 20µPa.
In practice, to find this level, microphones are often calibrated (using a microphone calibrator) with a signal of some precisely known level (often around 94dB). The amplitude resulting from this calibration signal can then be used to calculate the amplitude for the reference signal (assuming the response of the microphone is linear).

Decibels are a unit widely used to define some quantity relative to something else. There are a number of different types of decibel measurements, depending on what you're trying to describe about the signal you're receiving.
Read this link to get you started, it explains everything you need to know much better than I can!

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.