I am working on an Android Experiment and I need some help. I want to have a sound play and on movement of the phone, have the sound pitch get higher. (Ex: Move phone up, higher pitch). The sounds will be different depending on the orientation of the device. But how can I have the pitch of the sound get higher or lower depending on the gyro movement / accelerometer movement of the device?
Thanks for any help :)
You have to divide your task into two separate objectives. One is how to obtain the reading from your phone; another is to manipulate the sound output. The first task is straightforward. Search online and find a library or native API to read the device orientation.
Changing the sound pitch is more interesting. First you'll need to structure your program so that you have full control over the output stream. This allows you to change the waveform on the fly. Again, pushing bytes to the I/O is the mechanical part. But first think about the physics of sound.
When you play a clip at 2x the speed, the pitch goes twice as high, but the tempo also doubles. Similarly, slowing down the clip brings down the pitch but the clip takes longer to play. This may be what you want, but if you want to retain the tempo but only to change the pitch, you'll have to perform some wave transformation. You may read more about Pitch Scaling.
Related
Goal
Capture images with Android smartphones attached to moving vehicles
frequency: 1 Hz
reference model: Google Pixel 3a
objects of interest: the road/way in front of the vehicle
picture usage: as input for machine learning (like RNN) to identify damages on the road/way surfaces
capture environment: outdoor, only on cloudy days
Current state
Capturing works (currently using JPEG instead of RAW because of the data size)
auto-exposure works
static focus distance works
Challenge
The surface of the ways/roads in the pictures are often blurry
The source of the motion blur is mostly from the shaking vehicle/fixed phone
To reduce the motion blur we want to use a "Shutter Speed Priority Mode"
i.e. minimize shutter speed => increase ISO (accept increase noise)
there is only one aperture (f/1.8) available
there is no "Shutter Speed Priority Mode" (short: Tv/S-Mode) available in the Camera2 API
the CameraX API does not (yet) offer what we need (static focus, Tv/S Mode)
Steps
Set the shutter speed to the fastest exposure supported (easy)
Automatically adjust ISO setting for auto-exposure (e.g. this formular)
To calculate the ISO the only missing part is the light level (EV)
Question
How can I estimate the EV continuously during capturing to adjust the ISO automatically while using a fixed shutter speed?
Ideas so far:
If I could read out the "recommendations" from the Camera2 auto exposure (AE) routine without actually enabling AE_MODE_ON then I could easily calculate the EV. However, I did not find an API for this so far. I guess it's not possible without routing the device.
If the ambient light sensor would provide all information needed to auto-expose (calculate EV) this would also be very easy. However, from my understanding it only measures the incident light not the reflected light so the measurement does not take the actual objects in the pictures into account (how their surfaces reflect light)
If I could get the information from the Pixels of the last captures this would also be doable (if the calculation time fits into the time between two captures). However, from my unterstanding the pixel "bightness" is heavily dependent on the objects captured, i.e. if the brightness of the objects captured changes (many "black horses" or "white bears" at the side of the road/way) I would calculate bad EV values.
Capture auto-exposed images in-between the actual captures and calculate the light levels from the auto-selected settings used in the in-between captures for the actual captures. This would be a relatively "good" way from my understanding but it's quite hard on the resources end - I am not sure I the time available between two captures is enough for this.
Maybe I don't see a simpler solution. Has anyone done something like this?
Yes, you need to implement your own auto-exposure algorithm.
All the 'real' AE has to go by is the image captured by the sensor as well, so in theory you can build something just as good at guessing the right light level.
In practice, it's unlikely you can match it, both because you have a longer feedback loop (the AE algorithm can cheat a bit on synchronization requirements and update sensor settings faster than an application can), and because the AE algorithm can use hardware statistics units (collect histograms and average values across the scene), which make it more efficient.
But a simple auto-exposure algorithm would be to average the whole scene (or a section of the scene, or every-tenth-pixel of the scene, etc) and if that average is below half max value, increase ISO, and if it's above, reduce. A basic feedback control loop, in other words. With all the issues about stability, convergence, etc, that apply. So a bit of control theory understanding can be quite helpful here. I'd recommend a low-resolution YUV output (640x480 maybe?) to an ImageReader from the camera to use as the source data, and then just look at the Y channel. Not a ton of data to churn through in that case.
Or as hb0 mentioned, if you have a very limited set of outdoor conditions, you can try to hardcode values for each of them. But the range of outdoor brightness can be quite large, so this would require a decent bit of testing to make sure it'll work, plus manual selection of the right values each time.
When the pictures are only captured in specific light situations like "outdoor, cloudy":
Tabulated values can be used for the exposure value (EV) instead of using light measurements.
Example
EV100 (iso100) for Outdoor cloudy (OC) = 13
EV (dynamic iso) for OC = EV100 + log2(iso/100)
Using this formula together with those formulas we can calculate the iso from:
aperture (fixed)
shutter speed (manually selected)
Additionally, we could add an UI option to choose a "light situation" like:
outdoor, cloudy
outdoor, sunny
etc.
This is probably not the most accurate way but for now a first, easy way to continue prototyping.
We are working on a cross-platform project that requires sound volume sampling on smartphones and analyse the result with as high accuracy as possible, the IPhone developer used iOS implemented functionality of returnning sound power/volume in dB scale calculated by the OS itself. as far as i know there is no equivalent functionality in Android OS.
as of now, I am working on Android with the MediaRecorder class given by the OS, and i use getMaxAmplitude to measure the sound power/volume, i have seen a lot of answers on the net regard how to transfer amplitude to dB scale, the answer that sounded most reasonable was using the formula :
20*Math.log10(amplitude/MAX_AMPLITUDE)
but then i must know what the MAX_AMPLITUDE that can be returned by getMaxAmplitude, thing is that it is diffrent on diffrent devices, for exemple i tested getMaxAmplitude on HTC Desire, and on Samsung Galaxy S3,
on HTC it was reaching 32767 (which i saw in some answer that is the documented max), and on the S3 it was not going beyond 16383 (half of the HTC).
Q1 :
is this(the approach discussed above) the correct approach? its just that I read that the correct way to measure sound power/volume is by calculating RSM and then convert it to dB, is this how its done on IPhone?
Q2 :
no metter if i use RSM or just the Amplitude from getMaxAmplitude, it seems to me that i still need to know whats the highest amplitude i can get from the record hardware, is there a way to know that? or is there a way to somehow go around it?
90dBspl is an rms value in the acoustic domain.
The digital level of 2500 rms in a 16bit system is the same as approximately -22dB FS rms (actually -22.35), where 0dBFS rms is a full scale square wave. A full scale sinusoidal in such a system is 0dBFS peak and -3dB FS rms (reaching from -32768 to +32767).
A square wave of +/-2500 can be calculated as:
20 * log ( 2500/32767) = -22.35 dB FS rms
Please note that peaks of sinusoidals are always 3dB higher than the rms level. The only signal that has the same rms and peak level is the square wave.
Now, Android has a requirement of 30dB linearity around 90dBspl, but this linearity shall be +12dB above 90dBspl and -18dB below the same point. Outside this range there can be compression in different ways, depending on which phone model you test.
The guaranteed highest linear level inside an Android phone is -22dBFS +12dB = -10dBFS rms. Above this level it is uncertain. The most common scenario is that the last 7dB of peak headroom are still linear, leading to an acoustic maximum level of 90dBspl + (22-3 dB) = 109dB spl rms for a sinusoidal without clipping (or 112 dB spl peak).
In some phones you will find a peak limiter that reduces the gain above 102dBspl rms. The outcome of this is that you can still record up to the level of saturation for the microphone. This saturation level varies, but it is common to have like 2% distortion at 120dB spl. Above this level the microphone component starts to saturate and clip.
Looking at the other end of the scale:
Small phone microphones are in general noisy. The latest microphones can have a noise floor at -63dB below 0dBPa (94dBspl), but most microphones are between -58 and -60dB below 0dBPa.
How can this be calculated to dBFS rms ?
0dBPa rms is 94dB spl rms. From the statement above we know that 90dBspl rms acoustic level will be recorded at the digital level of -22dBFS rms in Android phones. -63dB below 90dBspl is the same as -22dBFSrms +4dB -63dB = -81dBFSrms. The absolute maximum range of dynamics in a 16 bit system can be approximated to 96dB (or 93dB depending how you see it), so the noise level is at least 12dB above the quantization noise in the digital file.
This is a very important finding for video recording mode. Unfortunately many video applications in Android tend to have too high microphone gain in the recording. This leads to clipping when recording loud music concerts and similar situations. We also know that the microphone itself is good up to at least 120dB. So it would be a good idea for any audio system engineer to make a video recording mode that actually used the whole dynamic range of the microphone. This means that the gain should be set at least 8dB lower. It is always possible to change the rms level afterwards in a video recording if the sound is too soft, but if it is clipped, then you have damaged the recording forever.
So, my message to you programmers is to implement a video recording mode where the acoustic level of 90dB spl rms is recorded at -30dBFSrms or slightly below that. Any maximization can be done afterwards. In this way we could record rock concerts with much better sound. Doing automatic gain control does not help the sound quality. The dynamic range is often too big to be controlled automatically. You get a lot of pumping in the sound. It is better to implement two different video recording modes: Concert mode and speech mode. In speech mode (optimized for a talking person at 1m distance) the recording gain could be even higher than -22dBFSrms for 90dBspl. I would say -12dBFS rms for 90dBspl would be a suitable recording level. (speech at 1m distance has an rms level of approximately 57dB spl and peaks 20-30dB higher).
Björn Gröhn
Audio system engineer at Sony mobile Lund, Sweden
I am working a phone recording software (android) which record a conversation between 2 people on a phone call. The output of each phone call is an audio file of which contains the sound from both the caller and callee.
However, most of the time, the voice from the phone that this software run on is clearer than the other. Users request me to make the 2 sound equally clear.
So the problem I have now is: I have a sound file containing voices from 2 sources with different volume, what should I do make the volume of voice from those 2 sources equally regarding the noise should not be increased. Given that this is a phone call so at a specific time there is only one person speaking.
I see at least 1 straight solution for this: making a program analyzing the wave form of the sound file, identifying parts of the sound file coming from the source having smaller voice and increase it to a level seemingly balance with the another. However this will be not an easy one to implement and I also hope that there would be better solution out there. Do you have any suggestion for me?
Thank you.
Well, the first thing to do is to get rid of all of the noise that you do not care about.
The spectrum that you would want to use is: 300 Hz to 3500 Hz
You can cut all of the other frequencies which would substantially cut your noise. You can then apply an autoequalization gain profile or even tap into the DSP profiles available on several devices.
I would also take a look at this whitepaper if you have a chance. (IEEE or ACM membership required).
An Auto-Equalization System Based on DirectShow Technology and Its Application in Audio Broadcast System of Radio Station
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5384659&contentType=Conference+Publications&searchWithin%3Dp_Authors%3A.QT.Bai+Xinyue.QT.
This is how I have solved this problem:
1. I decode the audio into a series of Integer value thank to the storing WAV format.
The result be [xi] ; 0 < xi < 255
2. Then I have to decide 2 custom value:
- Noise threshold? if xi > threshold => it is not noise (pretty naive!)
- How long should sound be a chunk of human voice?
I myself choose the first value to 5 and the second value to 100ms
3. My algorithm will analyze the [xi] in to [Yi] with each Y is an array of x and each Y represent a chunk of human sound.
After that, I apply k-mean with k=2 and got 2 different cluster of Y, one belongs to the person whose voice is louder and the other belongs to the one with softer voice.
4. What left is pretty straight forward, I have to decide a parameter M, each x belong to a Y of the softer voice will multiply with M and I get the final result.
How to calculate decibel from maxAmplitude, I wrote an android application to get maxAmplitude at regular interval, I need to show the o/p to the user in decibels.
Decibels are a relative unit, they express the power of your signal relative to some reference power.
If you are working with amplitudes, then the formula is:
power_db = 20 * log10(amp / amp_ref);
(See http://en.wikipedia.org/wiki/Decibel#Field_quantities).
Note also that maximum amplitude is not usually a very good indicator of loudness (or even of power). More typically, you should measure the RMS power of your signal, and convert that to dB instead.
Regular phone microphones aren't calibrated to measure absolute loudness, so it's not possible without also having a sound meter to initially calibrate the phone. As Oli mentions, you may be able to calculate a relative change in loudness, but I expect you want to replicate a real sound level meter.
I'm trying to build a gadget that detects pistol shots using Android. It's a part of a training aid for pistol shooters that tells how the shots are distributed in time and I use a HTC Tattoo for testing.
I use the MediaRecorder and its getMaxAmplitude method to get the highest amplitude during the last 1/100 s but it does not work as expected; speech gives me values from getMaxAmplitude in the range from 0 to about 25000 while the pistol shots (or shouting!) only reaches about 15000. With a sampling frequency of 8kHz there should be some samples with considerably high level.
Anyone who knows how these things work? Are there filters that are applied before registering the max amplitude. If so, is it hardware or software?
Thanks,
/George
It seems there's an AGC (Automatic Gain Control) filter in place. You should also be able to identify the shot by its frequency characteristics. I would expect it to show up across most of the audible spectrum, but get a spectrum analyzer (there are a few on the app market, like SpectralView) and try identifying the event by its frequency "signature" and amplitude. If you clap your hands what do you get for max amplitude? You could also try covering the phone with something to muffle the sound like a few layers of cloth
It seems like AGC is in the media recorder. When I use AudioRecord I can detect shots using the amplitude even though it sometimes reacts on sounds other than shots. This is not a problem since the shooter usually doesn't make any other noise while shooting.
But I will do some FFT too to get it perfect :-)
Sounds like you figured out your agc problem. One further suggestion: I'm not sure the FFT is the right tool for the job. You might have better detection and lower CPU use with a sliding power estimator.
e.g.
signal => square => moving average => peak detection
All of the above can be implemented very efficiently using fixed point math, which fits well with mobile android platforms.
You can find more info by searching for "Parseval's Theorem" and "CIC filter" (cascaded integrator comb)
Sorry for the late response; I didn't see this question until I started searching for a different problem...
I have started an application to do what I think you're attempting. It's an audio-based lap timer (button to start/stop recording, and loud audio noises for lap setting). It' not finished, but might provide you with a decent base to get started.
Right now, it allows you to monitor the signal volume coming from the mic, and set the ambient noise amount. It's also using the new BSD license, so feel free to check out the code here: http://code.google.com/p/audio-timer/. It's set up to use the 1.5 API to include as many devices as possible.
It's not finished, in that it has two main issues:
The audio capture doesn't currently work for emulated devices because of the unsupported frequency requested
The timer functionality doesn't work yet - was focusing on getting the audio capture first.
I'm looking into the frequency support, but Android doesn't seem to have a way to find out which frequencies are supported without trial and error per-device.
I also have on my local dev machine some extra code to create a layout for the listview items to display "lap" information. Got sidetracked by the frequency problem though. But since the display and audio capture are pretty much done, using the system time to fill in the display values for timing information should be relatively straightforward, and then it shouldn't be too difficult to add the ability to export the data table to a CSV on the SD card.
Let me know if you want to join this project, or if you have any questions.