I have an audio recording app on android market which records using PCM-WAV format.
My app also offers custom gain control ([-20dB, +20dB]), so I alter the original audio data with user selected gain value.
It works pretty well when using device built-in mic, but I have a user which uses some external mic plugged into his device, and the output is too loud and full of distortions (because of the loudness of his ext mic). Even when he set the gain to -20dB, the output is loud and contains distortions.
I thought I should add AGC control into the app for cases as this.
Now my question:
This AGC only applies when using DEVICE BUILT-IN mic? Or it applies also when using an ext mic plugged into the handheld?
It's quite likely that the real problem is that his microphone is overdriving the input jack - if that is the case, software can't fix the problem as what the A/D converter sees is already hopelessly distorted.
Your client may need to add an attenuator (resistive voltage divider) to the input signal.
Also, if the input signal is asymmetric it may be necessary to couple through a series capacitor to block any DC component.
Doing a recording with no gain, and examining the resulting waveform in an audio editor like audacity would probably be informative.
(Normally I would not post something this speculative as an answer, but was specificaly asked to convert it to one from its original offering as a comment)
Related
I have a cross-platform(iOS and Android) app where I will record audio clips then send it to the server to do some machine learning operations. In my iOS app, I use AVAudioRecorder for recording the audio. In the Android app, I use MediaRecorder for recording the audio. In the mobile initially, I use m4a format because of size constrictions. After reaching the server I will convert it to wav format before using it in the ML operations.
My Problem is, in iOS the AVAudioRecorder by OS default does a factor of Amplification to the raw audio data before we the developer get access to the raw data. But in Android, the MediaRecorder doesn't provide any sort of default Amplification to the raw data. In other words, in iOS I will never get the raw audio stream from the microphone whereas in Android I will always only get the raw audio stream from the microphone. The distinction is clearly visible if you can record the same audio in both iPhone and Android phones side by side with a common audio source, then import the recorded audio in Audacity for visual representation. I have attached a sample representation screenshot below.
In the image, the first track is the Android recording and the second track is from the iOS recording. When I hear both the audio through headphones I can vaguely distinguish them but when I visualize the data points, you can clearly see the difference in the image. These distinctions are bad for ML operations.
Clearly in the iPhone, there is a certain amplification factor involved which I would like to implement in the Android also.
Is anyone aware of the amplification factor? OR are there any other possible alternatives?
It's quite possible that the difference is that the effect of Automatic Gain Control.
You can disable this in your app's AVAudioSession by setting its mode to AVAudioSessionModeMeasurement which you do once in your application - usually at startup. This disables a great deal of input signal processing.
Reading your problem description, you might be better off enabling AGC on Android.
If neither of these yields results, you might want to gain scale both signals so they are just below clipping.
let audioSession = AVAudioSession.sharedInstance()
audio.session.setMode(AVAudioSessionModeMeasurement)
I'm messing around in my app with a custom model for speech commands - I have it working fine recording and processing input audio from an AudioRecord, and I give feedback to the user through text to speech.
One issue I have is that I'd like this to work even when audio is playing - either through my own text to speech or through something else playing in the background (music for instance). I realize this is going to be a non trivial problem, but if I could get access in some way to the audio output data (what the phone is playing) and match that up with my microphone input data, I think I can at least adjust my model for this + improve my results.
However, based on Android - Can I get the audio data for playback from the audio mixer? , it sounds like that is impossible.
Two questions:
1) Is there any way that I'm missing to get access to expected audio output/playback data through the android api, or any options the android api provides for dealing with this issue (the feedback loop between audio output and input)?
2) Outside of stopping all other playback or waiting for other playback to finish - is there any other approach to solve this problem? I would assume some calling apps have a way of dealing with this if the user is on speaker phone, I'm just missing how to do it myself
Thanks
Answers to 1 & 2: You want AcousticEchoCanceler.
A short lecture on why "deleting the speaker audio from the microphone input" input is a non-trivial task that takes substantial signal processing knowledge: It's more complicated than just time-shifting the speaker audio a little bit and subtracting it from the mic input. The fact is, the spectrum of the audio changes drastically even as it leaves the speaker (most tiny speakers have a very peaky response centered around 3-4KHz). The audio may bounce off multiple objects (walls, etc.) before it gets back to the mic (multipath interference). Different frequency components interfere at the microphone in different, impossible to predict ways, vastly changing the spectrum of the audio. And by the way -- if anything in the room moves, say, if you put your hand near the phone -- everything changes. That is why you don't want to try to write your own echo cancellation filter. Android has provided one for you, so you can write cool speakerphone apps and such.
I've searched everywhere, including the RootTools source. I can't find anything that manages the microphone, apart from muting it altogether. And there are no hints inside the AudioRecorder.setMicrophoneMute(bool) method either...
There are a few posts about this issue, but none of them ever go anywhere (through no fault of OPs'). Is it (legally) possible to override the OS and get to the mic hardware directly or something?
Thanks,
-tre
You can't directly set the recording volume, but you can change what you do with the byte data you get from AudioRecord (look at the AudioTrack class for reducing the volume of the track)
Edit: I forgot to mention that if you're having trouble with volume spikes you can look at automatic gain control. Some devices activate it automatically, but you can manually enable it.
On occasion, I get a phone or tablet with poorly chosen microphone input volume. Android 4 had it adjustable but not the newer versions. It makes it impossible to use VOIP. It's not just for fancy audio recordings.
Does anyone have experience (using OpenSL ES, ALSA, etc.) with redirecting audio or creating new sound paths in Android? The end goal is to create a virtual microphone to replace the external microphone, where one can play audio files as if they were speaking into the microphone. Applications accessing the microphone with AudioSource.MIC should use this alternate stream. It's not necessary for it to work with voice calls, I believe achieving that sort of functionality is harder as it's all done within the radio.
Any ideas on where to begin? I've done some research with OpenSL and ALSA, but it looks like I'll need to package new firmware (ROM) in order to define custom audio paths. If it can be avoided I'd like to create an application-level solution. The phones are 'rooted' (have su binaries). The target device for this is the Samsung Galaxy S4 Google Edition (GT-i9505G). Specifically I'm looking for audio driver configurations / source code or any references for the i9505G.
Thanks in advance!
edit - I've checked out the CyanogenMod 10.2 source tree, along with the jfltexx drivers and kernel. Here are the contents of kernel/samsung/jf/sound: http://pastebin.com/7vK8THcZ. Is this documented anywhere?
I once implemented the functionality you're after on a phone based on Qualcomm's APQ8064 platform (which seems to be nearly the same platform as the one in your target device). Below is a summary of what I can recall from this, as I no longer have access to the code I wrote, or an environment where I can easily do these kinds of modifications. So if this answer reads like a mess of fragmentary memories, that's because that's exactly what it is.
This info may also apply more-or-less to other Qualcomm platforms (like the MSM8960 or MSM8974), but will most likely be completely useless for platforms from other vendors (NVidia Tegra, Samsung Exynos, TI OMAP, etc).
A brief note: The method I used means that the audio that the recording application gets will have gone through mixing / volume control in the Android multimedia framework and/or the platform's multimedia DSP. So if you're playing something at 75% volume, recording it, and then playing back the recording at 75% volume it might end up sounding pretty quiet. If you want to get unprocessed PCM data (after decoding, but before mixing / volume control) you'll have to look into some other approach, e.g. customizing the AudioFlinger, but that's not something I've tried or can provide info on.
A few locations of interest:
The platform's audio drivers. Particularly the msm-pcm-routing.c file.
The ALSA UCM (Use-Case Manager) settings file. This is just an example UCM settings file. There are many variants of these files depending on the exact platform used, so your's may have a slightly different name (though it should start with snd_soc_msm_), and its contents will probably also differ slightly from the one I linked to.
NOTE for Kitkat and later: The UCM settings files were used on Jellybean (and possibly ICS). My understanding is that these settings have been moved to a file named mixer_paths.xml on Kitkat. The contents are pretty much the same, just in a different format.
The audio HAL code. The ALSA UCM is present in libalsa-intf, and the AudioHardware / AudioPolicyManager / ALSADevice code is present in audio-alsa. Note that this code is for Jellybean, since that's the lastest version that I'm familiar with. The directory structure (and possibly some of the files / classes) differs on Kitkat.
If you open up the UCM settings file and search for "HiFiPROXY Rx" you'll find something like this:
SectionVerb
Name "HiFiPROXY Rx"
EnableSequence
'AFE_PCM_RX Audio Mixer MultiMedia1':1:1
EndSequence
DisableSequence
'AFE_PCM_RX Audio Mixer MultiMedia1':1:0
EndSequence
# ALSA PCMs
CapturePCM 0
PlaybackPCM 0
EndSection
This defines a verb (essentially the basis of an audio use-case; there are also modifiers that can be applied on top of verbs for stuff like simultaneous playback and recording) with the name "HiFiPROXY Rx" (the HiFi moniker is used for most non-voice-call verbs, PROXY refers to the audio device used, and Rx means output) and specifies which ALSA control(s) to write to, and what to write to them, when the use-case should be enabled / disabled. Finally it lists the ALSA PCM playback / capture devices to use in this use-case. For example, PlaybackPCM 0 means that playback device 0 should be used (the ALSA card is implied to be the one that represents the built-in hardware codec, which typically is card 0). These verbs are selected by the audio HAL based on the use-case (music playback, voice call, recording, ...), what accessories you've got attached, etc.
If you look up "AFE_PCM_RX Audio Mixer" in the msm_qdsp6_widgets table in msm-pcm-routing.c you'll see that it refers to a list of mixer controls named afe_pcm_rx_mixer_controls that looks like this:
static const struct snd_kcontrol_new afe_pcm_rx_mixer_controls[] = {
SOC_SINGLE_EXT("MultiMedia1", MSM_BACKEND_DAI_AFE_PCM_RX,
MSM_FRONTEND_DAI_MULTIMEDIA1, 1, 0, msm_routing_get_audio_mixer,
msm_routing_put_audio_mixer),
SOC_SINGLE_EXT("MultiMedia2", MSM_BACKEND_DAI_AFE_PCM_RX,
... and so on...
This lists the front end DAIs that you are allowed to connect to the back end DAI (AFE_PCM_RX). To get an idea of how these relate to one another, see these diagrams.AFE_PCM_RX and AFE_PCM_TX is a pair of DAIs on some of Qualcomm's platforms that implement a sort of dummy/proxy device. What you do is feed audio into AFE_PCM_RX which then gets processed by the multimedia DSP (QDSP), and then you can read it back through AFE_PCM_TX. This is used to implement USB and WiFi audio routing, and also A2DP IIRC.
Back to the AFE_PCM_RX Audio Mixer MultiMedia1 line: This says that you're feeding MultiMedia1 into the AFE_PCM_RX Audio Mixer. MultiMedia1 is used for normal playback/recording, and corresponds to pcmC0D0 (you should be able to list the devices on your phone with adb shell cat /proc/asound/devices). There are other front end DAIs, like MultiMedia3 and MultiMedia5 that are used in special cases like low-latency playback and low-power audio playback.
When you feed MultiMedia1 to the AFE_PCM_RX Audio Mixer everything you write to playback device 0 on card 0 will be fed into the AFE_PCM_RX back end DAI. To read it back you could set up a UCM verb that does something like 'MultiMedia1 Mixer AFE_PCM_TX':1:1, and then you'd read from pcmC0D0c (which should be the default ALSA capture device).
A simple test would be to pull the UCM settings file from your phone (should be located somewhere under /system/etc/) and amend the "HiFi" verb's EnableSequence with something like:
'AFE_PCM_RX Audio Mixer MultiMedia1':1:1
'AFE_PCM_RX Audio Mixer MultiMedia3':1:1
'AFE_PCM_RX Audio Mixer MultiMedia5':1:1
(and similarly in the DisableSequence, but with :1:0 at the end of each line).
Then go to the "Capture Music" modifier (this is the poorly named modifier for normal recording) and change SLIM_0_TX to AFE_PCM_TX.
Copy your modified UCM settings file back to the phone (requires root permission), and reboot the phone. Then start some playback (have a wired headset/headphone attached, and disable touch sounds so that the low-latency verb doesn't get selected), and start a recording from AudioSource.MIC. Afterwards, check the recording and see if you were able to record the playback audio. If not, then perhaps the low-power audio verb was selected and you'll have to modify the "HiFi Low Power" verb similarly to what you did with the "HiFi" verb. It will help you if you have all the debug prints enabled in the audio HAL (i.e. uncomment #define LOG_NDEBUG 0 in all the cpp files where you can find it) so that you can see which UCM verbs / modifiers that get selected.
The modification I described above gets a bit tedious since you have to cover all the MultiMedia front end DAIs for all relevant verbs and modifiers.
IIRC, I was able to simplify this into just a single line per verb/modifier:
'AFE_PCM_RX Port Mixer SLIM_0_RX':1:1
If you look at the "HiFi", "HiFi Low Power", "HiFi Lowlatency" verbs you'll see that they all use the SLIMBUS_0_RX back end DAI, so I'm taking advantage of that by using the AFE_PCM_RX Port Mixer which lets me set up a connection from a back end DAI to another back end DAI. If you look at the afe_pcm_rx_port_mixer_controls and intercon tables in msm-pcm-routing.c you'll notice that there's no SLIM_0_RX entry for AFE_PCM_RX Port Mixer, so you'd have to add those yourself (it's just a matter of copy-pasting some of the existing lines and changing the names).
Some of the other changes you'd probably have to make:
In frameworks/base and frameworks/av (e.g. AudioManager, AudioService, AudioSystem) you'd have to add a new AudioSource constant and make sure that it gets recognized in all the necessary places.
In the UCM settings file you'd have to add some new verbs / modifiers to set up the ALSA controls properly when your new AudioSource is used.
In the audio HAL you'd have to make some changes so that your new verbs / modifiers get selected when your new AudioSource is used. Note that there's a base class of AudioPolicyManagerALSA called AudioPolicyManagerBase which you also might have to modify (it's located elsewhere in the source tree).
I wrote an app that records audio. Everything works. However, I am going to be using this app to record class room notes. How can I boost the input of the microphone to better capture all the noise? I wouldn't mind using root if I must. But wasn't sure if there was an API to do this.
Thanks all for reading!
If you are asking how to make the microphone more sensitive, I'm not sure. That would involve either operating the microphone at a higher voltage and/or hacking the drivers, neither of which are doable programatically, AFAIK. However, you could try amplifying the output by multiplying the output by some value (say 1.1 for 10% volume boost). Of course, the more you "amplify" the output, the more you will saturate the speaker (aka distort the audio). There are some signal processing techniques you can try to remove background noise and to isolate the paticular audio of interest, however, these things are merely processing improvements, not hardware upgrades. You can always try plugging in an external microphone into the headphone jack and using that to record the audio.
I know this isn't the answer you were hoping for, but I hope it helps.