I'm messing around in my app with a custom model for speech commands - I have it working fine recording and processing input audio from an AudioRecord, and I give feedback to the user through text to speech.
One issue I have is that I'd like this to work even when audio is playing - either through my own text to speech or through something else playing in the background (music for instance). I realize this is going to be a non trivial problem, but if I could get access in some way to the audio output data (what the phone is playing) and match that up with my microphone input data, I think I can at least adjust my model for this + improve my results.
However, based on Android - Can I get the audio data for playback from the audio mixer? , it sounds like that is impossible.
Two questions:
1) Is there any way that I'm missing to get access to expected audio output/playback data through the android api, or any options the android api provides for dealing with this issue (the feedback loop between audio output and input)?
2) Outside of stopping all other playback or waiting for other playback to finish - is there any other approach to solve this problem? I would assume some calling apps have a way of dealing with this if the user is on speaker phone, I'm just missing how to do it myself
Thanks
Answers to 1 & 2: You want AcousticEchoCanceler.
A short lecture on why "deleting the speaker audio from the microphone input" input is a non-trivial task that takes substantial signal processing knowledge: It's more complicated than just time-shifting the speaker audio a little bit and subtracting it from the mic input. The fact is, the spectrum of the audio changes drastically even as it leaves the speaker (most tiny speakers have a very peaky response centered around 3-4KHz). The audio may bounce off multiple objects (walls, etc.) before it gets back to the mic (multipath interference). Different frequency components interfere at the microphone in different, impossible to predict ways, vastly changing the spectrum of the audio. And by the way -- if anything in the room moves, say, if you put your hand near the phone -- everything changes. That is why you don't want to try to write your own echo cancellation filter. Android has provided one for you, so you can write cool speakerphone apps and such.
Related
I have a cross-platform(iOS and Android) app where I will record audio clips then send it to the server to do some machine learning operations. In my iOS app, I use AVAudioRecorder for recording the audio. In the Android app, I use MediaRecorder for recording the audio. In the mobile initially, I use m4a format because of size constrictions. After reaching the server I will convert it to wav format before using it in the ML operations.
My Problem is, in iOS the AVAudioRecorder by OS default does a factor of Amplification to the raw audio data before we the developer get access to the raw data. But in Android, the MediaRecorder doesn't provide any sort of default Amplification to the raw data. In other words, in iOS I will never get the raw audio stream from the microphone whereas in Android I will always only get the raw audio stream from the microphone. The distinction is clearly visible if you can record the same audio in both iPhone and Android phones side by side with a common audio source, then import the recorded audio in Audacity for visual representation. I have attached a sample representation screenshot below.
In the image, the first track is the Android recording and the second track is from the iOS recording. When I hear both the audio through headphones I can vaguely distinguish them but when I visualize the data points, you can clearly see the difference in the image. These distinctions are bad for ML operations.
Clearly in the iPhone, there is a certain amplification factor involved which I would like to implement in the Android also.
Is anyone aware of the amplification factor? OR are there any other possible alternatives?
It's quite possible that the difference is that the effect of Automatic Gain Control.
You can disable this in your app's AVAudioSession by setting its mode to AVAudioSessionModeMeasurement which you do once in your application - usually at startup. This disables a great deal of input signal processing.
Reading your problem description, you might be better off enabling AGC on Android.
If neither of these yields results, you might want to gain scale both signals so they are just below clipping.
let audioSession = AVAudioSession.sharedInstance()
audio.session.setMode(AVAudioSessionModeMeasurement)
Sending audio to the speaker for playback on Android is easy, but is it possible to get a copy of the actual final digital signal? Let's say I have 2 apps running "MyApp" and "SomeOtherApp". My app sends audio to the speaker, but so does "SomeOtherApp". "SomeOtherApp" is not my app - it's a 3rd party app. Is it possible to get a copy of the mixed audio signal which is played to the speaker by the OS? That is, the audio signal which is a mixture of the speaker signal from my app and the speaker signal from "SomeOtherApp".
To summarize: I am looking for a way to hook into the low-level audio path (HAL audio stream out - after mixing!) so I can get a copy of the "final" speaker signal (in real-time). Optimally, I would also like to hook into the low-level microphone path, but that's less of a concern right now.
Looks like the short answer is no.
Longer one is kinda. And sorta. But not really, as far as I know. Option 1: it might be a problem with respects to privacy. (not really a good option) Option 2: nobody thought it was needed, so did not build it into the system. Option 3: the amount of trouble shooting when programmers use the wrong source is just not worth it.
edit - You can, of course, record the input.
Checkthis one google example
Usage :
App will capture audio from android devices and playback on the same
device; the playback on speaker will be captured immediately
*
I am new to Android and Java and I need to know if this would be possible
I want to capture the sound input to the phone's microphone, perform some computations on this signal and output the modified signal to the earphones
Is processing the input to the microphone in real time like this possible?
THE ANDROID DEVELOPERS WEBSITE says that
Note: The Android Emulator does not have the ability to capture audio,
but actual devices are likely to provide these capabilities.
What does it mean by likely? Is it possible that some phones do not even allow reading from the microphone at all?
It is possible.
When you record the audio you can buffer it do some processing and then output it or save it or whatever....
There is an app for example that does the following thing when someone calls you:
It combines your voice with some sound you recorded and plays it in sync to the caller, making him think your are on a party, a bus etc....this is an example of processing the sound recorded.
Edit 1: Here is a similar topic that should guide you further how to implement this. Real-time audio processing in Android
I wrote an app that records audio. Everything works. However, I am going to be using this app to record class room notes. How can I boost the input of the microphone to better capture all the noise? I wouldn't mind using root if I must. But wasn't sure if there was an API to do this.
Thanks all for reading!
If you are asking how to make the microphone more sensitive, I'm not sure. That would involve either operating the microphone at a higher voltage and/or hacking the drivers, neither of which are doable programatically, AFAIK. However, you could try amplifying the output by multiplying the output by some value (say 1.1 for 10% volume boost). Of course, the more you "amplify" the output, the more you will saturate the speaker (aka distort the audio). There are some signal processing techniques you can try to remove background noise and to isolate the paticular audio of interest, however, these things are merely processing improvements, not hardware upgrades. You can always try plugging in an external microphone into the headphone jack and using that to record the audio.
I know this isn't the answer you were hoping for, but I hope it helps.
The android audio manager has a large number of different audio streams available, including DTMF, SYSTEM, RING etc.
Not that I'm saying its a good idea, but is there any significant disadvantage to playing audio on a stream other than MUSIC? The standard appears to be to play on the MUSIC stream, but if, for example, I want to use the ringer volume, is there any disadvantage to just playing on the RING stream instead?
There are a number of cases where playing on streams other than music offer some advantages in addition to the case I provided above, but I don't want to risk breaking more important functionality if I can help it.
I'd be curious to hear whether anyone has any experience playing on and/or manipulating other streams, and what side affects, if any, they've encountered (or incurred...)
Why do you need it? I think media stream is enough for any of your purposes.
Look, for example you use ring stream. And for example your program controls it volume. He sets "no volume" => it means that he goes into "vibrate mode" and he cant hear incoming rings because of your application