I am new to Android and Java and I need to know if this would be possible
I want to capture the sound input to the phone's microphone, perform some computations on this signal and output the modified signal to the earphones
Is processing the input to the microphone in real time like this possible?
THE ANDROID DEVELOPERS WEBSITE says that
Note: The Android Emulator does not have the ability to capture audio,
but actual devices are likely to provide these capabilities.
What does it mean by likely? Is it possible that some phones do not even allow reading from the microphone at all?
It is possible.
When you record the audio you can buffer it do some processing and then output it or save it or whatever....
There is an app for example that does the following thing when someone calls you:
It combines your voice with some sound you recorded and plays it in sync to the caller, making him think your are on a party, a bus etc....this is an example of processing the sound recorded.
Edit 1: Here is a similar topic that should guide you further how to implement this. Real-time audio processing in Android
Related
As per official documentation
Android 10 (API level 29) and higher imposes a priority scheme that can switch the input audio stream between apps while they are running. In most cases, if a new app acquires the audio input, the previously capturing app continues to run but receives silence. In some cases, the system can continue to deliver audio to both apps. The various sharing scenarios are explained below.
Other than some special cases, audio is not shared between apps.
But I have seen many apps sharing the audio input without being in the above special cases.
For eg. Zoom, when I'm on a call in zoom and start an audio recorder then both the apps are getting audio though zoom audio decreases in intensity.
Similarly, Omlet arcade is able to record mic audio even when mic access is given to other apps.
How is it possible? And as per the documentation, this shouldn't be allowed.
Update:
Was able to achieve it with the usage of Oboe. But it is not consistent on all devices. This also causes a sync issue in my live streaming app. Audio is audible with a delay
This is not possible in Android 5+ . You need a rooted phone to perform this action. In Omlet Arcade Whenever you play a Game and switch ON in-game mic, Omlet Arcade will stop receiving any audio input. However, Omlet Arcade will still function but you have to restart it in order to get voice input back.
Though, in a recent MIUI bug, People were able to listen to calls on Zoom and in-game mic apps. In your case, it might be not official Android and Edited Android like MIUI and OxygenOS
I'm messing around in my app with a custom model for speech commands - I have it working fine recording and processing input audio from an AudioRecord, and I give feedback to the user through text to speech.
One issue I have is that I'd like this to work even when audio is playing - either through my own text to speech or through something else playing in the background (music for instance). I realize this is going to be a non trivial problem, but if I could get access in some way to the audio output data (what the phone is playing) and match that up with my microphone input data, I think I can at least adjust my model for this + improve my results.
However, based on Android - Can I get the audio data for playback from the audio mixer? , it sounds like that is impossible.
Two questions:
1) Is there any way that I'm missing to get access to expected audio output/playback data through the android api, or any options the android api provides for dealing with this issue (the feedback loop between audio output and input)?
2) Outside of stopping all other playback or waiting for other playback to finish - is there any other approach to solve this problem? I would assume some calling apps have a way of dealing with this if the user is on speaker phone, I'm just missing how to do it myself
Thanks
Answers to 1 & 2: You want AcousticEchoCanceler.
A short lecture on why "deleting the speaker audio from the microphone input" input is a non-trivial task that takes substantial signal processing knowledge: It's more complicated than just time-shifting the speaker audio a little bit and subtracting it from the mic input. The fact is, the spectrum of the audio changes drastically even as it leaves the speaker (most tiny speakers have a very peaky response centered around 3-4KHz). The audio may bounce off multiple objects (walls, etc.) before it gets back to the mic (multipath interference). Different frequency components interfere at the microphone in different, impossible to predict ways, vastly changing the spectrum of the audio. And by the way -- if anything in the room moves, say, if you put your hand near the phone -- everything changes. That is why you don't want to try to write your own echo cancellation filter. Android has provided one for you, so you can write cool speakerphone apps and such.
Is it possible to capture the speaker output in Android?
This would be for the purpose of determining, when listening to the microphone input, which sounds were generated from other apps and which originated from the user.
This is essentially to create an acoustic cancellation filter on the output of other apps, so their sounds don't interfere with the microphone input.
As of Android 10 you can use the AudioPlaybackCapture API for audio playback capture.
https://developer.android.com/guide/topics/media/playback-capture
You won't be able to capture the output stream, per this post :
How to capture output stream of audio in Android?
Unless there has been some change since six months ago, when this question was answered.
However, a workaround (which does not look good) is mentioned there.
Sending audio to the speaker for playback on Android is easy, but is it possible to get a copy of the actual final digital signal? Let's say I have 2 apps running "MyApp" and "SomeOtherApp". My app sends audio to the speaker, but so does "SomeOtherApp". "SomeOtherApp" is not my app - it's a 3rd party app. Is it possible to get a copy of the mixed audio signal which is played to the speaker by the OS? That is, the audio signal which is a mixture of the speaker signal from my app and the speaker signal from "SomeOtherApp".
To summarize: I am looking for a way to hook into the low-level audio path (HAL audio stream out - after mixing!) so I can get a copy of the "final" speaker signal (in real-time). Optimally, I would also like to hook into the low-level microphone path, but that's less of a concern right now.
Looks like the short answer is no.
Longer one is kinda. And sorta. But not really, as far as I know. Option 1: it might be a problem with respects to privacy. (not really a good option) Option 2: nobody thought it was needed, so did not build it into the system. Option 3: the amount of trouble shooting when programmers use the wrong source is just not worth it.
edit - You can, of course, record the input.
Checkthis one google example
Usage :
App will capture audio from android devices and playback on the same
device; the playback on speaker will be captured immediately
*
Android's SpeechRecognizer apparently doesn't allow to record the input on which you're doing speech recognition into an audio file.
That is, either you record voice using a MediaRecorder (or AudioRecord for that matter) or you do Speech Recognition with a SpeechRecognizer, in which case the audio isn't recorded into a file (at least not one you can access); but you can't do both at the same time.
The question of how to achieve recording audio and doing speech recognition at the same time in Android has been asked several times, and the most popular "solution" is to record a flac file and use Google's unofficial Speech API which allows you to send a flac file via a POST request and obtain a json response with the transcription.
http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ (outdated Android version)
https://github.com/katchsvartanian/voiceRecognition/tree/master/VoiceRecognition
http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
That works pretty well but has a huge limitation which is it can't be used with files longer than about 10-15 seconds (the exact limit is not clear and may depend on file size or perhaps the amount of words). This makes it not suitable for my needs.
Also, slicing the audio file into smaller files is NOT a possible solution; even forgetting about the difficulties in properly splitting the file at the right positions (not in the middle of a word), many consecutive requests to the abovementioned web service api will randomly result in empty responses (Google says there's a usage limit of 50 requests per day, but as usual they don't disclose the details of the real usage limits which clearly restrict bursts of requests).
So, all this would seem to indicate that getting a transcription of speech while at the same time recording the input into an audio file in Android is IMPOSSIBLE.
HOWEVER, the Google Keep Android app does exactly that.
It allows you to speak, transcrbes what you said into text, and saves both the text and the audio recording (well it's not clear where it stores it, but you can replay it).
And it has no length limitation.
So the question is: DOES ANYBODY HAVE AN IDEA OF HOW GOOGLE KEEP DOES IT?
I would look at the source code but it doesn't seem to be available, is it?
I sniffed the packets Google Keep sends and receives while doing speech recognition, and it definitely does NOT use the speech api mentioned above. All the traffic is TLS and (from the outside) it looks pretty much the same as when you're using SpeechRecognizer.
So does perhaps a way exist to kind of "split" (i.e. duplicate, or multiplex) the microphone input stream into two streams, and feed one of them to a SpeechRecognizer and the other to a MediaRecorder?
Google Keep launches RecognizerIntent with certain undocumented extras and expects the resulting intent to contain the URI of the recorded audio. If RecognizerIntent is serviced by Google Voice Search then it all works out and Keep gets the audio.
See record/save audio from voice recognition intent for more information and a code sample that calls the recognizer in the same way as Keep (probably) does.
Note that this behavior is not part of Android. It's simply the current undocumented way of how two closed-source Google apps communicate with each other.
It uses onPartialResults(Bundle)
This event returns text recognized from recorded speech while it's still recording
It's also available on Xamarin