A similar, but different question was asked here: link1,
in that case they wanted to send audio to a phone call.
I am curious about a different situation.
I want to send audio from an application to the Gboard when it is being used with my application, in order to get local speech recognition done. link2
Is it possible to send audio from my app into Gboard such that it uses the local speech recognizer?
Using the Android Pixel audio APIs?
Using some direct APIs to Gboard?
Related
This question is to help the "Hard of hearing community" so that they can READ the phone/mobile call because they can not hear it.
Android 11 provides an API "AudioPlaybackCaptureConfiguration". This API gives apps the ability to copy the audio being played by other apps.
Google also implemented the same on Pixel mobiles as shownn here - https://www.youtube.com/watch?v=7hb3p8LZIq8 . But it has few limitations -
It supports only english language, How to enable support for the regional language
The current implementation translates voice to text using a local mobile engine i.e. voice is not going to google server(all the processing is happening offline in mobile itself), so accuracy is also low.
After seeing a lot of posts here it seems developers are facing issue while implementing the same to capture the caller voice and then transcibe it due to some restriction by Google.
How to record internal audio on Android devices or record MediaPlayer Audio Stream?
Is there anyway to capture the caller voice (https://developer.android.com/guide/topics/media/playback-capture#allowing_playback_capture) ? Like in the youtube video I shared above, Google must be capturing caller voice and its offline engine is processing that voice and converting it to text. So can we capture caller voice using some way and then send that voice to some server API or to Google Live Transcribe app (or whatever it is) for better accuracy and then the converted text will be displayed on the screen (as per user choice of language).
I am also a developer though not a mobile one. So some terminology may be wrong , please excuse it and provide your suggestion.
Can we modify the Android source code itself according to our requirement and remove that limitation so that we can achieve what we want to do even if it require to build custom Android OS ?
I am currently working on an app that would require recording the audio within my app and then sending the clip to google for transcription.
Is there any way I can send an audio clip to be processed with speech to text?
Or is there any other way other than this to convert that recording to text ?
Google's Voice To Text API is not available publicly at the moment and there's no announcement on where it could become available. On Android you can use system voice recognition feature, but it will only transcribe what it records by itself and your won't be able to feed it with any audio file for processing.
As for now, you either need to use other services like AT&T's, IBM's Watson, Dragon Dictation (all are on-line) or maybe consider including Sphinx CMU into your app if you absolutely demand off-line solution.
Android's SpeechRecognizer apparently doesn't allow to record the input on which you're doing speech recognition into an audio file.
That is, either you record voice using a MediaRecorder (or AudioRecord for that matter) or you do Speech Recognition with a SpeechRecognizer, in which case the audio isn't recorded into a file (at least not one you can access); but you can't do both at the same time.
The question of how to achieve recording audio and doing speech recognition at the same time in Android has been asked several times, and the most popular "solution" is to record a flac file and use Google's unofficial Speech API which allows you to send a flac file via a POST request and obtain a json response with the transcription.
http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ (outdated Android version)
https://github.com/katchsvartanian/voiceRecognition/tree/master/VoiceRecognition
http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
That works pretty well but has a huge limitation which is it can't be used with files longer than about 10-15 seconds (the exact limit is not clear and may depend on file size or perhaps the amount of words). This makes it not suitable for my needs.
Also, slicing the audio file into smaller files is NOT a possible solution; even forgetting about the difficulties in properly splitting the file at the right positions (not in the middle of a word), many consecutive requests to the abovementioned web service api will randomly result in empty responses (Google says there's a usage limit of 50 requests per day, but as usual they don't disclose the details of the real usage limits which clearly restrict bursts of requests).
So, all this would seem to indicate that getting a transcription of speech while at the same time recording the input into an audio file in Android is IMPOSSIBLE.
HOWEVER, the Google Keep Android app does exactly that.
It allows you to speak, transcrbes what you said into text, and saves both the text and the audio recording (well it's not clear where it stores it, but you can replay it).
And it has no length limitation.
So the question is: DOES ANYBODY HAVE AN IDEA OF HOW GOOGLE KEEP DOES IT?
I would look at the source code but it doesn't seem to be available, is it?
I sniffed the packets Google Keep sends and receives while doing speech recognition, and it definitely does NOT use the speech api mentioned above. All the traffic is TLS and (from the outside) it looks pretty much the same as when you're using SpeechRecognizer.
So does perhaps a way exist to kind of "split" (i.e. duplicate, or multiplex) the microphone input stream into two streams, and feed one of them to a SpeechRecognizer and the other to a MediaRecorder?
Google Keep launches RecognizerIntent with certain undocumented extras and expects the resulting intent to contain the URI of the recorded audio. If RecognizerIntent is serviced by Google Voice Search then it all works out and Keep gets the audio.
See record/save audio from voice recognition intent for more information and a code sample that calls the recognizer in the same way as Keep (probably) does.
Note that this behavior is not part of Android. It's simply the current undocumented way of how two closed-source Google apps communicate with each other.
It uses onPartialResults(Bundle)
This event returns text recognized from recorded speech while it's still recording
It's also available on Xamarin
I'm building an app that includes speech recognition - I intend to use the Android speech recognition service or the voice typing functionality.
From what I have read, the speech is mostly processed in the cloud. The question I have is whether anyone knows what format the audio is sent to the cloud in? For example, is something like WAV or MP3 or PCM, or is it likely to be something else entirely?
I admit this is mostly out of plain curiosity to know a bit more of what is going on behind the scenes. (But partly it also relates to an interest in the impacts of pre and post processing on recognition.)
Well , I've been looking for that info too , and the closest thing I could get to was the Google's speech recognition API for chrome which used FLAC audio codec. I'm not sure if android uses it too, but it is the closest thing I ever get.
my app uses the RecognizerIntent to record the user's voice and doing some speech recognition.
Now, I'd like to compare the results to some open source speech recognition engines. Most of them take an audio file as input. My thought was, to capture the sound from the Android's microphone, and start the RecognizerIntent at the same time. But it seems, that accessing the microphone is exclusive.
Is it possible to use the RecognizerIntent with a recorded audio stream?
Is it possible to access the microphone simultaneously with two Activites?
I have tried to find a solution to the same problem and have not had success. One other approach we explored was to access the web service that Google uses for recognition. I posted a question at Google's voice search speech recognition service, but it still goes unanswered.
There was a good post at Voice recognition on android with recorded sound clip? that dealt with this question and I believe the answer came from a Google employee.
Unfortunately the answer to both of your questions is no, but there are plans to extend this for Gingerbread and 3.0: http://www.mobiclue.com/android-3-0-gingerbread-features-supported-phones.html
I know for sure that is possible to use RecognizerIntent and save the audio, the question is how?
You can see in Google Keep Android applications doing it once you click in the microphone.