I'm able to integrate Android widgets with Google Assistant. And want to have some voice command experience.
For example the CREATE_CALL intent, if user is trying to call Alice by saying call Alice with some app, and if there are 2 Alice in my app, is it possible for me to response with a widget showing 2 Alice, and asking user by voice, and user can choose which one to actually call, all by voice? Can it be done by SpeechRecognizer API?
Broadly speaking, App Actions do not have a voice conversation experience. There are some tricks you can pull that might head in that direction, but they are largely outside of the App Action Widget experience itself.
Can I respond with a widget showing that there are multiple matches?
Yes, you can send back a Control Widget that might allow them to choose which user they mean.
Can they speak which user?
Probably not in the way you're thinking. To use your example, they can re-invoke the CREATE_CALL BII using any of the phrases, but you can't prompt them with "Who did you mean, exactly?" and for them to just say the name.
Can I use the SpeechRecognizer API?
Not as part of a widget.
Widgets get embedded in the conversation with the Assistant.
In theory (and this is on my list to eventually test and figure out), you should be able to deep link to an Android Intent in cases such as this and open a view. While there, you could use SpeechRecognizer or just open the microphone to send audio somewhere. But this isn't done using the Widget itself.
In this scenario, SpeechRecognizer just does the Speech To Text (STT) or Automatic Speech Recognition (ASR) part of the processing. To actually match this up to phrases to determine an Intent, you would need a Natural Language Understanding (NLU) module such as Dialogflow. (But you may not need the SpeechRecognizer in that particular case, since Dialogflow can also take an audio stream to do the ASR part for you.)
Related
I'm looking to add voice commands to an Android App that will be running on a tablet as a kiosk. I don't want the user to have to push a button, because the user is doing something more important (e.g. driving a car, flying a plane, or performing brain surgery) and the command could be completed by a single button push.
I see tutorials describing how to add speech to text and have the user push a button and get the text, but nothing allowing the wake word "Okay, Google" to start the voice recognition (much less a custom wake word).
I looked at using the Google Voice Actions to start with "Okay, Google" and then send something to my app (register an intent), but that has to be trained to one specific user (at least for the tablet I tried it on). I'll have different users every day (maybe more than one a day) and no opportunity for training the device.
I've worked with CMUSphinx and found it to be too unreliable for spotting a wake word.
Is there a way to add "Okay, Google" as a way to start listening to text inside my app?
Got it working using PocketSphinx for offline wake work recognition and then I hand the microphone over to IBM's Watson's Speech to Text software that works over the internet and comes back with pretty reliable results.
Unfortunately what you are trying to achieve is not possible. If I understood correctly what your concept: a 3rd party app will awaken the devices and act based on a set of commands (from a security point of view this is very bad).
The closest you can do is follow the Voice Actions Api - https://developers.google.com/voice-actions/system/
I don't want my app to control and execute some tasks on users phone, like setting alarm or calling someone. I have found many tutorials that focused on accomplishing this. I want to collect data using the speech recognizer API that goes to an online server, and later can be requested by multiple users.
For example:
"OK Google! find the nearby hospitals for me", I don't want this.
"Facebook was founded by Mark Zuckerberg in 2004", I want this.
The text data (collected from the recognition api) would hardly be 500 characters long and because the speech to text technology can make mistakes, I will allow the users to do the final check and make necessary changes by typing. Is Android's speech recognition suited for this kind of application? Or should I go for Google cloud APIs? I want to go with the former because it's completely free (No?). I want the recognition to be as accurate as possible and that's why I don't wanna leave Google's APIs.
You can invoke the speech recognizer Activity on Android using the Intent
new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
and receive the recognized text as result.
See details at https://developer.android.com/training/wearables/apps/voice.html
I'm making an app that will work with Android Wear,
And I wanted to implement a command into Google's "Ok Google" option.
I saw this page:
http://developer.android.com/training/wearables/apps/voice.html
But it's related only to apps that include Activities in the Android Wear.
I wanted to ask:
Can I add custom commands? I mean, those who does not start with the word "Start"?
Can I add commands that will do another thing than just opening the app? Like running a method?
If it's not the place to ask this, can you give me an email/link to Google Developers help/support? thanks.
For apps that run on the Android Wear Device:
No, the list of system-provided voice actions is fixed (and listed here). You can set your application to be able to respond to them (for example, to take a note), but you cannot add new ones.
Yes. When already inside your app, you you can use startActivityForResult() using the ACTION_RECOGNIZE_SPEECH to get voice input. You could then use the returned string to execute whatever you want.
Meanwhile, if you're just displaying notifications from an Android app running in a handheld, then you cannot presently have voice actions at all (at least in a literal sense). What you may have, though, is a notification action that requests voice input. That input will then be passed as an extra in the Intent that is delivered to the app in the handheld.
Introduction
Android provides two ways for me to use speech recognition.
The first way is by an Intent, as in this question: Intent example. A new Activity is pushed onto the top of the stack which listens to the user, hears some speech, attempts to transcribes it (normally via the cloud) then returns the result to my app, via an onActivityResult call.
The second is by getting a SpeechRecognizer, like the code here: SpeechRecognizer example. Here, it looks like the speech is recorded and transcribed on some other thread, then callbacks bring me the results. And this is done without leaving my Activity.
I would like to understand the pros and cons of these two ways of doing speech recognition.
What I've got so far
Using the Intent:
is simple to code
avoids reinventing the wheel
gives consistent user experience of speech recognition across the device
but
might be slow for the creation of a new activity with it's own window
Using the SpeechRecognizer:
lets me retain control of UI in my app
gives me extra possibilities of things to respond to (documentation)
but
is limited to be called from the main thread
more control requires more error-checking.
In addition to all this, I'd add at least this point:
SpeechRecognizer is better for hands-free user interfaces, since your app actually gets to respond to error conditions like "No matches" and perhaps restart itself. When you use the Intent, the app beeps and shows a dialog that the user must press to continue.
My summary is as follows:
SpeechRecognizer
Show different UI or no UI at all. Do you really want your app's UI to beep? Do you really want your UI to show a dialog when there is an error and wait for user to click?
App can do something else while speech recognition is happening
Can recognize speech while running in the background or from a
service
Can Handle errors better
Can access low level speech stuff like the raw audio or the RMS. Analyze that audio or use the loudness to make some kind of flashing light to indicate the app is listening
Intent
Consistent, and easy to use UI for users
Easy to program
The main difference is UI. SpeechRecognizer doesn't have any so you are responsible for creating one.
I use to wrote a prototype where I've have receiver for listening headset button, then activating speech recognition to listen for some commands. Screen was not activated so I had to use SpeechRecognizer (my UI was some prerecorded sounds and Text To Speech).
Second difference is that SpeechRecognizer has ability for constant listening. Intent version will always end exaction after some period. For example SpeechRecognizer is used by speech recognition "keyboard" so you can dictate a SMS.
In such case you will receive partial results only (in normal mode SpeechRecognizer gives only final results).
One thing that the other answers have not mentioned: if multiple speech recognizers are installed on the device then user switching between them is different depending on if "Intent" or the SpeechRecognizer is used.
In case of "Intent" the standard Activity selection dialog is popped up. The user can choose the recognizer to be used, and optionally set it globally as the default recognizer, to avoid the dialog in the future.
In case of SpeechRecognizer the user can set and configure the default recognizer in the global settings (Language and input -> Voice recognizer on ICS).
So, depending on which interface is used the documentation about setting the default recognizer and switching between recognizers should be different. (In most cases though there is just one recognizer, Google Voice Search, so this might not be a big issue in practice.)
I've got an idea for an android app, I want to be able to say commands and have the application listen out for these and perform some action.
For example, I want my app to sit idle and listen for my voice, when it hears me say "start", the app will start doing something until I say "stop".
The idea is to lay the phone down and not have to physically touch it in order to control my app.
Would this be possible with any current APIs? If so which ones should I look into?
You can take a look at the Google voice commands.
http://www.google.com/mobile/voice-actions/
Alternatively, if you want to customise your application, you can use the google voice service and write an activity that will invoke the voice service and return you the result.
Check out the below link for the sample application.
http://developer.android.com/resources/samples/ApiDemos/src/com/example/android/apis/app/VoiceRecognition.html