I have implemented Speech to Text using RecognizerIntent and its working perfectly.
But I need to modify the working of it and want to add pause time while user is speaking, as practically user might stop for a while and then speak again, so I want the voice search to stay for few seconds for example 5 seconds if no voice heard then only it stops and process the speech.
I have tried using services buts its not working as desired. Prefer code examples.
[Implement something similar when I turn on Speech to Text key on Xperia Z3 keyboard, it accepts speech till user taps pause]
Thanks
full duplex example provides your needed feature ( handle pause inline )
This is diff implementation from RecognizerIntent and operates on a more complicated setup for handling mic's audio stream and for handling network connections for processing streams ( audio/UP , text/DOWN ).
So, if you want to use streaming AND continuously recognized speech that goes on until you signal the end of INPUT ( like click event on the mic icon in example ) it can be alot more involved.
background
google API sample
IBM API sample
They are complicated. Either can be implemented on a good , android , httpclient.
Related
I have heard a lot about blocking the call recording in android 9. I want to understand is it possible to get audio stream in a background service. Every time a user make a Call. If anyone can help me here what are approach to achieve this like giving notification/asking permission to user for this or building a system app or a plugin or a service. For me Audio stream are required to convert it into speech to text.
i got reference from android documentation for incallService,building a calling App and implementing real time text. I have no idea how to connect these three together to get Audio Stream in background service
I'm looking to add voice commands to an Android App that will be running on a tablet as a kiosk. I don't want the user to have to push a button, because the user is doing something more important (e.g. driving a car, flying a plane, or performing brain surgery) and the command could be completed by a single button push.
I see tutorials describing how to add speech to text and have the user push a button and get the text, but nothing allowing the wake word "Okay, Google" to start the voice recognition (much less a custom wake word).
I looked at using the Google Voice Actions to start with "Okay, Google" and then send something to my app (register an intent), but that has to be trained to one specific user (at least for the tablet I tried it on). I'll have different users every day (maybe more than one a day) and no opportunity for training the device.
I've worked with CMUSphinx and found it to be too unreliable for spotting a wake word.
Is there a way to add "Okay, Google" as a way to start listening to text inside my app?
Got it working using PocketSphinx for offline wake work recognition and then I hand the microphone over to IBM's Watson's Speech to Text software that works over the internet and comes back with pretty reliable results.
Unfortunately what you are trying to achieve is not possible. If I understood correctly what your concept: a 3rd party app will awaken the devices and act based on a set of commands (from a security point of view this is very bad).
The closest you can do is follow the Voice Actions Api - https://developers.google.com/voice-actions/system/
I've implemented a relatively simple test application for a customer who owns a container repair facility. His aim is to deliver to his operators a tool with (possibly) voice interaction. The app basicly works well, using Google's Speech API. The annoying problem are the notification sounds when you launch the recognition intent and the subsequent timeout notification if the user doesn't speak within 4 seconds. I'm intercepting all the errors, so I can relaunch the recognizer, but it's not so comfortable hearing this couple of notification every 4 seconds, especially when you're awaiting the next container to check. A partial workaround could be the implementation of a sound trigger like the "Ok Google" feature found, for example, on my Samsung S6, but I'm not able to find info about that. The app is written with Xamarin, but it has been already ported under Android Studio in order to test the Nuance library, so if there isn't a chance for implementing "Ok Google" trigger under Xamarin, also any java suggestion would be very welcome. Obviously I don't need "Ok Google" but anoter trigger, like "inizio" or "start check", that is a user-defined trigger (or set of triggers).
Thanks.
Rodolfo
I am thinking about making an app that I can use to control my Arduino robot (over bluetooth/wifi) using voice commands. But to make the experience fluid, I will need the Android app speech recognition to be continuously running. If I want the robot to stop, I don't want to press a button, wait for the speech recognition dialog to appear, say my command "STOP", release the button, wait for the parser to parse it, and then send the stop command.
I would rather just have the Speech to Text in continuous listen mode when I am controlling my robot. And when it hears keywords, it sends them.
Can I do this in Android? I did some googling, and I found the recognizer intent, but all of the examples I found use a button trigger and pretty much followed the scenario I described above.
It can be done. Look at this link. It has also some example code :)
You can make it listen and when it speeches you get the word see if it is a keyword and then make the robot do as you want.
http://viralpatel.net/blogs/android-speech-to-text-api/
Introduction
Android provides two ways for me to use speech recognition.
The first way is by an Intent, as in this question: Intent example. A new Activity is pushed onto the top of the stack which listens to the user, hears some speech, attempts to transcribes it (normally via the cloud) then returns the result to my app, via an onActivityResult call.
The second is by getting a SpeechRecognizer, like the code here: SpeechRecognizer example. Here, it looks like the speech is recorded and transcribed on some other thread, then callbacks bring me the results. And this is done without leaving my Activity.
I would like to understand the pros and cons of these two ways of doing speech recognition.
What I've got so far
Using the Intent:
is simple to code
avoids reinventing the wheel
gives consistent user experience of speech recognition across the device
but
might be slow for the creation of a new activity with it's own window
Using the SpeechRecognizer:
lets me retain control of UI in my app
gives me extra possibilities of things to respond to (documentation)
but
is limited to be called from the main thread
more control requires more error-checking.
In addition to all this, I'd add at least this point:
SpeechRecognizer is better for hands-free user interfaces, since your app actually gets to respond to error conditions like "No matches" and perhaps restart itself. When you use the Intent, the app beeps and shows a dialog that the user must press to continue.
My summary is as follows:
SpeechRecognizer
Show different UI or no UI at all. Do you really want your app's UI to beep? Do you really want your UI to show a dialog when there is an error and wait for user to click?
App can do something else while speech recognition is happening
Can recognize speech while running in the background or from a
service
Can Handle errors better
Can access low level speech stuff like the raw audio or the RMS. Analyze that audio or use the loudness to make some kind of flashing light to indicate the app is listening
Intent
Consistent, and easy to use UI for users
Easy to program
The main difference is UI. SpeechRecognizer doesn't have any so you are responsible for creating one.
I use to wrote a prototype where I've have receiver for listening headset button, then activating speech recognition to listen for some commands. Screen was not activated so I had to use SpeechRecognizer (my UI was some prerecorded sounds and Text To Speech).
Second difference is that SpeechRecognizer has ability for constant listening. Intent version will always end exaction after some period. For example SpeechRecognizer is used by speech recognition "keyboard" so you can dictate a SMS.
In such case you will receive partial results only (in normal mode SpeechRecognizer gives only final results).
One thing that the other answers have not mentioned: if multiple speech recognizers are installed on the device then user switching between them is different depending on if "Intent" or the SpeechRecognizer is used.
In case of "Intent" the standard Activity selection dialog is popped up. The user can choose the recognizer to be used, and optionally set it globally as the default recognizer, to avoid the dialog in the future.
In case of SpeechRecognizer the user can set and configure the default recognizer in the global settings (Language and input -> Voice recognizer on ICS).
So, depending on which interface is used the documentation about setting the default recognizer and switching between recognizers should be different. (In most cases though there is just one recognizer, Google Voice Search, so this might not be a big issue in practice.)