Automatically Timeout RecognizerIntent - android

I am playing around with Android's speech recognition and would like to be able to timeout the voice recognition intent. I am creating and starting the speech recognition intent, based on the Android API example code, and it works fine. What I would like is the ability to to automatically cancel/timeout the speech detection if there is no audio input after N milli seconds. In otherwords, listen for speech and if there is none after a short time return to the activity that started the intent. Is this possible? I looked at the documentation for RecognizerIntent and there were no extra fields for the intent for doing this.

Have you tried adjusting the parameters in the RecognizerIntent such as
EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS
that work for both the standard sending the Intent and for using the SpeechRecognizer class directly.

One thing that comes to mind is that you set up a Post Delayed Handler. This handler will wait x seconds and then cancel the RecognizerIntent if time runs out and it is still up.

Related

Android add pause RecognizerIntent for speech to text

I have implemented Speech to Text using RecognizerIntent and its working perfectly.
But I need to modify the working of it and want to add pause time while user is speaking, as practically user might stop for a while and then speak again, so I want the voice search to stay for few seconds for example 5 seconds if no voice heard then only it stops and process the speech.
I have tried using services buts its not working as desired. Prefer code examples.
[Implement something similar when I turn on Speech to Text key on Xperia Z3 keyboard, it accepts speech till user taps pause]
Thanks
full duplex example provides your needed feature ( handle pause inline )
This is diff implementation from RecognizerIntent and operates on a more complicated setup for handling mic's audio stream and for handling network connections for processing streams ( audio/UP , text/DOWN ).
So, if you want to use streaming AND continuously recognized speech that goes on until you signal the end of INPUT ( like click event on the mic icon in example ) it can be alot more involved.
background
google API sample
IBM API sample
They are complicated. Either can be implemented on a good , android , httpclient.

Comparison of Speech Recognition use in Android: by Intent or on-thread?

Introduction
Android provides two ways for me to use speech recognition.
The first way is by an Intent, as in this question: Intent example. A new Activity is pushed onto the top of the stack which listens to the user, hears some speech, attempts to transcribes it (normally via the cloud) then returns the result to my app, via an onActivityResult call.
The second is by getting a SpeechRecognizer, like the code here: SpeechRecognizer example. Here, it looks like the speech is recorded and transcribed on some other thread, then callbacks bring me the results. And this is done without leaving my Activity.
I would like to understand the pros and cons of these two ways of doing speech recognition.
What I've got so far
Using the Intent:
is simple to code
avoids reinventing the wheel
gives consistent user experience of speech recognition across the device
but
might be slow for the creation of a new activity with it's own window
Using the SpeechRecognizer:
lets me retain control of UI in my app
gives me extra possibilities of things to respond to (documentation)
but
is limited to be called from the main thread
more control requires more error-checking.
In addition to all this, I'd add at least this point:
SpeechRecognizer is better for hands-free user interfaces, since your app actually gets to respond to error conditions like "No matches" and perhaps restart itself. When you use the Intent, the app beeps and shows a dialog that the user must press to continue.
My summary is as follows:
SpeechRecognizer
Show different UI or no UI at all. Do you really want your app's UI to beep? Do you really want your UI to show a dialog when there is an error and wait for user to click?
App can do something else while speech recognition is happening
Can recognize speech while running in the background or from a
service
Can Handle errors better
Can access low level speech stuff like the raw audio or the RMS. Analyze that audio or use the loudness to make some kind of flashing light to indicate the app is listening
Intent
Consistent, and easy to use UI for users
Easy to program
The main difference is UI. SpeechRecognizer doesn't have any so you are responsible for creating one.
I use to wrote a prototype where I've have receiver for listening headset button, then activating speech recognition to listen for some commands. Screen was not activated so I had to use SpeechRecognizer (my UI was some prerecorded sounds and Text To Speech).
Second difference is that SpeechRecognizer has ability for constant listening. Intent version will always end exaction after some period. For example SpeechRecognizer is used by speech recognition "keyboard" so you can dictate a SMS.
In such case you will receive partial results only (in normal mode SpeechRecognizer gives only final results).
One thing that the other answers have not mentioned: if multiple speech recognizers are installed on the device then user switching between them is different depending on if "Intent" or the SpeechRecognizer is used.
In case of "Intent" the standard Activity selection dialog is popped up. The user can choose the recognizer to be used, and optionally set it globally as the default recognizer, to avoid the dialog in the future.
In case of SpeechRecognizer the user can set and configure the default recognizer in the global settings (Language and input -> Voice recognizer on ICS).
So, depending on which interface is used the documentation about setting the default recognizer and switching between recognizers should be different. (In most cases though there is just one recognizer, Google Voice Search, so this might not be a big issue in practice.)

voice recognition based on level of voice (noise) intensity?

I want to build an android application which will recognize my voice, convert it into text and will show what i just spoke in a toast. i am able to do this by using a button which will launch voice recognizer for me. But now i want to make it work on the bases of my voice only.
The application should trigger voice recognizer and start listening to me only when i start speaking and should stop listening when it senses silence. Just like the functioning of talking tom application. There it records the voice but i want to recognize it using voice recognizer. Some thing like this:
if(no silense)
Launch Recognizer
else if(silence)
Stop Recognizer
Show toast
The main problem is that how can i sense if user is speaking something or not before launching voice recognizer. Is there any way to sense noise intensity..??
Secondly, is there any way to launch voice recognizer in the background...??
Is it possible if I can detect audio signal (someone starts speaking) in a background service, which will then immediately launch the voice recognizer to recognize the speech.
Most speech recognizers already have an endpointer to detect the start-of-speech and end-of-speech. Endpointers usually try to read the ambient noise level to determine a baseline for silence and to adapt the signal-to-noise ratio. But, if the input noise level changes, it might trigger the start-of-speech of the endpointer. If listening all the time, with a sensitive microphone, the endpointer might also pickup someone speaking next to you, instead of you.
As such, using a speech button is a good practice to announce when you wish to talk. Trying to get the recognizer to listen all of the time is probably not what you want to do, or should be left up to researchers.
Ok I have figured it out. I have used mediaRecorder class for this. When the application launches i start recording the audio using mediaRecoder (or you can provide a button to start and stop the whole process). I check for the amplitude of the audio being recorded by the mediaRecorder. If the amplitude passes over a predefined threshold, I pause the recording and launch the Voice Recognition activity. In OnActivityResult I again resume the recorder.
if(mRecorder != null){
int i= mRecorder.getMaxAmplitude(); // Getting amplitude
Log.d("AMPL : ", String.valueOf(i));
if(i>20000){ // If amplitude is more than 20000
onRecord(false); //Stop recording before launching recognizer
Intent intent=new Intent(this,VoiceRecognizer.class); //Launch recognizer activity
startActivityForResult(intent, 12112);
}
Alternatively: You can also use RecognitionListener interface as referred in this SO post.

Speech Recognition Service in Android

I have an Android application that uses speech recognition in an Activity. The GUI doesn't do anything except for contain the speech recognition objects. I would like to port this over to a service so I can talk to the application while it's running in the background.
However, as far as I know, the speech recognition service has to use onActivityResult, which is unavailable for Services. Is there a way to either contain an Activity in a Service such that its GUI is not displayed, or perform speech recognition in a service instead of an activity?
See Google's voice search speech recognition service - it might have some useful links to information. I don't think you can do non-Gui voice recognition because the recognizer is only exposed as the recognizer intent.
I don't think that Google wants people to call this service directly, and it likely violates some terms of service somewhere if you do, but check out http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ to see the service behind Chrome speech recognition which I suspect is similar to Android.
what if you have your service wake up an activity when it detects any incoming audio signal,
that acts like a widget only taking up a small part of the screen or even just a single pixel, then call voice recognition from the invisible activity?
Just an idea, I don't remember if a widget can be an activity or if you can make an activity that doesn't take up the screen.

TTS *and* Speech Input simultaneously?

I noticed that as soon as a voice recognition activity starts, text-to-speech output stops.
I understand the rational: TTS output could be "heard" by the voice recognition engine and interfere with its proper operation.
My question: Is this behavior hard-coded into the system, or can it be modified by a setting or parameter (in the API)?
Must the activity simultaneously use recognition and TTS? If the recognition can wait (functionally speaking), force the event to spawn the RecognizerIntent only onUtteranceCompleted
This is pure speculation, but there must be some common feature that can only be used by TTS and recognition one at a time (both apis come from android.speech.*)

Categories

Resources