I want to build an android application which will recognize my voice, convert it into text and will show what i just spoke in a toast. i am able to do this by using a button which will launch voice recognizer for me. But now i want to make it work on the bases of my voice only.
The application should trigger voice recognizer and start listening to me only when i start speaking and should stop listening when it senses silence. Just like the functioning of talking tom application. There it records the voice but i want to recognize it using voice recognizer. Some thing like this:
if(no silense)
Launch Recognizer
else if(silence)
Stop Recognizer
Show toast
The main problem is that how can i sense if user is speaking something or not before launching voice recognizer. Is there any way to sense noise intensity..??
Secondly, is there any way to launch voice recognizer in the background...??
Is it possible if I can detect audio signal (someone starts speaking) in a background service, which will then immediately launch the voice recognizer to recognize the speech.
Most speech recognizers already have an endpointer to detect the start-of-speech and end-of-speech. Endpointers usually try to read the ambient noise level to determine a baseline for silence and to adapt the signal-to-noise ratio. But, if the input noise level changes, it might trigger the start-of-speech of the endpointer. If listening all the time, with a sensitive microphone, the endpointer might also pickup someone speaking next to you, instead of you.
As such, using a speech button is a good practice to announce when you wish to talk. Trying to get the recognizer to listen all of the time is probably not what you want to do, or should be left up to researchers.
Ok I have figured it out. I have used mediaRecorder class for this. When the application launches i start recording the audio using mediaRecoder (or you can provide a button to start and stop the whole process). I check for the amplitude of the audio being recorded by the mediaRecorder. If the amplitude passes over a predefined threshold, I pause the recording and launch the Voice Recognition activity. In OnActivityResult I again resume the recorder.
if(mRecorder != null){
int i= mRecorder.getMaxAmplitude(); // Getting amplitude
Log.d("AMPL : ", String.valueOf(i));
if(i>20000){ // If amplitude is more than 20000
onRecord(false); //Stop recording before launching recognizer
Intent intent=new Intent(this,VoiceRecognizer.class); //Launch recognizer activity
startActivityForResult(intent, 12112);
}
Alternatively: You can also use RecognitionListener interface as referred in this SO post.
Related
I use the speech recognition and text-to-speech but I would like to mute the "beep" sound of the speech recognition and then unmute for hear the vocal synthesis.
I succeed to mute but when I want to set the volume at its maximum, it applies to the phone and not to my app.
How to manage this ?
Thanks
There is an answer you can refer to here regarding how to loop the recognition and silence the beep.
Whenever you call setStreamMute() it is for the entire device, not just your application.
The issue here is that the Google Search Application (4.1+) is controlling the beep and the audio, it is not part of the recognition API.
If you open Google Now whilst you have music playing and press the listen button, you'll note that the music stops until the recognition and voice interaction finishes, this is because the app is 'ducking' the audio.
There is nothing as developers we can do about this behaviour (other than use another Speech Recognition Provider) and it's frustrating, as voiced here.
Until we manage to persuade Google to allow us to pass parameters such as 'offline' and 'no audio prompt' in the Recognition Intent, there's nothing we can do but rant.....
Introduction
Android provides two ways for me to use speech recognition.
The first way is by an Intent, as in this question: Intent example. A new Activity is pushed onto the top of the stack which listens to the user, hears some speech, attempts to transcribes it (normally via the cloud) then returns the result to my app, via an onActivityResult call.
The second is by getting a SpeechRecognizer, like the code here: SpeechRecognizer example. Here, it looks like the speech is recorded and transcribed on some other thread, then callbacks bring me the results. And this is done without leaving my Activity.
I would like to understand the pros and cons of these two ways of doing speech recognition.
What I've got so far
Using the Intent:
is simple to code
avoids reinventing the wheel
gives consistent user experience of speech recognition across the device
but
might be slow for the creation of a new activity with it's own window
Using the SpeechRecognizer:
lets me retain control of UI in my app
gives me extra possibilities of things to respond to (documentation)
but
is limited to be called from the main thread
more control requires more error-checking.
In addition to all this, I'd add at least this point:
SpeechRecognizer is better for hands-free user interfaces, since your app actually gets to respond to error conditions like "No matches" and perhaps restart itself. When you use the Intent, the app beeps and shows a dialog that the user must press to continue.
My summary is as follows:
SpeechRecognizer
Show different UI or no UI at all. Do you really want your app's UI to beep? Do you really want your UI to show a dialog when there is an error and wait for user to click?
App can do something else while speech recognition is happening
Can recognize speech while running in the background or from a
service
Can Handle errors better
Can access low level speech stuff like the raw audio or the RMS. Analyze that audio or use the loudness to make some kind of flashing light to indicate the app is listening
Intent
Consistent, and easy to use UI for users
Easy to program
The main difference is UI. SpeechRecognizer doesn't have any so you are responsible for creating one.
I use to wrote a prototype where I've have receiver for listening headset button, then activating speech recognition to listen for some commands. Screen was not activated so I had to use SpeechRecognizer (my UI was some prerecorded sounds and Text To Speech).
Second difference is that SpeechRecognizer has ability for constant listening. Intent version will always end exaction after some period. For example SpeechRecognizer is used by speech recognition "keyboard" so you can dictate a SMS.
In such case you will receive partial results only (in normal mode SpeechRecognizer gives only final results).
One thing that the other answers have not mentioned: if multiple speech recognizers are installed on the device then user switching between them is different depending on if "Intent" or the SpeechRecognizer is used.
In case of "Intent" the standard Activity selection dialog is popped up. The user can choose the recognizer to be used, and optionally set it globally as the default recognizer, to avoid the dialog in the future.
In case of SpeechRecognizer the user can set and configure the default recognizer in the global settings (Language and input -> Voice recognizer on ICS).
So, depending on which interface is used the documentation about setting the default recognizer and switching between recognizers should be different. (In most cases though there is just one recognizer, Google Voice Search, so this might not be a big issue in practice.)
I am playing around with Android's speech recognition and would like to be able to timeout the voice recognition intent. I am creating and starting the speech recognition intent, based on the Android API example code, and it works fine. What I would like is the ability to to automatically cancel/timeout the speech detection if there is no audio input after N milli seconds. In otherwords, listen for speech and if there is none after a short time return to the activity that started the intent. Is this possible? I looked at the documentation for RecognizerIntent and there were no extra fields for the intent for doing this.
Have you tried adjusting the parameters in the RecognizerIntent such as
EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS
that work for both the standard sending the Intent and for using the SpeechRecognizer class directly.
One thing that comes to mind is that you set up a Post Delayed Handler. This handler will wait x seconds and then cancel the RecognizerIntent if time runs out and it is still up.
I noticed that as soon as a voice recognition activity starts, text-to-speech output stops.
I understand the rational: TTS output could be "heard" by the voice recognition engine and interfere with its proper operation.
My question: Is this behavior hard-coded into the system, or can it be modified by a setting or parameter (in the API)?
Must the activity simultaneously use recognition and TTS? If the recognition can wait (functionally speaking), force the event to spawn the RecognizerIntent only onUtteranceCompleted
This is pure speculation, but there must be some common feature that can only be used by TTS and recognition one at a time (both apis come from android.speech.*)
I have an Android application that begins recording from the microphone when the application starts. In my current version, the user must press a STOP button to stop recording.
How do I detect that the user has stopped talking and use that to trigger the recorder to stop?
Similar to what is implemented in the Speech Recognition functionality in Android. The user stops talking and then the speech is translated. I have seen other apps that do it, like Talking Tom type apps.
As a side note I would also love to show some type of visual indicating that the microphone is receiving sound. Something to show the sound level coming in.
Any help appreciated.
An approach is to use threads on recording and the speech power analyzing process on the recorded bytes,
there's a sample code for your reference: http://musicg.googlecode.com/files/musicg_android_demo.zip
What are you using to record audio? This may provide some clues:
android.media.MediaRecorder:
the constant MEDIA_RECORDER_INFO_MAX_DURATION_REACHED can be used with an onInfoListener.
android.speech.SpeechRecognizer:
attach a RecognitionListener and call onEndofSpeech().