I am started programming sl4a (in QPython) and it is really great. Now I tried to use the droid.recognizeSpeech function. This one works fine too, but I like to get it in the background listening for a keyword, like Google's 'OK Google'.
So I looked around, but cannot find anything. I don't know how I can implement it.
So I ask you, can someone tell me, if it is possible, how to make recognize speech always listening in the background waiting for a keyword?
I've toyed with the idea of doing this, but never found any useful practical application for it. So here's a summary of my research, I hope it's enough to get you started:
1. The Speech Recognizer facade has multiple parameters. Usually, everyone puts "none" in all of them except the first. Here's the facade in it's actuality:
recognizeSpeech:
Recognizes user's speech and returns the most likely result.
prompt (String) text prompt to show to the user when asking them to speak (optional)
language (String) language override to inform the recognizer that it should expect speech in a language different than the one set in the java.util.Locale.getDefault() (optional)
languageModel (String) informs the recognizer which speech model to prefer (see android.speech.RecognizeIntent) (optional)
returns: (String) An empty string in case the speech cannot be recognized.
So you're looking for the languageModel in this case, that option is restricted to two types. A web search model and a free-form speech model. You're looking for the free-form speech model in this case. Here's a little more info on this model from the horse's mouth:
Google on the Free-Form Language Model
Once you've looked at the free-form speech model, what should help you is Chrome's continuous speech recognition model, which should share a lot of the same characteristics of the Free-Form language model. Hope this helps set you on the right direction
Related
I have an iOS and an Android app that do Speech to Text recognition using Nuance Mobile SDK. Both apps work perfectly when using a Nuance sandbox appID.
However, when if I upload a phrase file, the Nuance server always returns zero results, as I can verify in the "didFinishWithResults" methods on both android and ios.
This is the phrase file I upload as a custom vocabulary to Nuance:
<phrases>
<phrase>two on to</phrase>
<phrase>bet 1 3 for</phrase>
<phrase>...and some other phrases.</phrase>
</phrases>
My Nuance custom dictionary is set to:
Topic:WebSearch
Grammar recognition mode: YES (<== Apps work perfectly when set to NO)
Vocabulary Weight: 90
Nuance's documentation claims that:
It is important to note that custom vocabularies are different than the constrained speech
recognition grammars many developers are familiar with. Using constrained grammars results in
high accuracy of words that are in the grammar and low (or no) recognition of words or phrases
that are not in the grammar. With custom vocabularies our large language models are still used
and the vocabulary simply adjusts the recognition probabilities of that large model. As a result,
using a vocabulary will not change your results as much as a conventional grammar would. For
example, you can expect to still get word results that are not in the vocabulary or “out of
grammar.” The grammar recognition mode feature makes the vocabulary act much more like a
traditional recognition grammar but as we are still using the underlying language model even
then you may get “out of grammar” results.
.....So my question is, What am I doing wrong to always get zero results from Nuance ASR when the custom vocabulary's grammar recognition mode is set to YES?
Nuance's customer support is totally useless, they probably outsource a bunch of people with just an FAQ in hand that have no idea about anything.
I hope somebody can help me out on this one.
I am trying to implement voice recognition in Android. I have followed various tutorials for the same which say, we need to give call to RecognizerIntent with RecognizerIntent.ACTION_RECOGNIZE_SPEECH and start activity for result. So when we speak,we get a set of values from the google server by providing RecognizerIntent.EXTRA_RESULTS. This is working fine. But what I need to do is, I need to provide a set of strings to the voice recognition engine, so when we say something, it will match what we have said with the provided set of strings and returns only one matched string.So I need to provide the recognizer engine some set of values from which it should give me the matched word. Can this be done?
This can be done with pocketsphinx, see keyword matching mode in tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialandroid
Is it possible to restrict voice search search widget to look for a match near to a given set of words. For example if I am using it to search over a list of names, its not meaningful as names are often corrected to some words.
If you use 3rd party Android recognition from Nuance (The people behind DragonDictate), it supports a "grammar mode" where you can somewhat restrict the phrases that will be recognised during recognition.
Importantly, if you add unusual names into a Custom Vocabulary, they SHOULD become recognizable (Complex pronunciation issues aside)
You can find information if you dig through:
http://dragonmobile.nuancemobiledeveloper.com ,
looking for 'Custom Vocabularies'. Grammar mode is essentially a special mode of custom vocabularies.
At the time of writing, there was a document here that makes some mention of grammar mode:
http://dragonmobile.nuancemobiledeveloper.com/downloads/custom_vocabulary/Guide_to_Custom_Vocabularies_v1.5.pdf - It only really becomes clear when you try to progress in their provisioning web GUI.
You have to set up an account, and jump through other hoops, but there is a free tier. This is the only potential way I have found to constrain a recognition vocabulary.
Well, short of running up PocketSphinx, but that is still described as a 'Research' 'PreAlpha'.
No, I don't work for Nuance. Not sure anyone does. They may have all been eaten by zombies. You would guess as much reading their support forums. They never reply.
No, unfortunately this is not possible.
You could also look at recognition from AT&T.
They have a very feature-rich web API, including full grammar support. I only found out about it recently!
1,000,000 transactions per month for free. Generous!
Look for 'AT&T API Program'. Weird name.
Link at time of writing:
http://developer.att.com/apis/speech
Unfortunately for me, no Australian accent language models at time of writing. Only US and UK English. Boooo.
EDIT: Some months after I submitted the above, AT&T retired the service mentioned. It seems everyone just wants a 'dumbed down' API where you just call a recognizer, and it returns words. Sure, that is of course the holy grail, but a properly designed, constrained grammar will generally work better. As someone with speech skills, the minimalism of the common Speech APIs today is really frustrating...
i have used the code provided in this link for the speech recognition. in emulator it is saying recognizer not present,so i installed it on mobile. when i click on speak button it is working. but when i speak some names "rajesh" it is showing some possible verbs and all but not the name. but i want to use the input to select a contact from the address book in order to make a call . so please tell me how to carry on in this direction. one more thing, every time i need to develop the code in eclipse then install it on mobile and then check for output. is there any alternative to edit and check the app code in the mobile from eclipse.
please provide me any possible links. i want to develop a call app for blind,if the voice recognition does not work, what else could be done to take input from the user.
Names are hard for Speech recognition. There are more possible names in the world than words in any dictionary, so being able to recognise any arbitrary name is hard. Though common names are easier.
Anyway, if you want to recognise a customized list of words/names, You might want to look at Dragon Mobile from Nuance. Here is a copy-and-paste from another similar question I answered:
If you use 3rd party Android recognition from Nuance (The people behind DragonDictate), it supports a "grammar mode" where you can somewhat restrict the phrases that will be recognised during recognition.
Importantly, if you add unusual names into a Custom Vocabulary, they SHOULD become recognizable (Complex pronunciation issues aside).
You can find information if you dig through:
http://dragonmobile.nuancemobiledeveloper.com ,
looking for 'Custom Vocabularies'. Grammar mode is essentially a special mode of custom vocabularies.
At the time of writing, there was a document here that makes some mention of grammar mode:
http://dragonmobile.nuancemobiledeveloper.com/downloads/custom_vocabulary/Guide_to_Custom_Vocabularies_v1.5.pdf - It only really becomes clear when you try to progress in their provisioning web GUI.
You have to set up an account, and jump through other hoops, but there is a free tier. This is the only potential way I have found to constrain a recognition vocabulary.
Well, short of running up PocketSphinx, but that is still described as a 'Research' 'PreAlpha'.
No, I don't work for Nuance. Not sure anyone does. They may have all been eaten by zombies. You would guess as much reading their support forums. They never reply.
I'm using the Speech Recognizer Intent in Android. Is there a way to add your own customized words or phrases to Android's Speech recognition 'dictionary'
No. You can only use the two language models supported.
The built in speech recognition provided by google only supports the dictation and search language models. See http://developer.android.com/reference/android/speech/RecognizerIntent.html and LANGUAGE_MODEL_FREE_FORM or LANGUAGE_MODEL_WEB_SEARCH.
http://developer.android.com/resources/articles/speech-input.html says:
You can make sure your users have the
best experience possible by requesting
the appropriate language model:
free_form for dictation, or web_search
for shorter, search-like phrases. We
developed the "free form" model to
improve dictation accuracy for the
voice keyboard, while the "web search"
model is used when users want to
search by voice
Michael is correct, you cannot change the Language Model.
However, you can use "sounds like" algorithms to process the results from Android and match words it doesn't know.
See my answer here:
speech recognition reduce possible search results