Doing some research I have found some different speech to text API's for Android.
Pocket Sphinx
Android Native API
I have the following requirements:
Must be able to support offline speech recognition (I'm not sure
if the Android API can do this)
Must be able to detect and
respond immediately to every word said. I would rather this than
detecting an entire sentence. I could split the returned sentence
into and array though and get each word.
The detection needs to
be processing in the backgound (no popups or anything as the Android
API seems to do)
Can someone recommend an API that is capable of my requirements.
Pocketsphinx meets all your requirements. What you call the "Android Native API" is basically a set of interface definitions and it does not contain the notion of offline/online.
You can also implement these interfaces using Pocketsphinx, since it supports things like partial results, confidence scores, n-best results etc. This way the implementation becomes available to any Android app. Maybe somebody has done it already, but I'm not aware of it.
Related
I am trying to get continuous speech recognition working using pocketsphinx. I tried following their tutorial (https://cmusphinx.github.io/wiki/tutorialandroid/), although to me it is somewhat vague and I could not get it to work. I am now trying a different approach: to start with pocket-sphinx-android-demo (https://github.com/cmusphinx/pocketsphinx-android-demo) then reduce its limitations (to be able to recognize for longer and more words). I have figured out where the output of the recognition goes, although the demo is still only able to recognize weather words (I removed the digit and phones demos). I have discovered that the activation phrase can have an infinite vocabulary, but I can't figure out what is limiting the vocabulary of the actual recognition. Here is the github link to my project if it may be helpful: https://github.com/Michaelszeng/pocket-sphinx_App_mk2/commits/master
Does anybody know what is limiting the demo recognizer's vocabulary or how I could remove that limitation?
I would like to implement offline voice recognition in my app. But I want it for two purposes:
For a small set of commands (play, stop, previous, next and a couple of others);
For a list of a few hundred bird names.
To implement (1), it seems to me a bad idea (slower and resource consuming) to use the full voice recognition force of android. In my mind, it would be easier to tell my app to only interpret a few words. That is, to use my own dictionary, telling my app to "use only these 10 words".
To implement (2) is similar to (1), but with a few hundred instead of 10.
Does this makes sense, and if so is there an easy way to implement it? Is it worth it?
Thanks!
L.
You can implement your app using CMUSphinx on Android. CMUSphinx tutorial is here:
http://cmusphinx.sourceforge.net/wiki/tutorial
The language models to recognize limited set of words are described here
http://cmusphinx.sourceforge.net/wiki/tutoriallm
You can use keyword spotting mode to recognize few commands.
Pocketsphinx on Android is described here:
http://cmusphinx.sourceforge.net/wiki/tutorialandroid
The demonstration includes the way to switch recognition modes from 10 words to few hundred words as you intend.
I am building a speech recognition android app that will act as a virtual personal assistant with tasks such as:
Make appointments/Reminders
Weather Info
General queries to Wolfram|Alpha / Wikipedia - (i.e Who directed Ghostbusters, whats the £-$ Exchange rate)
My question is wheather to use Pocketsphinx or the Google API?
Originally I set this up with "android.speech.RecognitionListener", worked great, however I want to implement Keyword spotting so the user doesn't need to have any interaction other than just speaking.
Apparently Google API doesn't support this, so I looked into using pocketsphinx for this, and still using google for the rest of the app (As I heard pocketsphinx is not as accurate?)
However the two don't get along as they can't both occupy the microphone at the same time.
Is there a nice way to switch between recognizers? (cant even import both to same project)
Should I just go with pocketshinx and deal with the lower accuracy?
Suggestions would be helpful
Cheers
For anybody who wants to implement a similar project, I have found a work around. It's abit hacky and not entirely clean, but it works.
Using the android speech recognizer with a toggle on/off switch like in many examples across the web, when onResults comes back, the string will be checked for said "hotword", if it is not present, discard the string, if it is, process it. Once the query has been processed and the text to speech is responding, programatically reclick the toggle button, ensuring constant listening.
Do the same on "onError" as well.
I did also have it onPartialResults as well, but it appeared to make the thread crash, not entirely sure why but once it was removed everything seems to work nicely.
You can use pocketsphinx only to recognize predefined set of commands due to really poor accuracy (you should prepare your own dictionary and language model). Also pocketsphinx can be used offline and it is a big cons for some project.
In other hand google is very accurate but it's not free and works only online.
I have tried to code this with Android's included android.speech.SpeechRecognizer class with no success.
Basically, what I am trying to do is making my app constantly listen for one keyword that will fire an intent whenever the keyword is recognized. I know that this will use a lot of battery.
For example - you are talking with a person. Normal conversation. The phone is actively listening and recognizing every single said word and listening for the keyword.
Let's say the keyword is "cheese" in this instance.
Whenever you say "cheese," the application fires an intent that starts up another part of the app.
I have tried to use speech recognition as a service but things didn't really go as planned. Maybe I did a mistake, I don't know.
I've been trying to accomplish this for 2 days in a row now, for more than 24 hours work time combined. If I am being too broad or infringing any of SO's rules, I sincerely apologize and ask my question to be deleted.
My question is - how would this be possible? Of course the SpeechRecognition that is included with android itself would be preferable, but it definitely will be a hassle because it is not even designed to work for extended periods.
from my research, there is no way to do this using the standard google voice recognition server. They way it works is once sound/word is recognized, the recognizer returns a list of what it thinks it heard with an associated confidence score.
to do what you are asking, you would:
have to keep re-activating the recognition service every time it fired a recognition event, until it matches the word you want.
your app would have to 'keep-awake' the recognition service. you could do this by creating a service that periodically wakes up your handset and resuming the service/activity.
I would not recommend either of these options considering that the battery life is really reduces by the voice recognition service being constantly on.
Unfortunately, I do not think there are any native Android APIs that will fully suit your needs. I would recommend checking out pocketsphinx.
It is a pretty robust speaker-independent speech recognition API from CMU that is more intended for tasks such as this. You can also check out a tutorial for getting started here.
Google has not made API support for "OK GOOGLE" public and left it on vendors to change or pass the support to consumers.
I think best bet at this time would be build source code yourself and then call the API's. As an example below google library has low level details of implementing recognizer. I'm not sure why google does not made it public.
I don't see an easy way to implement and test it.
http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/4.3_r2.1/android/speech/srec/Recognizer.java
i have used the code provided in this link for the speech recognition. in emulator it is saying recognizer not present,so i installed it on mobile. when i click on speak button it is working. but when i speak some names "rajesh" it is showing some possible verbs and all but not the name. but i want to use the input to select a contact from the address book in order to make a call . so please tell me how to carry on in this direction. one more thing, every time i need to develop the code in eclipse then install it on mobile and then check for output. is there any alternative to edit and check the app code in the mobile from eclipse.
please provide me any possible links. i want to develop a call app for blind,if the voice recognition does not work, what else could be done to take input from the user.
Names are hard for Speech recognition. There are more possible names in the world than words in any dictionary, so being able to recognise any arbitrary name is hard. Though common names are easier.
Anyway, if you want to recognise a customized list of words/names, You might want to look at Dragon Mobile from Nuance. Here is a copy-and-paste from another similar question I answered:
If you use 3rd party Android recognition from Nuance (The people behind DragonDictate), it supports a "grammar mode" where you can somewhat restrict the phrases that will be recognised during recognition.
Importantly, if you add unusual names into a Custom Vocabulary, they SHOULD become recognizable (Complex pronunciation issues aside).
You can find information if you dig through:
http://dragonmobile.nuancemobiledeveloper.com ,
looking for 'Custom Vocabularies'. Grammar mode is essentially a special mode of custom vocabularies.
At the time of writing, there was a document here that makes some mention of grammar mode:
http://dragonmobile.nuancemobiledeveloper.com/downloads/custom_vocabulary/Guide_to_Custom_Vocabularies_v1.5.pdf - It only really becomes clear when you try to progress in their provisioning web GUI.
You have to set up an account, and jump through other hoops, but there is a free tier. This is the only potential way I have found to constrain a recognition vocabulary.
Well, short of running up PocketSphinx, but that is still described as a 'Research' 'PreAlpha'.
No, I don't work for Nuance. Not sure anyone does. They may have all been eaten by zombies. You would guess as much reading their support forums. They never reply.