I am trying to get continuous voice input to work in my Android application. I tried using the built-in SpeechRecognizer Intent but it waits for the user to finish speaking before processing the words. This is not sufficient for me. I need the device to process the words while the user is still speaking.
I read that this is supported in Ice Cream Sandwich now. However, I did not find any API that allows me to access this feature. Does anyone know how this works now?
Thanks for your help!
I guess you heard about the new voice typing feature of Android 4.0. Take a look at this article.
You have to use an external library for it. Though the article says the library is designed for IME developers, and as I see the result of voice recognition will appear in a registered IME through InputMethodService. You can also check the source of the library, because it is a project on Google Code
Related
I am new to speech recognition, android and i have a use case where i need to build an android app which takes commands(limited set of commands, less than 100) from users and executes some logic. I have googled a bit and found the following can be done
Use google cloud speech api
Use Android inbuilt speech to text capability (Is it different from google cloud speech api? If so how?). Also what are the pros and cons of using offline mode of android speech to text?
Use open source speech recognition libraries like Kaldi, CMU Sphinx(it looked like they need a lot of effort in collecting and training the data)
Can someone please suggest me which of the above might best suit my use case?
I have a limited set of commands and speed matters the most to me.
I am really confused and thus putting this question. Thanks in advance.
Use google cloud speech api
Very expensive since you have to pay for every request.
Use Android inbuilt speech to text capability (Is it different from google cloud speech api? If so how?). Also what are the pros and cons of using offline mode of android speech to text?
The inbuilt API is ok to use. It is different from cloud API and it is free. It does not work offline transparently for the user though. Bad side it is slow and you can not configure the vocabulary. So it will decode all words instead of some particular set of commands and often will confuse the required commands with other words in noise.
Use open source speech recognition libraries like Kaldi, CMU Sphinx(it looked like they need a lot of effort in collecting and training the data)
Proper development is always an effort.
Google has recently made great progress with their speech recognition software, which is used in several open source products, e.g. Chromium Web Speech and Android Handsfree texting. I would like to use their speech recognition as part of my server stack, however I can't find much about it.
Is the text recognition software available as a library or package? Or alternatively, can I call chromium from another program to transcribe some audio file to text?
The Web Speech API's are designed only to be used in the context of either Chrome or Android. There is a lot of work that goes on in the client so there is no public server to server API that would just take an audio file and process it.
If you search github you find tools such as https://gist.github.com/alotaiba/1730160 but I am pretty certain that this method of access is 100% not supported, endorsed or confirmed to keep working.
The method previously stated at https://gist.github.com/alotaiba/1730160 does work for me. I use it on a daily basis in my home automation programs. I use a python script to capture audio and determine what is useful audio or just noise, then it sends the little audio snippet to google and returns the text all under a second!! I have successfully integrated it into my programs and if you google around you will find even more people that have as well!
Is there any way I can make an app using jelly bean's offline speech recognition api & choose the sensitivity?
Actually I want to build an app, which tests user's English speaking & score.
If anyone can provide link to an app that currently does the same thing, that will be helpful too, because on that case I wouldn't need build an app to help myself
If you want an offline speech recognition on android, use Pocketsphinx:
http://cmusphinx.sourceforge.net/2011/05/building-pocketsphinx-on-android/
Android has a really marvelous voice recognition feature built into google translate. As far as I can tell, this is the only app that offers you the ability to speak in a foreign language, and have the app transcribe what you said (and subsequently translate it to another language).
I'm curious if anyone knows how one might leverage the voice recognition lib and utilize it for things other than translation. Specifically, I want to be able to dictate text for email. I googled around a bit, but was unable to find anything. Curious if this functionality is exposed to the wider developer community (like most everything else under the Google roof).
TIA
I think this is as close an answer as I'm going to find.
http://developer.android.com/reference/android/speech/RecognizerIntent.html#ACTION_GET_LANGUAGE_DETAILS
I want see the source code for the voice enabled-keyboard feature for android.
Can someone tell me where to find the code?
I assume you're referring to the speech recognition feature demonstrated on the Nexus One with Android 2.1.
If this application is open sourced as part of Android, it will be posted on the Android Open Source Project website at https://android.googlesource.com.
However, Android 2.1 has not yet been posted; it should hopefully be available soon.
In the meantime, you could take a look at the source to the voice dialler application.
As far as I know this code is not currently planned to be open sourced -- it is owned by Google as part of their voice recognition server technology. The IME is a fork that Google made of the standard platform input method, adding voice search to it, much like other manufacturers make their own proprietary customizations.