I am new to speech recognition, android and i have a use case where i need to build an android app which takes commands(limited set of commands, less than 100) from users and executes some logic. I have googled a bit and found the following can be done
Use google cloud speech api
Use Android inbuilt speech to text capability (Is it different from google cloud speech api? If so how?). Also what are the pros and cons of using offline mode of android speech to text?
Use open source speech recognition libraries like Kaldi, CMU Sphinx(it looked like they need a lot of effort in collecting and training the data)
Can someone please suggest me which of the above might best suit my use case?
I have a limited set of commands and speed matters the most to me.
I am really confused and thus putting this question. Thanks in advance.
Use google cloud speech api
Very expensive since you have to pay for every request.
Use Android inbuilt speech to text capability (Is it different from google cloud speech api? If so how?). Also what are the pros and cons of using offline mode of android speech to text?
The inbuilt API is ok to use. It is different from cloud API and it is free. It does not work offline transparently for the user though. Bad side it is slow and you can not configure the vocabulary. So it will decode all words instead of some particular set of commands and often will confuse the required commands with other words in noise.
Use open source speech recognition libraries like Kaldi, CMU Sphinx(it looked like they need a lot of effort in collecting and training the data)
Proper development is always an effort.
Related
I am researching ways to implement form filling via voice command given by user inside my application.I have searched two options but no one is seems useful and I am bit confused here.
First I tried with android voice to speech library integration.It gives me text but isn't smart enough to converse with as google assistance do.
Then second I tried to integrate google assistance with api.a. It provides the user conversation but it is like adding command to google assistance .It doesn't provide me voice to text data so that I can fill form and do further operation.
Please suggest me ways to implement.
You can use the SDK provided by Slang Labs which allows you to add a custom voice experience inside your app. You can create a "buddy" in their console and configure the kind-of intents/utterances you want to handle. Then integrate its SDK into your app, which takes care of all the voice-related functionality and you can register callbacks for the intents you have configured in the console to handle the app-specific actions.
(disclaimer: I am a co-founder of Slang Labs :-))
You wouldn't use Actions on Google through Dialogflow for your implementation but rather the Google Assistant SDK which is meant for devices.
However, in your case it may make sense to use Dialogflow's Android client. You would not need to pull all of the Google Assistant's capabilities and the voice interaction would be limited to your own application.
So i'm looking into building a speech to text app for fun. I did some research and found an inbuilt Speech to Text API using RecognizerIntent that is free, but also found that google is now offerieng a cloud speech API that the charge for.
My question is, what is the difference between them, and If i use the inbuilt RecognizerIntent, is it free?
For the Google Cloud Speech API, refer to the following link:
https://cloud.google.com/speech/. Here are the highlights:
It supports 80 different languages.
It can recognize audio uploaded in the request.
It returns text results in real-time.
It is accurate in noisy environments.
It works with apps across any device and platform.
It is not free. Refer to the following link for pricing:
https://cloud.google.com/speech/pricing
For the Android Speech-to-Text API (Recognizer Intent), refer to the following link:
http://www.androidhive.info/2014/07/android-speech-to-text-tutorial/. Here are the highlights:
Need to pass local language to convert speech to text.
Not all devices support offline speech input.
You cannot pass an audio file to be recognized.
The intent returns an array of strings which match to out input. We can consider first one as the most accurate.
It only works with Android phones.
It is free.
I just read an article that google created a speech recognition that works offline on Android.
Is there a way to use that in my project that is not based on Android?
I'm sure that I'm not the first one who thinks about to use it in none Android projects, but I couldn't find anyone who did it and put it into the world wide web.
I have done a lot of research around the topic. What I want is simply a custom voice (not default voice on device) for my app. Wherever I have searched people suggest using device default.
Best example is Jarvis app on Play store.
I would like to create a uniform experience on any device with this approach. Can someone suggest any good libraries or a way to achieve this?
There is tacotron from google for this purpose.
But i am not sure if the android version is available yet.
Its under developement and probably google assistent is using it.
But they use mostly cloud version on python.
Google has recently made great progress with their speech recognition software, which is used in several open source products, e.g. Chromium Web Speech and Android Handsfree texting. I would like to use their speech recognition as part of my server stack, however I can't find much about it.
Is the text recognition software available as a library or package? Or alternatively, can I call chromium from another program to transcribe some audio file to text?
The Web Speech API's are designed only to be used in the context of either Chrome or Android. There is a lot of work that goes on in the client so there is no public server to server API that would just take an audio file and process it.
If you search github you find tools such as https://gist.github.com/alotaiba/1730160 but I am pretty certain that this method of access is 100% not supported, endorsed or confirmed to keep working.
The method previously stated at https://gist.github.com/alotaiba/1730160 does work for me. I use it on a daily basis in my home automation programs. I use a python script to capture audio and determine what is useful audio or just noise, then it sends the little audio snippet to google and returns the text all under a second!! I have successfully integrated it into my programs and if you google around you will find even more people that have as well!