im now writing STT in android studio and i have a question for some code lines.
intent=new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,getPackageName());
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE,"en-US");
The first line is set intent fot getting input of user's speech and the last one is for setting language that we going to use. but what about a second line?
Even though i read public documentation, cannot understand.
'The extra key used in an intent to the speech recognizer for voice search'
i understand that like this: after getting input of speech from the first line, use the input in the intent - and what kind of intent? - to the speech recognizer for voice search.
but still not sure..
can you give me an explanation?4
Thank you in advance
it's a flag that is used by voice search API to identify the called to this API (your application) so the voice search implements the callbacks and ... based on your package name...
Related
In android is there a Api i can use to detect when the user speaks into the mic ?
So im expecting there is a voice recognition build into android or some speech to text api i can use to detect someone speaking, any ideas ? Can ACTION_RECOGNIZE_SPEECH help me?
The way it works is you create an Intent with ACTION_RECOGNIZE_SPEECH and call startActivityForResult(). Then in your onActivityResult() override, you can pull the speech-to-text data from the result Intent extras.
Here's a neat tutorial to get you started:
Make your next Android app a good listener -- TechRepublic
I am looking at doing speech recognition in android. The program needs to have continuous speech recognition. The library only needs to be about 10 words. I have considered using Googles api, but I don't think it will work. (I cannot have anything covering the screen). I have been looking into other ways but nothing seems like it will work. Is it possible to use java's speech recognition library, or is there any other way of going about this?
In summary
1. Need continuous speech input
2. 10 words at max
3. can train if necessary
4. overview of program - display screen, wait for voice input or touch input, update screen repeat
5. cannot cover what is being displayed on the screen
Any help would be appreciated.
Thanks in advance
I think you would have to capture audio directly from the phone's microphone and stream it to your own recognition service. The Google recognition APIs are built as an Intent that launches their own Recognition dialog and gives you back results. If you want continuous recognition without a UI, you'll have to build that functionality yourself.
CMUSphinx has recently implemented continuous listening on Android platform. You can find the demo on the wiki page
You can configure one or multiple keywords to listen to, the default keyword is "oh mighty computer". You also can configure the detection threshold. Currently supported languages are US English and few others (French, Spanish, Russian, etc). You can train your own model for your language.
Listening is simple, you create a recognizer and just add keyword spotting search:
recognizer = defaultSetup()
.setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
.setDictionary(new File(modelsDir, "lm/cmu07a.dic"))
.setKeywordThreshold(1e-5f)
.getRecognizer();
recognizer.addListener(this);
recognizer.addKeywordSearch(KWS_SEARCH_NAME, KEYPHRASE);
switchSearch(KWS_SEARCH_NAME);
and define a listener:
#Override
public void onPartialResult(Hypothesis hypothesis) {
String text = hypothesis.getHypstr();
if (text.equals(KEYPHRASE))
// do something
}
Instead of single key phrase you can specify a commands file path on a filesystem:
recognizer.addKeywordSearch(KWS_SEARCH, new File(assetsDir,
"commands.lst").toString());
Which commands file commands.lst containing commands one per line:
oh might computer
ok google
hello dude
To put this file on filesystem you can put it in assets and run syncAssets on application start.
Here is another way (if you are planning to use Phonegap/Cordova).
https://stackoverflow.com/a/39695412/3603128
1) It listens continuously.
2) Does not display (occupy) on screen.
Use CMUSphinx library:
It will work in offline mode
You can name it up
It will start listens when you call his name
I am studying how the android.speech package works and I noticed that most of the extras used with the intent RecognizerIntent.ACTION_WEB_SEARCH are ignored by the speech recognizer.
If I set a language using the RecognizerIntent.EXTRA_LANGUAGE extra, the specified language is ignored, but the default language of the device is always used.
If I set a text using the RecognizerIntent.EXTRA_PROMPT, this text is not displayed.
If I start the speech recognition activity using startActivityForResult method, then the speech recognizer calls onActivityResult, but the second argument (the resultCode) is always RESULT_CANCELED and the third argument (the data Intent) is always null. This behavior is probably due to the fact that the purpose of this type of intent is to perform a search on the web. For the same reason, if I set the maximum number of results using RecognizerIntent.EXTRA_MAX_RESULTS, the specified value is ignored.
I found this behavior, but the official documentation says that these options can also be used for the ACTION_WEB_SEARCH intent.
Why does the actual behavior of the voice recognition system differ from what is stated in official documentation?
I think you are using the wrong action. Instead of ACTION_WEB_SEARCH, use ACTION_RECOGNIZE_SPEECH.
If you do, onActivityResult will behave as you expect and your Activity will be in control of interpreting the recognition results.
By the way, when you set ACTION_WEB_SEARCH, you delegate handling of the results to Android. Based on what the user says, Android might start a web browser or it might start an email (if the user says "email"). Because of this it makes sense that your Activity does not receive any useful information, although I think it should still take into account RecognizerIntent.EXTRA_LANGUAGE.
So I've searched far and wide for some sort of solution to the issue regarding removing Google's Voice Recognition UI dialog when a user wants to perform a Voice Command but have been unable to find any solution. I am trying to implement an app which displays a menu to the user and the user can either click on the options or say the options out loud which will open the new pages. So far ive been unable to implement this unless I use Googles RecognizerIntent but I dont want the dialog box to pop up. Anyone have any ideas? Or has anyone solved this issue or found a workaround? Thanks
EDIT: As a compromise maybe there is a way to move the dialog to the bottom of the screen while still being able to view my menu?
Does How can I use speech recognition without the annoying dialog in android phones help?
I'm pretty sure the Nuance/Dragon charges for production or commercial applications that use their services. If this is just a demo, you may be fine with the developer account. Android speech services are free for all Android applications.
You know that you can do this with google's APIs.
You've probably been looking at the documentation for the speech recognition intent. Look instead at the RecognitionListener interface to the speech recognition APIs.
Here's some code to help you
public class SpeechRecognizerExample extends Activity implements RecognitionListener{
//This would go down in your onCreate
SpeechRecognizer recognizer = SpeechRecognizer.createSpeechRecognizer(this);
recognizer.setRecognitionListener(this);
//Then you'd need to start it when the user clicks or selects a text field or something
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
//intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "zh");
intent.putExtra("calling_package",
"yourcallingpackage");
recognizer.startListening(intent);
//Then you'd need to implement the RecognitionListener functions - basically works just like a click listener
Here's the docs for a RecognitionListener:
http://developer.android.com/reference/android/speech/RecognitionListener.html
I've used the voice recognition feature on Android and I love it. It's one of my customers' most praised features. However, the format is somewhat restrictive. You have to call the recognizer intent, have it send the recording for transcription to google, and wait for the text back.
Some of my ideas would require recording the audio within my app and then sending the clip to google for transcription.
Is there any way I can send an audio clip to be processed with speech to text?
I got a solution that is working well to have speech recognizing and audio recording. Here is the link to a simple Android project I created to show the solution's working. Also, I put some print screens inside the project to illustrate the app.
I'm gonna try to explain briefly the approach I used. I combined two features in that project: Google Speech API and Flac recording.
Google Speech API is called through HTTP connections. Mike Pultz gives more details about the API:
"(...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio."
However, this API needs to receive a FLAC sound file to work properly. That makes us to go to the second part: Flac recording
I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. AudioBoo uses native code to record and play flac format.
Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded.
The project I created has the basic principles to make it work and can be improved for specific situations. In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. I left one key in that project just to show it's working, but I'll remove it eventually. If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post.
Unfortunately not at this time. The only interface currently supported by Android's voice recognition service is the RecognizerIntent, which doesn't allow you to provide your own sound data.
If this is something you'd like to see, file a feature request at http://b.android.com. This is also tangentially related to existing issue 4541.
As far as I know there is still no way to directly send an audio clip to Google for transcription. However, Froyo (API level 8) introduced the SpeechRecognizer class, which provides direct access to the speech recognition service. So, for example, you can start playback of an audio clip and have your Activity start the speech recognizer listening in the background, which will return results after completion to a user-defined listener callback method.
The following sample code should be defined within an Activity since SpeechRecognizer's methods must be run in the main application thread. Also you will need to add the RECORD_AUDIO permission to your AndroidManifest.xml.
boolean available = SpeechRecognizer.isRecognitionAvailable(this);
if (available) {
SpeechRecognizer sr = SpeechRecognizer.createSpeechRecognizer(this);
sr.setRecognitionListener(new RecognitionListener() {
#Override
public void onResults(Bundle results) {
// process results here
}
// define your other overloaded listener methods here
});
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
// the following appears to be a requirement, but can be a "dummy" value
intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, "com.dummy");
// define any other intent extras you want
// start playback of audio clip here
// this will start the speech recognizer service in the background
// without starting a separate activity
sr.startListening(intent);
}
You can also define your own speech recognition service by extending RecognitionService, but that is beyond the scope of this answer :)