I've used the voice recognition feature on Android and I love it. It's one of my customers' most praised features. However, the format is somewhat restrictive. You have to call the recognizer intent, have it send the recording for transcription to google, and wait for the text back.
Some of my ideas would require recording the audio within my app and then sending the clip to google for transcription.
Is there any way I can send an audio clip to be processed with speech to text?
I got a solution that is working well to have speech recognizing and audio recording. Here is the link to a simple Android project I created to show the solution's working. Also, I put some print screens inside the project to illustrate the app.
I'm gonna try to explain briefly the approach I used. I combined two features in that project: Google Speech API and Flac recording.
Google Speech API is called through HTTP connections. Mike Pultz gives more details about the API:
"(...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio."
However, this API needs to receive a FLAC sound file to work properly. That makes us to go to the second part: Flac recording
I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. AudioBoo uses native code to record and play flac format.
Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded.
The project I created has the basic principles to make it work and can be improved for specific situations. In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. I left one key in that project just to show it's working, but I'll remove it eventually. If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post.
Unfortunately not at this time. The only interface currently supported by Android's voice recognition service is the RecognizerIntent, which doesn't allow you to provide your own sound data.
If this is something you'd like to see, file a feature request at http://b.android.com. This is also tangentially related to existing issue 4541.
As far as I know there is still no way to directly send an audio clip to Google for transcription. However, Froyo (API level 8) introduced the SpeechRecognizer class, which provides direct access to the speech recognition service. So, for example, you can start playback of an audio clip and have your Activity start the speech recognizer listening in the background, which will return results after completion to a user-defined listener callback method.
The following sample code should be defined within an Activity since SpeechRecognizer's methods must be run in the main application thread. Also you will need to add the RECORD_AUDIO permission to your AndroidManifest.xml.
boolean available = SpeechRecognizer.isRecognitionAvailable(this);
if (available) {
SpeechRecognizer sr = SpeechRecognizer.createSpeechRecognizer(this);
sr.setRecognitionListener(new RecognitionListener() {
#Override
public void onResults(Bundle results) {
// process results here
}
// define your other overloaded listener methods here
});
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
// the following appears to be a requirement, but can be a "dummy" value
intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, "com.dummy");
// define any other intent extras you want
// start playback of audio clip here
// this will start the speech recognizer service in the background
// without starting a separate activity
sr.startListening(intent);
}
You can also define your own speech recognition service by extending RecognitionService, but that is beyond the scope of this answer :)
Related
I am using MediaStore.Audio.Media.RECORD_SOUND_ACTION to open sound recorder application, I am not able to open application as no default application present, then i install two voice recorder application even though not able to see these application in my chooser intent. I am using following code-
Intent soundRecorderIntent = new Intent(); // create intent
soundRecorderIntent.setAction(MediaStore.Audio.Media.RECORD_SOUND_ACTION); // set action
startActivityForResult(soundRecorderIntent, ACTIVITY_RECORD_SOUND); // start activity
It works well in marshmallow
The code is correct and you've probably found an app that supports the intent by now. But to prevent that others waste their time like I did, here's a quick note:
Most of the currently top rated voice recording apps in the Play Store do not provide the necessary intent filter.
That seemed so unrealistic to me that I doubted my code when it didn't work after having installed the five most popular voice recording apps. But there's a handy manifest viewer app that reveals that their manifests simply do not declare the intent filter. So, as said, the code is correct.
To save you the time from searching for a suitable app, here are two that are invocable:
Audio Recorder from Sony
Samsung Voice Recorder from Samsung
There is no requirement for a voice recorder app to support this Intent action, and apparently your device does not ship with an app that supports this Intent action either. Your code itself seems fine.
If recording is optional to your app, catch the ActivityNotFoundException and tell the user that they do not have a recorder, and perhaps suggest to them that they install one that you test that works, if you can find one.
Otherwise, record the audio yourself, using MediaRecorder.
I am doing speech recognition using a third party cloud service on Android, and it works well with Android API SpeechRecognizer. Code below:
Intent recognizerIntent =
new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_WEB_SEARCH);
// accept partial results if they come
recognizerIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
//need to have a calling package for it to work
if (!recognizerIntent.hasExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE)) {
recognizerIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, "com.example.speechrecognition");
}
recognizer = SpeechRecognizer.createSpeechRecognizer(context);
recognizer.setRecognitionListener(this);
recognizer.startListening(recognizerIntent);
At the same time, I want to record the audio with different audio setting, such as frequency, channels, audio-format, etc. Then I will continuously analyze this audio buffer. I use the AudioRecord for the purpose. This works well only I turn off the speech recognition.
If I record audio and speech recognition at the same time, then error happens.
E/AudioRecord: start() status -38
How to implement this kind of feature, I also tried native audio - SLRecordItf, also doesn't work.
As the comments state, only one mic access is permitted/possible at a time.
For the SpeechRecognizer the attached RecognitionListener has a callback of onBufferReceived(byte[] buffer) but unfortunately, Google's native recognition service does not supply any audio data to this, it's very frustrating.
Your only alternative is to use an external service, which won't be free. Google's new Cloud Speech API has an Android example.
I can't seem to find anything related to finding out what application got audio focus. I can correctly determine from my application what type of focus change it was, but not from any other application. Is there any way to determine what application received focus?
"What am I wanting to do?"
I have managed to record internal sound whether it be music or voice. If I am currently recording audio no matter the source, I want to determine what application took the focus over to determine what my application need's to do next.
Currently I am using the AudioManager.OnAudioFocusChangeListener for my application to stop recording internal sounds once the focus changes, but I want the application's name that gained the focus.
Short Answer: There's no good solution... and Android probably intended it this way.
Explanation:
Looking at the source code, AudioManager has no API's(even hidden APIs) for checking who has Audio Focus. AudioManager wraps calls to AudioService which holds onto the real audio state. The API that AudioService exposes through it's Stub when AudioManager binds to it also does not have an API for querying current Audio Focus. Thus, even through reflection / system level permissions you won't be able get the information you want.
If you're curious how the focus changes are kept track of, you can look at MediaFocusControl whose instance is a member variable of AudioService here.
Untested Hacky Heuristic:
You might be able to get some useful information by looking at UsageStats timestamps. Then once you have apps that were used within say ~500ms of you losing AudioFocus you can cross-check them against apps with Audio Permissions. You can follow this post to get permissions for any installed app.
This is clearly a heuristic and could require some tuning. It also requires the user to grant your app permissions to get access to the usage stats. Mileage may vary.
Looking at the MediaContorller class (new in lollipop, available in comparability library for older versions).
There are these two methods that look interesting:
https://developer.android.com/reference/android/media/session/MediaController.html#getPackageName()
https://developer.android.com/reference/android/media/session/MediaController.html#getSessionActivity()
getPackageName supposedly returns the current sessions package name:
http://androidxref.com/5.1.1_r6/xref/frameworks/base/media/java/android/media/session/MediaController.java#397
getSessionActivity gives you a PendingIntent with an activity to start (if one is supplied), where you could get the package as well.
Used together with your audio listener and a broadcast receiver for phone state to detect if the phone is currently ringing you might be able to use this in order to get a more fine grained detection than you currently have. As Trevor Carothers pointed out above, there is no way to get the general app with audio focus.
You can use dumpsys audio to find who are using audio focus. And, you can also look into the results of dumpsys media_session.
And, if you want to find who're playing music, you can choose dumpsys media.audio_flinger. For myself, I switch to this command.
I am looking at doing speech recognition in android. The program needs to have continuous speech recognition. The library only needs to be about 10 words. I have considered using Googles api, but I don't think it will work. (I cannot have anything covering the screen). I have been looking into other ways but nothing seems like it will work. Is it possible to use java's speech recognition library, or is there any other way of going about this?
In summary
1. Need continuous speech input
2. 10 words at max
3. can train if necessary
4. overview of program - display screen, wait for voice input or touch input, update screen repeat
5. cannot cover what is being displayed on the screen
Any help would be appreciated.
Thanks in advance
I think you would have to capture audio directly from the phone's microphone and stream it to your own recognition service. The Google recognition APIs are built as an Intent that launches their own Recognition dialog and gives you back results. If you want continuous recognition without a UI, you'll have to build that functionality yourself.
CMUSphinx has recently implemented continuous listening on Android platform. You can find the demo on the wiki page
You can configure one or multiple keywords to listen to, the default keyword is "oh mighty computer". You also can configure the detection threshold. Currently supported languages are US English and few others (French, Spanish, Russian, etc). You can train your own model for your language.
Listening is simple, you create a recognizer and just add keyword spotting search:
recognizer = defaultSetup()
.setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
.setDictionary(new File(modelsDir, "lm/cmu07a.dic"))
.setKeywordThreshold(1e-5f)
.getRecognizer();
recognizer.addListener(this);
recognizer.addKeywordSearch(KWS_SEARCH_NAME, KEYPHRASE);
switchSearch(KWS_SEARCH_NAME);
and define a listener:
#Override
public void onPartialResult(Hypothesis hypothesis) {
String text = hypothesis.getHypstr();
if (text.equals(KEYPHRASE))
// do something
}
Instead of single key phrase you can specify a commands file path on a filesystem:
recognizer.addKeywordSearch(KWS_SEARCH, new File(assetsDir,
"commands.lst").toString());
Which commands file commands.lst containing commands one per line:
oh might computer
ok google
hello dude
To put this file on filesystem you can put it in assets and run syncAssets on application start.
Here is another way (if you are planning to use Phonegap/Cordova).
https://stackoverflow.com/a/39695412/3603128
1) It listens continuously.
2) Does not display (occupy) on screen.
Use CMUSphinx library:
It will work in offline mode
You can name it up
It will start listens when you call his name
I have been developing an application that uses the recognizerIntent to get voice input. However, since jelly bean was launched, I have not been able to get the actual sound file from my voice input.
In the recognitionListener (http://developer.android.com/reference/android/speech/RecognitionListener.html) there is a method called onBufferReceived. However, there are no promises that this method will be called, and when I implemented it, it never got called. Is there any way to force this method to execute or what is the "best-practice"-approach to gather the sound file that the recognizerIntent analyzes?
It should be possible since both google now can do it with the voice-command "note-to-self", and Google Keep:s voice-notes does the same.
Thanks
I don't think there is a way to force it. It clearly depends on the recognition service implementation. If Google decides to not call onBufferReceived then there is no way to get the actual data that is used. Note that the mentioned Google apps don't use the (public) Intent / Service API to access the speech recognition but seem to use a private API within the apps (the speech recognition might be bundled within their apps).