Implement barge-in for Android TTS

Implement barge-in for Android TTS - android

I am having difficulty figuring out how to resolve this issue, I am not sure if I am not setting up threads correctly or if it is even possible to resolve things properly.
This is an Android app that reads certain strings out as TTS (using the native Android TTS) at certain timings. During this TTS reading, the user should be able to barge-in with instructions such as "Stop" or "Pause." This recognition is done by using the iSpeech API.
Our current solution is to have the TTS running as a Thread that will output the proper strings. Once the user presses a button to begin the voice recognition (using an Intent), the app does voice recognition and handles it perfectly, but then TTS never again outputs anything. Logcat shows the following error:
11-28 02:18:57.072: W/TextToSpeech(16383): speak failed: not bound to TTS engine
I have thought about making the voice recognition a thread of its own that pauses the TTS, but the problem would then be that the timer controlling the TTS would become unsynced with what it should be.
Any advice or help would be appreciated.
Relevant code regarding the thread and the intent are below:
Thread
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
//Prevent device from sleeping mid build.
getWindow().addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
setContentView(R.layout.activity_build_order);
mPlayer = MediaPlayer.create(BuildOrderActivity.this, R.raw.bing);
params.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID,"stringId");
tts = new TextToSpeech(BuildOrderActivity.this, new TextToSpeech.OnInitListener() {
#SuppressWarnings("deprecation")
public void onInit(int status) {
if(status != TextToSpeech.ERROR)
{
tts.setLanguage(Locale.US);
tts.setOnUtteranceCompletedListener(new OnUtteranceCompletedListener() {
public void onUtteranceCompleted(String utteranceId) {
mPlayer.start();
}
});
}
}
});
buttonStart = (Button) findViewById(R.id.buttonStartBuild);
buttonStart.setOnClickListener(new View.OnClickListener() {
public void onClick(View v) {
startBuild = new StartBuildRunnable();
Thread t = new Thread(startBuild);
t.start();
}
});
...//code continues oncreate setup for the view}
public class StartBuildRunnable implements Runnable {
public void run() {
double delay;
buildActions = parseBuildXMLAction();
buildTimes = parseBuildXMLTime();
say("Build has started");
delayForNextAction((getSeconds(buildTimes.get(0)) * 1000));
say(buildActions.get(0));
for (int i = 1; i < buildActions.size(); i++)
{
delay = calcDelayUntilNextAction(buildTimes.get(i - 1), buildTimes.get(i));
delayForNextAction((long) (delay * 1000));
say(buildActions.get(i));
//listViewBuildItems.setSelection(i);
}
say("Build has completed");
}
}
Intent
/**
* Fire an intent to start the speech recognition activity.
* #throws InvalidApiKeyException
*/
private void startRecognition() {
setupFreeFormDictation();
try {
recognizer.startRecord(new SpeechRecognizerEvent() {
#Override
public void onRecordingComplete() {
updateInfoMessage("Recording completed.");
}
#Override
public void onRecognitionComplete(SpeechResult result) {
Log.v(TAG, "Recognition complete");
//TODO: Once something is recognized, tie it to an action and continue recognizing.
// currently recognizes something in the grammar and then stops listening until
// the next button press.
if (result != null) {
Log.d(TAG, "Text Result:" + result.getText());
Log.d(TAG, "Text Conf:" + result.getConfidence());
updateInfoMessage("Result: " + result.getText() + "\n\nconfidence: " + result.getConfidence());
} else
Log.d(TAG, "Result is null...");
}
#Override
public void onRecordingCancelled() {
updateInfoMessage("Recording cancelled.");
}
#Override
public void onError(Exception exception) {
updateInfoMessage("ERROR: " + exception.getMessage());
exception.printStackTrace();
}
});
} catch (BusyException e) {
e.printStackTrace();
} catch (NoNetworkException e) {
e.printStackTrace();
}
}

Related

Android TextToSpeech behaves irregular

Update: After some digging I managed to find some information in the Logcat. See bottom.
Edit 2:
I have now created a new activity from scratch to reduce the problem. It does still not work correctly. Here is the code:
public class MainActivity extends AppCompatActivity {
private TextToSpeech textToSpeech;
private boolean isInitialized = false;
private MainActivity mainActivity;
int ctr = 0;
private String words[] = {"ord", "kula", "fotboll"};
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
Toolbar toolbar = findViewById(R.id.toolbar);
setSupportActionBar(toolbar);
mainActivity = this;
textToSpeech = new TextToSpeech(this, new TextToSpeech.OnInitListener() {
#Override
public void onInit(int status) {
if (status == TextToSpeech.SUCCESS){
textToSpeech.setOnUtteranceProgressListener(new UtteranceProgressListener() {
#Override
public void onStart(String utteranceId) {
System.out.println("---onStart");
}
#Override
public void onDone(String utteranceId) {
System.out.println("-----onDone");
}
#Override
public void onError(String utteranceId) {
System.out.println("-----onError");
}
#Override
public void onError(String utteranceId, int errorCode){
onError(utteranceId);
System.out.println("Error with code: " + errorCode);
}
});
isInitialized = true;
Locale locale = new Locale("swe");
textToSpeech.setLanguage(locale);
}
}
});
FloatingActionButton fab = findViewById(R.id.fab);
fab.setOnClickListener(new View.OnClickListener() {
#Override
public void onClick(View view) {
if (isInitialized){
System.out.println(textToSpeech.getLanguage().getDisplayLanguage());
textToSpeech.speak(words[ctr], TextToSpeech.QUEUE_FLUSH, null, "SpeakTest");
ctr++;
ctr %= words.length;
} else {
Snackbar.make(view, "Speaker not ready", Snackbar.LENGTH_LONG)
.setAction("Action", null).show();
}
}
});
}
}
What is extremely surprising is that only the words "ord" and "fotboll" are spoken, but not "kula". If I change words to {"kula", "kula", "kula"} and try long enough, it suddenly starts to work. As I have understood the documentation one should use the tags here. I have tried se, swe, sv, all with same result. Further, the command System.out.println(textToSpeech.getLanguage().getDisplayLanguage()); gives svenska which is correct.
If I change to en it works all the time. Also I get System.out.println(textToSpeech.getLanguage().getDisplayLanguage());
= engelska again correct.
What on earth is going on?
EDIT:
I have added a UtteranceProgressListener, according to this, the method
onError(String id) is deprecated and should be replaced by onError(String id, int errorCode). However, I have extended my class with UtteranceProgressListener and it forces me to implement the old onError method. This is always called, so something is wrong but I do not know what.
This is because the other onError(String id, int code) is never called.
I have updated the code.
I have a function that is supposed to speak a word in a specific language when the function is called.
Until for a couple of days ago it worked fine in my Sony Compact XZ2, but now it is irregular. Sometimes the word is spoken and sometimes not. The command textToSpeech.getEngines() returns com.google.android.tts
For example for Swedish, in setLanguage, I have tried "sv" and "sv-SV" when I create the Locale object. That has not helped.
I just noticed that when I press the button that calls playWord(text) a bunch of times ( > 40) it works and sometimes it works directly. Seems to be some strange delay.
The function speakText is called from this function in my Fragment:
private void playWord(){
if (text2Speech.isReady()) {
text2Speech.checkSpeaking();
text2Speech.setLanguage(getAcronym(mTraining.getCurrentSrc()));
text2Speech.speakText(front);
} else {
Toast.makeText(getContext(),"Speaker not ready yet", Toast.LENGTH_SHORT).show();
}
}
This is the class that handles the speaking. I have not obtained any error messages. It just seems random when the speaker works.
public class Text2Speech extends UtteranceProgressListener {
private Context mContext;
private TextToSpeech textToSpeech;
private boolean isReady = false;
public Text2Speech(Context context, final String src){
mContext = context;
System.out.println("text2Speech created");
textToSpeech = new TextToSpeech(mContext, new TextToSpeech.OnInitListener() {
#Override
public void onInit(int status) {
if (status == TextToSpeech.SUCCESS) {
isReady = true;
Locale locale = new Locale(src);
int ttsLang = textToSpeech.setLanguage(locale);
if (ttsLang == TextToSpeech.LANG_MISSING_DATA
|| ttsLang == TextToSpeech.LANG_NOT_SUPPORTED) {
Log.e("TTS", "The Language is not supported!");
} else {
Log.i("TTS", "Language Supported.");
}
Log.i("TTS", "Initialization success.");
} else {
Toast.makeText(mContext, "TTS Initialization failed!", Toast.LENGTH_SHORT).show();
}
}
});
}
public boolean isReady(){
return isReady;
}
public void checkSpeaking(){
if (textToSpeech.isSpeaking()){
textToSpeech.stop();
}
}
public void showMessage(String msg){
Toast.makeText(mContext, msg, Toast.LENGTH_SHORT).show();
}
public void speakText(String text){
int speechStatus = textToSpeech.speak(text, TextToSpeech.QUEUE_FLUSH, null);
switch (speechStatus){
case TextToSpeech.ERROR_INVALID_REQUEST:
showMessage("Invalid Request");
break;
case TextToSpeech.ERROR_NETWORK:
showMessage("Network Error");
break;
case TextToSpeech.ERROR_NETWORK_TIMEOUT:
showMessage("Network Timeout");
break;
case TextToSpeech.ERROR_NOT_INSTALLED_YET:
showMessage("Error Not Yet Downloaded");
break;
case TextToSpeech.ERROR_OUTPUT:
showMessage("Output Error");
break;
case TextToSpeech.ERROR_SERVICE:
showMessage("Error of TTS service");
break;
case TextToSpeech.ERROR_SYNTHESIS:
showMessage("Error synthesizing");
break;
case TextToSpeech.LANG_NOT_SUPPORTED:
showMessage("Language nor supported");
break;
}
if (speechStatus == TextToSpeech.ERROR) {
Log.e("TTS", "Error in converting Text to Speech!");
}
System.out.println("speech status - text " + speechStatus + " - " + text);
}
public void setLanguage(String src){
Locale locale = new Locale(src);
int tts = textToSpeech.setLanguage(locale);
System.out.println(tts + " " + src);
if (tts == TextToSpeech.LANG_MISSING_DATA
|| tts == TextToSpeech.LANG_NOT_SUPPORTED) {
Toast.makeText(mContext, "Language not yet supported.", Toast.LENGTH_LONG).show();
}
}
public void stop(){
textToSpeech.stop();
textToSpeech.shutdown();
}
#Override
public void onStart(String utteranceId) {
Log.e("START", "start speaking");
}
#Override
public void onDone(String utteranceId) {
Log.e("DONE", "done speaking");
}
#Override
public void onError(String utteranceID){
Log.e("Error", "Not infromative");
}
// This is not called!
#Override
public void onError(String utteranceId, int errorCode) {
Log.e("Error", "Error speaking");
}
}
Here is the error message in the Logcat:
NetworkSynthesizer: ExecutionException during NetworkFetchTask
java.util.concurrent.ExecutionException: clx: RESOURCE_EXHAUSTED: Quota exceeded for quota metric 's3-sessions' and limit 's3-session-limit' of service 'speechs3proto2-pa.googleapis.com' for consumer 'project_number:529030122437'.
at java.util.concurrent.FutureTask.report(FutureTask.java:123)
at java.util.concurrent.FutureTask.get(FutureTask.java:207)
at avf.a(PG:37)
at avf.a(PG:154)
at com.google.android.tts.service.GoogleTTSService.onSynthesizeText(PG:250)
at android.speech.tts.TextToSpeechService$SynthesisSpeechItem.playImpl(TextToSpeechService.java:1033)
at android.speech.tts.TextToSpeechService$SpeechItem.play(TextToSpeechService.java:819)
at android.speech.tts.TextToSpeechService$SynthHandler$1.run(TextToSpeechService.java:583)
at android.os.Handler.handleCallback(Handler.java:873)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:280)
at android.os.HandlerThread.run(HandlerThread.java:65)
Caused by: clx: RESOURCE_EXHAUSTED: Quota exceeded for quota metric 's3-sessions' and limit 's3-session-limit' of service 'speechs3proto2-pa.googleapis.com' for consumer 'project_number:529030122437'.
at cze.a(PG:58)
at cze.a(PG:29)
at dao.a(PG:21)
at ave.a(PG:36)
at ave.call(PG:80)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:764)
2019-03-16 21:35:46.917 1356-5238/? E/ActivityManager: Sending non-protected broadcast com.sonymobile.intent.action.POWER_BACK_OFF_FACTOR_CHANGED from system 2179:com.android.phone/1001 pkg com.android.phone
java.lang.Throwable
at com.android.server.am.ActivityManagerService.checkBroadcastFromSystem(ActivityManagerService.java:21814)
at com.android.server.am.ActivityManagerService.broadcastIntentLocked(ActivityManagerService.java:22423)
at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:22565)
at android.app.IActivityManager$Stub.onTransact$broadcastIntent$(IActivityManager.java:10171)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:167)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:3416)
at android.os.Binder.execTransact(Binder.java:731)
2019-03-16 21:35:46.917 12061-13318/? E/TTS.GoogleTTSServiceImp: Synthesis failure with error status code: -4
2019-03-16 21:35:46.918 12061-13318/? W/PlaybackSynthesisRequest: done() was called before start() call
2019-03-16 21:35:46.919 6468-6489/com.erikbylow.tailoreddictfire D/SPEECH: Error
When I turn WiFi on, it works.
Speculation: Can it be that the languages were missing and not downloaded when I did not use WiFi? When I turned WiFi on, the languages were downloaded?
To me this error: clx: RESOURCE_EXHAUSTED: Quota exceeded for quota metric... looks like there was always a network request, but after turning WiFi on, I could use TextToSpeach in flight mode.
On the other hand, I tried it woth russian in flight mode, that did not work. I turned internet on without WiFi, then it worked. Turned WiFi on again and then Russian worked as well. At least this suggests something needed to be downloaded?
I would like to find out what causes the problem and how to solve it since it is an app on Google Play. (Although I currently have exactly 0 active users beside me...) :).

Just to give some closure: This was an Android Framework Bug that has been closed with Android 12 (API 31). See also my referenced Bug Ticket: https://issuetracker.google.com/issues/138321382?pli=1

Android boolean is not keeping its state

So, I have a boolean called nuanceWaiting it's initially set to true. I immediate run a runnable loop that checks if nuanceWaiting is true or false.
protected void onCreate(Bundle savedInstanceState) {
...
nuanceWaiting = true;
...
}
#Override
protected void onStart() {
....
soundMeterLoop();
}
public void soundMeterLoop() {
soundMeterHandler = new Handler();
soundMeterRunnable = new Runnable() {
#Override
public void run() {
if(nuanceWaiting) {
//do my stuff
amplitude = soundMeter.getAmplitude();
if (amplitude > threshold) {
decibelLevelOutput.setTextColor(Color.RED);
startNuance();
} else {
decibelLevelOutput.setTextColor(Color.BLACK);
}
}
soundMeterHandler.postDelayed(this, 100);
}
};
soundMeterHandler.postDelayed(soundMeterRunnable, 100);
}
public void startNuance() {
nuanceWaiting = false;
nuance.toggleReco();
}
public void stopNuance() {
Log.d("SpeechKit", "stopNuance");
nuanceWaiting = true;
Log.d("SpeechKit", "nuanceWaiting " + nuanceWaiting);
}
Now, for some reason, once I call false, now, nuance.toggleReco() goes to another class and when it's finished, it calls stopNuance();
nuanceWaiting becomes false (showing in the second log), but when I check a log in the runnable, it still says true and never "stays" false when running the runnable again. Any idea as to why it doesn't stick to being false?
Below is what nuance.toggleReco(); does
public void toggleReco() {
Log.d("SpeechKit", "In "+state);
switch (state) {
case IDLE:
recognize();
break;
case LISTENING:
stopRecording();
break;
case PROCESSING:
cancel();
break;
}
}
It's usually in the IDLE state, so I'll follow that method,
private void recognize() {
//Setup our ASR transaction options.
Transaction.Options options = new Transaction.Options();
options.setRecognitionType(RecognitionType.DICTATION);
options.setDetection(DetectionType.Short);
options.setLanguage(new Language("eng-USA"));
options.setEarcons(startEarcon, stopEarcon, errorEarcon, cancelEarcon);
//Start listening
recoTransaction = session.recognize(options, recoListener);
}
private Transaction.Listener recoListener = new Transaction.Listener() {
#Override
public void onStartedRecording(Transaction transaction) {
Log.d("SpeechKit", "onStartedRecording");
//We have started recording the users voice.
//We should update our state and start polling their volume.
state = State.LISTENING;
startAudioLevelPoll();
}
#Override
public void onFinishedRecording(Transaction transaction) {
Log.d("SpeechKit", "onFinishedRecording");
//We have finished recording the users voice.
//We should update our state and stop polling their volume.
state = State.PROCESSING;
stopAudioLevelPoll();
avatar.stopNuance();
}
#Override
public void onRecognition(Transaction transaction, Recognition recognition) {
//We have received a transcription of the users voice from the server.
state = State.IDLE;
Log.d("SpeechKit", "onRecognition: " + recognition.getText());
voiceRecognizeText = recognition.getText();
voiceRecognize = (TextView) activity.findViewById(R.id.voiceRecognize);
voiceRecognize.setText(voiceRecognizeText);
}
#Override
public void onSuccess(Transaction transaction, String s) {
Log.d("SpeechKit", "onSuccess");
//Notification of a successful transaction. Nothing to do here.
}
#Override
public void onError(Transaction transaction, String s, TransactionException e) {
Log.e("SpeechKit", "onError: " + e.getMessage() + ". " + s);
//Something went wrong. Ensure that your credentials are correct.
//The user could also be offline, so be sure to handle this case appropriately.
//We will simply reset to the idle state.
state = State.IDLE;
avatar.stopNuance();
}
};

SpeechRecognizer throws onError on the first listening

In the Android 5 I faced with strange problem. The first call to the startListening of SpeechRecognizer results to the onError with error code 7 (ERROR_NO_MATCH).
I made test app with the following code:
if (speechRecognizer == null) {
speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this);
speechRecognizer.setRecognitionListener(new RecognitionListener() {
#Override
public void onReadyForSpeech(Bundle bundle) {
Log.d(TAG, "onReadyForSpeech");
}
#Override
public void onBeginningOfSpeech() {
Log.d(TAG, "onBeginningOfSpeech");
}
#Override
public void onRmsChanged(float v) {
Log.d(TAG, "onRmsChanged");
}
#Override
public void onBufferReceived(byte[] bytes) {
Log.d(TAG, "onBufferReceived");
}
#Override
public void onEndOfSpeech() {
Log.d(TAG, "onEndOfSpeech");
}
#Override
public void onError(int i) {
Log.d(TAG, "onError " + i);
}
#Override
public void onResults(Bundle bundle) {
Log.d(TAG, "onResults");
}
#Override
public void onPartialResults(Bundle bundle) {
Log.d(TAG, "onPartialResults");
}
#Override
public void onEvent(int i, Bundle bundle) {
Log.d(TAG, "onEvent");
}
});
}
final Intent sttIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
sttIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
sttIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en");
sttIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, "en");
speechRecognizer.startListening(sttIntent);
And have this log messages after first startListening call:
onError 7
onReadyForSpeech
onBeginningOfSpeech
onEndOfSpeech
onResults
And following messages after another startListening calls:
onRmsChanged
...
onRmsChanged
onReadyForSpeech
onRmsChanged
...
onRmsChanged
onBeginningOfSpeech
onRmsChanged
...
onRmsChanged
onEndOfSpeech
onRmsChanged
onRmsChanged
onRmsChanged
onResults
So, what is the reason of this error and how do I fix it?

As soon as you configure the "Okay Google" function to every screen the error appears.
So this seems to be the reason!
Deactivate the function and the problem should be solved

Done one workaround.
This is a regular flow
onReadyForSpeech -->onBeginningOfSpeech-->onEndOfSpeech -->onResults
But weired flow
onError(no match) -->onReadyForSpeech -->onBeginningOfSpeech-->onEndOfSpeech -->onResults
So set a boolean on the end of speech to true. and check onError to make sure that it has thrown an error after an end of speech!
speech.startListening(recognizerIntent);
isEndOfSpeech = false;
#Override
public void onError(int error) {
if (!isEndOfSpeech)
return;
}
#Override
public void onEndOfSpeech() {
isEndOfSpeech = true;
}

I had the same problem but I couldn't find a workaround, so I ended up just calling return inside onError if the time between startListening and onError is unreasonably short.
protected long mSpeechRecognizerStartListeningTime = 0;
protected synchronized void speechRecognizerStartListening(Intent intent) {
if (mSpeechRecognizer != null) {
this.mSpeechRecognizerStartListeningTime = System.currentTimeMillis();
RLog.d(this, "speechRecognizerStartListening");
this.mSpeechRecognizer.startListening(intent);
}
}
...
#Override
public synchronized void onError(int error) {
RLog.i(this, this.hashCode() + " - onError:" + error);
// Sometime onError will get called after onResults so we keep a boolean to ignore error also
if (mSuccess) {
RLog.w(this, "Already success, ignoring error");
return;
}
long duration = System.currentTimeMillis() - mSpeechRecognizerStartListeningTime;
if (duration < 500 && error == SpeechRecognizer.ERROR_NO_MATCH) {
RLog.w(this, "Doesn't seem like the system tried to listen at all. duration = " + duration + "ms. This might be a bug with onError and startListening methods of SpeechRecognizer");
RLog.w(this, "Going to ignore the error");
return;
}
// -- actual error handing code goes here.
}

I had the same problem on several devices. It seems onError(7) is always called before onReadyForSpeech(), so if to avoid using ugly times, you can do something like:
public void start(){
performingSpeechSetup = true;
speechRecognizer.startListening(intent);
}
and in the RecognitionListener:
public void onReadyForSpeech(Bundle bundle) {
performingSpeechSetup = false;
}
#Override
public void onError(int error) {
if (performingSpeechSetup && error == SpeechRecognizer.ERROR_NO_MATCH) return;
// else handle error
}

Turned out to be very easy in my case. The launching sound of the voice recognition was too loud and triggered the listening process at the very beginning. Turn down the system sound would help. (The volume key)

Recognizing multiple keywords using PocketSphinx

I've installed the PocketSphinx demo and it works fine under Ubuntu and Eclipse, but despite trying I can't work out how I would add recognition of multiple words.
All I want is for the code to recognize single words, which I can then switch() within the code, e.g. "up", "down", "left", "right". I don't want to recognize sentences, just single words.
Any help on this would be grateful. I have spotted other users' having similar problems but nobody knows the answer so far.
One thing which is baffling me is why do we need to use the "wakeup" constant at all?
private static final String KWS_SEARCH = "wakeup";
private static final String KEYPHRASE = "oh mighty computer";
.
.
.
recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);
What has wakeup got to do with anything?
I have made some progress (?) : Using addGrammarSearch I am able to use a .gram file to list my words, e.g. up,down,left,right,forwards,backwards, which seems to work well if all I say are those particular words. However, any other words will cause the system to match what is said to the "nearest" word from those stated. Ideally I don't want recognition to occur if words spoken are not in the .gram file...

Thanks to Nikolay's tip (see his answer above), I have developed the following code which works fine, and does not recognize words unless they're on the list. You can copy and paste this directly over the main class in the PocketSphinxDemo code:
public class PocketSphinxActivity extends Activity implements RecognitionListener
{
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;
#Override
public void onCreate(Bundle state)
{
super.onCreate(state);
setContentView(R.layout.main);
((TextView) findViewById(R.id.caption_text)).setText("Preparing the recognizer");
try
{
Assets assets = new Assets(PocketSphinxActivity.this);
File assetDir = assets.syncAssets();
setupRecognizer(assetDir);
}
catch (IOException e)
{
// oops
}
((TextView) findViewById(R.id.caption_text)).setText("Say up, down, left, right, forwards, backwards");
reset();
}
#Override
public void onPartialResult(Hypothesis hypothesis)
{
}
#Override
public void onResult(Hypothesis hypothesis)
{
((TextView) findViewById(R.id.result_text)).setText("");
if (hypothesis != null)
{
String text = hypothesis.getHypstr();
makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();
}
}
#Override
public void onBeginningOfSpeech()
{
}
#Override
public void onEndOfSpeech()
{
reset();
}
private void setupRecognizer(File assetsDir)
{
File modelsDir = new File(assetsDir, "models");
recognizer = defaultSetup().setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
.setDictionary(new File(modelsDir, "dict/cmu07a.dic"))
.setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
.getRecognizer();
recognizer.addListener(this);
File digitsGrammar = new File(modelsDir, "grammar/digits.gram");
recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}
private void reset()
{
recognizer.stop();
recognizer.startListening(DIGITS_SEARCH);
}
}
Your digits.gram file should be something like:
up /1e-1/
down /1e-1/
left /1e-1/
right /1e-1/
forwards /1e-1/
backwards /1e-1/
You should experiment with the thresholds within the double slashes // for performance, where 1e-1 represents 0.1 (I think). I think the maximum is 1.0.
And it's 5.30pm so I can stop working now. Result.

you can use addKeywordSearch which uses to file with keyphrases. One phrase per line with threshold for each phrase in //, for example
up /1.0/
down /1.0/
left /1.0/
right /1.0/
forwards /1e-1/
Threshold must be selected to avoid false alarms.

Working on updating Antinous amendment to the PocketSphinx demo to allow it to run on Android Studio. This is what I have so far,
//Note: change MainActivity to PocketSphinxActivity for demo use...
public class MainActivity extends Activity implements RecognitionListener {
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;
/* Used to handle permission request */
private static final int PERMISSIONS_REQUEST_RECORD_AUDIO = 1;
#Override
public void onCreate(Bundle state) {
super.onCreate(state);
setContentView(R.layout.main);
((TextView) findViewById(R.id.caption_text))
.setText("Preparing the recognizer");
// Check if user has given permission to record audio
int permissionCheck = ContextCompat.checkSelfPermission(getApplicationContext(), Manifest.permission.RECORD_AUDIO);
if (permissionCheck != PackageManager.PERMISSION_GRANTED) {
ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.RECORD_AUDIO}, PERMISSIONS_REQUEST_RECORD_AUDIO);
return;
}
new AsyncTask<Void, Void, Exception>() {
#Override
protected Exception doInBackground(Void... params) {
try {
Assets assets = new Assets(MainActivity.this);
File assetDir = assets.syncAssets();
setupRecognizer(assetDir);
} catch (IOException e) {
return e;
}
return null;
}
#Override
protected void onPostExecute(Exception result) {
if (result != null) {
((TextView) findViewById(R.id.caption_text))
.setText("Failed to init recognizer " + result);
} else {
reset();
}
}
}.execute();
((TextView) findViewById(R.id.caption_text)).setText("Say one, two, three, four, five, six...");
}
/**
* In partial result we get quick updates about current hypothesis. In
* keyword spotting mode we can react here, in other modes we need to wait
* for final result in onResult.
*/
#Override
public void onPartialResult(Hypothesis hypothesis) {
if (hypothesis == null) {
return;
} else if (hypothesis != null) {
if (recognizer != null) {
//recognizer.rapidSphinxPartialResult(hypothesis.getHypstr());
String text = hypothesis.getHypstr();
if (text.equals(DIGITS_SEARCH)) {
recognizer.cancel();
performAction();
recognizer.startListening(DIGITS_SEARCH);
}else{
//Toast.makeText(getApplicationContext(),"Partial result = " +text,Toast.LENGTH_SHORT).show();
}
}
}
}
#Override
public void onResult(Hypothesis hypothesis) {
((TextView) findViewById(R.id.result_text)).setText("");
if (hypothesis != null) {
String text = hypothesis.getHypstr();
makeText(getApplicationContext(), "Hypothesis" +text, Toast.LENGTH_SHORT).show();
}else if(hypothesis == null){
makeText(getApplicationContext(), "hypothesis = null", Toast.LENGTH_SHORT).show();
}
}
#Override
public void onDestroy() {
super.onDestroy();
recognizer.cancel();
recognizer.shutdown();
}
#Override
public void onBeginningOfSpeech() {
}
#Override
public void onEndOfSpeech() {
reset();
}
#Override
public void onTimeout() {
}
private void setupRecognizer(File assetsDir) throws IOException {
// The recognizer can be configured to perform multiple searches
// of different kind and switch between them
recognizer = defaultSetup()
.setAcousticModel(new File(assetsDir, "en-us-ptm"))
.setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
// .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
.getRecognizer();
recognizer.addListener(this);
File digitsGrammar = new File(assetsDir, "digits.gram");
recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}
private void reset(){
recognizer.stop();
recognizer.startListening(DIGITS_SEARCH);
}
#Override
public void onError(Exception error) {
((TextView) findViewById(R.id.caption_text)).setText(error.getMessage());
}
public void performAction() {
// do here whatever you want
makeText(getApplicationContext(), "performAction done... ", Toast.LENGTH_SHORT).show();
}
}
Caveat emptor: this is a work in progress. Check back later. Suggestions would be appreciated.

Use Speech Recognizer to open apps

I want to create a simple app which runs in the background using a service.
Using SpeechRecognizer it would listen for application names, and when it finds existing one it would open it. Also if it does not find coincidence or the result is not clear, it would suggest some options showing them in a list or via voice.
I already know how to use SpeechRecognizer, but what I would need is to set this service to maintain running on background and avoid it from being killed. Could this be done?

In addition to all this, I'd add at least this point:
SpeechRecognizer is better for hands-free user interfaces, since your app actually gets to respond to error conditions like "No matches" and perhaps restart itself. When you use the Intent, the app beeps and shows a dialog that the user must press to continue.
My summary is as follows:
SpeechRecognizer
Show different UI or no UI at all. Do you really want your app's UI to beep? Do you really want your UI to show a dialog when there is an error and wait for user to click?
App can do something else while speech recognition is happening
Can recognize speech while running in the background or from a service
Can Handle errors better
Can access low level speech stuff like the raw audio or the RMS. Analyze that audio or use the loudness to make some kind of flashing light to indicate the app is listening
Intent
Consistent, and easy to use UI for users
Easy to program
This is a work around for android version 4.1.1.
public class MyService extends Service
{
protected AudioManager mAudioManager;
protected SpeechRecognizer mSpeechRecognizer;
protected Intent mSpeechRecognizerIntent;
protected final Messenger mServerMessenger = new Messenger(new IncomingHandler(this));
protected boolean mIsListening;
protected volatile boolean mIsCountDownOn;
private boolean mIsStreamSolo;
static final int MSG_RECOGNIZER_START_LISTENING = 1;
static final int MSG_RECOGNIZER_CANCEL = 2;
#Override
public void onCreate()
{
super.onCreate();
mAudioManager = (AudioManager) getSystemService(Context.AUDIO_SERVICE);
mSpeechRecognizer = SpeechRecognizer.createSpeechRecognizer(this);
mSpeechRecognizer.setRecognitionListener(new SpeechRecognitionListener());
mSpeechRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
mSpeechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
mSpeechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,
this.getPackageName());
}
protected static class IncomingHandler extends Handler
{
private WeakReference<MyService> mtarget;
IncomingHandler(MyService target)
{
mtarget = new WeakReference<MyService>(target);
}
#Override
public void handleMessage(Message msg)
{
final MyService target = mtarget.get();
switch (msg.what)
{
case MSG_RECOGNIZER_START_LISTENING:
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.JELLY_BEAN)
{
// turn off beep sound
if (!mIsStreamSolo)
{
mAudioManager.setStreamSolo(AudioManager.STREAM_VOICE_CALL, true);
mIsStreamSolo = true;
}
}
if (!target.mIsListening)
{
target.mSpeechRecognizer.startListening(target.mSpeechRecognizerIntent);
target.mIsListening = true;
//Log.d(TAG, "message start listening"); //$NON-NLS-1$
}
break;
case MSG_RECOGNIZER_CANCEL:
if (mIsStreamSolo)
{
mAudioManager.setStreamSolo(AudioManager.STREAM_VOICE_CALL, false);
mIsStreamSolo = false;
}
target.mSpeechRecognizer.cancel();
target.mIsListening = false;
//Log.d(TAG, "message canceled recognizer"); //$NON-NLS-1$
break;
}
}
}
// Count down timer for Jelly Bean work around
protected CountDownTimer mNoSpeechCountDown = new CountDownTimer(5000, 5000)
{
#Override
public void onTick(long millisUntilFinished)
{
// TODO Auto-generated method stub
}
#Override
public void onFinish()
{
mIsCountDownOn = false;
Message message = Message.obtain(null, MSG_RECOGNIZER_CANCEL);
try
{
mServerMessenger.send(message);
message = Message.obtain(null, MSG_RECOGNIZER_START_LISTENING);
mServerMessenger.send(message);
}
catch (RemoteException e)
{
}
}
};
#Override
public void onDestroy()
{
super.onDestroy();
if (mIsCountDownOn)
{
mNoSpeechCountDown.cancel();
}
if (mSpeechRecognizer != null)
{
mSpeechRecognizer.destroy();
}
}
protected class SpeechRecognitionListener implements RecognitionListener
{
#Override
public void onBeginningOfSpeech()
{
// speech input will be processed, so there is no need for count down anymore
if (mIsCountDownOn)
{
mIsCountDownOn = false;
mNoSpeechCountDown.cancel();
}
//Log.d(TAG, "onBeginingOfSpeech"); //$NON-NLS-1$
}
#Override
public void onBufferReceived(byte[] buffer)
{
}
#Override
public void onEndOfSpeech()
{
//Log.d(TAG, "onEndOfSpeech"); //$NON-NLS-1$
}
#Override
public void onError(int error)
{
if (mIsCountDownOn)
{
mIsCountDownOn = false;
mNoSpeechCountDown.cancel();
}
mIsListening = false;
Message message = Message.obtain(null, MSG_RECOGNIZER_START_LISTENING);
try
{
mServerMessenger.send(message);
}
catch (RemoteException e)
{
}
//Log.d(TAG, "error = " + error); //$NON-NLS-1$
}
#Override
public void onEvent(int eventType, Bundle params)
{
}
#Override
public void onPartialResults(Bundle partialResults)
{
}
#Override
public void onReadyForSpeech(Bundle params)
{
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.JELLY_BEAN)
{
mIsCountDownOn = true;
mNoSpeechCountDown.start();
}
Log.d(TAG, "onReadyForSpeech"); //$NON-NLS-1$
}
#Override
public void onResults(Bundle results)
{
//Log.d(TAG, "onResults"); //$NON-NLS-1$
}
#Override
public void onRmsChanged(float rmsdB)
{
}
}
}

As commented, I think that you don't need to use Broadcast Receiver for what you are trying to do. Instead you should define a service to be continuosly listening for speech. You can find a implementation here:
And for that about android killing services, you cannot prevent a service from being killed by the system, even system services can be killed.
Anyway, you can use the Service's startForeground() method:
By default services are background, meaning that if the system needs
to kill them to reclaim more memory (such as to display a large page
in a web browser), they can be killed without too much harm. You can
set this flag if killing your service would be disruptive to the user,
such as if your service is performing background music playback, so
the user would notice if their music stopped playing.
You can see the implementation here.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Implement barge-in for Android TTS - android

Related

Android TextToSpeech behaves irregular

Android boolean is not keeping its state

SpeechRecognizer throws onError on the first listening

Recognizing multiple keywords using PocketSphinx

Use Speech Recognizer to open apps

Categories

Resources