How to add grammar to default VoiceDialer App?

How to add grammar to default VoiceDialer App? - android

I am trying to add a few commands to android default voicedialer app. It has commands like Open, dial, call, redial etc, I want to include lets say 'Find' to it. I have downloaded the source code from here and compiled it in Eclipse. the application sets up Grammar for arguments of these commands like it stores the names and phone numbers of the persons in contact list to generate intents when their names are recognized for CALL JOHN voice command. For CALL in this command it is just comparing the first word of resulting recognized string to "CALL".
I added "FIND" as an extra else if condition in the onRecognitionSuccess() function as shown below:
public class CommandRecognizerEngine extends RecognizerEngine
{
............
protected void onRecognitionSuccess(RecognizerClient recognizerClient) throws InterruptedException
{
.....................
if ("DIAL".equalsIgnoreCase(commands[0]))
{
Uri uri = Uri.fromParts("tel", commands[1], null);
String num = formatNumber(commands[1]);
if (num != null)
{
addCallIntent(intents, uri, literal.split(" ")[0].trim() + " " + num, "", 0);
}
}
................
else if ("FIND".equalsIgnoreCase(commands[0]))
{
if (Config.LOGD)
Log.d(TAG, "FIND detected...");
}
}//end onRecognitionSuccess
}//end CommandRecognizerEngine
but my app can't recognize it. Does anyone know how does recognizer detects commands like OPEN or CALL etc or refer me to appropriate documentation?
Thanks.

As it has been over a year, I doubt you need this answer anymore. However, some other people might find this through Google, as I did.
Right now, the best way to apply grammars to speech recognition on Android is to set the number of results higher, and then filter the results based on your grammar. It is not perfect, as the word recognized may not have passed a threshold to be included in the list, but it does greatly improve the accuracy of all speech recognition applications where the types of things you can say are somewhat limited.

Related

Text to speech pronouncing numbers like "4th", "8ths", or "2nd"

A while back I wrote some code that would convert a Double into a String, where the string was formatted as a readable fraction.
For an example
4.75 => "4 and 3 4ths"
1.5 => "1 and 1 half"
1.33 => "1 and 1 3rd"
The majority of numbers are pronounced as intended with a few notable exceptions. Instead of the text "4ths" being pronounced as "fourths" it is pronounced "four tee ache ess". Here is an example demonstrating this.
//this works
tts.speak("1 and 3 fourths", TextToSpeech.QUEUE_FLUSH, null);
//this works
tts.speak("1 and 1 3rd", TextToSpeech.QUEUE_FLUSH, null);
//this works
tts.speak("1 and 1 4th", TextToSpeech.QUEUE_FLUSH, null);
//this does not work
tts.speak("1 and 3 4ths", TextToSpeech.QUEUE_FLUSH, null);
//this does not work
tts.speak("1 and 3 4thes", TextToSpeech.QUEUE_FLUSH, null);
//this does not work
tts.speak("1 and 3 4th-s", TextToSpeech.QUEUE_FLUSH, null);
The strangest this is that this worked fine about a year back when I first wrote the code, the "ths" postfix was pronounced as one might expect. Perhaps I am mistaken on that point...
Regardless, the issue seems to be that numbers followed by 2 letters are read like a complete word, while numbers followed by 3 or more are read like a series of digits instead. I could add to the complexity of the algorithm by substituting all the numbers for their word counterparts however the longer I work at this the more I begin to think that I am reinventing the wheel. The API did not seem to denote a way of specifying pronunciation for the speak() method. Am I missing something?

This sort of behavior is going to vary between TextToSpeech Engines -- the Google TTS engine, for example, will behave differently than, say, the SVOX PICO (emulator < API 24) engine... so it's not your fault how each engine behaves slightly differently... and if there are any pronunciation controls, then the engine is responsible for supplying them directly to the end user via settings.
You're probably just testing on a different engine than you were before... or even an update to the same engine.
You could just test some major engines like Samsung, Google, and PICO and try to find a common denominator of behavior. I suspect that you're right: spelling out the words is the best option in this case.
You can specify what engine you want to use as the last argument (String) of the TextToSpeech constructor, and you can see what engines are installed on any particular device by going to (home\settings\language&locale\TTS) or in code like this:
private ArrayList<String> whatEnginesAreInstalled(Context context) {
final Intent ttsIntent = new Intent();
ttsIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
final PackageManager pm = context.getPackageManager();
final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);
ArrayList<String> installedEngineNames = new ArrayList<>();
for (ResolveInfo r : list) {
String engineName = r.activityInfo.applicationInfo.packageName;
installedEngineNames.add(engineName);
// just logging the version number out of interest
String version = "null";
try {
version = pm.getPackageInfo(engineName,
PackageManager.GET_META_DATA).versionName;
} catch (Exception e) {
Log.i("XXX", "try catch error");
}
Log.i("XXX", "we found an engine: " + engineName);
Log.i("XXX", "version: " + version);
}
return installedEngineNames;
}

As Boober Bunz explained, these features vary from one engine to another. It might get changed with newer versions of engine as well. I would suggest the best option will be to convert everything to words, like Fourths to make it consistent across engines. For a quick fix you can try 4th's as it seems to be more valid word than others you mentioned not working.

Turning notifications to text-to-speech when driving

I have an application that according to some events, changes a normal notification to text-to-speech in order to since sometimes the phone isn't available to users, and it'll be safer not to handle the phone.
For example, when you're driving, this is dangerous, so i want to turn the notifications to text-to-speech.
I've looked for a long time some explanation for turning text-to-speech when driving, but i can't find any reference for that no where i search.
For generating text-to-speech, i have this part, which works fine :
private TextToSpeech mTextToSpeech;
public void sayText(Context context, final String message) {
mTextToSpeech = new TextToSpeech(context, new TextToSpeech.OnInitListener() {
#Override
public void onInit(int status) {
try {
if (mTextToSpeech != null && status == TextToSpeech.SUCCESS) {
mTextToSpeech.setLanguage(Locale.US);
mTextToSpeech.speak(message, TextToSpeech.QUEUE_ADD, null);
}
} catch (Exception ex) {
System.out.print("Error handling TextToSpeech GCM notification " + ex.getMessage());
}
}
});
}
But, i don't know how to check if i'm currently driving or not.

As Ashwin suggested, you can use Activity recognition Api, but there's a downside of that, the driving samples you'll receive, has a field of 'confidence' which isn't always accurate, so you'll have to do extra work(such as check locations to see if you actually moved) in order to fully know if the user moved.
You can use google's FenceApi which allows you to define a fence of actions such as driving, walking, running, etc. This api launched recently. If you want a sample for using it, you can use this answer.
You can pull this git project (everything free), which does exactly what you want : adds to the normal notification a text-to-speech when you're driving.

In order to know whether you are driving or not you can use Activity Recognition API
Here is a great tutorial that might help you out Tutorial and Source Code

Check if RS232 response contains malformed chars

I have android platform on one end and arduino on the other, connected via serial. Everything works fine, however in some cases arduino restarts itself and causes a flow of unknown characters while its restarting to the serial.
Here is a serial log while arduino is rebooting:
�z������"&O�Z&���B
���F ���cd�:{����t�>��+������2�~����. ���r���DD���^��.�.B�.��ڮ2t��Z:��,R��A�ڢr��Ckˡ���.-���N^���b�����^���
Question is, how can I check on android end if the response was malformed?

You should probably add some kind of "framing" to your messages. CR/LF isn't enough.
For example, put a special "preamble" at the front, and watch for it on the Android side. Choose something that will not occur in the body ("payload") of the message. And choose something that is very unlikely to occur in the random chars that show up on a reboot, a couple of chars long.
You could also put a CRC at the end. "Fletcher" is easy.

I ended up using simple solution like this:
private String filterData(String receivedStr) {
if (receivedStr.contains(RECV_HEADER) && receivedStr.contains(mReadRules.RECV_END)) {
int header_pos = receivedStr.indexOf(RECV_HEADER);
int crc_pos = receivedStr.indexOf(RECV_END);
return receivedStr.substring(header_pos, crc_pos);
} else {
return null;
}
}
It also extracts message if its wrapped around with malformed data.

Text to speech returns a different non-existant Locale after setting an existing one

original question
I have a standard texttospeech, android.speech.tts.TextToSpeech
I initialize it and set a language by using tts.setLanguage(Locale.getDefault())
That default Locale is de_DE (for germany, correct).
Right after setting it, i ask the tts to give me its language tts.getLanguage()
now it tells me that its set to "deu_DEU"
There is no Locale with that setting. So i cant even check if its set to the right language because i cant find the Locale object that has the matching values.
Issue might be related to Android 4.3, but i didnt find any info.
Background is, that i need to show values with the same decimal symbol, but tts needs the correct symbol or it says "dot" in german which makes NO sense at all.
Conclusion:
A Locale is a container that contains a string that is composed of a language, a country and an optional string. Every text-to-speech engine can return a custom Locale like "eng_USA_texas".
Furthermore the Locale that is returned by the tts engine can only be a "close match" to the wanted Locale. So "en_US" instead of "en_UK".
However, Locale has a method called getLanguage() and it returns the first part of above mentioned string. "en" or "eng". Those Language codes are regulated by ISO and one can hope that everyone sticks to it. (see link in the accepted answer)
So checking for tts.getLanguage().getLanguage().startsWith("en") should always be true if its some form of english language setting and the ISO standards are fulfilled.
It is important to mention that Locales should not be compared by locale_a == locale_b as both can be different objects yet have the same content, they are containers of sort.
Always compare with locale_a.equals(locale_b)
I hope this helps people sort out some problems with tts and language

You're right, it's frustrating how the locale codes the TTS object uses are different to those of the device locale. I don't understand why this decision was made.
To add further complication, the TTS Engine can supply all kinds of different locales, such as eng_US_sarah or en-US-female etc. It's down to the TTS Engine how these are stored and displayed.
I've had to write additional code to iterate through the returned locales and attempt to match them to the locale the system can use, or vica-versa.
To start with, take a look at how the engines you have installed are returning their locale information. You can then start to collate in your code a list to associate 'deu_DEU' to 'de_De'.
This is often simplistic by using split("_") & startsWith(String), but unfortunately not for all locales.
Here's some base code I've used to analyse the installed TTS Engines' locale structure.
private void getEngines() {
final Intent ttsIntent = new Intent();
ttsIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
final PackageManager pm = getActivity().getPackageManager();
final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);
final ArrayList<Intent> intentArray = new ArrayList<Intent>(list.size());
for (int i = 0; i < list.size(); i++) {
final Intent getIntent = new Intent();
getIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
getIntent.setPackage(list.get(i).activityInfo.applicationInfo.packageName);
getIntent.getStringArrayListExtra(TextToSpeech.Engine.EXTRA_AVAILABLE_VOICES);
intentArray.add(getIntent);
}
for (int i = 0; i < intentArray.size(); i++) {
startActivityForResult(intentArray.get(i), i);
}
}
#Override
public void onActivityResult(final int requestCode, final int resultCode, final Intent data) {
try {
if (data != null) {
System.out.print(data.getStringArrayListExtra("availableVoices").toString());
}
} catch (NullPointerException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
From the above ISO-3 codes and the device locale format, you should be able to come up with something for the locales you are concerned with.
I've been intending to submit an enhancement request to AOSP for a while, as all TTS Engines need to use constant values and extras such as gender etc need to be added to use the TTS Engines to their full capabilities.
EDIT: Further to your edit, note the wording regarding setLanguage(). The individual TTS Engine will try and match as close as possible to the requested locale, but that applied locale may be completely wrong, depending on how lenient the Engine provider is in their code and their response.

After creating an object of TextToSpeech class, you should configure it (or check it's available state/values) into TextToSpeech.OnInitListener's onInit() callback. You will get reliable information there about your TextToSpeech object.
Check my answer here:
https://stackoverflow.com/a/65620221/7835969

process input from voice recognition

I'm starting my final year project. I will do an android application, which will take commands from the user, and then process the input in order to show results.
My question is, what ways can I use to process the input( what I mean by input here is the data or text after transferring speech to text)?
I have found some ways to do that like matching the input with data stored already(template matching), but Im looking for something more better and smarter that that (and if there are any suggested references).
Thanks

I would suggest you start with a very basic and clearly defined set of keyword rules of your own:
#Override
public void onResults(final Bundle results) {
final ArrayList<String> heardVoice = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
if(heardVoice != null && !heardVoice.isEmpty()) {
for(String result: heardVoice){
if(result.contains("bluetooth")){
if(result.contains("on")){
// turn on bluetooth
break;
} else if(result.contains("off")){
// turn off bluetooth
break;
}
}
}
}
}
Once you've understood these basic keyword parameters, you can then look to using a Natural Language Processing (NLP) model and the performance of your code.
There are many examples out there, but the Apache OpenNLP is a good place to start, with comprehensive documentation.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.