A while back I wrote some code that would convert a Double into a String, where the string was formatted as a readable fraction.
For an example
4.75 => "4 and 3 4ths"
1.5 => "1 and 1 half"
1.33 => "1 and 1 3rd"
The majority of numbers are pronounced as intended with a few notable exceptions. Instead of the text "4ths" being pronounced as "fourths" it is pronounced "four tee ache ess". Here is an example demonstrating this.
//this works
tts.speak("1 and 3 fourths", TextToSpeech.QUEUE_FLUSH, null);
//this works
tts.speak("1 and 1 3rd", TextToSpeech.QUEUE_FLUSH, null);
//this works
tts.speak("1 and 1 4th", TextToSpeech.QUEUE_FLUSH, null);
//this does not work
tts.speak("1 and 3 4ths", TextToSpeech.QUEUE_FLUSH, null);
//this does not work
tts.speak("1 and 3 4thes", TextToSpeech.QUEUE_FLUSH, null);
//this does not work
tts.speak("1 and 3 4th-s", TextToSpeech.QUEUE_FLUSH, null);
The strangest this is that this worked fine about a year back when I first wrote the code, the "ths" postfix was pronounced as one might expect. Perhaps I am mistaken on that point...
Regardless, the issue seems to be that numbers followed by 2 letters are read like a complete word, while numbers followed by 3 or more are read like a series of digits instead. I could add to the complexity of the algorithm by substituting all the numbers for their word counterparts however the longer I work at this the more I begin to think that I am reinventing the wheel. The API did not seem to denote a way of specifying pronunciation for the speak() method. Am I missing something?
This sort of behavior is going to vary between TextToSpeech Engines -- the Google TTS engine, for example, will behave differently than, say, the SVOX PICO (emulator < API 24) engine... so it's not your fault how each engine behaves slightly differently... and if there are any pronunciation controls, then the engine is responsible for supplying them directly to the end user via settings.
You're probably just testing on a different engine than you were before... or even an update to the same engine.
You could just test some major engines like Samsung, Google, and PICO and try to find a common denominator of behavior. I suspect that you're right: spelling out the words is the best option in this case.
You can specify what engine you want to use as the last argument (String) of the TextToSpeech constructor, and you can see what engines are installed on any particular device by going to (home\settings\language&locale\TTS) or in code like this:
private ArrayList<String> whatEnginesAreInstalled(Context context) {
final Intent ttsIntent = new Intent();
final PackageManager pm = context.getPackageManager();
final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);
ArrayList<String> installedEngineNames = new ArrayList<>();
for (ResolveInfo r : list) {
String engineName = r.activityInfo.applicationInfo.packageName;
// just logging the version number out of interest
String version = "null";
try {
version = pm.getPackageInfo(engineName,
} catch (Exception e) {
Log.i("XXX", "try catch error");
Log.i("XXX", "we found an engine: " + engineName);
Log.i("XXX", "version: " + version);
return installedEngineNames;
As Boober Bunz explained, these features vary from one engine to another. It might get changed with newer versions of engine as well. I would suggest the best option will be to convert everything to words, like Fourths to make it consistent across engines. For a quick fix you can try 4th's as it seems to be more valid word than others you mentioned not working.
My Java app uses TextToSpeech, with setPitch() and setSpeechRate(). Since the recent upgrade to Android 12 API 32, these set methods no longer work. This is on some Samsung devices, and Google Pixel 5a. They worked before the upgrade, and have no effect now (although still return a "0" result).
I'm using the deprecated speak API:
int speak (String text, int queueMode, HashMap<String, String> params)
It looks like setPitch and setSpeechRate have been broken by the upgrade. On one device I tried it immediately before and immediately after upgrading, with no change to the app.
Perhaps the new API still works:
public int speak (CharSequence text, int queueMode, Bundle params, String utteranceId)
I haven't tried it yet, but thought I'd flag this and check for others' experiences straight away. Does anyone else have this problem, or any suggestions?
EDITS: Using the newer API does the same.
Using an emulator on API 32 it all works OK. Using a Galaxy S21 or Galaxy S22 on API 32 we get the problem. Both are using the com.google.android.tts TTS engine.
Curiously, it ignores setSpeechRate() unless it's over 2.0.
I've created an empty app with just TTS. SpeechRates from 0.2 to 2.0 speak as if they're 1.0. A speechRate of 2.1 speaks about double speed, as it should.
My code is below for reference:
void initTTS() {
tts = new TextToSpeech (this, status -> {
Log.d("MyApp", "TTS Init complete. status:" + status);
// All of these speak at the same rate, as if they are 1.0
// This speaks much faster, sounds like 2.1
void speak(Float speechRate) {
final Bundle params = new Bundle();
tts.speak("This is my voice", TextToSpeech.QUEUE_ADD, params, "1");
If you'd like to try this and report the result, please do. Don't forget to mention which API level you are on, and which device please.
Our applications had the same problem. Speech rate is now always 1 or more than 2. And on different phones Samsung, Xiaomi. And on different versions of OS 10.12. It looks like a problem with updating the voice package.
original question
I have a standard texttospeech, android.speech.tts.TextToSpeech
I initialize it and set a language by using tts.setLanguage(Locale.getDefault())
That default Locale is de_DE (for germany, correct).
Right after setting it, i ask the tts to give me its language tts.getLanguage()
now it tells me that its set to "deu_DEU"
There is no Locale with that setting. So i cant even check if its set to the right language because i cant find the Locale object that has the matching values.
Issue might be related to Android 4.3, but i didnt find any info.
Background is, that i need to show values with the same decimal symbol, but tts needs the correct symbol or it says "dot" in german which makes NO sense at all.
A Locale is a container that contains a string that is composed of a language, a country and an optional string. Every text-to-speech engine can return a custom Locale like "eng_USA_texas".
Furthermore the Locale that is returned by the tts engine can only be a "close match" to the wanted Locale. So "en_US" instead of "en_UK".
However, Locale has a method called getLanguage() and it returns the first part of above mentioned string. "en" or "eng". Those Language codes are regulated by ISO and one can hope that everyone sticks to it. (see link in the accepted answer)
So checking for tts.getLanguage().getLanguage().startsWith("en") should always be true if its some form of english language setting and the ISO standards are fulfilled.
It is important to mention that Locales should not be compared by locale_a == locale_b as both can be different objects yet have the same content, they are containers of sort.
Always compare with locale_a.equals(locale_b)
I hope this helps people sort out some problems with tts and language
You're right, it's frustrating how the locale codes the TTS object uses are different to those of the device locale. I don't understand why this decision was made.
To add further complication, the TTS Engine can supply all kinds of different locales, such as eng_US_sarah or en-US-female etc. It's down to the TTS Engine how these are stored and displayed.
I've had to write additional code to iterate through the returned locales and attempt to match them to the locale the system can use, or vica-versa.
To start with, take a look at how the engines you have installed are returning their locale information. You can then start to collate in your code a list to associate 'deu_DEU' to 'de_De'.
This is often simplistic by using split("_") & startsWith(String), but unfortunately not for all locales.
Here's some base code I've used to analyse the installed TTS Engines' locale structure.
private void getEngines() {
final Intent ttsIntent = new Intent();
final PackageManager pm = getActivity().getPackageManager();
final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);
final ArrayList<Intent> intentArray = new ArrayList<Intent>(list.size());
for (int i = 0; i < list.size(); i++) {
final Intent getIntent = new Intent();
for (int i = 0; i < intentArray.size(); i++) {
startActivityForResult(intentArray.get(i), i);
public void onActivityResult(final int requestCode, final int resultCode, final Intent data) {
try {
if (data != null) {
} catch (NullPointerException e) {
} catch (Exception e) {
From the above ISO-3 codes and the device locale format, you should be able to come up with something for the locales you are concerned with.
I've been intending to submit an enhancement request to AOSP for a while, as all TTS Engines need to use constant values and extras such as gender etc need to be added to use the TTS Engines to their full capabilities.
EDIT: Further to your edit, note the wording regarding setLanguage(). The individual TTS Engine will try and match as close as possible to the requested locale, but that applied locale may be completely wrong, depending on how lenient the Engine provider is in their code and their response.
After creating an object of TextToSpeech class, you should configure it (or check it's available state/values) into TextToSpeech.OnInitListener's onInit() callback. You will get reliable information there about your TextToSpeech object.
Check my answer here:
I'm starting my final year project. I will do an android application, which will take commands from the user, and then process the input in order to show results.
My question is, what ways can I use to process the input( what I mean by input here is the data or text after transferring speech to text)?
I have found some ways to do that like matching the input with data stored already(template matching), but Im looking for something more better and smarter that that (and if there are any suggested references).
I would suggest you start with a very basic and clearly defined set of keyword rules of your own:
public void onResults(final Bundle results) {
final ArrayList<String> heardVoice = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
if(heardVoice != null && !heardVoice.isEmpty()) {
for(String result: heardVoice){
// turn on bluetooth
} else if(result.contains("off")){
// turn off bluetooth
Once you've understood these basic keyword parameters, you can then look to using a Natural Language Processing (NLP) model and the performance of your code.
There are many examples out there, but the Apache OpenNLP is a good place to start, with comprehensive documentation.
I am learning to write an app that is intended to perform TTS on given strings, and have tried an example modified from web:
Coding as follows:
// setup TTS part 1
mTts = new TextToSpeech(Lesson2_dialog_revision_simple.this, this); // TextToSpeech.OnInitListener
speakBtn.setOnClickListener(new OnClickListener()
public void onClick(View v)
StringTokenizer loveTokens = new StringTokenizer("他們 one two是 three ",",.");
int i = 0;
loveArray = new String[loveTokens.countTokens()];
loveArray[i++] = loveTokens.nextToken();
// setup TTS part 2
public void onUtteranceCompleted(String utteranceId)
Log.v(TAG, "Get completed message for the utteranceId " + utteranceId);
lastUtterance = Integer.parseInt(utteranceId);
// setup TTS part 3
public void onInit(int status)
if(status == TextToSpeech.SUCCESS)
int result = mTts.setLanguage(Locale.CHINESE); // <====== set speech location
if(result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED)
Toast.makeText(Lesson2_dialog_revision_simple.this, "Language is not supported", Toast.LENGTH_LONG).show();
// setup TTS part 4
private void speakText()
if(lastUtterance >= loveArray.length)
lastUtterance = 0;
Log.v(TAG, "the begin utterance is " + lastUtterance);
for(int i = lastUtterance; i < loveArray.length; i++)
params.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, String.valueOf(i));
mTts.speak(loveArray[i], TextToSpeech.QUEUE_ADD, params);
Everything is ok if the int result = mTts.setLanguage(Locale.US); in part 3 above is set as US and to read out "one two three" in English perfectly. (in the above example, it will skip all the chinese words and just read out one two three)
However, if I change the string to read out Chinese by setting language as setLanguage(Locale.CHINESE), it immediately toasts out that "Language is not supported".
I would like to ask
the current TTS still does not support Chinese? I would even more prefer Cantonese rather than Chinese.
The phone is ABLE to recognize Cantonese when I inputting messages via speech (Cantonese). Is it actually there are some other way to perform TTS with output being Cantonese?
1 - The Google TTS Engine at its current version does not support Cantonese as output yet. Putonghua works fine.
2 - Ekho is a TTS Engine that supports Cantonese.
You might want to give a try on the TTS app I developed that works with Ekho and Google TTS Engine: Voice Out TTS
As far as I know there's no specific Locale in JAVA to distinguish between Cantonese or Putonghua because Cantonese is a Chinese dialect. The Locale in JAVA refers only to the writings (Simplified or Traditional).
For example you can read a string written in Traditional Chinese with Cantonese or Putonghua.
#Pearmak: you can check the language that are supported in your device
int i = mTts.isLanguageAvailable(Locale.ENGLISH);
where mTts is object of TextToSpeech
If you get the value of i >=0 then that language is supported on you device otherwise not.
You may also pass the language locale string.
int i = mTts.isLanguageAvailable(new Locale("zh_CN")); //for chinese simplified
Yue, a tiny Chinese text to speech (TTS) synthesis engine of Cantonese, Mandarin for offline embedded system. Yue is extremely small size, offline, independent, and PCM audio output no needs of server or network connection. It has high naturalness of synthesised voice for hybrid text input, the Cantonese and Mandarin speech synthesis for same text input, with Yale, Jyutping and Pinyin romanization. The engine can continues produce and play voice for long text, the length of the text without limit. It has build-in intelligent detecter that can handle any traditional Chinese, simplified Chinese, English, number and punctuations, symbol mixed text input. Yue is written in ANSI C, no dependent of third part library, running on ARM, AVR embedded system such as watch, toy, robot and iPhone, Android, … mobile platforms, of course normal desktops, ebook, news paper reader, story teller. Yue can be loaded into memory and embedded in other programs, because of its extremely small size, it is well suited to embedded systems, and is also suitable for desktop operating systems. The engine can have bindings for a large number of programming languages.
The link:http://www.sevenuc.com/en/tts.html
Google TTS recently added support for Cantonese (and also Mandarin). http://www.androidpolice.com/2015/07/24/google-tts-now-supports-four-new-languages-including-cantonese-and-mandarin/
some phones have the cantonese locale that you can use with TTS.
new Locale("yue", "HK"); //yue for 粤语
Once you have set the system language to Cantonese, then you can use setLanguage(Locale.getDefault()).
I am trying to add a few commands to android default voicedialer app. It has commands like Open, dial, call, redial etc, I want to include lets say 'Find' to it. I have downloaded the source code from here and compiled it in Eclipse. the application sets up Grammar for arguments of these commands like it stores the names and phone numbers of the persons in contact list to generate intents when their names are recognized for CALL JOHN voice command. For CALL in this command it is just comparing the first word of resulting recognized string to "CALL".
I added "FIND" as an extra else if condition in the onRecognitionSuccess() function as shown below:
public class CommandRecognizerEngine extends RecognizerEngine
protected void onRecognitionSuccess(RecognizerClient recognizerClient) throws InterruptedException
if ("DIAL".equalsIgnoreCase(commands[0]))
Uri uri = Uri.fromParts("tel", commands[1], null);
String num = formatNumber(commands[1]);
if (num != null)
addCallIntent(intents, uri, literal.split(" ")[0].trim() + " " + num, "", 0);
else if ("FIND".equalsIgnoreCase(commands[0]))
if (Config.LOGD)
Log.d(TAG, "FIND detected...");
}//end onRecognitionSuccess
}//end CommandRecognizerEngine
but my app can't recognize it. Does anyone know how does recognizer detects commands like OPEN or CALL etc or refer me to appropriate documentation?
As it has been over a year, I doubt you need this answer anymore. However, some other people might find this through Google, as I did.
Right now, the best way to apply grammars to speech recognition on Android is to set the number of results higher, and then filter the results based on your grammar. It is not perfect, as the word recognized may not have passed a threshold to be included in the list, but it does greatly improve the accuracy of all speech recognition applications where the types of things you can say are somewhat limited.