best practice for specifying pronunciation for Android TTS engine? - android

In general, I'm very impressed with Android's default text to speech engine (i.e., com.svox.pico). As expected, it mispronounces some words (as do I) and it therefore occasionally needs some pronunciation guidance. So I'm wondering about best practices for phonetically spelling out those words that the pico TTS engine mispronounces.
For example, the correct pronunciation of the bird Chachalaca is CHAH-chah-LAH-kah. Here is what the TTS engine produces:
mTts.speak("Chachalaca", TextToSpeech.QUEUE_ADD, null); // output: chuh-KAL-uh-KUH
mTts.speak("CHAH-chah-LAH-kah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-EL-AY-AYCH-dash-kuh
mTts.speak("CHAHchahLAHkah", TextToSpeech.QUEUE_ADD, null); // output: CHA-chah-LAH-ka
mTts.speak("CHAH chah LOCKah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-LAH-kah
Here are my questions.
Is there a standard phonetic spelling recognized by the Android TTS engine?
If not, are there some general rules for making custom pronunciation spellings that will make the spellings more likely to be correct in future TTS engines/versions?
It appears that the Android TTS engine ignores text case. What is the best way to specify emphasis?
By the way, this is what the TTS engine writes to logcat:
V/TtsService( 294): TTS processing: CHAH chah LOCKah
V/TtsService( 294): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 294): Language already loaded (en-US == en-US)
I/SynthProxy( 294): setting speech rate to 100
I/SynthProxy( 294): setting pitch to 100
[UPDATE]
I tried passing an XML document to TextToSpeech.speak() as follows:
String text = "<?xml version=\"1.0\"?>" +
"<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" " +
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
"xsi:schemaLocation=\"http://www.w3.org/2001/10/synthesis " +
"http://www.w3.org/TR/speech-synthesis/synthesis.xsd\" " +
"xml:lang=\"en-US\">" +
"That is a big car! " +
"That <emphasis>is</emphasis> a big car! " +
"That is a <emphasis>big</emphasis> car! " +
"That is a huge bank account! " +
"That <emphasis level=\"strong\">is</emphasis> a huge bank account! " +
"That is a <emphasis level=\"strong\">huge</emphasis> bank account!" +
"</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);
As Android Eve suggested, the TTS engine read only the XML body (i.e., the comments about the big car and the huge bank account). I didn't realize the TTS engine was capable of parsing XML documents. However, I did not hear any emphasis in the TTS output.
[UPDATE 2]
I simplified the question to whether or not Android TTS supports Speech Synthesis Markup Language here.

JW answered my question at the tts-for-android group:
Hi Greg,
The Pico engine recognizes the tag with the XSAMPA alphabet.
There are no easy rules to derive a certain pronunciation from the orthograpy, but you can use intuitive spellings and trial and error. Capitalizing and hyphens will introduce more problems than solving them. Using different spellings and introducing extra word boundaries (spaces) can work.
The emphasis tag and the exclamation mark will not change the synthesis result. Use , , and commands instead.
Some examples of the proper syntax for specifying the pronunciation using the SSML phoneme tag are in these tests of TextToSpeech.
Even with these simple test SSML documents, there are warning messages posted to logcat about the SSML document not being well-formed. So I opened an issue about these seemingly incorrect logcat messages to the Android issue tracker.
The syntax for specifying an x-SAMPA sequence to SVOX pico is
String text = "<speak xml:lang=\"en-US\"> <phoneme alphabet=\"xsampa\" ph=\"d_ZIn\"/>.</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);
Although more examples would be helpful, a good reference for x-SAMPA is at http://en.wikipedia.org/wiki/Xsampa If I compile a couple dozen examples, I'll post them to that Wikipedia page.

One answer for all 3 questions: Look at the SSML specifications: http://www.w3.org/TR/speech-synthesis/
For example, to specify emphasis, you use the emphasis element, e.g.
<?xml version="1.0"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
That is a <emphasis> big </emphasis> car!
That is a <emphasis level="strong"> huge </emphasis>
bank account!
</speak>

Related

String objects with various Hyperlinks in Textview?

I have a String objects that contain hyperlinks which I'm trying to make clickable in a Textview, here's an example:
String descrption = "\"Cardano is a decentralised platform" +
" that will allow complex programmable transfers of" +
" value in a secure and scalable fashion. It is one" +
" of the first blockchains to be built in the highly" +
" secure Haskell programming language. Cardano is developing" +
" a smart contract platform" +
" which seeks to deliver more advanced features than any protocol previously developed." +
" It is the first blockchain platform to evolve out of a scientific philosophy and a research-first driven approach." +
" The development team consists of a large global collective of expert engineers and researchers.\\r\\n\\r\\n" +
"The Cardano project is different from other blockchain projects as it openly addresses the need for regulatory oversight whilst maintaining" +
" consumer privacy and protections through an innovative software architecture.";
TextView textView = findViewById(R.id.description_textview);
Linkify.addLinks(textView, Linkify.WEB_URLS);
But this is how it comes out:
How can I get the hyperlinks to format correctly?
I've managed to get it working with the following code:
descrption= descrption.replaceAll("\\r\\n", "<p>");
Spanned spanned= Html.fromHtml(descrption);
textView .setText(spanned);
textView.setMovementMethod(LinkMovementMethod.getInstance());

System.out.print with // prints the code Blue. What is the meaning of this?

System.out.println(TAG + " //METHOD_STARTED// - //start_firebase_and_get_userID//");
Why if I write these in my app, it comes in blue in the console after the //?
It appears to be mistaking it for a hyperlink.
//METHOD_STARTED// is a valid protocol-relative URL (domain names don't have to have dots if they're on your local network), and it seems that Android Studio/IntelliJ's link detection is falling for it. Of course, it's not valid in this case, because there's no protocol in the log output for it to be relative to, so really this is a bug.

Accessibility: Talkback, WebView and user's locale

I have developed an app that includes a Webview. I would like to make my app fully accessible, so for the webview element I would like for TalkBack to read html elements such as "Heading", "Banner", "EditText" in a fully accessible way.
I have seen that the TTS process in WebView is done through JS injection via Chromium AccessibilityInjector.java class. This injects this script into the page, which only has the messages in English. The result is that when a device is in another language, the TTS reads these html element in English regardless.
Now I cannot debug or extend the chromium webclient, so how can I make TTS to read my page according to the user's locale?
EDIT: I am using jQuery Mobile by the way.
Just in case someone stumbles into this problem: I had to apply an ugly workaround to get over this. Whenever I load a page and TalkBack is enabled, I reinject the javascript variables containing the text to be read, with their localized counterparts. For instance, for Spanish text:
view.loadUrl("javascript:window.setTimeout(function(){" +
"window.console.log(\"Injecting messages.\");" +
"cvox.TestMessages[\"chromevox_input_type_text\"] = {message: \"cuadro de edición\"};" +
"cvox.TestMessages[\"chromevox_input_type_radio\"] = {message: \"botón de opción\"};" +
"cvox.TestMessages[\"chromevox_selected\"] = {message: \"seleccionado\"};" +
"cvox.TestMessages[\"chromevox_unselected\"] = {message: \"no seleccionado\"};" +
"cvox.TestMessages[\"chromevox_radio_selected_state\"] = {message: \"seleccionado\"};" +
"cvox.TestMessages[\"chromevox_radio_unselected_state\"] = {message: \"no seleccionado\"};" +
"cvox.TestMessages[\"chromevox_input_type_submit\"] = {message: \"botón\"};" +
"cvox.TestMessages[\"chromevox_input_type_button\"] = {message: \"botón\"};" +
"cvox.TestMessages[\"chromevox_tag_button\"] = {message: \"botón\"};" +
"}, 2000)");
Note that I insert a timeout before injecting the variables -- this is to prevent chromevox from being injected after my injection thus making the solution useless.
I know this is an ugly patch, but I could not find any better solutions without access to the chromium webview classes.

EXTRA_AVAILABLE_VOICES always returns eng-GBR only. Why?

I am using the following snippet to log all available (and unavailable) voices currently on phone:
ArrayList<String> availableVoices = intent.getStringArrayListExtra(TextToSpeech.Engine.EXTRA_AVAILABLE_VOICES);
String availStr = "";
for (String lang : availableVoices)
availStr += (lang + ", ");
Log.i(String.valueOf(availableVoices.size()) + " available langs: ", availStr);
ArrayList<String> unavailableVoices = intent.getStringArrayListExtra(TextToSpeech.Engine.EXTRA_UNAVAILABLE_VOICES);
String unavailStr = "";
for (String lang : unavailableVoices)
unavailStr += (lang + ", ");
Log.w(String.valueOf(unavailableVoices.size()) + " unavailable langs: ", unavailStr);
The logged result is somehwat bewildering, since I know beyond certainty that I have multiple languages installed and I can even hear the TTS speaking in eng-USA, yet the log shows:
1 available langs: eng-GBR,
30 unavailable langs: ara-XXX, ces-CZE, dan-DNK, deu-DEU, ell-GRC,
eng-AUS, eng-GBR, eng-USA, spa-ESP, spa-MEX, fin-FIN, fra-CAN,
fra-FRA, hun-HUN, ita-ITA, jpn-JPN, kor-KOR, nld-NLD, nor-NOR,
pol-POL, por-BRA, por-PRT, rus-RUS, slk-SVK, swe-SWE, tur-TUR,
zho-HKG, zho-CHN, zho-TWN, tha-THA,
Why is this inconsistent behavior? (note that eng-GBR appears in both the available and unavailable lists...)
It turns out that as far as text-to-speech in Android 2.x goes, it's the wild west out there: Every and any installed 3rd-party TTS engine can modify the output of this EXTRA_AVAILABLE_VOICES function however they desire, regardless whether checked/unchecked or selected/unselected as default.
I just tried uninstalling all TTS engines from my phone, leaving only the hard-coded Pico, and the result match exactly what I expected:
6 available voices: deu-DEU, eng-GBR, eng-USA, spa-ESP, fra-FRA,
ita-ITA,
0 unavailable voices:
I don't mind the output of this function dynamically refer to the currently selected (i.e. default) TTS engine, but the fact is that once a 3rd party TTS engine is installed, this function's output doesn't make any sense, because it ignores any settings.
Also note that the name misleading: It's available languages, not voices!
I am posting this answer with the hope that it will help someone save the time & agony of discovering this the hard way.

Does Android TTS support Speech Synthesis Markup Language?

Passing the following SSML (Speech Synthesis Markup Language) document to the com.svox.pico TextToSpeech engine resulted in a reading of the XML body but no control from the phoneme element or the emphasis element. This result (no apparent SSML control) is the same on a Nexus One running Android 2.2 as well as on the emulator running an AVD with SDK level 8.
String text = "<?xml version=\"1.0\"?>" +
"<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" " +
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
"xsi:schemaLocation=\"http://www.w3.org/2001/10/synthesis " +
"http://www.w3.org/TR/speech-synthesis/synthesis.xsd\" " +
"xml:lang=\"en-US\">" +
"tomato " +
"<phoneme alphabet=\"ipa\" ph=\"t&#x259;mei&#x325;&#x27E;ou&#x325;\"> tomato </phoneme> " +
"That is a big car! " +
"That <emphasis> is </emphasis> a big car! " +
"That is a <emphasis> big </emphasis> car! " +
"That is a huge bank account! " +
"That <emphasis level=\"strong\"> is </emphasis> a huge bank account! " +
"That is a <emphasis level=\"strong\"> huge </emphasis> bank account!" +
"</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);
Does any Android TTS engine support any of the SSML elements?
I've been experimenting with SSML and it seems that the TTS engine wraps its input automaticly with the root <speak> element, so if you leave it out, then it works fine and you don't get a parser error.
Example:
String text = "Testing <phoneme alphabet=\"xsampa\" ph=\""{k.t#`\"/>.";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);
The answer seems to be "sort of". Not all the SSML tags are supported yet, but some test examples of the use of the <phoneme> tag are at https://android.googlesource.com/platform/external/svox/+/89292811b7fe82e5c14fa13942779763627e26db
Though the test examples produce the desired speech output, they also produce XML parser error messages in logcat. I've opened an issue about these seemingly incorrect error messages at the Android issue tracker (issue 11010).
It does appear that android.speech.tts at SDK level 23 supports a subset of SSML. Speech text can be wrapped in <speak> tags, and <say-as> is observed, while <break> is not. There is no documentation regarding SSML support.

Categories

Resources