I have a class that uses the Android TTS API to transcribe text to audio. I can control the pitch and speed; but I noticed the engine requires a text string and also a hash object. I noticed some words are pronounced too quickly to be easily recognized, and inflection seems too unnatural. Is there a way I can control these two things; possibly through the HashMap? The following is how I'm using the engine:
mTts = new TextToSpeech(Globals.context, this); // context, listener
}
#Override
public void onInit(int status) {
HashMap<String, String> myHashRender = new HashMap();
myHashRender.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, speech);
mTts.setPitch(0.8f);
mTts.setSpeechRate(0.6f);
mTts.synthesizeToFile(speech, myHashRender, fileOutPath);
while (mTts.isSpeaking()) try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
mTts.stop();
mTts.shutdown();
Google TTS does not currently support that, but here is what you can do: During parsing of your text, you can change parts of it to get the intonation and inflection you want.
For example, if you encounter the word 'Hey' you rewrite it on the fly to 'Heeeey' before you send it to the TTS engine to get a different pronounciation.
It is not pretty but it is a workaround.
Google TTS does not currently support changing inflection, nor does it
support inline prosody tags as defined in SSML. - alanv Jun 5 at 20:30
Google TTS does not currently support changing inflection, nor does it support inline prosody tags as defined in SSML. While there are parameters you can set, none of them control inflection or per-word prosody.
There may be other engines that do support these features. eSpeak, for example, does support SSML tags and has an Android port available on Play Store.
Related
I have an application that according to some events, changes a normal notification to text-to-speech in order to since sometimes the phone isn't available to users, and it'll be safer not to handle the phone.
For example, when you're driving, this is dangerous, so i want to turn the notifications to text-to-speech.
I've looked for a long time some explanation for turning text-to-speech when driving, but i can't find any reference for that no where i search.
For generating text-to-speech, i have this part, which works fine :
private TextToSpeech mTextToSpeech;
public void sayText(Context context, final String message) {
mTextToSpeech = new TextToSpeech(context, new TextToSpeech.OnInitListener() {
#Override
public void onInit(int status) {
try {
if (mTextToSpeech != null && status == TextToSpeech.SUCCESS) {
mTextToSpeech.setLanguage(Locale.US);
mTextToSpeech.speak(message, TextToSpeech.QUEUE_ADD, null);
}
} catch (Exception ex) {
System.out.print("Error handling TextToSpeech GCM notification " + ex.getMessage());
}
}
});
}
But, i don't know how to check if i'm currently driving or not.
As Ashwin suggested, you can use Activity recognition Api, but there's a downside of that, the driving samples you'll receive, has a field of 'confidence' which isn't always accurate, so you'll have to do extra work(such as check locations to see if you actually moved) in order to fully know if the user moved.
You can use google's FenceApi which allows you to define a fence of actions such as driving, walking, running, etc. This api launched recently. If you want a sample for using it, you can use this answer.
You can pull this git project (everything free), which does exactly what you want : adds to the normal notification a text-to-speech when you're driving.
In order to know whether you are driving or not you can use Activity Recognition API
Here is a great tutorial that might help you out Tutorial and Source Code
It looks as though Google has made offline speech recognition available from Google Now for third-party apps. It is being used by the app named Utter.
Has anyone seen any implementations of how to do simple voice commands with this offline speech rec? Do you just use the regular SpeechRecognizer API and it works automatically?
Google did quietly enable offline recognition in that Search update, but there is (as yet) no API or additional parameters available within the SpeechRecognizer class. {See Edit at the bottom of this post} The functionality is available with no additional coding, however the user’s device will need to be configured correctly for it to begin working and this is where the problem lies and I would imagine why a lot of developers assume they are ‘missing something’.
Also, Google have restricted certain Jelly Bean devices from using the offline recognition due to hardware constraints. Which devices this applies to is not documented, in fact, nothing is documented, so configuring the capabilities for the user has proved to be a matter of trial and error (for them). It works for some straight away – For those that it doesn't, this is the ‘guide’ I supply them with.
Make sure the default Android Voice Recogniser is set to Google not
Samsung/Vlingo
Uninstall any offline recognition files you already have installed
from the Google Voice Search Settings
Go to your Android Application Settings and see if you can uninstall
the updates for the Google Search and Google Voice Search
applications.
If you can't do the above, go to the Play Store see if you have the
option there.
Reboot (if you achieved 2, 3 or 4)
Update Google Search and Google Voice Search from the Play Store (if
you achieved 3 or 4 or if an update is available anyway).
Reboot (if you achieved 6)
Install English UK offline language files
Reboot
Use utter! with a connection
Switch to aeroplane mode and give it a try
Once it is working, the offline recognition of other languages,
such as English US should start working too.
EDIT: Temporarily changing the device locale to English UK also seems to kickstart this to work for some.
Some users reported they still had to reboot a number of times before it would begin working, but they all get there eventually, often inexplicably to what was the trigger, the key to which are inside the Google Search APK, so not in the public domain or part of AOSP.
From what I can establish, Google tests the availability of a connection prior to deciding whether to use offline or online recognition. If a connection is available initially but is lost prior to the response, Google will supply a connection error, it won’t fall-back to offline. As a side note, if a request for the network synthesised voice has been made, there is no error supplied it if fails – You get silence.
The Google Search update enabled no additional features in Google Now and in fact if you try to use it with no internet connection, it will error. I mention this as I wondered if the ability would be withdrawn as quietly as it appeared and therefore shouldn't be relied upon in production.
If you intend to start using the SpeechRecognizer class, be warned, there is a pretty major bug associated with it, which require your own implementation to handle.
Not being able to specifically request offline = true, makes controlling this feature impossible without manipulating the data connection. Rubbish. You’ll get hundreds of user emails asking you why you haven’t enabled something so simple!
EDIT: Since API level 23 a new parameter has been added EXTRA_PREFER_OFFLINE which the Google recognition service does appear to adhere to.
Hope the above helps.
I would like to improve the guide that the answer https://stackoverflow.com/a/17674655/2987828 sends to its users, with images. It is the sentence "For those that it doesn't, this is the ‘guide’ I supply them with." that I want to improve.
The user should click on the four buttons highlighted in blue in these images:
Then the user can select any desired languages. When the download is done, he should disconnect from network, and then click on the "microphone" button of the keyboard.
It worked for me (android 4.1.2), then language recognition worked out of the box, without rebooting. I can now dictates instructions to the shell of Terminal Emulator ! And it is twice faster offline than online, on a padfone 2 from ASUS.
These images are licensed under cc by-sa 3.0 with attribution required to stackoverflow.com/a/21329845/2987828 ; you may hence add these images anywhere along with this attribution.
(This the standard policy of all images and texts at stackoverflow.com)
A simple and flexible offline recognition on Android is implemented by CMUSphinx, an open source speech recognition toolkit. It works purely offline, fast and configurable It can listen continuously for keyword, for example.
You can find latest code and tutorial here.
Update in 2019: Time goes fast, CMUSphinx is not that accurate anymore. I recommend to try Kaldi toolkit instead. The demo is here.
In short, I don't have the implementation, but the explanation.
Google did not make offline speech recognition available to third party apps. Offline recognition is only accessable via the keyboard. Ben Randall (the developer of utter!) explains his workaround in an article at Android Police:
I had implemented my own keyboard and was switching between Google
Voice Typing and the users default keyboard with an invisible edit
text field and transparent Activity to get the input. Dirty hack!
This was the only way to do it, as offline Voice Typing could only be
triggered by an IME or a system application (that was my root hack) .
The other type of recognition API … didn't trigger it and just failed
with a server error. … A lot of work wasted for me on the workaround!
But at least I was ready for the implementation...
From Utter! Claims To Be The First Non-IME App To Utilize Offline Voice Recognition In Jelly Bean
I successfully implemented my Speech-Service with offline capabilities by using onPartialResults when offline and onResults when online.
I was dealing with this and I noticed that you need to install the offline package for your Language. My language setting was "Español (Estados Unidos)" but there is not offline package for that language, so when I turned off all network connectivity I was getting an alert from RecognizerIntent saying that can't reach Google, then I change the language to "English (US)" (because I already have the offline package) and launched the RecognizerIntent it just worked out.
Keys: Language setting == Offline Voice Recognizer Package
It is apparently possible to manually install offline voice recognition by downloading the files directly and installing them in the right locations manually. I guess this is just a way to bypass Google hardware requirements.
However, personally I didn't have to reboot or anything, simply changing to UK and back again did it.
Working example is given below,
MyService.class
public class MyService extends Service implements SpeechDelegate, Speech.stopDueToDelay {
public static SpeechDelegate delegate;
#Override
public int onStartCommand(Intent intent, int flags, int startId) {
//TODO do something useful
try {
if (VERSION.SDK_INT >= VERSION_CODES.KITKAT) {
((AudioManager) Objects.requireNonNull(
getSystemService(Context.AUDIO_SERVICE))).setStreamMute(AudioManager.STREAM_SYSTEM, true);
}
} catch (Exception e) {
e.printStackTrace();
}
Speech.init(this);
delegate = this;
Speech.getInstance().setListener(this);
if (Speech.getInstance().isListening()) {
Speech.getInstance().stopListening();
} else {
System.setProperty("rx.unsafe-disable", "True");
RxPermissions.getInstance(this).request(permission.RECORD_AUDIO).subscribe(granted -> {
if (granted) { // Always true pre-M
try {
Speech.getInstance().stopTextToSpeech();
Speech.getInstance().startListening(null, this);
} catch (SpeechRecognitionNotAvailable exc) {
//showSpeechNotSupportedDialog();
} catch (GoogleVoiceTypingDisabledException exc) {
//showEnableGoogleVoiceTyping();
}
} else {
Toast.makeText(this, R.string.permission_required, Toast.LENGTH_LONG).show();
}
});
}
return Service.START_STICKY;
}
#Override
public IBinder onBind(Intent intent) {
//TODO for communication return IBinder implementation
return null;
}
#Override
public void onStartOfSpeech() {
}
#Override
public void onSpeechRmsChanged(float value) {
}
#Override
public void onSpeechPartialResults(List<String> results) {
for (String partial : results) {
Log.d("Result", partial+"");
}
}
#Override
public void onSpeechResult(String result) {
Log.d("Result", result+"");
if (!TextUtils.isEmpty(result)) {
Toast.makeText(this, result, Toast.LENGTH_SHORT).show();
}
}
#Override
public void onSpecifiedCommandPronounced(String event) {
try {
if (VERSION.SDK_INT >= VERSION_CODES.KITKAT) {
((AudioManager) Objects.requireNonNull(
getSystemService(Context.AUDIO_SERVICE))).setStreamMute(AudioManager.STREAM_SYSTEM, true);
}
} catch (Exception e) {
e.printStackTrace();
}
if (Speech.getInstance().isListening()) {
Speech.getInstance().stopListening();
} else {
RxPermissions.getInstance(this).request(permission.RECORD_AUDIO).subscribe(granted -> {
if (granted) { // Always true pre-M
try {
Speech.getInstance().stopTextToSpeech();
Speech.getInstance().startListening(null, this);
} catch (SpeechRecognitionNotAvailable exc) {
//showSpeechNotSupportedDialog();
} catch (GoogleVoiceTypingDisabledException exc) {
//showEnableGoogleVoiceTyping();
}
} else {
Toast.makeText(this, R.string.permission_required, Toast.LENGTH_LONG).show();
}
});
}
}
#Override
public void onTaskRemoved(Intent rootIntent) {
//Restarting the service if it is removed.
PendingIntent service =
PendingIntent.getService(getApplicationContext(), new Random().nextInt(),
new Intent(getApplicationContext(), MyService.class), PendingIntent.FLAG_ONE_SHOT);
AlarmManager alarmManager = (AlarmManager) getSystemService(Context.ALARM_SERVICE);
assert alarmManager != null;
alarmManager.set(AlarmManager.ELAPSED_REALTIME_WAKEUP, 1000, service);
super.onTaskRemoved(rootIntent);
}
}
For more details,
https://github.com/sachinvarma/Speech-Recognizer
Hope this will help someone in future.
I'm starting my final year project. I will do an android application, which will take commands from the user, and then process the input in order to show results.
My question is, what ways can I use to process the input( what I mean by input here is the data or text after transferring speech to text)?
I have found some ways to do that like matching the input with data stored already(template matching), but Im looking for something more better and smarter that that (and if there are any suggested references).
Thanks
I would suggest you start with a very basic and clearly defined set of keyword rules of your own:
#Override
public void onResults(final Bundle results) {
final ArrayList<String> heardVoice = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
if(heardVoice != null && !heardVoice.isEmpty()) {
for(String result: heardVoice){
if(result.contains("bluetooth")){
if(result.contains("on")){
// turn on bluetooth
break;
} else if(result.contains("off")){
// turn off bluetooth
break;
}
}
}
}
}
Once you've understood these basic keyword parameters, you can then look to using a Natural Language Processing (NLP) model and the performance of your code.
There are many examples out there, but the Apache OpenNLP is a good place to start, with comprehensive documentation.
I am learning to write an app that is intended to perform TTS on given strings, and have tried an example modified from web:
Coding as follows:
// setup TTS part 1
mTts = new TextToSpeech(Lesson2_dialog_revision_simple.this, this); // TextToSpeech.OnInitListener
speakBtn.setOnClickListener(new OnClickListener()
{
public void onClick(View v)
{
StringTokenizer loveTokens = new StringTokenizer("他們 one two是 three ",",.");
int i = 0;
loveArray = new String[loveTokens.countTokens()];
while(loveTokens.hasMoreTokens())
{
loveArray[i++] = loveTokens.nextToken();
}
speakText();
}
});
}
// setup TTS part 2
#Override
public void onUtteranceCompleted(String utteranceId)
{
Log.v(TAG, "Get completed message for the utteranceId " + utteranceId);
lastUtterance = Integer.parseInt(utteranceId);
}
// setup TTS part 3
#Override
public void onInit(int status)
{
if(status == TextToSpeech.SUCCESS)
{
int result = mTts.setLanguage(Locale.CHINESE); // <====== set speech location
if(result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED)
{
Toast.makeText(Lesson2_dialog_revision_simple.this, "Language is not supported", Toast.LENGTH_LONG).show();
speakBtn.setEnabled(false);
}
else
{
speakBtn.setEnabled(true);
mTts.setOnUtteranceCompletedListener(this);
}
}
}
// setup TTS part 4
private void speakText()
{
lastUtterance++;
if(lastUtterance >= loveArray.length)
{
lastUtterance = 0;
}
Log.v(TAG, "the begin utterance is " + lastUtterance);
for(int i = lastUtterance; i < loveArray.length; i++)
{
params.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, String.valueOf(i));
mTts.speak(loveArray[i], TextToSpeech.QUEUE_ADD, params);
}
}
Questions:
Everything is ok if the int result = mTts.setLanguage(Locale.US); in part 3 above is set as US and to read out "one two three" in English perfectly. (in the above example, it will skip all the chinese words and just read out one two three)
However, if I change the string to read out Chinese by setting language as setLanguage(Locale.CHINESE), it immediately toasts out that "Language is not supported".
I would like to ask
the current TTS still does not support Chinese? I would even more prefer Cantonese rather than Chinese.
The phone is ABLE to recognize Cantonese when I inputting messages via speech (Cantonese). Is it actually there are some other way to perform TTS with output being Cantonese?
Thanks!!
1 - The Google TTS Engine at its current version does not support Cantonese as output yet. Putonghua works fine.
2 - Ekho is a TTS Engine that supports Cantonese.
You might want to give a try on the TTS app I developed that works with Ekho and Google TTS Engine: Voice Out TTS
As far as I know there's no specific Locale in JAVA to distinguish between Cantonese or Putonghua because Cantonese is a Chinese dialect. The Locale in JAVA refers only to the writings (Simplified or Traditional).
For example you can read a string written in Traditional Chinese with Cantonese or Putonghua.
#Pearmak: you can check the language that are supported in your device
int i = mTts.isLanguageAvailable(Locale.ENGLISH);
where mTts is object of TextToSpeech
If you get the value of i >=0 then that language is supported on you device otherwise not.
You may also pass the language locale string.
int i = mTts.isLanguageAvailable(new Locale("zh_CN")); //for chinese simplified
Yue, a tiny Chinese text to speech (TTS) synthesis engine of Cantonese, Mandarin for offline embedded system. Yue is extremely small size, offline, independent, and PCM audio output no needs of server or network connection. It has high naturalness of synthesised voice for hybrid text input, the Cantonese and Mandarin speech synthesis for same text input, with Yale, Jyutping and Pinyin romanization. The engine can continues produce and play voice for long text, the length of the text without limit. It has build-in intelligent detecter that can handle any traditional Chinese, simplified Chinese, English, number and punctuations, symbol mixed text input. Yue is written in ANSI C, no dependent of third part library, running on ARM, AVR embedded system such as watch, toy, robot and iPhone, Android, … mobile platforms, of course normal desktops, ebook, news paper reader, story teller. Yue can be loaded into memory and embedded in other programs, because of its extremely small size, it is well suited to embedded systems, and is also suitable for desktop operating systems. The engine can have bindings for a large number of programming languages.
The link:http://www.sevenuc.com/en/tts.html
Google TTS recently added support for Cantonese (and also Mandarin). http://www.androidpolice.com/2015/07/24/google-tts-now-supports-four-new-languages-including-cantonese-and-mandarin/
some phones have the cantonese locale that you can use with TTS.
try
new Locale("yue", "HK"); //yue for 粤语
Once you have set the system language to Cantonese, then you can use setLanguage(Locale.getDefault()).
I'm using the pico default android TTS engine with IPA caracters doing this
String text3 = "<speak xml:lang=\"fr-FR\"> <phoneme alphabet=\"ipa\" ph=\"+"+words+"\"/>.</speak>";
myTTS.speak(text3, TextToSpeech.QUEUE_ADD, null);
It's generally working, but for some letters it doesn't like "ã" or "ɑ" etc.
So my question is, How can I add theses letters/sounds, to this TTS engine ?
Hey you can use addEarcon() to add sounds to testToSpeech link.
This medthod is used to add earcons.It will link a text to a speecific sound file.
You can also find example on this.
mTts = new TextToSpeech(this, new OnInitListener() {
#Override
public void onInit(int status) {
mTts.addEarcon("[tock]", "com.ideal.itemid", R.raw.tock_snd);
showRecordingView();
}
});
There is also a very good explanation on addEarcon in book Professional Android Sensor Programming by Greg Milette, Adam Stroud
at page no 366 and 367.
You can also find example on this link.