Android Instant Speech to Text voice recognition - android

I don't have much experience with Android, but was asked by a hearing-impaired friend if there is a way to essentially "stream" voice to text on a mobile device. I've used and looked into the android built in api, but it seems that only sends the speech off for processing after the speech input is completed. I'm looking for something that works contiguously (similar to how Dragon works with microsoft word).
Perhaps there is already an app that does this. If not, is there a way to implement this with the current Android OS/API?
Any suggestions appreciated.

As you've mentioned, the speech-to-text recognition is sent to Google for processing. This can take enormous computing power, which current devices simply can't handle (yet). Because everything is processed server-side, you won't be able to do immediate speech recognition in real time directly on the phone.
It's possible that somebody has created a 3rd-party library to do this, but I'm not aware of any. Even so, it would probably have some significant limitations or reduced accuracy.

You can use this Extra for the Recognizer Intent:
String EXTRA_PARTIAL_RESULTS Optional boolean to indicate whether partial results should be returned by the recognizer as the user speaks (default is false).
http://developer.android.com/reference/android/speech/RecognizerIntent.html#EXTRA_PARTIAL_RESULTS

Related

What is the most reliable TTS (Text-to-Speech) engine to use when developing for Android?

Our company is developing a full-featured app for blind people, which means it relies heavily on text-to-speech (TTS). We have noticed that the TTS voice simply stops speaking randomly. It usually works fine and we have no issues with speech, but once in a blue moon we get no voice output and the app doesn't know otherwise so it continues to work like usual, but without any voice. Users can still use the app for the most part, but they no longer hear the speech until the app is restarted and everything is reset.
Is there a reliable way to know if the voice fails to speak something?
I already utilized an utterance complete listener to handle certain scenarios with what it says, but that makes no difference when the TTS simply doesn't output the speech. It's as if the voice "thinks" it said it but we never hear it.
Is there an event we can capture that would be fired when the TTS engine tries to say something but fails?
In my experience, at the time of writing, the most reliable TTS Engine is Google's own. This of course, is a matter of opinion and not something Stackoverflow encourages. Currently, Google TTS is the only one that uses the latest Voice API correctly, where as other engines crash, fail or simply report incorrectly.
It's unfortunately all too common that a TTS Engine will believe it has spoken the utterance correctly, but hasn't.
To combat this, when I request the speech, I set a Runnable, which checks if onDone() fails to be called within a period relative to the length of the speech. I also check onDone() is not called too quickly, which would suggest that the speech failed, unless the request was deliberately one of silence.
Those two checks enable me to toast to the user if there's an issue - given that you are dealing with the blind, you will obviously have to find another way to communicate the problem! Perhaps a series of multiple small vibrates could generically denote an issue?
Hope that helps.

Real-time call transcription on Android

I am an Android developer who is living with hearing impairment and I am currently exploring the option of making a speech to text app with Speech Recognizer API in Android. Closed-captioning telephones and Innocaption are not available in in my home country. Potential applications might be like captioning during telephone calls.
https://developer.android.com/reference/android/speech/SpeechRecognizer.html
The API is meant for capturing voice commands, not for real-time live transcribing. I am even able to implement it as a service but I constantly need to restart it after it has delivered a result or a partial result, which is not feasible in a conversational setting (words get lost while the service is restarting).
Do note that I don't need a 100% accuracy for this app. Many hearing impaired people find it helpful to have some context of the conversation to help them along. So I don't actually need comments about how this is not going to be accurate.
Is there a way to implement Speech Recognizer in a continuous mode? I can create a textview that constantly updates itself when new text is returned from the service. If this API is not what I should be looking at, is there any recommendation? I tested CMUSphinx but find that it is too dependent on blocks of phrases/sentences that it is not likely to work for the kind of application I have in mind.
I am a deaf software developer, so I can chime in. I've been monitoring the state of art of Speech-To-Text APIs, and the APIs have now become "good enough" to provide operatorless relay/captioning services for CERTAIN kinds of phone conversations with people using telephone in quiet settings. For example, I get 98% transcription accuracy with my spouse's voice with the Apple Siri realtime transcription (iOS 8).
I was able to jerryrig phone captioning by routing the sound out of one phone, to a 2nd iPhone that I press the microphone button (popup keyboard), and successfully captioned a telephone conversation with ~95% accuracy at 250 words per minute (faster than Sprint Captioned Telephone and Hamilton Captioned Telephone), at least until the 1 minute cutoff time.
Thusly, I declare computer-based voice recognition practical for phone calls with family members (of the type you call frequently in quiet environments), where you can at least coach them to move to a quiet place to allow captioning to work properly (with >95% accuracy). Since iOS 8 got released, we REALLY need this, so we don't need to rely on rely operators or captioning telephone. Sprint Captioned telephone lags badly during fast speech, while Apple Siri keeps up, so I can conduct more natural telephone conversations with my jerryrigged two-iOS-device Apple Siri "realtime Captioned Telephone" setup.
Some cellphones transmit audio in a higher-def manner, so it works well between two iPhones (iPhone speaker piped into another iPhone's Siri running in iOS8 continuous mode). That's assuming you're on G.722.2 (AMR-WB), like when running two iPhones on the same carrier that supports the high-def audio telephony standard. It works perfectly when piped through Siri -- roughly as good as doing it in front of the phone, for the same human voice (assuming the other end is speaking into the phone in a quiet environment).
Google and Apple needs to open up their speech-to-text APIs to assistive applications, pronto, because operatorless telephone transcription is finally now practical, at least when calling family members (good voices & coached to be in a quiet environment when receiving call). The continuous recognition time limit needs to also be removed during this situation, too.
Google is not going to work with telephone quality audio anyway, you need to work on captioning service using CMUSphinx yourself.
You probably didn't configure CMUSphinx properly, it should be ok for large vocabulary transcription, the only thing you should care about is to use telephony 8khz model, not wideband model and generic language model.
For the best accuracy it's probably worth to move processing on the server, you can setup the PBX to make the calls and transcribe audio there instead of hoping to do something on a limited device.
It is true that the SpeechRecognizer API documentation claims that
The implementation of this API is likely to stream audio to remote
servers to perform speech recognition. As such this API is not
intended to be used for continuous recognition, which would consume a
significant amount of battery and bandwidth.
This bit of text was added a year ago (https://android.googlesource.com/platform/frameworks/base/+/2921cee3048f7e64ba6645d50a1c1705ef9658f8). However, no changes were made to the API at the time, i.e. the API remained the same. Also, I don't really see anything specific to networking and battery drain in the API documentation. So, go ahead and implement a recognizer (maybe based on CMUSphinx) and make it accessible via this API.

Change voice during phone call android

I want to make an android application that allow user change the voice during phone call. For example: You are a man, you can change the voice to a woman or robot when talking over phone. It is like a funny prank.
I work around android's API and google for some days but still have no idea. Some one told is impossible but I see some app on google play can do:
https://play.google.com/store/apps/details?id=com.gridmob.android.funnycall
So I think there are some ways to do that.
I think about recording and play back by using AudioTracker but I have 2more problem:
1. I cannot mute the voice from phone call, so the phone only play my sound after processing
2. record and process will make a long delay (slow-realtime)
Can any one share some solution for this?
The app you linked isn't changing voices on the phone: it uses SIP (or similar) to place a call through the authors' servers and the voice changing happens there. That's why you only get a small number of free minutes of use before you have to pay them.
Yes it uses a sip server to do this process. The reason you cannot actually create an app that does this on the phone is because of two things. The first thing being, sound processing for the phone is locked. You can't unlock this because its strictly engineered through hardware not software. A pc can do this because it uses a standard sound card in which software can modify its frequencies. The second thing is phone manufactures are required to design their phones in a standard format. There are laws that force these companies to make it impossible to do any voice morphing. It is against the law to impersonate someone you are not, over any telephone network.
Hard way
You get the input voice, you use voice recognition to detect the words, then you use speech-to-text with your desired voice as output.
Less hard way
Sound processing: Changing frequencies, amplitude etc.

speech recognition reduce possible search results

I have started with Speech recognition using android, sl4a and python and so far, it works fine.
My user is just supposed to input numbers between 0 and 9 with his voice. Is there a way to tell android to only search in those number and therefore reduce the time of recognition (and probably errors) ?
No. You cannot change what google returns. You can only process the results.
Fortunately, you can process the results to increase the chance of a match.
For example, you could use a phonetic matching algorithm like
Soundex
Using Soundex or something similar, if the recognizer hears something like "true" your code could still recognize it as the number 2.

offline voice recognization for voice calculator

i am building an application "Voice Calculator" which takes input as a voice and display result based on the input.
i dont want to use a google servers for voice recognization, is there any way through i can achive my goal.
i want to take input as " two plus three multiply four hundred twenty two minus one hundred" etc. so i would like to record and compare every words,
that can be converted in to text and which can be used to perform calculation.
can any one guide me , how to achive this? i am done with designing calculator with its functionality,
i hope i am able to explain my doubt, looking for help.. thank u..
I have used Google API for voice recognition, although I wanted an off-line version, I need to rely on voice recognition.
Have a look at Voice Recognization for android example.
Granted that not many device support it yet, but Jelly Bean will allow you to download Google's voice controls to the device for offline use.

Categories

Resources