I googled around and found the regular speech-api from google. But I think this isn't what I need. I need continious voice recognition and the ability to launch other actions when a specific word is spoken. Is there anything in the android sdk that I can use?
If not: Is it possible to implement third-party libraries? (If yes: which - and what do I have to think about when implement a third-party-library?)
Edit: I thought about this again. I have to recognize just one 'word' (that probably won't be in googles-speech-databases). I have the chance to record it. That means, I'm able to continiously match the incoming audio-stream against my recording. That should work without a database. But I'm new to android-development. Do you have suggestions for APIs to use for recording and matching the recorded? Or is there any better way to continiously wait for a specifig 'word' to occur and then process any further actions?
btw: if that wasn't clear described: the app should continue to record and watch for the word to occure again when the reaction is done.
Is there anything in the android sdk that I can use?
No, sorry.
Related
I am building a speech recognition android app that will act as a virtual personal assistant with tasks such as:
Make appointments/Reminders
Weather Info
General queries to Wolfram|Alpha / Wikipedia - (i.e Who directed Ghostbusters, whats the £-$ Exchange rate)
My question is wheather to use Pocketsphinx or the Google API?
Originally I set this up with "android.speech.RecognitionListener", worked great, however I want to implement Keyword spotting so the user doesn't need to have any interaction other than just speaking.
Apparently Google API doesn't support this, so I looked into using pocketsphinx for this, and still using google for the rest of the app (As I heard pocketsphinx is not as accurate?)
However the two don't get along as they can't both occupy the microphone at the same time.
Is there a nice way to switch between recognizers? (cant even import both to same project)
Should I just go with pocketshinx and deal with the lower accuracy?
Suggestions would be helpful
Cheers
For anybody who wants to implement a similar project, I have found a work around. It's abit hacky and not entirely clean, but it works.
Using the android speech recognizer with a toggle on/off switch like in many examples across the web, when onResults comes back, the string will be checked for said "hotword", if it is not present, discard the string, if it is, process it. Once the query has been processed and the text to speech is responding, programatically reclick the toggle button, ensuring constant listening.
Do the same on "onError" as well.
I did also have it onPartialResults as well, but it appeared to make the thread crash, not entirely sure why but once it was removed everything seems to work nicely.
You can use pocketsphinx only to recognize predefined set of commands due to really poor accuracy (you should prepare your own dictionary and language model). Also pocketsphinx can be used offline and it is a big cons for some project.
In other hand google is very accurate but it's not free and works only online.
Scanned through the Android API (reference) documentation, but didn't find specific API that allows one to achieve the following:
Be notified of an incoming call
Automatically answer or reject the incoming call
While a call is in progress, be able to capture the audio
Play a pre-recorded message, after answering the call
The intention behind the questions, as most might have guessed, is to have an automated answering machine type of application. I have seen such applications on Nokia Symbian OS devices.
If such functionality requires rooting the device, I'd still be interested in knowing the API's available once rooted!
As an aside, is there are separate API reference documentation for API's available to rooted devices ?
For the latter parts of your question, No.
Imagine for a second there was, and you had an app installed that uses it. It could record your conversations and send them to a 3rd party. The app might not even disclose that it does this.
That sounds like it would be a huge security problem... Don't you agree?
It would appear I am mistaken about the call recording part - several apps available on google play (such as this, this, and this) does call recording, at least of the user making the call.
For #1, this is covered by marcin_j's answer
For #2, these SO answers show you can accept or reject a call programmatically.
For #3, I did a bit more detailed search on this, which reveals a related Stackoverflow question and answer, which provides info on recording audio (as per the above linked apps). Please keep in mind there are likely legal requirements around recording calls.
For #4 (playing a message to the caller), the only info I was able to find on this says it is not supported. It's hard to find much more info on this with so much clutter on search coming up with apps that are basically an audio version of caller id.
Most of these answers are on StackOverflow already; hopefully bringing it all together here helps you.
You can use android.intent.action.PHONE_STATE broadcast, and check TelephonyManager.CALL_STATE_RINGING state. Requires android.permission.READ_PHONE_STATE.
2/3. Dont think you can do this, at least not on non-rooted phones. Maybe someone else will give better answer.
We are working on an application similar to "Funny Call" on Google Play.
When user makes a call to another another contact, we would be intercepting the call and will add some effects to it and then this modified sound will reach the recipient.
I've searched for the solution to this problem and found out that many developers say Android does not support this.
Android API for call sound stream manipulation
Can the Android API be leveraged to modify the caller's voice during the call?
But, I would still like to know if its really not possible straight from the horse's mouth. I would like to know if there is any specific reason behind this.
Is there really no way to achieve this?
Can you please also tell me if there is any possibility of this being possible in near future?
Is there really no way to achieve this?
If you read through the comments on the app you cited, it would appear that they are doing VOIP, and that their servers are then actually placing the call, as that is why there are calling rates to different countries. I see no evidence that they are using the on-device telephony capability. You, of course, are welcome to supply such evidence, if you have any.
I have tried to code this with Android's included android.speech.SpeechRecognizer class with no success.
Basically, what I am trying to do is making my app constantly listen for one keyword that will fire an intent whenever the keyword is recognized. I know that this will use a lot of battery.
For example - you are talking with a person. Normal conversation. The phone is actively listening and recognizing every single said word and listening for the keyword.
Let's say the keyword is "cheese" in this instance.
Whenever you say "cheese," the application fires an intent that starts up another part of the app.
I have tried to use speech recognition as a service but things didn't really go as planned. Maybe I did a mistake, I don't know.
I've been trying to accomplish this for 2 days in a row now, for more than 24 hours work time combined. If I am being too broad or infringing any of SO's rules, I sincerely apologize and ask my question to be deleted.
My question is - how would this be possible? Of course the SpeechRecognition that is included with android itself would be preferable, but it definitely will be a hassle because it is not even designed to work for extended periods.
from my research, there is no way to do this using the standard google voice recognition server. They way it works is once sound/word is recognized, the recognizer returns a list of what it thinks it heard with an associated confidence score.
to do what you are asking, you would:
have to keep re-activating the recognition service every time it fired a recognition event, until it matches the word you want.
your app would have to 'keep-awake' the recognition service. you could do this by creating a service that periodically wakes up your handset and resuming the service/activity.
I would not recommend either of these options considering that the battery life is really reduces by the voice recognition service being constantly on.
Unfortunately, I do not think there are any native Android APIs that will fully suit your needs. I would recommend checking out pocketsphinx.
It is a pretty robust speaker-independent speech recognition API from CMU that is more intended for tasks such as this. You can also check out a tutorial for getting started here.
Google has not made API support for "OK GOOGLE" public and left it on vendors to change or pass the support to consumers.
I think best bet at this time would be build source code yourself and then call the API's. As an example below google library has low level details of implementing recognizer. I'm not sure why google does not made it public.
I don't see an easy way to implement and test it.
http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/4.3_r2.1/android/speech/srec/Recognizer.java
I googled around and found the regular speech-api from google. But I think this isn't what I need. I need continious voice recognition and the ability to launch other actions when a specific word is spoken. Is there anything in the android sdk that I can use?
If not: Is it possible to implement third-party libraries? (If yes: which - and what do I have to think about when implement a third-party-library?)
Edit: I thought about this again. I have to recognize just one 'word' (that probably won't be in googles-speech-databases). I have the chance to record it. That means, I'm able to continiously match the incoming audio-stream against my recording. That should work without a database. But I'm new to android-development. Do you have suggestions for APIs to use for recording and matching the recorded? Or is there any better way to continiously wait for a specifig 'word' to occur and then process any further actions?
btw: if that wasn't clear described: the app should continue to record and watch for the word to occure again when the reaction is done.
Is there anything in the android sdk that I can use?
No, sorry.