Ignore background music when using google voice recognition - android

I'm trying to make an alarm clock Android app that could be stopped with voice recognition.
For that, I'm using the Google Speech Recognition API (+ this code to do voice recognition continuously).
It works fine, until I play music at the same time. The voice recognition becomes way less efficient in this case.
This problem is logical, since the music adds some noise which makes recognition harder. But since the music played is known, I was wondering if it was possible to tell Google to try to ignore these additional noise. I know there exists some filter in signal processing to do that (like Kalman filter or Wiener filter).
So my question is:
Is it possible to apply a filter with Google voice recognition to ignore a known noise? Or is there another voice recognition library that allows that?
Edit: It's not a duplicate, since the problem is not the same. But interesting suggestion though.

Google Voice Recognition will already be optimised to detect speech, regardless of any background ambient noise 'type'.
Rather than using Google's native Voice Recognition, supplied via their 'Now/Assistant' application, you can use their Cloud Speech API which offers some enhancements.
The recognizer is designed to ignore background voices and noise
without additional noise-canceling. However, for optimal results,
position the microphone as close to the user as possible, particularly
when background noise is present.
The above is no doubt true generally across their Voice Recognition System.
Use word and phrase hints to add names and terms to the vocabulary and
to boost the accuracy for specific words and phrases.
For short queries or commands, use StreamingRecognize with
single_utterance set to true. This optimizes the recognition for short
utterances and also minimizes latency.
https://cloud.google.com/speech/docs/best-practices

Related

Voice Recognition with an audio file?

I am currently working on an app that would require recording the audio within my app and then sending the clip to google for transcription.
Is there any way I can send an audio clip to be processed with speech to text?
Or is there any other way other than this to convert that recording to text ?
Google's Voice To Text API is not available publicly at the moment and there's no announcement on where it could become available. On Android you can use system voice recognition feature, but it will only transcribe what it records by itself and your won't be able to feed it with any audio file for processing.
As for now, you either need to use other services like AT&T's, IBM's Watson, Dragon Dictation (all are on-line) or maybe consider including Sphinx CMU into your app if you absolutely demand off-line solution.

How does Google Keep do Speech Recognition while saving the audio recording at the same time?

Android's SpeechRecognizer apparently doesn't allow to record the input on which you're doing speech recognition into an audio file.
That is, either you record voice using a MediaRecorder (or AudioRecord for that matter) or you do Speech Recognition with a SpeechRecognizer, in which case the audio isn't recorded into a file (at least not one you can access); but you can't do both at the same time.
The question of how to achieve recording audio and doing speech recognition at the same time in Android has been asked several times, and the most popular "solution" is to record a flac file and use Google's unofficial Speech API which allows you to send a flac file via a POST request and obtain a json response with the transcription.
http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ (outdated Android version)
https://github.com/katchsvartanian/voiceRecognition/tree/master/VoiceRecognition
http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
That works pretty well but has a huge limitation which is it can't be used with files longer than about 10-15 seconds (the exact limit is not clear and may depend on file size or perhaps the amount of words). This makes it not suitable for my needs.
Also, slicing the audio file into smaller files is NOT a possible solution; even forgetting about the difficulties in properly splitting the file at the right positions (not in the middle of a word), many consecutive requests to the abovementioned web service api will randomly result in empty responses (Google says there's a usage limit of 50 requests per day, but as usual they don't disclose the details of the real usage limits which clearly restrict bursts of requests).
So, all this would seem to indicate that getting a transcription of speech while at the same time recording the input into an audio file in Android is IMPOSSIBLE.
HOWEVER, the Google Keep Android app does exactly that.
It allows you to speak, transcrbes what you said into text, and saves both the text and the audio recording (well it's not clear where it stores it, but you can replay it).
And it has no length limitation.
So the question is: DOES ANYBODY HAVE AN IDEA OF HOW GOOGLE KEEP DOES IT?
I would look at the source code but it doesn't seem to be available, is it?
I sniffed the packets Google Keep sends and receives while doing speech recognition, and it definitely does NOT use the speech api mentioned above. All the traffic is TLS and (from the outside) it looks pretty much the same as when you're using SpeechRecognizer.
So does perhaps a way exist to kind of "split" (i.e. duplicate, or multiplex) the microphone input stream into two streams, and feed one of them to a SpeechRecognizer and the other to a MediaRecorder?
Google Keep launches RecognizerIntent with certain undocumented extras and expects the resulting intent to contain the URI of the recorded audio. If RecognizerIntent is serviced by Google Voice Search then it all works out and Keep gets the audio.
See record/save audio from voice recognition intent for more information and a code sample that calls the recognizer in the same way as Keep (probably) does.
Note that this behavior is not part of Android. It's simply the current undocumented way of how two closed-source Google apps communicate with each other.
It uses onPartialResults(Bundle)
This event returns text recognized from recorded speech while it's still recording
It's also available on Xamarin

Voice speech recognition android remove google screen and microphone button

I'm using voice speech recognition from Android API.
I followed successfully this tutorial:
http://code4reference.com/2012/07/tutorial-android-voice-recognition/#comment-335
But I've an unsolved question.
How cain I remove the screen that appear when calling RecognizerIntent??
I'm talking about removing google and microphone button that advice that I'm using speech recognition.
I need to remove this small screen because I need to do other things on screen while that is recognizeing my voice.
You need to write a service class that call createSpeechRecognizer. You can get an idea of how to do it in my answer at Android Speech Recognition as a service on Android 4.1 & 4.2
You'd need to work at a lower level than this example. What this example does is launch an app that does voice recognition for you and sends you the results. That app is drawing the UI, and you can't stop it. What you'd need to do is write an app against the service that actually does the voice recognition (basicly, exactly what that intent is doing). You can probably find an example of this in the Android keyboard code, as they provide a custom UI against Google voice.
You cannot do this. The screen is not displayed by your app, but is displayed by the voice recognition API instead, and you cannot control it.
In any case, that screen is a standard for voice recognition on the device, and users are familiar with it. It would be something of an anti pattern to remove it and conduct voice recognition. With the screen there, users will know that voice recognition is active, and that the microphone is picking up sound as it provides that feedback.
Use SpeechRecognizer. The Intent mechanism is similar.
For convienience use the recognizeDirectly method in this helper class

What format is speech sent to cloud with Android speech recognition?

I'm building an app that includes speech recognition - I intend to use the Android speech recognition service or the voice typing functionality.
From what I have read, the speech is mostly processed in the cloud. The question I have is whether anyone knows what format the audio is sent to the cloud in? For example, is something like WAV or MP3 or PCM, or is it likely to be something else entirely?
I admit this is mostly out of plain curiosity to know a bit more of what is going on behind the scenes. (But partly it also relates to an interest in the impacts of pre and post processing on recognition.)
Well , I've been looking for that info too , and the closest thing I could get to was the Google's speech recognition API for chrome which used FLAC audio codec. I'm not sure if android uses it too, but it is the closest thing I ever get.

android voice recognition

I am aware of android's voice recognition capabilities but I have seen some applications like vligo for example that are able to detect a given sentence, how is this possible? as far as I know you can't set up android's api to do that.

Categories

Resources