I use Google Speech Recognizer for Android that I do not want to replace.
It produces text, which I want to interpret using my own grammar.
I've checked some tools (like sphinx), but all of them seem to require using their recognizer in order to decode text, but I don't want to use their recognizer.
Do you know some tool that can process a given text using a grammar? Or perhaps how can i use sphinx w/o their recognizer?
Thanks
Grammars are equivalent to finite deterministic automata parsing and those are equivalent to regular expressions. So instead of external frameworks you can use Java's regular expressions for parsing the text:
http://developer.android.com/reference/java/util/regex/Pattern.html
If you want named regexps you can consider
https://code.google.com/p/named-regexp/
If you want more complex parsing with semantics and intent extraction which goes beyond automata capabilities, you can find corresponding packages in opennlp
If you want to recognize grammars, you can try CMUSphinx, it's significantly more accurate on constrained recognition.
Related
I know that android already have an own recognizing voice API but it uses internet, I would like to have like a compiled library with a few voice commands that I could use offline, is that possible?
Yes it is possible you would have to train a model to convert speech to commands.
A simple approach could be to break speech at intervals where the amplitude is minimum, this will give you the words in the speech, match these words with your model with some probability if the matched probability is higher than some threshold value your model can then help execute the commands related to that word. You will have to ship this model within your application. i would recommend to update the model as you train your model further.
I want to develop a complex-script IME. But I am not quite sure about the respective functionality of IME and underlying render-er. I think KitKat is using Harfbuzz-ng. For complex scripts, the mapping isn't linear like English. The characters need to be rearranged/display differently as you input.
Assumptions: The language is displayed properly on the device. e.g. you can read news etc. Android version: KitKat
So my questions are,
Is the reordering the job of the IME or the underlyging engine?
Is IME only responsible for feeding unicode points to the system and then
the render-er would do the rearrangement?
Please point me into some readings about this topic.
Yes to your both questions. IME should just feed Unicode characters in their logical order to the underlying system, the final rendering (AKA shaping) will be done by the text layout engine.
Doing some research I have found some different speech to text API's for Android.
Pocket Sphinx
Android Native API
I have the following requirements:
Must be able to support offline speech recognition (I'm not sure
if the Android API can do this)
Must be able to detect and
respond immediately to every word said. I would rather this than
detecting an entire sentence. I could split the returned sentence
into and array though and get each word.
The detection needs to
be processing in the backgound (no popups or anything as the Android
API seems to do)
Can someone recommend an API that is capable of my requirements.
Pocketsphinx meets all your requirements. What you call the "Android Native API" is basically a set of interface definitions and it does not contain the notion of offline/online.
You can also implement these interfaces using Pocketsphinx, since it supports things like partial results, confidence scores, n-best results etc. This way the implementation becomes available to any Android app. Maybe somebody has done it already, but I'm not aware of it.
I need to use speech input to insert text. How can i detect keyword when I'm speaking ?
Can i do this with Android Speech Input or I need external library ?
Any ideas ?
Thanks
Keyword detection task is different from a speech recognition task. While second tries to understand the text being spoken and check all possible word combinations, keyword spotting usually check two hypothesis - word is here or garbage is here. Its way more efficient to check keyword presence but it requires custom algorithm. You can implement one with the open source speech recognition toolkit like CMUSphinx.
http://cmusphinx.sourceforge.net
Which runs on Android too, you can check
Voice command keyword listener in Android
to see how to integrate it.
Absolutely.
See this for some code that detects the "magic word"
Just launch an Intent with ACTION_RECOGNIZE_SPEECH and then check the results for your keyword. Checking for the keyword can be complicated, but this code should get you started.
https://github.com/gmilette/Say-the-Magic-Word-
I used the Snowboy library for this task
Website: https://snowboy.kitt.ai
Github: https://github.com/kitt-ai/snowboy
It is a C library but it can be included in Android code using the JNI. The only downside to it is that you have to train it with audio samples if you want to use another keyword than the ones that come with the library.
I would like to integrate speech recognition into my Android application.
I am aware google provides two language models (free form for dictation and web search for short phrases).
However, my app will have a finite number of possible words (maybe a few thousand). Is it possible to specify the vocabularly; limiting it to these words, in the hope of achieving more accurate results?
My immediate thoughts would be to use the web search language model and then check the results of this against my vocabulary.
Any thoughts appreciated.
I think your intuition is correct and you've answered your own question.
The built in speech recognition provided by google only supports the dictation and search language models. See http://developer.android.com/reference/android/speech/RecognizerIntent.html
You can get back results using these recognizer models and then classify or filter the results to find what best matches your limited vocabulary. There are different techniques to do this and they can range from simple parsing to complex statistical models.
The only other alternative I've seen is to use some other speech recognition on a server that can accept your dedicated language model. Though this is costly and complex and used by commercial speech companies like VLingo or Dragon or Microsoft's Bing.
You can use Opensource models like Voxforge or cheap ones like Lumenvox.
Some have been ported to android. I forgot by whom.
I answered pretty much the same question before - please check here: Building openears compatible language model
and here:
typically you need very large text corpora to generate useful language models.
If you just have a small amount of training data, your language model will be over-fitted, which means that it will not generalize.