I am currently working with a Pepper robot (academic version and the QiSDK and NaoQi 2.9). Since I am using the academic version I can't use the cloud based automatic speech recognition service from Softbank which is not included and therefore e.g. I can't use wildcards or other chatbot engines besides QiChat.
Does anybody of you know how I can implement my own speech recognition service for Pepper? I can't find where I can get access to the audiostream of Pepper's microphones.
I've read the documentation from Softbank:
https://developer.softbankrobotics.com/pepper-qisdk
and
https://qisdk.softbankrobotics.com/sdk/doc/pepper-sdk/ch4_api/conversation/reference/basechatbot.html
And I've tried to create a SpeechRecognizer based on Android, which works, but uses the Tablets microphone and not Peppers.
Remote Speech Recognition is a service that you will need to by on top if it was not included with your original Pepper offer!
Regards,
Jonas
I was also curious and contacted the softbank support.
Summary:
With version 2.9. you have no access to the head microphones and can only access the tablet mic.
Related
Is it possible to integrate external TTS engine with Pepper Robot?
I want to integrate Third party Speech engine with pepper robot. Please guide me on the same.
You can integrate an external TTS engine with Pepper. Either offboard (like the services offered by IBM, MS Azure or Google) or onboard (ideal would be something in Java or Kotlin for Android Pepper, but anything is possible). If you have a specific technology in mind, please provide more details and we can give you a more precise answer.
Bear in mind that this may introduce latency in speech synthesising compared to the default text to speech engine.
Edit - sorry, I missed your Android tag. The below mentioned APIs only work on Pepper 2.5 (Choregraphe Pepper)
Alternatively, there are a number of different voices available on Pepper, perhaps one will suit your needs. Use the naoqi API function ALTextToSpeech.getAvailableVoices to list the different voice options, then ALTextToSpeech.setVoice to set the voice to one of those options.
I am new to speech recognition, android and i have a use case where i need to build an android app which takes commands(limited set of commands, less than 100) from users and executes some logic. I have googled a bit and found the following can be done
Use google cloud speech api
Use Android inbuilt speech to text capability (Is it different from google cloud speech api? If so how?). Also what are the pros and cons of using offline mode of android speech to text?
Use open source speech recognition libraries like Kaldi, CMU Sphinx(it looked like they need a lot of effort in collecting and training the data)
Can someone please suggest me which of the above might best suit my use case?
I have a limited set of commands and speed matters the most to me.
I am really confused and thus putting this question. Thanks in advance.
Use google cloud speech api
Very expensive since you have to pay for every request.
Use Android inbuilt speech to text capability (Is it different from google cloud speech api? If so how?). Also what are the pros and cons of using offline mode of android speech to text?
The inbuilt API is ok to use. It is different from cloud API and it is free. It does not work offline transparently for the user though. Bad side it is slow and you can not configure the vocabulary. So it will decode all words instead of some particular set of commands and often will confuse the required commands with other words in noise.
Use open source speech recognition libraries like Kaldi, CMU Sphinx(it looked like they need a lot of effort in collecting and training the data)
Proper development is always an effort.
i am trying to build a game with unity which have Arabic speech recognition to be used on android devices. I am stuck at the speech recognition tool. which one is more suitable with unity ? can i use google API directly ? if yes can i control the data ?
or sphinx is more suitable for game on android device ? I have read about sphinx but the acoustic model of Arabic language was not available but for it.
I have tried to make it using MATLAB but it will cost me a lot to build it on mobile using MATLAB coder and also it does not have all the required libraries.
any help about this ?
Official unity plugin for pocketsphinx is here:
https://github.com/cmusphinx/pocketsphinx-unity-demo
Arabic model has to be trained, yes. In the end you could have a good system, but you have to invest enough into it.
Google has recently made great progress with their speech recognition software, which is used in several open source products, e.g. Chromium Web Speech and Android Handsfree texting. I would like to use their speech recognition as part of my server stack, however I can't find much about it.
Is the text recognition software available as a library or package? Or alternatively, can I call chromium from another program to transcribe some audio file to text?
The Web Speech API's are designed only to be used in the context of either Chrome or Android. There is a lot of work that goes on in the client so there is no public server to server API that would just take an audio file and process it.
If you search github you find tools such as https://gist.github.com/alotaiba/1730160 but I am pretty certain that this method of access is 100% not supported, endorsed or confirmed to keep working.
The method previously stated at https://gist.github.com/alotaiba/1730160 does work for me. I use it on a daily basis in my home automation programs. I use a python script to capture audio and determine what is useful audio or just noise, then it sends the little audio snippet to google and returns the text all under a second!! I have successfully integrated it into my programs and if you google around you will find even more people that have as well!
I'm writing an android app and I want to use the android OS voice recognition. How do I incorporate it into the application?
Have you seen the API Demo, "VoiceRecognition" - http://developer.android.com/resources/samples/ApiDemos/src/com/example/android/apis/app/VoiceRecognition.html
That should give you a head start.