I am working on an android project for controlling an arduino robot using speech recognition. i wanted an offline speech recognition unit to recognise only a few words. so thought of implementing audio fingerprinting for the purpose. so is there anyway i can use ths to recognise a few simple words.???
What you need to implement is more related to audio recognition/classification. You will not get what you want using audio fingerprinting.
Lets say you have 5 words, you need to record these words (as many times as possible and pronounced by different people if possible). Then you need to extract audio features (such as MFCC) from these recordings and to train a classifier (such as a SVM) with 5 classes (one for each word).
Related
This question is to help the "Hard of hearing community" so that they can READ the phone/mobile call because they can not hear it.
Android 11 provides an API "AudioPlaybackCaptureConfiguration". This API gives apps the ability to copy the audio being played by other apps.
Google also implemented the same on Pixel mobiles as shownn here - https://www.youtube.com/watch?v=7hb3p8LZIq8 . But it has few limitations -
It supports only english language, How to enable support for the regional language
The current implementation translates voice to text using a local mobile engine i.e. voice is not going to google server(all the processing is happening offline in mobile itself), so accuracy is also low.
After seeing a lot of posts here it seems developers are facing issue while implementing the same to capture the caller voice and then transcibe it due to some restriction by Google.
How to record internal audio on Android devices or record MediaPlayer Audio Stream?
Is there anyway to capture the caller voice (https://developer.android.com/guide/topics/media/playback-capture#allowing_playback_capture) ? Like in the youtube video I shared above, Google must be capturing caller voice and its offline engine is processing that voice and converting it to text. So can we capture caller voice using some way and then send that voice to some server API or to Google Live Transcribe app (or whatever it is) for better accuracy and then the converted text will be displayed on the screen (as per user choice of language).
I am also a developer though not a mobile one. So some terminology may be wrong , please excuse it and provide your suggestion.
Can we modify the Android source code itself according to our requirement and remove that limitation so that we can achieve what we want to do even if it require to build custom Android OS ?
I'm writing an android app that lets user record his voice through microphone & save it in storage & link it to a specific content (like a Contact). Later, user call that voice again & the app should compare it with saved audio files & find the one that matches the voice.
I searched a lot & found some libraries that do this online, like EchoPrint that generates fingerprint from recorded audio & sends it to opensource server & returns the result. But I need to do this offline.
Has anybody know such library?
If you are aiming to compare an old recording of a user with a new call as it comes in, audio fingerprinting solutions like Dejavu in Python on a server or Echoprint in C++ won't help you. They are for doing recognition and retrieval on recorded audio segments plus noise.
They cannot deal with the variabilites in human voice. See an explanation here.
If that's the case, what you are referring to is speaker recognition, which is much harder and involves quite a bit of machine learning. It would be tough to do this for a large corpus of users (especially offline on a phone), but for determining between a couple users, it might be doable.
Below is a good Library. Which is Easy to use. But you need to convert your Audio Files to Wave Format prior to this.
https://code.google.com/p/musicg/
Android's SpeechRecognizer apparently doesn't allow to record the input on which you're doing speech recognition into an audio file.
That is, either you record voice using a MediaRecorder (or AudioRecord for that matter) or you do Speech Recognition with a SpeechRecognizer, in which case the audio isn't recorded into a file (at least not one you can access); but you can't do both at the same time.
The question of how to achieve recording audio and doing speech recognition at the same time in Android has been asked several times, and the most popular "solution" is to record a flac file and use Google's unofficial Speech API which allows you to send a flac file via a POST request and obtain a json response with the transcription.
http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ (outdated Android version)
https://github.com/katchsvartanian/voiceRecognition/tree/master/VoiceRecognition
http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
That works pretty well but has a huge limitation which is it can't be used with files longer than about 10-15 seconds (the exact limit is not clear and may depend on file size or perhaps the amount of words). This makes it not suitable for my needs.
Also, slicing the audio file into smaller files is NOT a possible solution; even forgetting about the difficulties in properly splitting the file at the right positions (not in the middle of a word), many consecutive requests to the abovementioned web service api will randomly result in empty responses (Google says there's a usage limit of 50 requests per day, but as usual they don't disclose the details of the real usage limits which clearly restrict bursts of requests).
So, all this would seem to indicate that getting a transcription of speech while at the same time recording the input into an audio file in Android is IMPOSSIBLE.
HOWEVER, the Google Keep Android app does exactly that.
It allows you to speak, transcrbes what you said into text, and saves both the text and the audio recording (well it's not clear where it stores it, but you can replay it).
And it has no length limitation.
So the question is: DOES ANYBODY HAVE AN IDEA OF HOW GOOGLE KEEP DOES IT?
I would look at the source code but it doesn't seem to be available, is it?
I sniffed the packets Google Keep sends and receives while doing speech recognition, and it definitely does NOT use the speech api mentioned above. All the traffic is TLS and (from the outside) it looks pretty much the same as when you're using SpeechRecognizer.
So does perhaps a way exist to kind of "split" (i.e. duplicate, or multiplex) the microphone input stream into two streams, and feed one of them to a SpeechRecognizer and the other to a MediaRecorder?
Google Keep launches RecognizerIntent with certain undocumented extras and expects the resulting intent to contain the URI of the recorded audio. If RecognizerIntent is serviced by Google Voice Search then it all works out and Keep gets the audio.
See record/save audio from voice recognition intent for more information and a code sample that calls the recognizer in the same way as Keep (probably) does.
Note that this behavior is not part of Android. It's simply the current undocumented way of how two closed-source Google apps communicate with each other.
It uses onPartialResults(Bundle)
This event returns text recognized from recorded speech while it's still recording
It's also available on Xamarin
I am working on an application. Here I want to read various sounds using Android Application. I know how to record in Android. Now what I want to do is to read a sound and based on its frequency I want to display it on Android Screen. So how can I read the frequency in Hz or KHz?
You will need to perform a discrete Fourier transform on your recorded audio samples. You can write code yourself or use a library for it. Unfortunately I have no idea which FFT libraries exist for Java, but I am sure you can google that. I found two in 2 minutes:
http://www.ee.ucl.ac.uk/~mflanaga/java/FourierTransform.html
https://sites.google.com/site/piotrwendykier/software/jtransforms
I created an android game using gl. I want to create a story mode for the game. In the story mode obviously there is a story.. I don't want to use my voice to narrate the story and the characters' voices and I don't want to get someone speak instead. Is there any program that i can generate voices and use Text-To-Speech or something like that? I don't want the robot voice that people usually use for videos and stuff. I want to actually create the voice and write text that it will read and also to record it. Is there something like that?
Android does have a built-in TTS engine:
http://developer.android.com/resources/articles/tts.html
However, the results probably won't be up to scratch for narration -- machine TTS is still kind of weak, especially if you're trying to convey any emotion. It isn't likely to work as a substitute for real voice actors.