I'm writing an android app that lets user record his voice through microphone & save it in storage & link it to a specific content (like a Contact). Later, user call that voice again & the app should compare it with saved audio files & find the one that matches the voice.
I searched a lot & found some libraries that do this online, like EchoPrint that generates fingerprint from recorded audio & sends it to opensource server & returns the result. But I need to do this offline.
Has anybody know such library?
If you are aiming to compare an old recording of a user with a new call as it comes in, audio fingerprinting solutions like Dejavu in Python on a server or Echoprint in C++ won't help you. They are for doing recognition and retrieval on recorded audio segments plus noise.
They cannot deal with the variabilites in human voice. See an explanation here.
If that's the case, what you are referring to is speaker recognition, which is much harder and involves quite a bit of machine learning. It would be tough to do this for a large corpus of users (especially offline on a phone), but for determining between a couple users, it might be doable.
Below is a good Library. Which is Easy to use. But you need to convert your Audio Files to Wave Format prior to this.
https://code.google.com/p/musicg/
Related
I'm a biomedical engineering student and I'm doing my Final project on an assistive app for people with sensory issues. I'm also unfortunately not familiar with android apps coding (I have done some C and C++ plus a tiny bit of front-end on JS).
Long story short, I'm looking to analyze every single audio that will be played to the user (through headphones or peripheral speakers) before they are actually played to the user (could be a video playing from any app, games, music etc.) and filter out certain groups of frequencies from it that could cause overload or overstimulation. (the frequencies will be guessed through questions answered by user, the goal is to drastically decrease these frequencies amplitude).
I have found open-source code for the frequencies analyzer both in kotlin and java.
Now my issue is: can I have access to and manipulate other apps audio output? (I have found that the audiostream can be paused or prioritized through audio focus but didn't find an answer for this specific need)
sorry it got long and I thank everyone in advance!
Android's SpeechRecognizer apparently doesn't allow to record the input on which you're doing speech recognition into an audio file.
That is, either you record voice using a MediaRecorder (or AudioRecord for that matter) or you do Speech Recognition with a SpeechRecognizer, in which case the audio isn't recorded into a file (at least not one you can access); but you can't do both at the same time.
The question of how to achieve recording audio and doing speech recognition at the same time in Android has been asked several times, and the most popular "solution" is to record a flac file and use Google's unofficial Speech API which allows you to send a flac file via a POST request and obtain a json response with the transcription.
http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ (outdated Android version)
https://github.com/katchsvartanian/voiceRecognition/tree/master/VoiceRecognition
http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
That works pretty well but has a huge limitation which is it can't be used with files longer than about 10-15 seconds (the exact limit is not clear and may depend on file size or perhaps the amount of words). This makes it not suitable for my needs.
Also, slicing the audio file into smaller files is NOT a possible solution; even forgetting about the difficulties in properly splitting the file at the right positions (not in the middle of a word), many consecutive requests to the abovementioned web service api will randomly result in empty responses (Google says there's a usage limit of 50 requests per day, but as usual they don't disclose the details of the real usage limits which clearly restrict bursts of requests).
So, all this would seem to indicate that getting a transcription of speech while at the same time recording the input into an audio file in Android is IMPOSSIBLE.
HOWEVER, the Google Keep Android app does exactly that.
It allows you to speak, transcrbes what you said into text, and saves both the text and the audio recording (well it's not clear where it stores it, but you can replay it).
And it has no length limitation.
So the question is: DOES ANYBODY HAVE AN IDEA OF HOW GOOGLE KEEP DOES IT?
I would look at the source code but it doesn't seem to be available, is it?
I sniffed the packets Google Keep sends and receives while doing speech recognition, and it definitely does NOT use the speech api mentioned above. All the traffic is TLS and (from the outside) it looks pretty much the same as when you're using SpeechRecognizer.
So does perhaps a way exist to kind of "split" (i.e. duplicate, or multiplex) the microphone input stream into two streams, and feed one of them to a SpeechRecognizer and the other to a MediaRecorder?
Google Keep launches RecognizerIntent with certain undocumented extras and expects the resulting intent to contain the URI of the recorded audio. If RecognizerIntent is serviced by Google Voice Search then it all works out and Keep gets the audio.
See record/save audio from voice recognition intent for more information and a code sample that calls the recognizer in the same way as Keep (probably) does.
Note that this behavior is not part of Android. It's simply the current undocumented way of how two closed-source Google apps communicate with each other.
It uses onPartialResults(Bundle)
This event returns text recognized from recorded speech while it's still recording
It's also available on Xamarin
I want to write an app on Android to record snoring sounds of a sleeper and analyze it afterwards (i.e., not in real-time) for signs of a medical condition called obstructive sleep apnea.
The Android devices I've experimented with have voice recorders that produce a file format called .3ga. I want to programmatically read in the audio file and look at the amplitude for each individual time-sample. Then I can analyze that for patterns. Would this be easier if I converted this to a different format, e.g., MP3, and if so how can I do that programmatically?
I did a Google search on this and most of the hits seemed to be related to audio recording or playback which are unrelated to what I'm trying to do. I haven't coded anything yet because I don't know how to get started.
You are looking to do sample-based analysis on a raw audio signal, but the formats you mention are compressed. You will need to either deal with raw samples directly, or decompress the audio and then analyze.
Since you said you can do this work after-the-fact, why not upload to a server and analyze there?
I have create an app for android that creates and stores a single tone, then plays it back utilizing android audio track class. Here's the issue: on my phone I can only play tones up to a frequency of about 11kHz, and on a virtual phone run from my PC (same exact code) I can get frequencies up to about 14kHz. What could cause this cutoff?
Using a tone generator app from the market, my phone can produce up to 20kHz signals, so I know it is not a hardware issue.
Thanks.
It might help if you provide some of the code for how you're generating the tone.
For audio stuff, you should go here http://music.columbia.edu/mailman/listinfo/andraudio and signup then ask there. There's a great community of Android developers for that list all dedicated to audio development.
Also, self-promotion, I run a forum website (relatively new and needing updates) and I plan to add an Android Audio forum on it once I get enough interested folks. If you're interested, sign up here
Is it possible to read an audio stream during (GSM) phone call? I would like to write an encoding application, and I do not want to go with SIP&VoIP. Thank you.
This will be phone and OS dependent and there are several apps that claim they record audio (Total Recall, Record my call on Android) but they generally seem to record via the microphone meaning the far end sound is poor.
I don't believe either the apple or android api's support access to the raw voice stream today.
Something to be aware of also is that it is not always legal to do this without informing the other party (i.e. the person on the other end of the call that you are planning to 'capture' the voice stream somehow) in many places - this may not be relevant for your particular plans but worth mentioning anyway.
If you have the option of doing the work in the network or on a PABX then you can create a basic (if not very efficient) solution by simply creating a three way (or conference) call.