I'm trying to see how gsm affects data on a phone call. Here is what I'm trying to do. One person will be talking on a phone and I will record his voice from phone's mic while he speaks and on the other phone I will get the data coming from gsm and compare them. I want to write an android application to get that data. Is that possible on android or can you suggest another way to achieve this?
Some background (you may know this already)...
When you make a GSM call, the analogue signal in the phone microphone corresponding to your speech is converted into a series of digital values and then encoded with a voice codec. This is basically a clever algorithm to capture as much of the speech as possible, in as little data as possible.
The idea is to maintain very good speech quality while saving on the amount of bandwidth needed for a call. Techniques used include not transmitting quite periods (when you are not speaking) and various compressions and predictive encoding algorithms. There have been and still are a number of codecs in use in GSM, but the latest and general preferred codec is called AMR-Narrowband.
Nearly all GSM deployments encrypt speech between the phone and the base station - while there are publicised weaknesses in the various encryption algorithms, I am assuming (hoping...) that decrypting is not what you are looking for.
Your question - 'I want to see that if there will be data loss or corruption when voice reaches over gsm'
Firstly, is it worth noting that speech is 'relatively' tolerant of small amounts of data loss and corruption, at least compared to data. It is quite common to have bursts of packet loss in VoIP networks and it may cause a temporary degradation in voice quality. Secondly, packet loss in a VoIP network will include delayed packets which can be confusing - if the packet arrives too late to be included in the 'sound' being played in the receivers speaker then it is effectively lost from the VoIP point of view, even though other measures may show that it simply arrived late.
To actually measure the loss between the GSM phone and the basestation you would need access to the data received at the basestation, which you will not usually have unless you are the operator.
If you do an end to end test, from one GSM to another, your speech path will traverse other network nodes also, so you will not know if any loss or corruption is happening over the GSM air interface or in one or more of the other nodes.
You would also need to be aware of handover from one cell to another and from 2G to 3G (GSM to UMTS) which may affect your tests (even stationary phones can handover in certain circumstances).
If your interest is purely academic then the easiest thing might be to create your own GSM base station and test on this - there exists several open source GSM 'network in a box' projects which should allow you do this. I have not used it myself, but this one looks the most actively supported at this time - check out the mailing list under the community tab for a good place to maybe follow up your investigations:
http://openbts.org
Related
I am an Android developer who is living with hearing impairment and I am currently exploring the option of making a speech to text app with Speech Recognizer API in Android. Closed-captioning telephones and Innocaption are not available in in my home country. Potential applications might be like captioning during telephone calls.
https://developer.android.com/reference/android/speech/SpeechRecognizer.html
The API is meant for capturing voice commands, not for real-time live transcribing. I am even able to implement it as a service but I constantly need to restart it after it has delivered a result or a partial result, which is not feasible in a conversational setting (words get lost while the service is restarting).
Do note that I don't need a 100% accuracy for this app. Many hearing impaired people find it helpful to have some context of the conversation to help them along. So I don't actually need comments about how this is not going to be accurate.
Is there a way to implement Speech Recognizer in a continuous mode? I can create a textview that constantly updates itself when new text is returned from the service. If this API is not what I should be looking at, is there any recommendation? I tested CMUSphinx but find that it is too dependent on blocks of phrases/sentences that it is not likely to work for the kind of application I have in mind.
I am a deaf software developer, so I can chime in. I've been monitoring the state of art of Speech-To-Text APIs, and the APIs have now become "good enough" to provide operatorless relay/captioning services for CERTAIN kinds of phone conversations with people using telephone in quiet settings. For example, I get 98% transcription accuracy with my spouse's voice with the Apple Siri realtime transcription (iOS 8).
I was able to jerryrig phone captioning by routing the sound out of one phone, to a 2nd iPhone that I press the microphone button (popup keyboard), and successfully captioned a telephone conversation with ~95% accuracy at 250 words per minute (faster than Sprint Captioned Telephone and Hamilton Captioned Telephone), at least until the 1 minute cutoff time.
Thusly, I declare computer-based voice recognition practical for phone calls with family members (of the type you call frequently in quiet environments), where you can at least coach them to move to a quiet place to allow captioning to work properly (with >95% accuracy). Since iOS 8 got released, we REALLY need this, so we don't need to rely on rely operators or captioning telephone. Sprint Captioned telephone lags badly during fast speech, while Apple Siri keeps up, so I can conduct more natural telephone conversations with my jerryrigged two-iOS-device Apple Siri "realtime Captioned Telephone" setup.
Some cellphones transmit audio in a higher-def manner, so it works well between two iPhones (iPhone speaker piped into another iPhone's Siri running in iOS8 continuous mode). That's assuming you're on G.722.2 (AMR-WB), like when running two iPhones on the same carrier that supports the high-def audio telephony standard. It works perfectly when piped through Siri -- roughly as good as doing it in front of the phone, for the same human voice (assuming the other end is speaking into the phone in a quiet environment).
Google and Apple needs to open up their speech-to-text APIs to assistive applications, pronto, because operatorless telephone transcription is finally now practical, at least when calling family members (good voices & coached to be in a quiet environment when receiving call). The continuous recognition time limit needs to also be removed during this situation, too.
Google is not going to work with telephone quality audio anyway, you need to work on captioning service using CMUSphinx yourself.
You probably didn't configure CMUSphinx properly, it should be ok for large vocabulary transcription, the only thing you should care about is to use telephony 8khz model, not wideband model and generic language model.
For the best accuracy it's probably worth to move processing on the server, you can setup the PBX to make the calls and transcribe audio there instead of hoping to do something on a limited device.
It is true that the SpeechRecognizer API documentation claims that
The implementation of this API is likely to stream audio to remote
servers to perform speech recognition. As such this API is not
intended to be used for continuous recognition, which would consume a
significant amount of battery and bandwidth.
This bit of text was added a year ago (https://android.googlesource.com/platform/frameworks/base/+/2921cee3048f7e64ba6645d50a1c1705ef9658f8). However, no changes were made to the API at the time, i.e. the API remained the same. Also, I don't really see anything specific to networking and battery drain in the API documentation. So, go ahead and implement a recognizer (maybe based on CMUSphinx) and make it accessible via this API.
I am working on one voice messaging application, I need to compare two voice like,
Register with app by record your voice
Sent voice message to
another user by record voice, but first need to compare this voice
to recorded voice in profile.
Its for security purpose and need to know recorded message is from specific user or not.
I tried :
Compare two sound in Android
http://www.dreamincode.net/forums/topic/274280-using-fft-to-compare-two-audio-files-and-then-realtime-comparison/
But not getting idea about voice Comparison.
Please share if anybody know about the same. Didn't find any sample to do this.
Since you indicated it's for security purpose, I'd like to first share a few things on voice biometry :-)
The problem with authenticating someone is that you'd need to be sure he was actually there saying the things that were recorded... and that's a whole different level of complexity than merely comparing voice characteristics.
Algorithms extracting voice features from a sample and later calculating the distance between a new sample and the first one can easily be fooled by a recording made up by an attacker.
Since in your case there's a human recipient, creating a message made up of chopped words or sentences from random conversations is actually quite difficult and time consuming. But not completely impossible...
There are very good sounding softwares created for the music industry that will e.g. take some voice audio input and make it sound (intonation and time wise) like a second audio sample (a guide, made by the fraudster). Vocalign Pro by SynchroArts does this to help get perfect backing vocal tracks. You could further tweak the audio by hand using other voice editing softwares and achieve an acceptable level of quality that wouldn't be immediately detected by the recipient.
Depending on what the attacker wants your user to say, the process complexity could range from an hour to a day provided he has all the recording material he wants...
To fight against this type of attack, you need to detect the audio sample has been edited. The digital edition will leave unnatural traces. E.g. in the background noise surrounding the voice.
AFAICT, only the best commercial softwares achieve this level security check, but I can't tell how far they go in the detection of such edits.
From a pure security perspective, you'd also need to be sure the device was not compromised. So these voice verification check should happen server side and not on the phone itself.
Please note these are general considerations and it all depends on what sort of security measures you actually need for your use case. My car alarm is certainly not unbreakable but it helps raising the bar so fewer attackers could potentially steal it...
Another thing to consider is that biometry is by definition a statistical process and it will yield a certain percentage of false positives and false negatives. By changing the acceptance threshold, you'll be able to lower one of them at the cost of raising the other.
Selecting an appropriate threshold will require you to have a fair amount of test data. Say 1 minute recording of at least 200 speakers to start getting a picture.
One more thing I think you'll need to consider is the inherent variability of the human voice. People may be sick which in some cases might render the voice unrecognizable. Also the emotional state might play a role: sadness or anger will yield different sounding voices...
And last but not least, the surrounding noise might pose a problem. Say the user enrolled while at home and later records a message while on the go in a busy city environment, the system might have troubles making sure it's actually the same person speaking. The signal to noise ratio is definitely going to be one of your main issues. Small tip: depending on the distance of the microphone to the mouth, the ratio will be quite different. You'll get way better result when the user puts the phone close to its face like in a regular phone conversation than when the user looks at the screen while recording the message.
Voice variability and signal to noise ratio are probably the main reasons behind false negative results.
Hopefully, you now have a better understanding of the challenges awaiting you and I can start sharing some pointers for open source and commercial libraries.
AFAIK, there are no open source libraries that includes fraudster detection...
You may want to check Nuance Communication for state-of-the-art. There are plenty other vendors, just check with Google, I only mentioned Nuance because of it's reputation.
There is an OSS library called Alize (written in C++, under LGPL license) which uses an algorithm called MFCC (Mel Frequency Cepstrum Coefficients). MFCC is known to bring excellent results. Expect a steep learning curve as this software is aimed at researchers willing to improve the state-of-the-art on this topic and the vocabulary used is very specific.
I wrote an OSS library named Recognito (Java, Apache 2.0) aimed at regular developers so you should be able to test it in a matter of minutes. The lib is very young and I first focused on it's API before improving the algorithms. The algorithm I use for the moment is called Linear Predictive Coding (LPC) and is known to bring good results (and I do have good results, provided recordings yield the same level of quality :-)). I'm currently in the process of releasing a new version including a likelihood coefficient in the match results. MFCC implementation is on the road map.
There is plenty of javadoc and the code should be very straightforward...
https://github.com/amaurycrickx/recognito
Recognito has a dependency on javax.sound packages for audio file handling. You may want to check this post for what it takes to use it in Android: Voice matching in android
Given many people need something for android, I'll do something about it in the near future instead of saying how one should modify the lib :-)
HTH
we have an app with mobile audio clients written in low-level OpenSL ES to achieve low-latency input from microphone. Than we are sending 10ms frames encapsulated in UDP datagram to server.
On server we are doing some post-processing which is curucially dependent on aan assumption that frames from mobile clients comes in fixed intervals (eg. 10ms per frame), so we can align them.
It seems that internal crystal frequencies on mobile phones can vary a lot and due to this, we are getting perfect alignment on the beggining but poor alignment after few minutes.
I know, that ALSA on Linux can tell you exact frequency of the crystal - so you can correct your counts based on this. Unfortunatelly I don't know how to get this information in Android.
Thx for help
The essence of the problem you face is that you have an ADC and a DAC on separate systems with different local oscillators. You're presumably timing your packets against a 3rd (and possibly 4th) CPU clock.
The correct solution to this problem is some kind of clock recovery algorithm. To do this properly you need some means of accurately timestamping (e.g. to bit accuracy) transmitted packets, and then use a PLL to drive the clock-rate of the receiver's sample clock. This is is precisely the approach that both IEEE1394 audio and MPEG2 Transport streams use.
Since probably can't do either of these things, your approach is most likely going to involve dropping or repeating samples (or even entire packets) periodically to keep your receive buffer from under- or over-flowing.
USB Audio has a similar lack of hardware support for clock recovery, and the approaches used there may be applicable to your situation.
Relying on the transmission and reception timing of network packets is a terrible idea. The jitter on delivery times is horrendous - particularly with Wifi or cellular connections. You'd be well advised to use not rely on it at all, and instead do as both IEEE1394 audio and MPEG 2 TS do, which is to decouple audio data transport from consumption using a model FIFO in which data is consumed at a constant rate and delivered to it in packets of unreliable timing.
As for ALSA, all it can do (unless if has an accurate external timing reference) is to measure the drift between the sample clock of the audio interface and the CPU's clock. This does not yield 'the exact frequency' of anything as neither oscillator is likely to be accurate, and both may drift dependent on temperature.
If someone wants to write a android application that interacts with a physical device, specifically a reader using mobiles audio jack
(e.g. Like how Square Inc is doing ) how is this done?
Is there a api's to interact with the reader and get the cards data?
When a company creates a reader (physical device) does it provide relevant apis?
Are the physical details abstracted from the application programmer?
I have found the AudioRecord class which can record magnetic stripe data from audio jack
But I can't fiqure out how to capture the actual card swipe event and
to extract the meaningful data from RAW DATA
Can any one help me with this
Any input is highly welcome!
The way this usually works is by encoding the data signal sent out by the device, like the card reader, in such a way that is can be decoded on the other end. Sound is a wave, and different amplitudes correspond to different loudness, and different frequencies correspond to different pitches. Imagine you have a sine wave, that varies between a high and a low frequency that are sufficiently different from each other so as to be easily distinguishable. The device sending out binary data (0's and 1's) can translate this data into an audio signal that varies by frequency (an alternative is varying amplitude). The receiver, in this case the mobile device, decodes the signal back into 0's and 1's. This is called "Frequency-shift-keying" (check out more here: http://en.wikipedia.org/wiki/Frequency-shift_keying).
The simplest way to implement this is to try and find an open library that already does it. The device sending the data will also need to contain some kind of microcontroller that can perform the initial modulation. If you come across any good libraries, let me know, because I'm currently
looking.
To answer your question, companies do not generally provide APIs etc to perform this.
This may seem like a lot of extra work to convert a digital signal, into an audio signal, and back, and you're right. However, every mobile device has essentially the same headphone jack, whereas the USB port on an Android is drastically different from an iPhone's lighting connector, or the connector in previous iPhones. Sending wirelessly through a network or Bluetooth is also an option, but they have their disadvantages as well.
Now the mobile device must be using a special headphone jack that supports microphones, otherwise it cannot receive input, it can only output sound. Most smartphones can do this.
Radios work on this principle (FM = Frequency modulation, AM = amplitude modulation).
Old dial up modems used FSK, which is why you heard those weird noises each time it connected.
Hope that helps!
I'm looking in to making a pH tester for my Android phone. I've found a pH electrode that will send a milliVolt signal which I can then use to convert into a pH reading (59.2 mV per pH unit # 25° C). The question I'm having is would it be possible to connect the electrode to the headphone jack and directly read the milliVolt reading or would I need to convert the analog signal to digital first and then plug it in via USB? I'm not a big electronics guy but I'm doing this project on the side and hoping to learn from it.
I was thinking perhaps getting the mV reading from the headphone jack would be possible with the GetMaxAmplitude function like from this thread here: Range of values for GetMaxAmplitude. Although, from what I understand the lowest reading possible with this function is 0 and there are negative mV values that can be read when testing for pH.
Any help is greatly appreciated, thanks!
This should be asked in the electrical engineering site. But the best way is to use a Bluetooth-to-serial converter, ($5 off ebay) and a PIC microcontroller with USART and A/D converter, ($1), you could program the PIC quite easily in C with the 'MPLAB' IDE and 'HI-TECH' C compiler. The tools you'll need are a PIC programmer ($20) and something with a serial port if you want to configure the Bluetooth-serial converter, like a desktop PC or a USB-serial converter. You might need an op-amp circuit to amplify the signal so it's readable by the PIC. You'd then use code from Google's BluetoothChat example to get your phone connecting to your bluetooth system, and receiving data from it.
Using a microphone for input would be tricky, for one reason, because it will be filtered to accept only AC. One way to get round that would be to modulate an oscillator's output so its amplitude is proportional to the DC signal you're measuring, then you could measure the magnitude by analysing the data from the microphone.
Interfacing with USB is more difficult than it sounds, it would be harder to build something which would interface with that and measure millivolts, than with bluetooth, because the PIC processor you use for analog to digital sampling and USB client would in fact have to either act as USB host or USB OTG on a phone, which is far more complicated than being a USB peripheral.
I think you would have the most consistent operation across a range of android devices if you built a circuit which uses the voltage from the sensor to control the frequency of an audio oscillator, and measures the frequency with software on the phone.
It's not impossible that a direct connection and reading the amplitude would work, but the two problems are that the signal path may not be good all the way down to DC - there may be a minimum frequency that it can pass making it unsuitable for measuring constant voltages. And second, that the gain of the input channel may not be consistent from device to device or even over time, temperature, etc. There are possible workarounds such as circuits which alternately send the voltage upright and inverted, effectively modulating it to overcome minimum frequency limitations, or even alternate the actual reading with a reference voltage to help model the input gain.
But I'd probably recommend either the frequency modulation approach, or using a $20 embedded bluetooth module and going wireless. Either way, the sensor system is going to need its own small battery pack.
You can extract some power from the headphone jack by telling android to make some sound (and, I suppose, rectifying the output and storing it in a capacitor) - I've seen a bunch of jack-powered things do this. I wonder if the 2 ideas could go together? What if you modulated some audio out through the headphone jacks, through the sensor, then back into the mic? The pH reading should mess with the received sound in some kind of measurable way I'd expect?