For the Dutch movie "App" (http://www.imdb.com/title/tt2536436), a second screen app was developed. This app synchronizes with the audio of the movie to give extra details about some scenes and some other movie fragments from other angles. It seems like it is synchronized with the audio of the movie.
For a school project, a similar app has to be developed, so we want to achieve the same result. Does anybody know of any way to synchronize app content with an external audio source? We know we have to account for environmental audio to be filtered out, but have no idea where to start. It seems that MPEG2-TS has some kind of time coding via a protocol called SMPTE, but we don't know how to "listen" to this time coding in our android app.
Does anybody have any idea? Any external libraries to be used?
Here's an article briefly explaining some Automated Content Recognition (ACR) techniques:
Second screen apps use the microphone on your phone or tablet to listen to the TV and identify the channel, show or ad you are watching, and the precise location within it, based on one of two techniques:
Watermarking
Audio watermarking requires a series of inaudible “watermarks” to be encoded into the broadcast TV signal, normally using a hardware encoder in the playout suite or OB truck. Watermarks can be repeated regularly throughout the broadcast, providing a timecode, or used to trigger specific events such as questions or voting windows. The second screen app uses the device’s microphone to listen for each watermark, and decode the “payload” which uniquely identifies the channel and timecode or event.
Fingerprinting
Audio fingerprinting does not require the broadcast content to be modified. Instead, the TV content is analyzed before it is broadcast (or sometimes while it is being broadcast) and broken up into a sequence of “fingerprints” which are as unique as their name suggests. The second screen app uses an API to record a short segment of audio, and generates its own fingerprint, which is then compared to the “target” set of fingerprints, and if a match is found, the channel, show and timecode are identified.
Thank you for the response. We used watermarking techniques for solving this issue; we added in high-frequency audio watermarks which were detected by the smartphone using FFT analysis of the heard audio. When we detected the TV show that was being watched, we fetched the information of the show over the internet.
Related
I am working on one voice messaging application, I need to compare two voice like,
Register with app by record your voice
Sent voice message to
another user by record voice, but first need to compare this voice
to recorded voice in profile.
Its for security purpose and need to know recorded message is from specific user or not.
I tried :
Compare two sound in Android
http://www.dreamincode.net/forums/topic/274280-using-fft-to-compare-two-audio-files-and-then-realtime-comparison/
But not getting idea about voice Comparison.
Please share if anybody know about the same. Didn't find any sample to do this.
Since you indicated it's for security purpose, I'd like to first share a few things on voice biometry :-)
The problem with authenticating someone is that you'd need to be sure he was actually there saying the things that were recorded... and that's a whole different level of complexity than merely comparing voice characteristics.
Algorithms extracting voice features from a sample and later calculating the distance between a new sample and the first one can easily be fooled by a recording made up by an attacker.
Since in your case there's a human recipient, creating a message made up of chopped words or sentences from random conversations is actually quite difficult and time consuming. But not completely impossible...
There are very good sounding softwares created for the music industry that will e.g. take some voice audio input and make it sound (intonation and time wise) like a second audio sample (a guide, made by the fraudster). Vocalign Pro by SynchroArts does this to help get perfect backing vocal tracks. You could further tweak the audio by hand using other voice editing softwares and achieve an acceptable level of quality that wouldn't be immediately detected by the recipient.
Depending on what the attacker wants your user to say, the process complexity could range from an hour to a day provided he has all the recording material he wants...
To fight against this type of attack, you need to detect the audio sample has been edited. The digital edition will leave unnatural traces. E.g. in the background noise surrounding the voice.
AFAICT, only the best commercial softwares achieve this level security check, but I can't tell how far they go in the detection of such edits.
From a pure security perspective, you'd also need to be sure the device was not compromised. So these voice verification check should happen server side and not on the phone itself.
Please note these are general considerations and it all depends on what sort of security measures you actually need for your use case. My car alarm is certainly not unbreakable but it helps raising the bar so fewer attackers could potentially steal it...
Another thing to consider is that biometry is by definition a statistical process and it will yield a certain percentage of false positives and false negatives. By changing the acceptance threshold, you'll be able to lower one of them at the cost of raising the other.
Selecting an appropriate threshold will require you to have a fair amount of test data. Say 1 minute recording of at least 200 speakers to start getting a picture.
One more thing I think you'll need to consider is the inherent variability of the human voice. People may be sick which in some cases might render the voice unrecognizable. Also the emotional state might play a role: sadness or anger will yield different sounding voices...
And last but not least, the surrounding noise might pose a problem. Say the user enrolled while at home and later records a message while on the go in a busy city environment, the system might have troubles making sure it's actually the same person speaking. The signal to noise ratio is definitely going to be one of your main issues. Small tip: depending on the distance of the microphone to the mouth, the ratio will be quite different. You'll get way better result when the user puts the phone close to its face like in a regular phone conversation than when the user looks at the screen while recording the message.
Voice variability and signal to noise ratio are probably the main reasons behind false negative results.
Hopefully, you now have a better understanding of the challenges awaiting you and I can start sharing some pointers for open source and commercial libraries.
AFAIK, there are no open source libraries that includes fraudster detection...
You may want to check Nuance Communication for state-of-the-art. There are plenty other vendors, just check with Google, I only mentioned Nuance because of it's reputation.
There is an OSS library called Alize (written in C++, under LGPL license) which uses an algorithm called MFCC (Mel Frequency Cepstrum Coefficients). MFCC is known to bring excellent results. Expect a steep learning curve as this software is aimed at researchers willing to improve the state-of-the-art on this topic and the vocabulary used is very specific.
I wrote an OSS library named Recognito (Java, Apache 2.0) aimed at regular developers so you should be able to test it in a matter of minutes. The lib is very young and I first focused on it's API before improving the algorithms. The algorithm I use for the moment is called Linear Predictive Coding (LPC) and is known to bring good results (and I do have good results, provided recordings yield the same level of quality :-)). I'm currently in the process of releasing a new version including a likelihood coefficient in the match results. MFCC implementation is on the road map.
There is plenty of javadoc and the code should be very straightforward...
https://github.com/amaurycrickx/recognito
Recognito has a dependency on javax.sound packages for audio file handling. You may want to check this post for what it takes to use it in Android: Voice matching in android
Given many people need something for android, I'll do something about it in the near future instead of saying how one should modify the lib :-)
HTH
I built a similar app Shazam, however it only works in sending an entire file of 10seconds of audio.
My doubt is: In android, there's any thing to keep like Shazam of while music is playing and the database is searching? Or it's own Shazam service technology?
Shazam developed that audio fingerprint matching technlogy. It's not available in the default Android SDK.
The Shazam technology is proprietary. The base algorithm was documented since by its creator:
The algorithm uses a combinatorially hashed time-frequency constellation analysis of the
audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified.
This is very novel and efficient, but the principles for fingerprinting audio stay the same. Among which certainly a FTT (fast fourier transform) to at least detect the BPM. Its even possible to convert sound to an image (the simplest being a spectogram), which can be further processed by audio-unrelated software.
If you need an audio analysis library, written in Java you could look into MusicG for example which is said to be simple to use on Android.
I am working on one voice messaging application, I need to compare two voice like,
Register with app by record your voice
Sent voice message to
another user by record voice, but first need to compare this voice
to recorded voice in profile.
Its for security purpose and need to know recorded message is from specific user or not.
I tried :
Compare two sound in Android
http://www.dreamincode.net/forums/topic/274280-using-fft-to-compare-two-audio-files-and-then-realtime-comparison/
But not getting idea about voice Comparison.
Please share if anybody know about the same. Didn't find any sample to do this.
Since you indicated it's for security purpose, I'd like to first share a few things on voice biometry :-)
The problem with authenticating someone is that you'd need to be sure he was actually there saying the things that were recorded... and that's a whole different level of complexity than merely comparing voice characteristics.
Algorithms extracting voice features from a sample and later calculating the distance between a new sample and the first one can easily be fooled by a recording made up by an attacker.
Since in your case there's a human recipient, creating a message made up of chopped words or sentences from random conversations is actually quite difficult and time consuming. But not completely impossible...
There are very good sounding softwares created for the music industry that will e.g. take some voice audio input and make it sound (intonation and time wise) like a second audio sample (a guide, made by the fraudster). Vocalign Pro by SynchroArts does this to help get perfect backing vocal tracks. You could further tweak the audio by hand using other voice editing softwares and achieve an acceptable level of quality that wouldn't be immediately detected by the recipient.
Depending on what the attacker wants your user to say, the process complexity could range from an hour to a day provided he has all the recording material he wants...
To fight against this type of attack, you need to detect the audio sample has been edited. The digital edition will leave unnatural traces. E.g. in the background noise surrounding the voice.
AFAICT, only the best commercial softwares achieve this level security check, but I can't tell how far they go in the detection of such edits.
From a pure security perspective, you'd also need to be sure the device was not compromised. So these voice verification check should happen server side and not on the phone itself.
Please note these are general considerations and it all depends on what sort of security measures you actually need for your use case. My car alarm is certainly not unbreakable but it helps raising the bar so fewer attackers could potentially steal it...
Another thing to consider is that biometry is by definition a statistical process and it will yield a certain percentage of false positives and false negatives. By changing the acceptance threshold, you'll be able to lower one of them at the cost of raising the other.
Selecting an appropriate threshold will require you to have a fair amount of test data. Say 1 minute recording of at least 200 speakers to start getting a picture.
One more thing I think you'll need to consider is the inherent variability of the human voice. People may be sick which in some cases might render the voice unrecognizable. Also the emotional state might play a role: sadness or anger will yield different sounding voices...
And last but not least, the surrounding noise might pose a problem. Say the user enrolled while at home and later records a message while on the go in a busy city environment, the system might have troubles making sure it's actually the same person speaking. The signal to noise ratio is definitely going to be one of your main issues. Small tip: depending on the distance of the microphone to the mouth, the ratio will be quite different. You'll get way better result when the user puts the phone close to its face like in a regular phone conversation than when the user looks at the screen while recording the message.
Voice variability and signal to noise ratio are probably the main reasons behind false negative results.
Hopefully, you now have a better understanding of the challenges awaiting you and I can start sharing some pointers for open source and commercial libraries.
AFAIK, there are no open source libraries that includes fraudster detection...
You may want to check Nuance Communication for state-of-the-art. There are plenty other vendors, just check with Google, I only mentioned Nuance because of it's reputation.
There is an OSS library called Alize (written in C++, under LGPL license) which uses an algorithm called MFCC (Mel Frequency Cepstrum Coefficients). MFCC is known to bring excellent results. Expect a steep learning curve as this software is aimed at researchers willing to improve the state-of-the-art on this topic and the vocabulary used is very specific.
I wrote an OSS library named Recognito (Java, Apache 2.0) aimed at regular developers so you should be able to test it in a matter of minutes. The lib is very young and I first focused on it's API before improving the algorithms. The algorithm I use for the moment is called Linear Predictive Coding (LPC) and is known to bring good results (and I do have good results, provided recordings yield the same level of quality :-)). I'm currently in the process of releasing a new version including a likelihood coefficient in the match results. MFCC implementation is on the road map.
There is plenty of javadoc and the code should be very straightforward...
https://github.com/amaurycrickx/recognito
Recognito has a dependency on javax.sound packages for audio file handling. You may want to check this post for what it takes to use it in Android: Voice matching in android
Given many people need something for android, I'll do something about it in the near future instead of saying how one should modify the lib :-)
HTH
I want to make an android application that allow user change the voice during phone call. For example: You are a man, you can change the voice to a woman or robot when talking over phone. It is like a funny prank.
I work around android's API and google for some days but still have no idea. Some one told is impossible but I see some app on google play can do:
https://play.google.com/store/apps/details?id=com.gridmob.android.funnycall
So I think there are some ways to do that.
I think about recording and play back by using AudioTracker but I have 2more problem:
1. I cannot mute the voice from phone call, so the phone only play my sound after processing
2. record and process will make a long delay (slow-realtime)
Can any one share some solution for this?
The app you linked isn't changing voices on the phone: it uses SIP (or similar) to place a call through the authors' servers and the voice changing happens there. That's why you only get a small number of free minutes of use before you have to pay them.
Yes it uses a sip server to do this process. The reason you cannot actually create an app that does this on the phone is because of two things. The first thing being, sound processing for the phone is locked. You can't unlock this because its strictly engineered through hardware not software. A pc can do this because it uses a standard sound card in which software can modify its frequencies. The second thing is phone manufactures are required to design their phones in a standard format. There are laws that force these companies to make it impossible to do any voice morphing. It is against the law to impersonate someone you are not, over any telephone network.
Hard way
You get the input voice, you use voice recognition to detect the words, then you use speech-to-text with your desired voice as output.
Less hard way
Sound processing: Changing frequencies, amplitude etc.
I'm currently working on an app with the end goal of being roughly analogous to an Android version of Air Play for the iDevices.
Streaming media and all that is easy enough, but I'd like to be able to include games as well. The problem with that is that to do so I'd have to stream the screen.
I've looked around at various things about taking screenshots (this question and the derivatives from it in particular), but I'm concerned about the frequency/latency. When gaming, anything less than 15-20 fps simply isn't going to cut it, and I'm not certain such is possible with the methods I've seen so far.
Does anyone know if such a thing is plausible, and if so what it would take?
Edit: To make it more clear, I'm basically trying to create a more limited form of "remote desktop" for Android. Essentially, capture what the device is currently doing (movie, game, whatever) and replicate it on another device.
My initial thoughts are to simply grab the audio buffer and the frame buffer and pass them through a socket to the other device, but I'm concerned that the methods I've seen for capturing the frame buffer are too slow for the intended use. I've seen people throwing around comments of 3 FPS limits and whatnot on some of the more common ways of accessing the frame buffer.
What I'm looking for is a way to get at the buffer without those limitations.
I am not sure what you are trying to accomplish when you refer to "Stream" a video game.
But if you are trying to mimic AirPlay, all you need to do is connect via a Bluetooth/ internet connection to a device and allow sound. Then save the results or handle it accordingly.
But video games do not "Stream" a screen because the mobile device will not handle much of a work load. There are other problems like, how to will you handle the game if the person looses internet connection while playing? On top of that, this would require a lot of servers to support the game workload on the backend and bandwidth.
But if you are trying to create an online game. Essentially all you need to do is send and receive messages from a server. That is simple. If you want to "Stream" to another device, simply connect the mobile device to speakers or a TV. Just about all mobile video games or applications just send simple messages via JSON or something similar. This reduces overhead, is simple syntax, and may be used across multiple platforms.
It sounds like you should take a look at this (repost):
https://stackoverflow.com/questions/2885533/where-to-start-game-programming-for-android
If not, this is more of an open question about how to implement a video game.