In my android application I need to capture the user's speech from the microphone and then pass it to the server. Currently, I use the MediaRecorder class. However, it doesn't satisfy my needs, because I want to make glowing effect, based on the current volume of input sound, so I need an AudioStream, or something like that, I guess. Currently, I use the following:
this.recorder = new MediaRecorder();
this.recorder.setAudioSource(MediaRecorder.AudioSource.MIC);
this.recorder.setOutputFormat(MediaRecorder.OutputFormat.MPEG_4);
this.recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
this.recorder.setOutputFile(FILENAME);
I am writing using API level 7, so I don't see any other AudioEncoders, but AMR Narrow Band. Maybe that's the reason of awful noise which I hear in my recordings.
The second problem I am facing is poor sound quality, noise, so I want to reduct (cancel, suppress) it, because it is really awful, especially on my noname chinese tablet. This should be server-side, because, as far as I know, requiers a lot of resources, and not all of the modern gadgets (especially noname chinese tablets) can do that as fast as possible. I am free to choose, which platform to use on the server, so it can be ASP.NET, PHP, JSP, or whatever helps me to make the sound better. Speaking about ASP.NET, I have come across a library, called NAudio, may be it can help me in some way. I know, that there is no any noise reduction solution built in the library, but I have found some examples on FFT and auto-corellation using it, so it may help.
To be honest, I have never worked with sound this close before and I have no idea where to start. I have googled a lot about noise reduction techniques, code examples and found nothing. You guys are my last hope.
Thanks in advance.
Have a look at this article.
Long story short, it uses MediaRecorder.AudioSource.VOICE_RECOGNITION instead of AudioSource.MIC, which gave me really good results and noise in the background did reduce very much.
The great thing about this solution is, it can be used with both AudioRecord and MediaRecorder class.
For audio capture you can use the AudioRecord class. This lets you record raw audio, i.e. you are not restricted to "narrow band" and you can also measure the volume.
Many smartphones have two microphones, one is the MIC you are using, the other one is near camera for video shooting, called CAMCORDER. You can get data from both of them to do noise reduction. There are many papers talking about audio noise reduction with multiple microphones.
Ref: http://developer.android.com/reference/android/media/MediaRecorder.AudioSource.html
https://www.google.com/search?q=noise+reduction+algorithm+with+two+mic
Related
I have seen a application called ququer http://xququ.com
It can using sound beep to share message or files between mobile devices and some other devices.
I think the the infomation is encoded in some format into sound,but do not know how it could be done.
Are there some mature solution for this,especially for android?
There's two major ways to encode information in sound. Remember that sound is a wave at a certain frequency. You can either encode it in the volume of the sound (the amplitude of the wave) or in the frequency of the wave. There are called amplitude modulation and frequency modulation, or AM and FM. Just like radio, just in different frequency ranges.
AM wouldn't be too hard. The sender would beep a known frequency sound at either 50% volume or 100% volume, and the receiver would listen on the microphone, use a band pass filter to get that frequency, and measure the volume. FM would be a little harder, but it could use two sound files of slightly different frequencies and do the same thing- since we want binary data it actually isn't that hard.
I wanted to make an app like this, but instead of creating awful modem-y sounds make R2D2-like sounds. Never got around to it. Anyway, to answer your question: Gabe Sechan lists two ways that sound (or any wave) can be used to transmit information. A third is called Phase Modulation.
Together these three techniques (AM, FM and PM) are the staple of how the data from one signal can be imposed on another and transmitted, but they are examples of analog modulation. For this application, you want digital modulation. This is a bit out of my expertise, so I'll refer you to Wikipedia (though maybe someone else can give a more thorough answer here):
http://en.wikipedia.org/wiki/Modulation#Digital_modulation_methods
You might also want to ask on dsp.stackexchange.com for better starting points. There's a lot to know here, but maybe I've given you enough to google some open source librarires or at least ask the right question.
Of course, you can use the techniques Gabe Sechan suggested, and you might find them more intuitive. Indeed, many (most? all?) digital modulation techniques use analog modulation as a starting point. However, your datarates will probably be lower.
I have searched for this online, but am still a bit confused (as I'm sure others will be if they think of something like this). I'd like to preface by saying that this is not for homework and/or profit.
I wanted to create an app that could listen to your microwave as you prepare popcorn. It would work by sounding an alarm when there's a certain time interval between pops (say 5-6 seconds). Again, this is simply a project to keep me occupied - not for a class.
Either way, I'm having trouble trying to figure out how to analyze the audio intake in real-time. That is, I need a way to log the time when a "pop" occurs. So that you guys don't think I didn't do any research into the matter, I've checked out this SO question and have extensively searched the AudioRecord function list.
I'm thinking that I will probably have to do something with one of the versions of read() and then compare the recorded audio every 2 seconds or so to the recorded audio of a "pop" (i.e. if 70% or more of the byte[] audioData array is the same as that of a popping sound, then log the time). Can anyone with Android audio input experience let me know if I'm at least on the right track? This is not a question of me wanting you to code anything for me, but a question as to whether I'm on the correct track, and, if not, which direction I should head instead.
I think I have an easier way.
You could use the MediaRecorder 's getMaxAmplitude method.
Anytime your recorder detects a big jump in amplitude, you have detected a corn pop!
Check out this code (ignore the playback part): Playing back sound coming from microphone in real-time
Basically the idea is that you will have to take the value of each 16-bit sample (which corresponds to the value of the wave at that time). Using the sampling rate, you can calculate the time between peaks in volume. I think that might accomplish what you want.
this may be a bit overkill, but there is a framework from MIT media labs called funf: http://code.google.com/p/funf-open-sensing-framework/
They already created classes for audio input and some analysis (FFT and the like), also saving to files or uploading is implemented as far as I've seen, and they handle most of the sensors available on the phone.
You can also get inspired from the code they wrote, which I think is pretty good.
I was looking at Android's SoundPool as a mechanism to implement sound effects in my generic game development library. It seemed ideal.
But a little bit of research indicates that there all kinds of bugs in SoundPool. Are the bugs in SoundPool still relevant?
Because I'm developing a library, any bugs in SoundPool become bugs in my library, and I want to insulate my users from that.
So my question is basically: what API should I use for audio?
Using AudioTrack and writing my own mixer is not out of the question. But obviously it would be preferable to avoid doing that. And is there any API to provide decoding for me?
I need to be able to play a reasonable number of simultaneous sound effects (at least 16, let's say), and have even more open. Sounds need to start playing with low latency. WAV files need to be supported (MP3/Ogg is unimportant). Sound effects need to support seamless looping and dynamic, individual volume adjustment. The Android app lifecycle needs to be properly supported.
I have heard there is a 1MB limit somewhere for SoundPool, this is probably acceptable for each individual sound effect but not for all buffers/sounds. Can someone tell me exactly what the limit is on?
Finally, I need to be able to play background music as well, in compressed formats, with low CPU load. I assume MediaPlayer is ideal for this. Can it be used in parallel with another API?
I know a few people have been using MediaPlayer to fill in for SoundPool. But does it support the features that I need?
Are there any other audio APIs I've missed?
Just to add some more recent feedback on this issue. I've been using SoundPool for some time in an app with a fairly large user base for key press sounds. Our use case:
Must be played immediately
Up to 3+ sounds in parallel
We make use of the setRate across it's full range [0.5f-2.0f]
I've now experienced two major device specific issue and have decided to cut my losses and switch away from SoundPool
A large number of 4.4 LG devices (mostly the LG G2/G3 line) were having a native crash with their implementation of SoundPool. This was fixed in an update (eventually) but we still have a lot of users with un-upgraded devices
Sony Xperia devices currently have all sorts of issue with SoundPool as reported by others. In my case, I've discovered that if you use setRate with rate > 1.0f the SoundPool with start throwing exceptions until your app quits (and burn through a bunch of battery in the process).
TL;DR; I no longer think it's worth the danger/hassle of debugging SoundPool
Stick with OGG files and SoundPool will do you just fine. It's the nature of the multi-platform beast that is Android that there WILL be hardware configurations that will not work with every significant program, no matter how diligently the programmers try.
If this is a large and well-funded project, add to the funding one of each major phone for testing. It's actually much cheaper than the programmer time spent researching and trying to guess what their performance is.
Sorry. Seems as if this isn't the answer that you were looking for. Good luck!
DISCLAIMER: I have a small amount of experience with MediaPlayer, and no successful experience with the other APIs I mention, and the following information is based on what I've read in the DOCs and what I've read from google searches.
You could use mediaplayer (for the background music) with other audio APIs, since MediaPlayer automatically runs on it's own thread, but I believe it has a high-ish cpu load, and I don't think it would take compressed bits very well, but I'm not too sure.
There's also JetPlayer http://developer.android.com/reference/android/media/JetPlayer.html which seems like a lot of work to use effectively, but it would work very well with playing background music, then playing other sounds as needed in the game. From what I read of the DOCs, it takes a MIDI file (I think?) and you mute and unmute tracks to make it work how you want it to.
I like AudioTrack because it gives you the ability to edit sounds at runtime by changing the frequencies of the sound, and SoundPool can do the same.
Though for your situation, AudioTrack doesn't seem like it would work well, since playing two sounds would require two threads because AudioTrack is blocking (I'm pretty sure).
And with SoundPool, I'm thinking that since you have 16 sounds, maybe take two threads with one SoundPool in each thread and apply 8 sounds to each SoundPool. I don't really know though, as I've never even tried using SoundPool.
And again, my information is not based on experience, just what it appears from what I've read, so I may be completely or maybe just slightly wrong, or heck, who knows.
And I don't really know anything about the SoundPool bugs, since I haven't researched it.
I would like to listen to the mic (I guess using AudioRecord) and perform some action the very moment a person starts to speak. I know I can buffer audio with AudioRecord, but how do I analyze it ?
Well, the difficult part will be getting the phone to recognize that it's voice. You can set the voice recognition system as the input, instead of the mic, which might be able to do that. I don't think so though, because (I actually read all about this yesterday) the phone doesn't actually do the recognizing, it just opens up a live stream (like a phone call) to the Google servers, and they do the recognizing.
Also, the information that I have found so far points to the conclusion that Android does not support analysis of live audio from the mic. All these other apps that seem to be "live" are actually just taking a bunch of small samples and analyzing them really quickly so that they seem live. A 500 millisecond sample every 300 milliseconds seems to be common.
Luckily, on the side of my programming job, I'm also a sound technician, so I can tell you that (if you were willing to put in the work) there is a way to detect actual voice as opposed to just sound. Every voice is split into a few distinct ratios of frequencies which all combine to make the voice we hear, and every voice's ratios remains pretty constant, while each individual voice's ratios are different (which is why voice-based passwords work). So, if you were able to take a sample, break it up into frequencies of about 10hz each, and watch for the amplitude of each, and when you got a frequency/amplitude pattern that looked similar to a voice instead of just "white noise", you'd be in business. DOING that however, doesn't seem like it'd be easy at all. Something similar has been done before with the app called SpectralView, which displays the audio spectrum all broken up.
Also, as you can see by using the Voice Search, a voice also fluctuates a lot in how loud it is. You could look for that, but it wouldn't be as reliable.
In conclusion, how do you analyze it? Well, you would have to look for a pattern in the frequencies that looks like a voice. How do you do that? Well, to be honest, I don't know for sure. Sorry.
I'm trying to build a gadget that detects pistol shots using Android. It's a part of a training aid for pistol shooters that tells how the shots are distributed in time and I use a HTC Tattoo for testing.
I use the MediaRecorder and its getMaxAmplitude method to get the highest amplitude during the last 1/100 s but it does not work as expected; speech gives me values from getMaxAmplitude in the range from 0 to about 25000 while the pistol shots (or shouting!) only reaches about 15000. With a sampling frequency of 8kHz there should be some samples with considerably high level.
Anyone who knows how these things work? Are there filters that are applied before registering the max amplitude. If so, is it hardware or software?
Thanks,
/George
It seems there's an AGC (Automatic Gain Control) filter in place. You should also be able to identify the shot by its frequency characteristics. I would expect it to show up across most of the audible spectrum, but get a spectrum analyzer (there are a few on the app market, like SpectralView) and try identifying the event by its frequency "signature" and amplitude. If you clap your hands what do you get for max amplitude? You could also try covering the phone with something to muffle the sound like a few layers of cloth
It seems like AGC is in the media recorder. When I use AudioRecord I can detect shots using the amplitude even though it sometimes reacts on sounds other than shots. This is not a problem since the shooter usually doesn't make any other noise while shooting.
But I will do some FFT too to get it perfect :-)
Sounds like you figured out your agc problem. One further suggestion: I'm not sure the FFT is the right tool for the job. You might have better detection and lower CPU use with a sliding power estimator.
e.g.
signal => square => moving average => peak detection
All of the above can be implemented very efficiently using fixed point math, which fits well with mobile android platforms.
You can find more info by searching for "Parseval's Theorem" and "CIC filter" (cascaded integrator comb)
Sorry for the late response; I didn't see this question until I started searching for a different problem...
I have started an application to do what I think you're attempting. It's an audio-based lap timer (button to start/stop recording, and loud audio noises for lap setting). It' not finished, but might provide you with a decent base to get started.
Right now, it allows you to monitor the signal volume coming from the mic, and set the ambient noise amount. It's also using the new BSD license, so feel free to check out the code here: http://code.google.com/p/audio-timer/. It's set up to use the 1.5 API to include as many devices as possible.
It's not finished, in that it has two main issues:
The audio capture doesn't currently work for emulated devices because of the unsupported frequency requested
The timer functionality doesn't work yet - was focusing on getting the audio capture first.
I'm looking into the frequency support, but Android doesn't seem to have a way to find out which frequencies are supported without trial and error per-device.
I also have on my local dev machine some extra code to create a layout for the listview items to display "lap" information. Got sidetracked by the frequency problem though. But since the display and audio capture are pretty much done, using the system time to fill in the display values for timing information should be relatively straightforward, and then it shouldn't be too difficult to add the ability to export the data table to a CSV on the SD card.
Let me know if you want to join this project, or if you have any questions.