I'm using Android's MediaCodec class to read raw data from audio files. That works just fine.
The problem is that I don't know if it's safe to assume that the output data will always be 16-bit?
I can tell, experimentally, that the output is 16-bit, but I don't know how to check that at runtime. The MediaCodec documentation doesn't appear to tell me. The MediaFormat KEY_CHANNEL_MASK could tell me, but MediaCodec doesn't appear to set those flags. It sets the sample rate, and the mime-type, but nothing that can tell me the bit-size explicitly.
I suppose that given the difference between presentation times of subsequent blocks, and the sample rate, I should be able to calculate it, but that doesn't seem very satisfactory.
Is there a way to tell, or is it written somewhere that I don't have to?
Currently the output is always 16 bit in stock Android. If that changes in the future we'll add an additional format key that specifies the format. Note that KEY_CHANNEL_MASK would only tell you which channels are included (e.g. left, right, center, etc), not the sample format.
No it does not. If you have 24 bit PCM (WAV) files, it will give you 24 bit audio. And seemingly no way to determine this. I have added a question here MediaCodec and 24 bit PCM
Related
I'm building an app where it's important to have accurate seeking in MP3 files.
Currently, I'm using ExoPlayer in the following manner:
public void playPeriod(long startPositionMs, long endPositionMs) {
MediaSource mediaSource = new ClippingMediaSource(
new ExtractorMediaSource.Factory(mDataSourceFactory).createMediaSource(mFileUri),
startPositionMs * 1000,
endPositionMs * 1000
);
mExoPlayer.prepare(mediaSource);
mExoPlayer.setPlayWhenReady(true);
}
In some cases, this approach results in offsets of 1-3 seconds relative to the expected playback times.
I found this issue on ExoPlayer's github. Looks like this is an intrinsic limitation of ExoPlayer with Mp3 format and it won't be fixed.
I also found this question which seems to suggest that the same issue exists in Android's native MadiaPlayer and MediaExtractor.
Is there a way to perform accurate seek in local (e.g. on-device) Mp3 files on Android? I'm more than willing to take any hack or workaround.
MP3 files are not inherently seekable. They don't contain any timestamps. It's just a series of MPEG frames, one after the other. That makes this tricky. There are two methods for seeking an MP3, each with some tradeoffs.
The most common (and fastest) method is to read the bitrate from the first frame header (or, maybe the average bitrate from the first few frame headers), perhaps 128k. Then, take the byte length of the entire file, divide it by this bitrate to estimate the time length of the file. Then, let the user seek into the file. If they seek 1:00 into a 2:00 file, divide the byte size of the file to the 50% mark and "needle drop" into the stream. Read the file until a sync word for the next frame header comes by, and then begin decoding.
As you can imagine, this method isn't accurate. At best, you're going to be within a half frame of the target on-average. With frame sizes being 576 samples, this is pretty accurate. However, there are problems with calculating the needle drop point in the first place. The most common issue is that ID3 tags and such add size to the file, throwing off the size calculations. A more severe issue is a variable bitrate (VBR) file. If you have music encoded with VBR, and the beginning of the track is silent-ish or otherwise easy to encode, the beginning might be 32 kbps whereas one second in might be 320 kbps. A 10x error in calculating the time length of the file!
The second method is to decode the whole file to raw PCM samples. This means you can guarantee sample-accurate seeking, but you must decode at least up to the seek point. If you want a proper time length for the full track, you must decode the whole file. Some 20 years ago, this was painfully slow. Seeking into a track would take almost as long as listening to the track to the point you were seeking to! These days, for short files, you can probably decode them so fast that it doesn't matter so much.
TL;DR; If you must have sample-accurate seeking, decode the files first before putting them in your player, but understand the performance penalty first before deciding this tradeoff.
For those who might come across this issue in the future, I ended up simply converting mp3 to m4a. This was the simplest solution in my specific case.
Constant bitrate mp3s are better. The system i used was to record the sample offset location of each frame header in the mp3 into a list. Then to seek, I would seek to the closest frame header before the desired sample by using the values in the list and then read from that location to my desired sample. This works fairly well but not perfect as the rendered wave form is decoded from the reference frame not the values if you decoded from start of file. If accuracy is reqired use libmpg123 it appears to be almost sample accurate. Note check licencing if for commercial app.
I'd like to analyze a piece of a recorded sound sample and find it's properties like pitch and so.
I have tried to analyze the recorded bytes of the buffer with no success.
How it can be done?
You will have to look into FFM.
Then do something like this pseudocode indicates :
Complex in[1024];
Complex out[1024];
Copy your signal into in
FFT(in, out)
for every member of out compute sqrt(a^2+b^2)
To find frequency with highest power scan for the maximum value in the first 512 points in out
Check also out the original post of the buddy here because it is probably a duplicate.
Use fast Fourier transform.. Libraries available for most languages. Bytes are no good, can be mp3 encoded or wav/pcm.. You need to decide then analyze.
DG
On Android 4.1 and above, I am using MediaCodec framework to decode H264 data. I see the codec instance that I'm using (via createDecoderByType) supports multiple color-formats. However, it always gives the output in the 1st-indexed color-format (from its supported list).
Is there a way to force the decoder to give out decoded data in a particular color-format from the ColorFormats it supports? I know the developer docs does mention that the key KEY_COLOR_FORMAT can only be set for encoders, but then help me understand what is the rational of having multiple supported color-formats for decoders?
No, there is currently no way to specify the color format for the decoder output.
This is especially annoying on devices that use undocumented proprietary buffer layouts.
Directing the output to a Surface results in more consistent and portable behavior, but as of API 19 there's still no convenient way to get at the pixel data (ImageReader doesn't work with MediaCodec output formats, glReadPixels() can be slow and works in RGB, etc). If you can do what you need with OpenGL shaders then things work pretty well (see e.g. the effects in "show + capture camera").
In my android application I need to capture the user's speech from the microphone and then pass it to the server. Currently, I use the MediaRecorder class. However, it doesn't satisfy my needs, because I want to make glowing effect, based on the current volume of input sound, so I need an AudioStream, or something like that, I guess. Currently, I use the following:
this.recorder = new MediaRecorder();
this.recorder.setAudioSource(MediaRecorder.AudioSource.MIC);
this.recorder.setOutputFormat(MediaRecorder.OutputFormat.MPEG_4);
this.recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
this.recorder.setOutputFile(FILENAME);
I am writing using API level 7, so I don't see any other AudioEncoders, but AMR Narrow Band. Maybe that's the reason of awful noise which I hear in my recordings.
The second problem I am facing is poor sound quality, noise, so I want to reduct (cancel, suppress) it, because it is really awful, especially on my noname chinese tablet. This should be server-side, because, as far as I know, requiers a lot of resources, and not all of the modern gadgets (especially noname chinese tablets) can do that as fast as possible. I am free to choose, which platform to use on the server, so it can be ASP.NET, PHP, JSP, or whatever helps me to make the sound better. Speaking about ASP.NET, I have come across a library, called NAudio, may be it can help me in some way. I know, that there is no any noise reduction solution built in the library, but I have found some examples on FFT and auto-corellation using it, so it may help.
To be honest, I have never worked with sound this close before and I have no idea where to start. I have googled a lot about noise reduction techniques, code examples and found nothing. You guys are my last hope.
Thanks in advance.
Have a look at this article.
Long story short, it uses MediaRecorder.AudioSource.VOICE_RECOGNITION instead of AudioSource.MIC, which gave me really good results and noise in the background did reduce very much.
The great thing about this solution is, it can be used with both AudioRecord and MediaRecorder class.
For audio capture you can use the AudioRecord class. This lets you record raw audio, i.e. you are not restricted to "narrow band" and you can also measure the volume.
Many smartphones have two microphones, one is the MIC you are using, the other one is near camera for video shooting, called CAMCORDER. You can get data from both of them to do noise reduction. There are many papers talking about audio noise reduction with multiple microphones.
Ref: http://developer.android.com/reference/android/media/MediaRecorder.AudioSource.html
https://www.google.com/search?q=noise+reduction+algorithm+with+two+mic
I want to find out the level of noise around the phone. There doesn't seem to be an easy built-in way to do this, so I've found a few examples that say to use AudioRecord to listen for a brief time and then to use a formula to get the decibel. I have a few questions that the documentation doesn't seem much to explaining though and I was wondering if you kind folk could help me understand.
What's captured in the Array? AudioRecord takes an several arguments on Channel type, format and sample rate and stores that information into an array. I can see the raw numbers, but what do they actually mean?
Is there a resource that your could point me to, or explain to me, on how to convert(what I assume to be) the raw audio in byte form into a decibel representation?
The part I don't understand is #1, what is actually put in the array. Any help would be appreciated.
Found a great resource that converts to a 0-9 range the amplitude of the sound at the mic.
SoundMeter is done by google and works well enough for my purpose.