I'd like to analyze a piece of a recorded sound sample and find it's properties like pitch and so.
I have tried to analyze the recorded bytes of the buffer with no success.
How it can be done?
You will have to look into FFM.
Then do something like this pseudocode indicates :
Complex in[1024];
Complex out[1024];
Copy your signal into in
FFT(in, out)
for every member of out compute sqrt(a^2+b^2)
To find frequency with highest power scan for the maximum value in the first 512 points in out
Check also out the original post of the buddy here because it is probably a duplicate.
Use fast Fourier transform.. Libraries available for most languages. Bytes are no good, can be mp3 encoded or wav/pcm.. You need to decide then analyze.
DG
Related
I am using noise meter to read noise in decibels. When I run the app it is recording almost 120 readings per second. I don't want those many recordings. Is there any way to specify that I want only one or two recordings per second like that. Thanks in advance. noise_meter package.
I am using code from git hub which is already written using noise_meter github repo noise_meter example
I tried to calculate no. of samples using sample rate which is 40100 in the package. but I can't understand it.
As you see in the source code , audio streamer uses a fixed size buffer of a new thousand and an audio sample rate of 41000, and includes this comment Uses a buffer array of size 512. Whenever buffer is full, the content is sent to Flutter. So, small audio blocks will arrive at the consumer frequently (as you might expect from a streamer). It doesn't seem possible to adjust this.
The noise meter package simply takes each block of audio and calculates the noise level, so the rate of arrival of those is exactly the same as rate of arrival of audio blocks from the underlying package.
Given the simplicity of the noise meter calculation, you could replace it with your own code directly on top of audio streamer. You just need to collect multiple blocks of audio together before performing the simple decibel calculation.
Alternatively you could simply discard N out of each N+1 samples.
I'm building an app where it's important to have accurate seeking in MP3 files.
Currently, I'm using ExoPlayer in the following manner:
public void playPeriod(long startPositionMs, long endPositionMs) {
MediaSource mediaSource = new ClippingMediaSource(
new ExtractorMediaSource.Factory(mDataSourceFactory).createMediaSource(mFileUri),
startPositionMs * 1000,
endPositionMs * 1000
);
mExoPlayer.prepare(mediaSource);
mExoPlayer.setPlayWhenReady(true);
}
In some cases, this approach results in offsets of 1-3 seconds relative to the expected playback times.
I found this issue on ExoPlayer's github. Looks like this is an intrinsic limitation of ExoPlayer with Mp3 format and it won't be fixed.
I also found this question which seems to suggest that the same issue exists in Android's native MadiaPlayer and MediaExtractor.
Is there a way to perform accurate seek in local (e.g. on-device) Mp3 files on Android? I'm more than willing to take any hack or workaround.
MP3 files are not inherently seekable. They don't contain any timestamps. It's just a series of MPEG frames, one after the other. That makes this tricky. There are two methods for seeking an MP3, each with some tradeoffs.
The most common (and fastest) method is to read the bitrate from the first frame header (or, maybe the average bitrate from the first few frame headers), perhaps 128k. Then, take the byte length of the entire file, divide it by this bitrate to estimate the time length of the file. Then, let the user seek into the file. If they seek 1:00 into a 2:00 file, divide the byte size of the file to the 50% mark and "needle drop" into the stream. Read the file until a sync word for the next frame header comes by, and then begin decoding.
As you can imagine, this method isn't accurate. At best, you're going to be within a half frame of the target on-average. With frame sizes being 576 samples, this is pretty accurate. However, there are problems with calculating the needle drop point in the first place. The most common issue is that ID3 tags and such add size to the file, throwing off the size calculations. A more severe issue is a variable bitrate (VBR) file. If you have music encoded with VBR, and the beginning of the track is silent-ish or otherwise easy to encode, the beginning might be 32 kbps whereas one second in might be 320 kbps. A 10x error in calculating the time length of the file!
The second method is to decode the whole file to raw PCM samples. This means you can guarantee sample-accurate seeking, but you must decode at least up to the seek point. If you want a proper time length for the full track, you must decode the whole file. Some 20 years ago, this was painfully slow. Seeking into a track would take almost as long as listening to the track to the point you were seeking to! These days, for short files, you can probably decode them so fast that it doesn't matter so much.
TL;DR; If you must have sample-accurate seeking, decode the files first before putting them in your player, but understand the performance penalty first before deciding this tradeoff.
For those who might come across this issue in the future, I ended up simply converting mp3 to m4a. This was the simplest solution in my specific case.
Constant bitrate mp3s are better. The system i used was to record the sample offset location of each frame header in the mp3 into a list. Then to seek, I would seek to the closest frame header before the desired sample by using the values in the list and then read from that location to my desired sample. This works fairly well but not perfect as the rendered wave form is decoded from the reference frame not the values if you decoded from start of file. If accuracy is reqired use libmpg123 it appears to be almost sample accurate. Note check licencing if for commercial app.
I'm making an application in which I record a direct audio from the microphone of the cell phone, I save that recording and I need to compare it with some audio already stored in the device
The audios are of "noise" of motors, the idea is that from the recorded recording it indicates us to which case of the saved ones it seems
that is, I have two cases, a good engine and a damaged engine, when I finish recording it must say "this audio belongs to a damaged engine"
Reading I find that it has to be done through artificial intelligence, which is really complex, I have read that you can "decompose" the audio into a vector of numbers or make comparisons by FFT, however I do not find much information about it, really I'd appreciate your help.
the file type saved is .wav
It's nontrivial task to compare audio signals.
The audio is just a sequence of values (numbers) where index is just a "time" and value is a loudness of sound (amplitude).
If you compare audio data like two arrays (sequences) element by element, iterating through the index - it will be luck to get something reasonable. Though you need some transformation of this array to get aggregated info about this sequence of numbers as a whole (for example - spectre of signal).
There are some mathematical tools for this task, for example, mentioned by you well-known Fourier Transform and statistical tool Autocorrelation (it finds "kindness" of sequence of numbers).
The autocorrelation method can be relatively simple - you just iterate comparing arrays of data and calculate the autocorrelation. But you will pay for simplicity in case of initial quality (or preparation/normalization) of signals - they should have similar duration. The value of resulted correlation function will show how differ two sequences, i.e. 0 - is absolutely different and 1 - is almost the same.
To implement Fourier Transform (FFT) is not a problem too, you could take well described algo and implement it itself on any language without using third party libs. It does the job very well.
FT will help you get a spectrum of the signal i.e. another set of values: set of amplitudes per frequency (roughly, frequency as array index instead of time in case of input raw signal) and now you can compare this given spectrums almost like two arrays iterating through an index (frequency) and then decide on their similarity - calculate deltas and see whether it hit into some acceptance interval (or you can use more correct statistical methods e.g. correlation function).
As for noised signal, the noise is usually subtracted from the given data set (but here you should know the sort of noise type).
It is all related to signal processing area and if you're working on such project you need to learn more about this.
Bonus: a book for example
I am trying to extract the SMPTE timecode (wikipedia) from an audio input stream in android.
As mentioned here https://stackoverflow.com/a/2099226 first step is to scan the input stream byte sequence for 0011111111111101 to synchronize. But how to do this with the AudioRecord class?
That answer isn't really correct. The audio signal you are getting is a modulated carrier wave, and extracting SMPTE bits from it is a multi-step process: The raw data you get through the mike or audio in isn't going to correspond to SMPTE timecode. Therefore, you need to decode the audio, which is not at all simple.
The first step is to convert your audio signal from biphase mark code. I haven't implemented a SMPTE reader myself, but you know the clock rate from the SMPTE standard, so the first thing I would do is filter carefully to get rid of background noise, since it sounds like you are taking the audio in from the mike. A gentle high-pass to remove any DC offset should do and a gentle lowpass for HF noise should also help. (You could instead use a broad bandpass)
Then, you need to find the start of each clock cycle. You could do something fancy like an autocorrelation or PLL algorithm, but I suspect that knowing the approximate clock rate from from the SMPTE standard and being able to adjust a few percent up and down is good enough -- maybe better. So, just look for repeating transitions according to the spec. Doing something fancy will help if you suspect your timecode is highly warped (which might be the case if you have a really old tape deck or you want to sync at very high/low speeds, but LTC isn't really designed for this. That's more VTC's domain.).
Once you've identified the clock, you need to determine, for each clock tick, if a transition in the signal occurred at the start of the clock cycle. Each clock tick will have a transition in the middle, but a transition at the start indicates a 0 bit. That's how BMC transmits both clock and data in a single stream. This allows you to create a new stream of your actual SMPTE data.
Now you've decoded the BMC into a SMPTE stream. The next step is to look for the sync code. Looking at the spec on Wikipedia and from what I remember of SMPTE, I would assert that it is not enough to find a single sync code, which may happen by accident or coincidence elsewhere in the 80-bit block. Instead, you must find several in a row at the right interval. Then you can read your data into 80-bit SMPTE blocks, and, as you read, you must continue to verify the sync codes. If you don't see one where you expected it, start the search from scratch.
Finally, once you've decoded, you'll have to come up with some way to "flywheel" because you will almost certainly not read all data correctly all the time (no checksums!). That is the nature of the beast.
I want to find out the level of noise around the phone. There doesn't seem to be an easy built-in way to do this, so I've found a few examples that say to use AudioRecord to listen for a brief time and then to use a formula to get the decibel. I have a few questions that the documentation doesn't seem much to explaining though and I was wondering if you kind folk could help me understand.
What's captured in the Array? AudioRecord takes an several arguments on Channel type, format and sample rate and stores that information into an array. I can see the raw numbers, but what do they actually mean?
Is there a resource that your could point me to, or explain to me, on how to convert(what I assume to be) the raw audio in byte form into a decibel representation?
The part I don't understand is #1, what is actually put in the array. Any help would be appreciated.
Found a great resource that converts to a 0-9 range the amplitude of the sound at the mic.
SoundMeter is done by google and works well enough for my purpose.