convert WAVE file into vectors in Android - android

Im searching for a way to convert Wave files into a list of numbers to cross-correlate the resulting vectors (like the numbers you get when you read a wave file on MATLAB)
0.6653
-0.8445
0.9589
-0.9999
0.9643
-0.8547
0.6797
-0.4525
0.1907
0.0858
-0.3557
0.5983
-0.7951
0.9309
-0.9953
0.9835
-0.8962
0.7402
-0.5275
0.2742
is there a way to do that in Android or even C/C++? i really dont know how to start.

WAVE file format is fairly simple, especially if you're interested in linear PCM encoded data. Descriptions of the format is available from various sources, such as here and here.
Using these, you should be able to decode:
the header ("RIFF" chunk)
the "fmt " chunk which contains various information such as the number of bytes per samples, the number of channels (ie. mono, stereo, or more), sampling rate, etc.
and the "data" chunk which if the main thing that you'll want to look at in order to create a MATLAB-like vector.
If you're dealing with single channel (ie. mono) WAVE files, the data is fairly straightforward to decode. The number of samples should then corresponds (give-or-take a few bytes for padding) to the size of the data block divided by the number of bytes per sample (the number of bits per sample is available from the "fmt " chunk). The mapping of the samples' integer to a [0-1] floating point value can be done by multiplying by a constant (eg. 1.0/128 for 1-byte-per-sample).
For multi-channel WAVE files, keep in mind that channel data are interleaved (e.g. sample 1 left/right, sample 2 left/right, ...)
Note also that there are a number of tutorial/samples floating around (such as this sample in C or this sample in Java), and various open source sound libraries which you may use as a starting point.

Related

What does forcing key frames mean?

What does forcing key frames mean?
As per the doc
-force_key_frames[:stream_specifier] expr:expr (output,per-stream) Force key frames at the specified timestamps, more precisely at the
first frames after each specified time. If the argument is prefixed
with expr:, the string expr is interpreted like an expression and is
evaluated for each frame. A key frame is forced in case the evaluation
is non-zero.
Still I am not able to understand what does forcing key frames at specified timestamp means and what is its use?I can see this command is used while segmenting video.What is its purpose there?
A typical video codec uses temporal compression i.e. most frames only store the difference with respect to earlier (and in some cases, future) frames. So, in order to decode these frames, earlier frames have to be referenced, in order to generate a full image. In short, keyframes are frames which don't rely on other frames for decoding, and which other frames rely on in order to get decoded.
If a video has to be cut or segmented, without transcoding (recompression), then the segmenting can only occur at keyframes, so that the first frame of a segment is a keyframe. If this were not the case, then the frames of a segment till the next keyframe could not be played.
A encoder like x264 typically generates keyframes only if it detects that a scene change has occurred*. This isn't conducive for segmentation, as the keyframes may be generated at irregular intervals. In order to ensure that segments of identical and predictable lengths can be made, the force_key_frames option can be used to ensure desired keyframe placement.
-force_key_frames expr:gte(t,n_forced*5) forces a keyframe at t=5,10,15 seconds...
The GOP size option g is another method to ensure keyframe placement, e.g. -g 50 forces a keyframe every 50 frames.
*subject to minimum and maximum keyframe distance parameters.

What's the difference between the AUDIO_FORMAT_PCM_32_BIT and AUDIO_FORMAT_PCM_8_24_BIT in Android Lollipop?

The AUDIO_FORMAT_PCM_32_BIT and AUDIO_FORMAT_PCM_8_24_BIT are two high definition audio formats in Android Lollipop.
Seems they are all in 32 bit depth.
Who know the exactly difference between them?
You can find that information in audio.h:
/* Audio format consists of a main format field (upper 8 bits) and a sub
format field (lower 24 bits).
AUDIO_FORMAT_PCM_32_BIT and AUDIO_FORMAT_PCM_8_24_BIT are defined as:
AUDIO_FORMAT_PCM_32_BIT = (AUDIO_FORMAT_PCM |
AUDIO_FORMAT_PCM_SUB_32_BIT),
AUDIO_FORMAT_PCM_8_24_BIT = (AUDIO_FORMAT_PCM |
AUDIO_FORMAT_PCM_SUB_8_24_BIT),
And if we look at the definitions of AUDIO_FORMAT_PCM_SUB_32_BIT and AUDIO_FORMAT_PCM_8_24_BIT we find some helpful comments:
AUDIO_FORMAT_PCM_SUB_32_BIT = 0x3, /* PCM signed .31 fixed point */
AUDIO_FORMAT_PCM_SUB_8_24_BIT = 0x4, /* PCM signed 7.24 fixed point */
In response to Michael's comment:
signed .31 means 1 bit for sign, 0 bits for the whole part, and 31 bits for the fractional part. signed 7.24 means 1 bit for sign, 7 bits for the whole part, and 24 bits for the fractional part. Read up on fixed-point arithmetic if you want to know more about how it's used.
AUDIO_FORMAT_PCM_8_24_BIT most likely refers to a padded 8 bits of zeros as the 7.24 fixed point doesn't make sense for PCM data. This is because PCM data ranges from [1.0 .. -1.0]. (it technically should be 8.23, otherwise 7.24 == 25-bits!). So the use of a "whole" [number] part does not make sense.
A single sample of AUDIO_FORMAT_PCM_8_24_BIT will contain 4 bytes, where only 3 bytes will hold any meaningful data and the remaining single byte will be all zeros.
The alternative is AUDIO_FORMAT_PCM_24_BIT_PACKED that only contains 3 bytes per sample and no padding. 24-bit audio has a strange format, and it doesn't fit well in the powers of 2 of digital audio. It is typically easier to handle a 24-bit sample as if it was 32-bit.

how to deal with FFT parameters

I recorded an audio sample, and i want to apply FFT to it,,,
I did all the steps needed in order to use FFT in android such as getting the j-transform library and everything else needed...
and with in the code, i first defined the fft :
DoubleFFT_1D fft = new DoubleFFT_1D(1024);
and inside the code, after reading the audio file( stored as PCM) ... i applied FFT on it by using the following instruction:
fft.complexForward(audio_file_in_double_format);
Here is my question:
First of all the number (1024) used in the parameter of the fft definition, what is it based on? and what does it mean?
Does it mean that the fft will be applied on only 1024 samples?!
And what will be the output of the fft function? i know that it will give complex numbers, so is it gonna give a result double to the input??
I need help understanding how this FFT function works?!
The code is working fine with me, but i need to understand,, because i am inputting the while audio file into the FFT function which is alot bigger than 1024 samples. So is it applying FFT to its first 1024 and ignoring the rest? or what??

How to do Visualizer while recording audio in android

I know Visualizer to show some wave while playing audio using android Media Player.
But i want to show Visualizer while recording audio means while recording i want to show linear wave which changes based on user voice beat.
Is it possible to do in android.
by calling every x milliseconds your MediaRecorder.getMaxAmplitude(), you gonna have (from official documentation) :
the maximum absolute amplitude that was sampled since the last call to
this method.
then, you can process this value in real time to draw a graph or change some view properties.
not perfect, but I hope it helps =)
edit: just so you know, the retrieved value will be the same across all android devices : between 0 and 32767. (I have more than 10k user's reports giving me this value when they blow in the mic).
You may need to use AudioRecorder class instead of MediaRecorder.
Check AudioRecorder#read(...) methods which put audio data to byte[] instead of putting it directly to a file.
To show changes on the graph you will have to analyze the data (which is encoded in PCM 8 or 16 bit - link) and update the graph in real time.
Two important things:
you need to convert live bytes (from mic) to numeric values inorder to plot them.
Since you use a real-time graph, to plot those points
use SurfaceView.
Convert recording bytes to numeric values refer:
Android: Listener to record sound if any sound occurs where you will see the variable "temp" holds the numerical value of your audio.
Plot points
These numeric values which indicates your Y values is plotted against increasing X (time interval) values (0,1,2..) as graph. Using SurfaceView
eg..,
//canvas.drawLine(previous X value,previous Y value,X,Y, paint);
canvas.drawPoint(X,Y,paint);
SurfaceHolder.unlockCanvasAndPost(canvas);
You need not plot all values, for efficiency you can filter those values with your conditions and plot for certain intervals of time.
Hope this helps :)

Android PCM Bytes

I am using the AudioRecord class to analize raw pcm bytes as it comes in the mic.
So thats working nicely. Now i need convert the pcm bytes into decibel.
I have a formula that takes sound presure in Pa into db.
db = 20 * log10(Pa/ref Pa)
So the question is the bytes i am getting from audiorecorder from the buffer what is it is it amplitude pascal sound pressure or what.
I tried to putting the value into te formula but it comes back with very hight db so i do not think its right
thanks
Disclaimer: I know little about Android.
Your device is probably recording in mono at 44,100 samples per second (maybe less) using two bytes per sample. So your first step is to combine pairs of bytes in your original data into two-byte integers (I don't know how this is done in Android).
You can then compute the decibel value (relative to the peak) of each sample by first taking the normalized absolute value of the sample and passing it to your Db function:
float Db = 20 * log10(ABS(sampleVal) / 32768)
A value near the peak (e.g. +32767 or -32768) will have a Db value near 0. A value of 3277 (0.1) will have a Db value of -20; a value of 327 (.01) will have a Db value of -40 etc.
The problem is likely the definition of the "reference" sound pressure at the mic. I have no idea what it would be or if it's available.
The only audio application I've ever used, defined 0db as "full volume", when the samples were at + or - max value (in unsigned 16 bits, that'd be 0 and 65535). To get this into db I'd probably do something like this:
// assume input_sample is in the range 0 to 65535
sample = (input_sample * 10.0) - 327675.0
db = log10(sample / 327675.0)
I don't know if that's right, but it feels right to the mathematically challenged me. As the input_sample approaches the "middle", it'll look more and more like negative infinity.
Now that I think about it, though, if you want a SPL or something that might require different trickery like doing RMS evaluation between the zero crossings, again something that I could only guess at because I have no idea how it really works.
The reference pressure in Leq (sound pressure level) calculations is 20 micro-Pascal (rms).
To measure absolute Leq levels, you need to calibrate your microphone using a calibrator. Most calibrators fit 1/2" or 1/4" microphone capsules, so I have my doubts about calibrating the microphone on an Android phone. Alternatively you may be able to use the microphone sensitivity (Pa/mV) and then calibrate the voltage level going into the ADC. Even less reliable results could be had from comparing the Android values with the measured sound level of a diffuse stationary sound field using a sound level meter.
Note that in Leq calculations you normally use the RMS values. A single sample's value doesn't mean much.
I held my sound level meter right next to the mic on my google ion and went 'Woooooo!' and noted that clipping occurred about 105 db spl. Hope this helps.
The units are whatever units are used for the reference reading. In the formula, the reading is divided by the reference reading, so the units cancel out and no longer matter.
In other words, decibels is a way of comparing two things, it is not an absolute measurement. When you see it used as if it is absolute, then the comparison is with the quietest sound the average human can hear.
In our case, it is a comparison to the highest reading the device handles (thus, every other reading is negative, or less than the maximum).

Categories

Resources