Buffer calculation in AudioHardwareALSA::getInputBufferSize(...) - android

I'm looking at getInputBufferSize(...) function in AudioHardwareALSA.cpp and it returns hardcoded the value of 320. My question is: How is this value calculated?
I've done some pre-cals but still there are some questions.
sample_rate = 8000
format = S16_LE = 2 bytes/sample
period_time = 10000 us (guessing)
buffer_size = 2 * period_size
period_size = period_time * bytes/sec
buffer_size = 2 * (0.01 * sample_rate * 2) = 320 bytes.
I can't find the period_time in the code, so one question is: where is it defined or is just a rough calculation?
I'm also trying to add some more sample rates i.e 16000 and 32000 (later maybe more). How to calculate the right minimum buffer size? Is the delay always 10 ms for all the sample rates?
Any help is appreciated.

I believe Google implemented NB-AMR encode to start with. later they added support for AAC. In the case of NB-AMR, the frame size is 320 bytes.
You may be aware that for NB-AMR:
sampling rate = 8000 samples / sec
frame duration = 20ms
sample size = 2 bytes
channels = mono
So, each frame contains
8000 samples / sec * 0.02 sec * 2 bytes / sample / channel * 1 channels = 320 bytes
For AAC, these parameters are different and hence the framesize

Related

How to confirm opus encode buffer size?

Function opus_encode need frame size as parameter. in api doc it says buffer size is number of samples per channel.
But how to determine which size should i use?
I use opus in android. sample rate 16k, buffer size 1280. when i set frame size to 640 in encode and decode, the length of decoded file is half of raw pcm. when i set to 960, decoded file is 2/3 of raw pcm. but set to 1280, encode will return -1 as arg error.
When i use cool edit to play decoded, it is faster than raw pcm.
there must be something about my parameters.
Is anyone using opus can help me.
Thanks a lot.
Opus encode definition:
opus_int32 opus_encode ( OpusEncoder * st,
const opus_int16 * pcm,
int frame_size,
unsigned char * data,
opus_int32 max_data_bytes )
When you specify frame_size you need to set it to number of samples per one channel available in pcm buffer.
OPUS codec supports stereo and mono signals and corresponding configuration of encoder is channels parameter that you specify when you call opus_encoder_create function.
Also you need to know about supported frame sizes by OPUS codec. It supports frames with: 2.5, 5, 10, 20, 40 or 60 ms of audio data.
One millisecond of audio with 16kHz is 16 samples (16000/1000). So for mono you can specify frame_size set to:
16 * 2.5 = 40 (very rare)
16 * 5 = 80 (rare)
16 * 10 = 160
16 * 20 = 320
16 * 40 = 640
16 * 60 = 960
OPUS codec will not accept another sizes. The best way to deal with buffer size of 1280 samples is to split on four 20ms packets or two 40ms packets.
So you encode two or four packets from one buffer received from buffer.

Android SuperPowered SDK Audio - Frequency Domain example - memset frequency manipulation

I'm trying to understand the Superpowered SDK, but new to both Android and C++, as well as audio signals. I have Frequency Domain example from here:
https://github.com/superpoweredSDK/Low-Latency-Android-Audio-iOS-Audio-Engine/tree/master/Examples_Android/FrequencyDomain
running on my Nexus 5X. In the FrequencyDomain.cpp file:
static SuperpoweredFrequencyDomain *frequencyDomain;
static float *magnitudeLeft, *magnitudeRight, *phaseLeft, *phaseRight, *fifoOutput, *inputBufferFloat;
static int fifoOutputFirstSample, fifoOutputLastSample, stepSize, fifoCapacity;
#define FFT_LOG_SIZE 11 // 2^11 = 2048
static bool audioProcessing(void * __unused clientdata, short int *audioInputOutput, int numberOfSamples, int __unused samplerate) {
SuperpoweredShortIntToFloat(audioInputOutput, inputBufferFloat, (unsigned int)numberOfSamples); // Converting the 16-bit integer samples to 32-bit floating point.
frequencyDomain->addInput(inputBufferFloat, numberOfSamples); // Input goes to the frequency domain.
// In the frequency domain we are working with 1024 magnitudes and phases for every channel (left, right), if the fft size is 2048.
while (frequencyDomain->timeDomainToFrequencyDomain(magnitudeLeft, magnitudeRight, phaseLeft, phaseRight)) {
// You can work with frequency domain data from this point.
// This is just a quick example: we remove the magnitude of the first 20 bins, meaning total bass cut between 0-430 Hz.
memset(magnitudeLeft, 0, 80);
memset(magnitudeRight, 0, 80);
I understand how the first 20 bins is 0-430 Hz from here:
How do I obtain the frequencies of each value in an FFT?
but I don't understand the value of 80 in memset... being 4*20, is it 4 bytes for a float * 20 bins? Does magnitudeLeft hold data for all the frequencies? How would I then remove, for example, 10 bins of frequencies from the middle or the highest from the end? Thank you!
Every value in magnitudeLeft and magnitudeRight is a float, which is 32-bits, 4 bytes.
memset takes a number of bytes parameter, so 20 bins * 4 bytes = 80 bytes.
memset clears the first 20 bins this way.
Both magnitudeLeft and magnitudeRight represents the full frequency range with 1024 floats. Their size is always FFT size divided by two, so 2048 / 2 in the example.
Removing from the middle and the top looks something like:
memset(&magnitudeLeft[index_of_first_bin_to_remove], 0, number_of_bins * sizeof(float));
Note that the first parameter is not multiplied with sizeof(float), because the compiler knows that magnitudeLeft is a float, so it will automatically input the correct address.

MediaCodec sub-second video length

I'm using MediaCodec to encode an image into a video. It is a requirement that videos can have sub-second length (e.g. 3.5 seconds long).
My thinking in order to achieve this is is to determine the video frame rate like so.
int lengthInMillis = 3500;
float seconds = lengthInMillis / 1000f;
int ordinal = (int) seconds; // ordinal == 3
float point = seconds - ordinal;
float numFrames = seconds / point; // numFrames == 7
float fps = seconds / numFrames; // fps = 0.7
this.numFrames = (int) numFrames;
Unfortunately when attempting to configure the MediaCodec with a KEY_FRAME_RATE of less than 1 an IllegalStateException. So this method doesn't work. Is it possible to use MediaCodec to create a video with a running time that ends at a fraction of a second?
The length of the video, and the frame rate of the video, are not related.
A 3.5-second video with 7 frames is running at 2 fps, not 0.7 fps. You should be computing "frames per second" as "frames / seconds", not seconds / numFrames.
In any event, the frame-rate value is actually deprecated in API 21:
Note: On LOLLIPOP, the format to MediaCodecList.findDecoder/EncoderForFormat must not contain a frame rate. Use format.setString(MediaFormat.KEY_FRAME_RATE, null) to clear any existing frame rate setting in the format.

getMinBufferSize() when recording more than 1 sec with AudioRecord

I´m a bit confused using getMinBufferSize() and AudioRecord.read() while recording from the MIC of the phone.
I understand that getMinBufferSize() gives you the minimun amount of bytes required to create the audiorecord object (in 1 sec?).
bufferSize= AudioRecord.getMinBufferSize(RECORDER_SAMPLERATE,
RECORDER_CHANNELS,
RECORDER_AUDIO_ENCODING);
Then, when they call AudioRecord.read(), they have as an argument for the size of the bytes read "bufferSize".
read = recorder.read(data, 0, bufferSize);
Here are my questions:
1- Why bufferSize returns me 8192? I guess it´s making 8*1024 but I would like to know exactly what is the calculation that it is making (I´m using 8000 Hz sample rate, channel MONO and 16-bit PCM)
2- I suppose that bufferSize is the amount of data that I can store in 1 sec of duration but, what if I want to read more than 1 sec? Should I multiply this value to the number of seconds?
I guess you have a size of an array 8192
Since you encode your file in 16bit-PCM, the array size will be 16bit * 8192 which is around 130000
data capacity made in a second is 128000 ( = 8000 * 1 * 16)
so it becomes your min buffer size

Read and transmit wav files in Android in a way similar to Chirp

I want to read each character from an existing wav file and assign it to a certain frequency.
I specifically want to transfer WAV files over sound from a phone to another like the "Chirp" android application.
And for that I need to map all the data to certain frequencies and play the generated tone so that the other phone can decode it and reconstitute the wav file.
Take a look at this: chirp.io/tech
For example the first line of a wave file is :
52 49 46 46 E0...
my idea is to do like:
5--> 100hz
2--> 200hz
4-->300hz
...
Is their a way to split them without changing the data?
i think i should mention that my wav file is formatted as:
static int sampleRate=44100;
static int numSample=duration*sampleRate;
long mySubChunk1Size = 16;
static short myBitsPerSample= 16;
int myFormat = 1;
static int myChannels = 1;
long myByteRate = sampleRate * myChannels * myBitsPerSample/8;
int myBlockAlign = myChannels * myBitsPerSample/8;
long myChunk2Size = generatedSnd.length* myChannels * myBitsPerSample/8;
long myChunkSize = 36 + myChunk2Size;
The way you are trying to do it is simply... naive.
I tell you why:
0x59 is a byte (decimal: 89) but you want 2 different sounds from it (0x50 and 0x09)?
It doesn't seem a good idea, since you would have 2 frequencies (for the LSB and the MSB).
Moreover, you will need to map all the 256 byte values you can have in a file,
from 0x00 to 0xFF (decimal: 255).
Plus, assigning 0x50 (decimal: 80) 100 Hz and 0x009 (decimal: 9) 200 Hz, again doesn't
make much sense to me...
Now, a possible way to do that:
I would implement an algorithm such as (byte * 32) + 440 that gives me:
440 Hz for 0x00, which is (0 * 32) + 440
...
3288 Hz for 0x59, which is (89 * 32) + 440
...
8600 Hz for 0xFF, which is (255 * 32) + 440.
All bytes will be "encoded" into frequencies in the audible spectrum.
In much a similar way to that used in the aforementioned "Chirp" method.
And I don't have to tell it that 0 is 440 Hz (nor any other association), the algorithm makes it for me.
Moreover, you can sen out (and receive) any kind of file. Not bad.
[EDIT]
Since the media (acoustic speakers/microphones) are limited to the Low Frequency range (audible sounds), you have to use audible sounds.
If you were to use Radio transmission, then you could use High Frequencies as well.
Since it really depends on construction quality. I suggested a "safe" range every speaker/microphone coupling will be able to deal with.
For reference on audible tones: http://en.wikipedia.org/wiki/Audio_frequency

Categories

Resources