How to calculate the elapsed time from AudioRecord in Android?
What I am trying to do is similar to this: figure 5(Second graph). To further explain myself I'm recording sound in real time then graphing out the pitch and calculating the buffered data within a certain time.
Just calculate it, you know the size of your recorded data and you also should know sample rate, and number of channels. So the comment posted by Michael is correct.
I followed Michael's comment and succesfully calculated the audio length in seconds after recording it.
int SAMPLE_RATE_IN_HZ = 44100;
int AUDIO_FORMAT = AudioFormat.ENCODING_PCM_16BIT; // Guaranteed to be supported by devices
int CHANNEL_CONFIG = AudioFormat.CHANNEL_IN_MONO; // Is guaranteed to work on all devices
File audioFile = yourRecordingLogicHere();
long audioLengthInSeconds = audioFile.length() / (2 * SAMPLE_RATE_IN_HZ);
I record in mono that's why there is one *2 multiplication less than in Michael's comment.
Related
I see many resources recommending that AudioTrack.getTimestamp() be used on modern Android versions to calculate audio latency for audio/video sync.
For instance:
https://stackoverflow.com/a/37625791/332798
https://developer.amazon.com/docs/fire-tv/audio-video-synchronization.html#section1-1
https://groups.google.com/forum/#!topic/android-platform/PoHfyNK54ps
However, none of these explain how to use the timestamp to calculate the latency? I'm struggling to figure what to do with the timestamp's framePosition/nanoTime to come up with a latency number.
So prior to this API, you would use AudioTrack.getPlaybackHeadPosition() which was just an approximation. Thus, to account for latency you had to offset that value with a latency value from one of two hidden methods: AudioManager.getOutputLatency() or AudioTrack.getLatency().
With the new AudioTrack.getTimestamp() API, you get a snapshot of the playhead position at a given time, taken directly at the output. As such, it is fully accurate and already accounts for device latency. Thus there's no need to call any other APIs now to add/remove latency.
The caveat is that this timestamp is only a snapshot, and the docs recommend you don't call this new method very often. So the trick to getting the "current" position is to use your last snapshot and linearly interpolate what the current value should be:
playheadPos = timestamp.framePosition +
(System.nanoTime() - timestamp.nanoTime) * samplerate / 1e9;
This position can then be compared against how many frames you've written into the AudioTrack, by maintaining another counter which increments every time AudioTrack.write() completes:
int bytesWritten = track.write(...);
writtenPos += bytesWritten / pcmFrameSize;
If you're working with ENCODING_AC3, the playhead position reported by AudioTrack is still in terms of samples. You will either need to convert it to bytes, or convert the number of bytes you've written in back into samples. Either way, you will need to know the bitrate of your AC3 stream (i.e. 384000bps)
int bytesWritten = track.write(...);
writtenPos += bytesWritten * samplerate / (bitrate / 8);
I am using AudioTrack class to play a stream of raw sound data:
AudioTrack audioTrack;
int sampleRate = 11025;
int channelConfigIn = AudioFormat.CHANNEL_IN_MONO;
int channelConfigOut = AudioFormat.CHANNEL_OUT_STEREO;
int audioFormat = AudioFormat.ENCODING_PCM_16BIT;
.....
int bufferSize = AudioTrack.getMinBufferSize(sampleRate,channelConfigOut,audioFormat);
audioTrack = new AudioTrack(AudioManager.STREAM_VOICE_CALL,sampleRate,channelConfigOut,audioFormat,bufferSize,AudioTrack.MODE_STREAM);
audioTrack.play();
Then on a separate thread:
while(true)
{
short [] buffer = new short[14500];
//fill buffer with sound data
long time = System.currentTimeMillis();
audioTrack.write(buffer,0,14500);
Log.i("time",(System.currentTimeMillis() - time) + "");
}
My problem is that the log always shows that the write method blocks for about 0.6 second, that is the same as the played sound length (14500 samples), moreover the phone is not responding during the playback, the main tread almost can not do anything anyone can help...
You are using the blocking version of the write() method. You can use write(float[],int,int,int) instead, passing WRITE_NON_BLOCKING as the 4th parameter, but that would only write as much data as can fit in the play buffer. I generally prefer the approach you already have (dedicating a thread to writing). You should expect each call to block; after all, the play buffer can only fit so much data, and to make more space, sound must be played (which takes time). 14500 samples is ~1.3 seconds of sound at your chosen sample rate, so I'm guessing it takes you about .7 seconds to fill buffer each time.
I cannot tell, based on the code presented, why your UI thread is not responding.
I'm trying to calculate the exact song duration of an mp3 file in Android.
I tried using the calculation: song duration = filesize / bitrate, which produces extremely close results, but I'd like to calculate it more precisely.
I found a solution in Java here where it suggests this calculation: song duration = filesize / (framesize * framerate) and provides this example code:
AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(file);
AudioFormat format = audioInputStream.getFormat();
long audioFileLength = file.length();
int frameSize = format.getFrameSize();
float frameRate = format.getFrameRate();
float durationInSeconds = (audioFileLength / (frameSize * frameRate));
However, AudioInputStream is no longer supported in Android. So then I tried figuring out how to obtain the frame size and the frame rate with Android, but I can't find any solutions there either. Any ideas for how to calculate this in Android?
As an aside, using MediaMetadataRetriever.METADATA_KEY_DURATION as suggested here doesn't produce the correct song duration for me. The reason is because I'm streaming my mp3 file, and attempting to calculate the song duration strictly from the file's header information. That way, I can correctly update my progress bar very early on in the streaming process.
I am currently writing some code for a sample sequencer in Android. I am using the AudioTrack class. I have been told the only proper way to have accurate timing is to use the timing of the AudioTrack. EG I know that if I write a buffer of X samples to AudioTrack playing at a rate of 44100 samples per second, that the time to write will be (1/44100)X secs.
Then you use that info to know what samples should be written when.
I am trying to implement my first attempt using this approach. I am using only one sample and am writing it as continuous 16th notes at a tempo of 120bpm. But for some reason it is playing at a rate of 240bpm.
First I checked my code to derive the time of a 16th (nanoseconds) note at tempo X. It checks outs.
private void setPeriod()
{
period=(int)((1/(((double)TEMPO)/60))*1000);
period=(period*1000000)/4;
Log.i("test",String.valueOf(period));
}
Then I verified that my code to get the time for my buffer to be played at 44100khz in nanoseconds and it is correct.
long bufferTime=(1000000000/SAMPLE_RATE)*buffSize;
So now I am left thinking that the audio track is playing at a rate that is different from 44100. Maybe 96000khz, which would explain the doubling of speed. But when I instantiate the audioTrack, it was indeed set to 44100khz.
final int SAMPLE_RATE is set to 44100
buffSize = AudioTrack.getMinBufferSize(SAMPLE_RATE, AudioFormat.CHANNEL_OUT_MONO,
AudioFormat.ENCODING_PCM_16BIT);
track = new AudioTrack(AudioManager.STREAM_MUSIC, SAMPLE_RATE,
AudioFormat.CHANNEL_OUT_MONO,
AudioFormat.ENCODING_PCM_16BIT,
buffSize,
AudioTrack.MODE_STREAM);
So I am confused as to why my tempo is being doubled. I ran a debug to compare time elapsed audioTrack to time elapsed system time, and the it seems that the audiotrack is indeed playing twice as fast as it should be. I am confused.
Just to make sure, this is my play loop.
public void run() {
// TODO Auto-generated method stub
int buffSize=192;
byte[] output = new byte[buffSize];
int pos1=0;//index for output array
int pos2=0;//index for sample array
long bufferTime=(1000000000/SAMPLE_RATE)*buffSize;
long elapsed=0;
int writes=0;
currTrigger=trigger[triggerPointer];
Log.i("test","period="+String.valueOf(period));
Log.i("test","bufferTime="+String.valueOf(bufferTime));
long time=System.nanoTime();
while(play)
{
//fill up the buffer
while(pos1<buffSize)
{
output[pos1]=0;
if(currTrigger&&pos2<sample.length)
{
output[pos1]=sample[pos2];
pos2++;
}
pos1++;
}
track.write(output, 0, buffSize);
elapsed=elapsed+bufferTime;
writes++;
//time passed is more than one 16th note
if(elapsed>=period)
{
Log.i("test",String.valueOf(writes));
Log.i("test","elapsed A.T.="+String.valueOf(elapsed)+" elapsed S.T.="+String.valueOf(System.nanoTime()-time));
time=System.nanoTime();
writes=0;
elapsed=0;
triggerPointer++;
if(triggerPointer==16)
triggerPointer=0;
currTrigger=trigger[triggerPointer];
pos2=0;
}
pos1=0;
}
}
}
edited : rephrased and updated due to initial erroneous assumption that system time was used to synchronize sequenced audio :)
As for audio playing back at twice the speed, this is a bit strange as the "write"-method of the AudioTrack is blocking until the native layer has enqueued the next buffer, are you sure the render loop isn't invoked from two different sources (although I assume from your example you invoke the loop from within a thread).
However, what is certain is that there is a time synchronization issue to address: the problem herein lies with the calculation of the buffer time you use in your example:
(1000000000/SAMPLE_RATE)*buffSize;
Which will always return 4353741 at a buffer size of 192 samples at a sample rate of 44100 Hz, thus disregarding any cues in tempo (for instance this will be the same at 300 BPM or 40 BPM), Now, in your example this doesn't have any consequences for the actual syncing per se, but I'd like to point this out as we'll return to it shortly further on in this text.
Also, nanoseconds are a nicely precise unit, but too much as milliseconds will suffice for audio operations. As such, I will continue the illustration in milliseconds.
Your calculation for the period of a 16th note at 120 BPM indeed checks out at the correct value of 125 ms. The previously mentioned calculation for the period corresponding to each buffer size is 4.3537 ms. This indicates you will iterate the buffer loop 28.7112 times before the time of a single sixteenth note passes. In your example however, you check whether the "offset" for this sixteenth note has passed at the END of the buffer iteration loop (where the period for a single buffer has already been added to the elapsed time!), by using:
elapsed>=period
Which will already lead to drift at the first occasion as at this moment "elapsed" would be at (192 * 29 iterations) 5568 samples (or 126.26 ms), rather than at (192 * 28.7112 iterations) 5512 samples (or 126 ms). This is a difference of 56 samples (or when speaking in time : 1.02 ms). This wouldn't of course lead to samples playing back FASTER than expected (as you stated), but already leads to a irregularity in playback. For the second 16th note (which would occur at the 57.4224th iteration, the drift would be 11136 - 11025 = 111 samples or 2.517 ms (more than half your buffer time!) As such, you must perform this check WITHIN the
while(pos1<buffSize)
loop, where you are incrementing the read pointer up until the size of the buffer has been reached. As such you will need to increase another variable by a fraction of the buffer period PER buffer sample.
I hope the above example illustrates why I'd initially proposed counting time by sample iterations rather than elapsed time (of course the samples DO indicate time, as they are merely translations of a unit of time to an amount of samples in a buffer, but you can use these numbers as the markers, rather than adding a fixed interval to a counter as in your render loop).
First of all, some convenience math to help you with getting these values :
// calculate the amount of samples are necessary for storing the given length of time
// ( in milliSeconds ) at the given sample rate ( in Hz )
int millisecondsToSamples( int milliSeconds, int sampleRate )
{
return ( int ) ( milliSeconds * ( sampleRate / 1000 ));
}
OR : These calculations which are more convenient when thinking in a musical context like you mentioned in your post. Calculate the amount of samples that are present in a single bar of music at the given sample rate ( in Hz ), tempo ( in BPM ) and time signature ( timeSigBeatUnit being the "4" and timeSigBeatAmount being the "3" in a time signature of 3/4 - although most sequencers limit themselves to 4/4 I've added the calculation for explaining the logic).
int samplesPerBeat = ( int ) (( sampleRate * 60 ) / tempo );
int samplesPerBar = samplesPerBeat * timeSigBeatAmount;
int samplesPerSixteenth = ( int ) ( samplesPerBeat / 4 ); // 1/4 of a beat being a 16th
etc.
The way you then write the timed samples into the output buffer is by keeping track of the "playback position' in your buffer callback, i.e. each time you write a buffer, you'll be incrementing the playback position with the length of the buffer. Returning to a musical context: if you were to be "looping a single bar of 120 bpm in 4/4 time", when the playback position would exceed (( sampleRate * 60 ) / 120 * 4 = 88200 samples, you reset it to 0 to "loop" from the beginning.
So let's assume you have two "events" of audio that occur in a sequence of a single bar of 4/4 time at 120 BPM. One event is to play on the 1st beat of a bar and lasts for a quaver (1/8 of a bar) and the other is to play on the 3rd beat of a bar and lasts for another quaver. These two "events" (which you could represent in a value object) would have the following properties, for the first event:
int start = 0; // buffer position 0 is at the 1st beat/start of the bar
int length = 11025; // 1/8 of the full bar size
int end = 11025; // start + length
and the second event:
int start = 44100; // 3rd beat (or half-way through the bar)
int length = 11025;
int end = 55125; // start + length
These value objects could have two additional properties such as "sample" which could be the buffer containing the actual audio and "readPointer" which would hold the last sample-buffer index the sequencer has read from last.
Then in the buffer write loop:
int playbackPosition = 0; // at start of bar
int maximumPlaybackPosition = 88200; // i.e. a single bar of 4/4 at 120 bpm
public void run()
{
// loop through list of "audio events" / samples
for ( CustomValueObject audioEvent : audioEventList )
{
// loop through the buffer length this cycle will write
for ( int i = 0; i < bufferSize; ++i )
{
// calculate "sequence position" from playback position and current iteration
int seqPosition = playbackPosition + i;
// sequence position within start and end range of audio event ?
if ( seqPosition >= audioEvent.start && seqPosition <= audioEvent.end )
{
// YES! write its sample content into the output buffer
output[ i ] += audioEvent.sample[ audioEvent.readPointer ];
// update the sample read pointer to the next slot (but keep in bounds)
if ( ++audioEvent.readPointer == audioEvent.length )
audioEvent.readPointer = 0;
}
}
// update playback position and keep within sequencer range for looping
if ( playbackPosition += bufferSize > maximumPosition )
playbackPosition -= maximumPosition;
}
}
This should give you a perfectly timed approach in writing audio. There's still some magic you have to work out when you're hitting the iteration where the sequence will loop (i.e. read the remaining unprocessed buffer length from the start of the sample for seamless looping) but I hope this gives you a general idea on a working approach.
Here is my code
private static int Fs = 44100;
private byte recorderAudiobuffer[] = new byte [1024];
AudioRecord recorder = new AudioRecord(AudioSource.MIC, Fs, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, 4096);
//start recorder
recorder.startRecording();
A value is identified by bin[n] and bin[n+1] of recorderAudiobuffer[].
What unit are these values?
The difference between the recordAudiobuffer[] elements would be 1/44100 second, since your sample rate is 44100 Hz.
As for the value of the bytes they hold, it could mean any sound level. If the sensitivity is low, the maximum value for the bytes could be, I don't know, 12, even if the sound you're recording is loud. On the other hand, with the sensitivity turned up, 255 could be a whisper, and it's railed after that.
This is more an audio question :), but here it goes, to be simple as much as possible, sound data has two dimensions, one is time based and other is sample based, so you have a sound separated on 44100 * (sample rate for example 16bits per amplitude) you have 705600 bits needed for one second or 88200, and in this case if you have a same sampling you will have 11.6 something milliseconds of buffer.
And of course you are asking about amplitude values.
Hope this helps and enjoy your work.