So I've read various posts and blogs on how to get the amplitude from the microphone on Android. Multiple posts suggested using this implementation:
minSize = AudioRecord.getMinBufferSize(
44100,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT
);
audioRecord = AudioRecord(
MediaRecorder.AudioSource.MIC,
44100,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT,
minSize
);
val buffer = ShortArray(minSize)
audioRecord.read(buffer, 0, minSize)
var max = 0
for (s in buffer) {
if (abs(s.toInt()) > max) {
max = abs(s.toInt())
}
}
return max.toDouble()//This being the amplitude
First question: What is the measurement of the value I am getting returned? An example of the value could typically be 381 with "regular noise" e.g. milliVolts(mV)?
On iOS you are able to get averagePower and peakPower from the AudioRecorder which returns the average or max amplitude in dbFS.
Second question: Is it possible to do the same implementation that we have on Android on iOS?
Third question: Is it possible to do the same implementation that we have on iOS on Android?
To provide some context; As part of a research project we are looking for different sound patterns that might link a user to a specific context and in order to do so on a larger scale we need to be able to compare the amplitude from Android and iOS.
Fourth question: Regardless of the implementations mentioned above, is there a better way to compare soundwaves from microphones of both iOS and Android devices?
The easiest way to get the max amplitude on Android is a lot less code. Just call MediaRecorder.getMaxAmplitude(). It returns the max amplitude since the previous call. I'm not sure where you got the suggestion to use the call you did, that seems like the hard way.
The units aren't specified. It should correspond to the amount of pressure picked up by the mic, but as every model will have different mics, amps, DACs, etc it isn't going to be the same on all devices. All you can be promised is it will be between 0 and (2^x)-1 where x is the number of bits per sample you picked. I'm not sure why you'd think it would be in millivolts, this isn't an electrical measurement. Sound is typically measured in dB or pascals.
For comparing iOS to Android and trying to do matching- what are you trying to do? The code you have is just finding the max value. That's kind of uninteresting, unless all you're doing is an applause meter. If you're actually looking to compare soundwaves wouldn't you be better off taking the fourier transform and doing so in the frequency domain? The time domain is really messy for that.
Related
I'm currently trying to playback audio using AudioTrack. Audio is received over the network and application continuously read data and add to an internal buffer. A separate thread is consuming data and using AudioTrack to playback.
Problems:
Audio playback fluctuate (feels like audio drop at a regular interval) continuously making it unclear.
Playback speed is too high or too low making them unrealistic.
In order to avoid the network latency and other factors I made the application to wait till it read enough data and playback at the end.
This makes the audio to play really fast. Here is a basic sample of logic I use.
sampleRate = AudioTrack.getNativeOutputSampleRate(AudioManager.STREAM_MUSIC);
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT),
AudioTrack.MODE_STREAM);
audioTrack.play();
short shortBuffer[] = new short[AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT)];
while (!stopRequested){
readData(shortBuffer);
audioTrack.write(shortBuffer, 0, shortBuffer.length, AudioTrack.WRITE_BLOCKING);
}
Is it correct to say that Android AudiTrack class doesn't have in built functionality to control the audio playback based on environment conditions? If so, are there better libraries available with a simplified way for audio playback?
The first issue that I see, it is an arbitrary sampling rate.
AudioTrack.getNativeOutputSampleRate will return the sampling rate that used by the sound system. It may be 44100, 48000, 96000, 192000 or whatever. But looks like you have audio data from some independent source, which produces the data on the very exact sampling rate.
Let's say audio data from the source is sampled at 44100 samples per second. If you start playing it at 96000 it will be speeded up and higher pitched.
So, use the sampling rate setting, along with the number of channels, sample format etc, as it given by the source, not relying on system defaults.
The second: are you sure the readData procedure always will be fast enough to successfully fill the buffer, whatever small the buffer is, and return back faster than the buffer is played?
You have created AudioTrack with AudioTrack.getMinBufferSize passed as bufferSizeInBytes parameter.
The getMinBufferSize function returns a minimum possible size of the buffer that can be used at this parameter. Let's say it returned the size corresponding to a buffer of 10ms length.
That means the new data should be prepared within this time interval. I.e. The time interval between previous write returned control and new write is performed should be less than the time size of the buffer.
So, if the readData function may delay for some reason longer than that time interval, the playback will be paused for that time, you'll hear small gaps in the playback.
The reasons why readData may delay could be various: if it's reading data from the file, then it may delay waiting for IO operations; if it allocates java objects, it may be bumped into garbage collector's delay; if it uses some kind of decoder of another kind of audio source which uses it's own buffering, it may periodically delay refilling the buffer.
But anyway, if you're not creating some kind of real-time synthesizer which should react as soon as possible to the user input, always use the buffer size reasonably high, but not less than getMinBufferSize returned. I.e.:
sampleRate = 44100;// sampling rate of the source
int bufSize = sampleRate * 4; // 1 second length; 4 - is the frame size: 2 chanels * 2 bytes per each sample
bufSize = max(bufSize, AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT)); // Not less than getMinBufferSize returns
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
bufSize,
AudioTrack.MODE_STREAM);
Like user #pskink said,
Most likely your sampleRate (or any other parameter passed to the
AudioTrack constructor) is invalid.
So I would start by checking what value you are actually setting the sample rate.
For reference, you can also set the speed of AudioTrack by calling the setPlayBackParams method:
public void setPlaybackParams (PlaybackParams params)
If you check the AudioTrack docs, you can see the PlaybackParams docs and can set the speed and pitch of the output audio. This object can then be passed to set the playback parameters within your AudioTrack object.
However, it is unlikely that you will need to use this if your only issue is the original constructor sampleRate (since we cannot see where the variable sampleRate comes from).
I see many resources recommending that AudioTrack.getTimestamp() be used on modern Android versions to calculate audio latency for audio/video sync.
For instance:
https://stackoverflow.com/a/37625791/332798
https://developer.amazon.com/docs/fire-tv/audio-video-synchronization.html#section1-1
https://groups.google.com/forum/#!topic/android-platform/PoHfyNK54ps
However, none of these explain how to use the timestamp to calculate the latency? I'm struggling to figure what to do with the timestamp's framePosition/nanoTime to come up with a latency number.
So prior to this API, you would use AudioTrack.getPlaybackHeadPosition() which was just an approximation. Thus, to account for latency you had to offset that value with a latency value from one of two hidden methods: AudioManager.getOutputLatency() or AudioTrack.getLatency().
With the new AudioTrack.getTimestamp() API, you get a snapshot of the playhead position at a given time, taken directly at the output. As such, it is fully accurate and already accounts for device latency. Thus there's no need to call any other APIs now to add/remove latency.
The caveat is that this timestamp is only a snapshot, and the docs recommend you don't call this new method very often. So the trick to getting the "current" position is to use your last snapshot and linearly interpolate what the current value should be:
playheadPos = timestamp.framePosition +
(System.nanoTime() - timestamp.nanoTime) * samplerate / 1e9;
This position can then be compared against how many frames you've written into the AudioTrack, by maintaining another counter which increments every time AudioTrack.write() completes:
int bytesWritten = track.write(...);
writtenPos += bytesWritten / pcmFrameSize;
If you're working with ENCODING_AC3, the playhead position reported by AudioTrack is still in terms of samples. You will either need to convert it to bytes, or convert the number of bytes you've written in back into samples. Either way, you will need to know the bitrate of your AC3 stream (i.e. 384000bps)
int bytesWritten = track.write(...);
writtenPos += bytesWritten * samplerate / (bitrate / 8);
In the application which I want to create, I face some technical obstacles. I have two music tracks in the application. For example, a user imports the music background as a first track. The second path is a voice recorded by the user to the rhythm of the first track played by the speaker device (or headphones). At this moment we face latency. After recording and playing back in the app, the user hears the loss of synchronisation between tracks, which occurs because of the microphone and speaker latencies.
Firstly, I try to detect the delay by filtering the input sound. I use android’s AudioRecord class, and the method read(). This method fills my short array with audio data.
I found that the initial values of this array are zeros so I decided to cut them out before I will start to write them into the output stream.
So I consider those zeros as a „warmup” latency of the microphone. Is this approach correct? This operation gives some results, but it doesn’t resolve the problem, and at this stage, I’m far away from that.
But the worse case is with the delay between starting the speakers and playing the music. This delay I cannot filter or detect. I tried to create some calibration feature which counts the delay. I play a „beep” sound through the speakers, and when I start to play it, I also begin to measure time. Then, I start recording and listen for this sound being detected by the microphone. When I recognise this sound in the app, I stop measuring time. I repeat this process several times, and the final value is the average from those results. That is how I try to measure the latency of the device. Now, when I have this value, I can simply shift the second track backwards to achieve synchronisation of both records (I will lose some initial milliseconds of the recording, but I skip this case, for now, there are some possibilities to fix it).
I thought that this approach would resolve the problem, but it turned out this is not as simple as I thought. I found two issues here:
1. Delay while playing two tracks simultaneously
2. Random in device audio latency.
The first: I play two tracks using AudioTrack class and I run method play() like this:
val firstTrack = //creating a track
val secondTrack = //creating a track
firstTrack.play()
secondTrack.play()
This code causes delays at the stage of playing tracks. Now, I don’t even have to think about latency while recording; I cannot play two tracks simultaneously without delays. I tested this with some external audio file (not recorded in my app) - I’m starting the same audio file using the code above, and I can see a delay. I also tried it with MediaPlayer class, and I have the same results. In this case, I even try to play tracks when callback OnPreparedListener invoke:
val firstTrack = //AudioPlayer
val secondTrack = //AudioPlayer
second.setOnPreparedListener {
first.start()
second.start()
}
And it doesn’t help.
I know that there is one more class provided by Android called SoundPool. According to the documentation, it can be better with playing tracks simultaneously, but I can’t use it because it supports only small audio files and that can't limit me.
How can I resolve this problem? How can I start playing two tracks precisely at the same time?
The second: Audio latency is not deterministic - sometimes it is smaller, and sometimes it’s huge, and it’s out of my hands. So measuring device latency can help but again - it cannot resolve the problem.
To sum up: is there any solution, which can give me exact latency per device (or app session?) or other triggers which detect actual delay, to provide the best synchronisation while playback two tracks at the same time?
Thank you in advance!
Synchronising audio for karaoke apps is tough. The main issue you seem to be facing is variable latency in the output stream.
This is almost certainly caused by "warm up" latency: the time it takes from hitting "play" on your backing track to the first frame of audio data being rendered by the audio device (e.g. headphones). This can have large variance and is difficult to measure.
The first (and easiest) thing to try is to use MODE_STREAM when constructing your AudioTrack and prime it with bufferSizeInBytes of data prior to calling play (more here). This should result in lower, more consistent "warm up" latency.
A better way is to use the Android NDK to have a continuously running audio stream which is just outputting silence until the moment you hit play, then start sending audio frames immediately. The only latency you have here is the continuous output latency.
If you decide to go down this route I recommend taking a look at the Oboe library (full disclosure: I am one of the authors).
To answer one of your specific questions...
Is there a way to calculate the latency of the audio output stream programatically?
Yes. The easiest way to explain this is with a code sample (this is C++ for the AAudio API but the principle is the same using Java AudioTrack):
// Get the index and time that a known audio frame was presented for playing
int64_t existingFrameIndex;
int64_t existingFramePresentationTime;
AAudioStream_getTimestamp(stream, CLOCK_MONOTONIC, &existingFrameIndex, &existingFramePresentationTime);
// Get the write index for the next audio frame
int64_t writeIndex = AAudioStream_getFramesWritten(stream);
// Calculate the number of frames between our known frame and the write index
int64_t frameIndexDelta = writeIndex - existingFrameIndex;
// Calculate the time which the next frame will be presented
int64_t frameTimeDelta = (frameIndexDelta * NANOS_PER_SECOND) / sampleRate_;
int64_t nextFramePresentationTime = existingFramePresentationTime + frameTimeDelta;
// Assume that the next frame will be written into the stream at the current time
int64_t nextFrameWriteTime = get_time_nanoseconds(CLOCK_MONOTONIC);
// Calculate the latency
*latencyMillis = (double) (nextFramePresentationTime - nextFrameWriteTime) / NANOS_PER_MILLISECOND;
A caveat: This method relies on accurate timestamps being reported by the audio hardware. I know this works on Google Pixel devices but have heard reports that it isn't so accurate on other devices so YMMV.
Following the answer of donturner, here's a Java version (that also uses other methods depending on the SDK version)
/** The audio latency has not been estimated yet */
private static long AUDIO_LATENCY_NOT_ESTIMATED = Long.MIN_VALUE+1;
/** The audio latency default value if we cannot estimate it */
private static long DEFAULT_AUDIO_LATENCY = 100L * 1000L * 1000L; // 100ms
/**
* Estimate the audio latency
*
* Not accurate at all, depends on SDK version, etc. But that's the best
* we can do.
*/
private static void estimateAudioLatency(AudioTrack track, long audioFramesWritten) {
long estimatedAudioLatency = AUDIO_LATENCY_NOT_ESTIMATED;
// First method. SDK >= 19.
if (Build.VERSION.SDK_INT >= 19 && track != null) {
AudioTimestamp audioTimestamp = new AudioTimestamp();
if (track.getTimestamp(audioTimestamp)) {
// Calculate the number of frames between our known frame and the write index
long frameIndexDelta = audioFramesWritten - audioTimestamp.framePosition;
// Calculate the time which the next frame will be presented
long frameTimeDelta = _framesToNanoSeconds(frameIndexDelta);
long nextFramePresentationTime = audioTimestamp.nanoTime + frameTimeDelta;
// Assume that the next frame will be written at the current time
long nextFrameWriteTime = System.nanoTime();
// Calculate the latency
estimatedAudioLatency = nextFramePresentationTime - nextFrameWriteTime;
}
}
// Second method. SDK >= 18.
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED && Build.VERSION.SDK_INT >= 18) {
Method getLatencyMethod;
try {
getLatencyMethod = AudioTrack.class.getMethod("getLatency", (Class<?>[]) null);
estimatedAudioLatency = (Integer) getLatencyMethod.invoke(track, (Object[]) null) * 1000000L;
} catch (Exception ignored) {}
}
// If no method has successfully gave us a value, let's try a third method
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
AudioManager audioManager = (AudioManager) CRT.getInstance().getSystemService(Context.AUDIO_SERVICE);
try {
Method getOutputLatencyMethod = audioManager.getClass().getMethod("getOutputLatency", int.class);
estimatedAudioLatency = (Integer) getOutputLatencyMethod.invoke(audioManager, AudioManager.STREAM_MUSIC) * 1000000L;
} catch (Exception ignored) {}
}
// No method gave us a value. Let's use a default value. Better than nothing.
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
estimatedAudioLatency = DEFAULT_AUDIO_LATENCY;
}
return estimatedAudioLatency
}
private static long _framesToNanoSeconds(long frames) {
return frames * 1000000000L / SAMPLE_RATE;
}
The android MediaPlayer class is notoriously slow to begin audio playback, I experienced an issue in an app I was creating where there was a greater than one second delay to begin playing an audio clip. I resolved it by switching to ExoPlayer which resulted in the playback starting within 100ms. I've also read that ffmpeg has even faster start audio startup time than ExoPlayer but I haven't used it so I can't make any promises.
Here is my code
private static int Fs = 44100;
private byte recorderAudiobuffer[] = new byte [1024];
AudioRecord recorder = new AudioRecord(AudioSource.MIC, Fs, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, 4096);
//start recorder
recorder.startRecording();
A value is identified by bin[n] and bin[n+1] of recorderAudiobuffer[].
What unit are these values?
The difference between the recordAudiobuffer[] elements would be 1/44100 second, since your sample rate is 44100 Hz.
As for the value of the bytes they hold, it could mean any sound level. If the sensitivity is low, the maximum value for the bytes could be, I don't know, 12, even if the sound you're recording is loud. On the other hand, with the sensitivity turned up, 255 could be a whisper, and it's railed after that.
This is more an audio question :), but here it goes, to be simple as much as possible, sound data has two dimensions, one is time based and other is sample based, so you have a sound separated on 44100 * (sample rate for example 16bits per amplitude) you have 705600 bits needed for one second or 88200, and in this case if you have a same sampling you will have 11.6 something milliseconds of buffer.
And of course you are asking about amplitude values.
Hope this helps and enjoy your work.
I need to implement into my app a functionality where works or needs to read the intensity of the voice through the microphone. But I don't know how to do it, can someone help me please? Just I need to read the intensity of the voice, not recognize any word.
I understand that I have to use AudioRecord class but I don't undestant whats steps I must write into my code, because I don't know if really necessary that I save a little of the voice into the SD card an after that convert it to PCM an after read the maximum of this signal.
The AudioRecord class will let you record into a buffer. You can then chose to process the buffer or save it to the SD card, depending on your needs. Which you want to do depends entirely on your application. Do you need the data after you process it? Or is the processed result all you need? Do you intend to play back the recordings?
A simplified example of how to use the AudioRecord class follows:
AudioRecord recorder = new AudioRecord(MediaRecorder.AudioSource.MIC,
sampleRate, AudioFormat.CHANNEL_IN_STEREO,
AudioFormat.ENCODING_PCM_16BIT, bufferSize);
recorder.startRecording();
short buf[] = new short[buffersize];
int n = 0;
while(<some condition>) {
n = recorder.read(buf, 0, bufferSize);
process(buf);
}
recorder.stop();
recorder.release();
You would obviously want to put the above code in a thread outside of the main UI thread.
You need to make sure that whatever you do in process is quick enough that you can get back around to reading the data before the buffer fills up, or you will drop data. Sample rate and buffer size will depend on how you are processing the data and what your latency requirements are.
After you get all that working, you may decide that you want to put the phone into 'Speaker Phone' in order to get better gain through the mic:
AudioManager amAudioManager;
amAudioManager = (AudioManager)getSystemService(Context.AUDIO_SERVICE);
amAudioManager.setMode(AudioManager.MODE_IN_CALL);
amAudioManager.setSpeakerphoneOn(true);
Yes, you have to put the phone in IN_CALL in order to enable the speaker phone. Yes, some phones apparently disable the ability to record when IN_CALL.