Beat matching crossfade with ExoPlayer

Beat matching crossfade with ExoPlayer - android

I want to implement a beat matching Crossfade feature using ExoPlayer. Basically I have a concept how it should work, but I find it hard to adapt it to ExoPlayer.
Let me please first write how I want to do this so you can understand the case.
As you probably know Beat Matching Crossfade let to seamlessly switch from one song to another. Additionally it adjusts second song tempo to the first song tempo during the crossfade.
So my plan is as follows:
1. Load song A and B so they both starts to buffer.
2. Decoded samples of song A and B are stored in buffers BF1 and BF2.
3. There would be a class called MUX which is a main buffer and contains both songs buffers, BF1 and BF2. MUX provides audio samples to the Player. Samples provided to the Player are BF1 samples or mixed samples from BF1 and BF2 if there is a crossfade.
4. When buffer reaches the crossfade point then samples are send to Analyser class so it can analyse samples from both buffers and modify them for crossfade. Analyser sends modified samples to MUX which updates it's main buffer.
When crossfade is finished then load a next song from playlist.
My main question is how to mix two songs so I can implement class like MUX.
What I know so far is that I can access decoded samples in MediaCodecRender.processOutputBuffer() method so from that point I could create my BF1 and BF2 buffers.
There was also an idea to create two instances of ExoPlayer and while first song is playing the second one is analysed and it's samples are modified for further crossfade, but I think it may be hard to synchronise two players so the beats would match.
Thanks in advance for any help!

Answering #David question about crossfade implementation it looks more or less like this. You have to listen for active player playback and call this method when you want to start crossfade. It's in RxJava but can be easily migrated to Couroutines Flow.
fun crossfadeObservable(
fadeOutPlayer: Player?,
fadeInPlayer: Player,
crossfadeDurationMs: Long,
crossfadeScheduler: Scheduler,
uiScheduler: Scheduler
): Observable<Unit> {
val fadeInMaxGain = fadeInPlayer.audioTrack?.volume ?: 1f
val fadeOutMaxGain = fadeOutPlayer?.audioTrack?.volume ?: 1f
fadeOutPlayer?.enableAudioFocus(false)
fadeInPlayer.enableAudioFocus(true)
fadeInPlayer.playerVolume = 0f
fadeInPlayer.play()
fadeOutPlayer?.playerVolume = fadeOutMaxGain
fadeOutPlayer?.play()
val iterations: Float = crossfadeDurationMs / CROSSFADE_STEP_MS.toFloat()
return Observable.interval(CROSSFADE_STEP_MS, TimeUnit.MILLISECONDS, crossfadeScheduler)
.take(iterations.toInt())
.map { iteration -> (iteration + 1) / iterations }
.filter { percentOfCrossfade -> percentOfCrossfade <= 1f }
.observeOn(uiScheduler)
.map { percentOfCrossfade ->
fadeInPlayer.playerVolume = percentOfCrossfade.coerceIn(0f, fadeInMaxGain)
fadeOutPlayer?.playerVolume = (fadeOutMaxGain - percentOfCrossfade).coerceIn(0f, fadeOutMaxGain)
}
.last()
.doOnTerminate { fadeOutPlayer?.pause() }
.doOnUnsubscribe { fadeOutPlayer?.pause() }
}
const val CROSSFADE_STEP_MS = 100L

Related

Does ExoPlayer create a windowIndex for each media source in ConcatenatingMediaSource?

I'm working on an app that streams a list of mp3 files, to do this I've used ExoPlayer with a ConcatenatingMediaSource as this:
private fun createMediaSource(
tracks: List<Track>
): MediaSource = ConcatenatingMediaSource(true).apply {
tracks.forEach { track ->
val mediaSource = ProgressiveMediaSource
.Factory(DefaultDataSourceFactory(context))
.createMediaSource(MediaItem.fromUri(track.getFullUri()))
addMediaSource(mediaSource)
}
}
This works great, the files play as list with no errors at all, however what's required from me is to play all these streams as a single stream, where I show the total length of all streams on the seek bar, and the user would seek seamlessly between them.
Of course I'm not using the VideoPlayer provided by ExoPlayer because I need the seekbar to span all media sources, which apparently this is not possible to do with ExoPlayerUi.
So this is the logic I've used when the user tries to seek:
exoPlayer.apply {
var previousTracksLength = 0L
var windowIndex = 0
var currentItemLength = 0L
run loop#{
tracksList.forEachIndexed { index, track ->
currentItemLength = track.getLengthMillis()
previousTracksLength += currentItemLength
if (newPositionMillis < previousTracksLength) {
windowIndex = index
return#loop
}
}
}
val positionForCurrentTrack = (newPositionMillis - (previousTracksLength - currentItemLength))
pause()
if (windowIndex == currentWindowIndex) {
seekTo(positionForCurrentTrack)
} else {
seekTo(windowIndex, positionForCurrentTrack)
}
play()
}
This works amazingly well when the ConcatenatingMediaSource has only 3 or less media sources, but if it's bigger than that, weird behavior starts showing up, I might just want to seek 10 seconds forward the player would move more than 2 minutes instead.
After debugging it was obvious for me that when I call: seekTo(windowIndex, positionForCurrentTrack) exoPlayer is seeking to a window that's not mapped with a specific media source in the ConcatenatingMediaSource !
And here comes my questions:
Does ExoPlayer create a single window for each mediaSource in the ConcatenatingMediaSource or not ?
and If not is there a way to force it to do that ?

This is not really an answer but the explanation to why when I called seekTo(windowIndex, position) the player seemed like it was ignoring the windowIndex and actually seek to a completely unexpected position is because the media type was mp3 !
Apparently many devs have suffered the same issue where the player seek position is out of sync with the real position of the media that's being played when it's an mp3.
More details for anyone having weird issues when playing mp3 using ExoPlayer
https://github.com/google/ExoPlayer/issues/6787#issuecomment-568180969

androidx.media2.player.MediaPlayer combining two sounds

I have two instances of androidx.media2.player.MediaPlayer, one for playing .mp4, one for playing .wav. I want them both to play audio simultaneously.
I am using setAudioAttributes for two players, but as soon as I set the attributes for both player, none of them plays sound, and the video player doesn't even play video.
val soundFile = course.header.directory + soundFileName
val file = File(soundFile)
val fileDescriptor = ParcelFileDescriptor.open(
file,
ParcelFileDescriptor.MODE_READ_ONLY)
val mediaItem = FileMediaItem.Builder(fileDescriptor).build()
audioPlayer.setMediaItem(mediaItem)
audioPlayer.setAudioAttributes(AudioAttributesCompat.Builder()
.setContentType(AudioAttributesCompat.CONTENT_TYPE_SPEECH)
.setUsage(AudioAttributesCompat.USAGE_MEDIA)
.build()
)
audioPlayer.prepare().addListener({
videoPlayer.play()
audioPlayer.play()
}, ContextCompat.getMainExecutor(context))
I set attributes to videoPlayer in a similar way.
I've been trying to set the same session id to both players, but it didn't help.
Blocking the thread for five seconds appears to work around the problem, but I don't know why:
audioPlayer.prepare().addListener({
videoPlayer.play()
audioPlayer.play()
Thread.sleep(5000)
}, ContextCompat.getMainExecutor(context))

I haven't found a solution. I've moved on to ExoPlayer, which doesn't have setAudioAttributes.

WebRTC ScreenCapturerAndroid

I'm trying to create a screen capturer app for android. I already have the webRTC portion setup with a video capturer using Camera2Enumerator library from here. How can I modify this to create a pre-recorded video capturer instead of camera capturer.
Thanks!

Just wanted to give an update that I have solved this. I'm unable to share the entire code but here's a process that might help:
Acquire one frame of your pre-recorded file and store in a byte array(must be in YUV format)
Replace the VideoCapturer() with the following:
fun onGetFrame(p0: ByteArray?) {
var timestampNS = java.util.concurrent.TimeUnit.MILLISECONDS.toNanos(SystemClock.elapsedRealtime())
var buffer:NV21Buffer = NV21Buffer(p0,288,352,null)
var videoFrame:VideoFrame = VideoFrame(buffer,0,timestampNS)
localVideoSource.capturerObserver.onFrameCaptured(videoFrame)
videoFrame.release()
}
where p0 is the byte array with the frame
Call this function in startLocalVideoCapture() using a timer (every few milliseconds...I used 10 nanoseconds) https://developer.android.com/reference/android/os/CountDownTimer
remove this line in startLocalVideoCapture()--->
VideoCapturer.initialize(
surfaceTextureHelper,
localVideoOutput.context,
localVideoSource.capturerObserver)

Video mirroring in android

Recently, I was working with camerax for recording video with the front camera only. But I ran into an issue where the video is being mirrored after saving.
Currently, I am using a library(Mp4Composer-android) to mirror the video after recording which takes up a processing time. So, I noticed that Snapchat and Instagram are giving the output without this processing.
After that happened I also noticed that our native camera application is providing an option to select whether we want to mirror the video or not.
The configuration I have added to camerax,
videoCapture = VideoCapture
.Builder()
.apply {
setBitRate(2000000)
setVideoFrameRate(24)
}
.build()
How can I make my camera not mirror the video?

Temporary Solution:
I used this library as a temporary solution. The issue with this library was that I had to process the video after I record it and it took considerably some time to do it. I used this code:
Add this to gradle:
//Video Composer
implementation 'com.github.MasayukiSuda:Mp4Composer-android:v0.4.1'
Code for flipping:
Mp4Composer(videoPath, video)
.flipHorizontal(true)
.listener(
object : Mp4Composer.Listener {
override fun onProgress(progress: Double) { }
override fun onCurrentWrittenVideoTime(timeUs: Long) { }
override fun onCompleted() { }
override fun onCanceled() { }
override fun onFailed(exception: Exception?) { }
}
)
.start()
Note: This will compress your video too. Look into the library documentation for more details
An answer that was given to me by a senior developer who worked in a video-based NDK for a long time:
Think of the frames that are given out by the camera going through a dedicated highway. There is a way where we can capture all the frames going through that highway.
Capture the frames coming through that highway
Flip the pixels of each frame
Give out the frames through that same highway
He didn't specify how to capture and release the frames.
Why I didn't use that solution(The issue):
If we have to perform this action in real-time, We have to do that with high efficiency. Ranging with the quality of the camera, We have to capture anywhere from 24 frames to 120 frames per second and process and dispatch the frames.
If we need to do that, we need NDK developers and a lot of engineering which most startups can't afford.

In the application which I want to create, I face some technical obstacles. I have two music tracks in the application. For example, a user imports the music background as a first track. The second path is a voice recorded by the user to the rhythm of the first track played by the speaker device (or headphones). At this moment we face latency. After recording and playing back in the app, the user hears the loss of synchronisation between tracks, which occurs because of the microphone and speaker latencies.
Firstly, I try to detect the delay by filtering the input sound. I use android’s AudioRecord class, and the method read(). This method fills my short array with audio data.
I found that the initial values of this array are zeros so I decided to cut them out before I will start to write them into the output stream.
So I consider those zeros as a „warmup” latency of the microphone. Is this approach correct? This operation gives some results, but it doesn’t resolve the problem, and at this stage, I’m far away from that.
But the worse case is with the delay between starting the speakers and playing the music. This delay I cannot filter or detect. I tried to create some calibration feature which counts the delay. I play a „beep” sound through the speakers, and when I start to play it, I also begin to measure time. Then, I start recording and listen for this sound being detected by the microphone. When I recognise this sound in the app, I stop measuring time. I repeat this process several times, and the final value is the average from those results. That is how I try to measure the latency of the device. Now, when I have this value, I can simply shift the second track backwards to achieve synchronisation of both records (I will lose some initial milliseconds of the recording, but I skip this case, for now, there are some possibilities to fix it).
I thought that this approach would resolve the problem, but it turned out this is not as simple as I thought. I found two issues here:
1. Delay while playing two tracks simultaneously
2. Random in device audio latency.
The first: I play two tracks using AudioTrack class and I run method play() like this:
val firstTrack = //creating a track
val secondTrack = //creating a track
firstTrack.play()
secondTrack.play()
This code causes delays at the stage of playing tracks. Now, I don’t even have to think about latency while recording; I cannot play two tracks simultaneously without delays. I tested this with some external audio file (not recorded in my app) - I’m starting the same audio file using the code above, and I can see a delay. I also tried it with MediaPlayer class, and I have the same results. In this case, I even try to play tracks when callback OnPreparedListener invoke:
val firstTrack = //AudioPlayer
val secondTrack = //AudioPlayer
second.setOnPreparedListener {
first.start()
second.start()
}
And it doesn’t help.
I know that there is one more class provided by Android called SoundPool. According to the documentation, it can be better with playing tracks simultaneously, but I can’t use it because it supports only small audio files and that can't limit me.
How can I resolve this problem? How can I start playing two tracks precisely at the same time?
The second: Audio latency is not deterministic - sometimes it is smaller, and sometimes it’s huge, and it’s out of my hands. So measuring device latency can help but again - it cannot resolve the problem.
To sum up: is there any solution, which can give me exact latency per device (or app session?) or other triggers which detect actual delay, to provide the best synchronisation while playback two tracks at the same time?
Thank you in advance!

Synchronising audio for karaoke apps is tough. The main issue you seem to be facing is variable latency in the output stream.
This is almost certainly caused by "warm up" latency: the time it takes from hitting "play" on your backing track to the first frame of audio data being rendered by the audio device (e.g. headphones). This can have large variance and is difficult to measure.
The first (and easiest) thing to try is to use MODE_STREAM when constructing your AudioTrack and prime it with bufferSizeInBytes of data prior to calling play (more here). This should result in lower, more consistent "warm up" latency.
A better way is to use the Android NDK to have a continuously running audio stream which is just outputting silence until the moment you hit play, then start sending audio frames immediately. The only latency you have here is the continuous output latency.
If you decide to go down this route I recommend taking a look at the Oboe library (full disclosure: I am one of the authors).
To answer one of your specific questions...
Is there a way to calculate the latency of the audio output stream programatically?
Yes. The easiest way to explain this is with a code sample (this is C++ for the AAudio API but the principle is the same using Java AudioTrack):
// Get the index and time that a known audio frame was presented for playing
int64_t existingFrameIndex;
int64_t existingFramePresentationTime;
AAudioStream_getTimestamp(stream, CLOCK_MONOTONIC, &existingFrameIndex, &existingFramePresentationTime);
// Get the write index for the next audio frame
int64_t writeIndex = AAudioStream_getFramesWritten(stream);
// Calculate the number of frames between our known frame and the write index
int64_t frameIndexDelta = writeIndex - existingFrameIndex;
// Calculate the time which the next frame will be presented
int64_t frameTimeDelta = (frameIndexDelta * NANOS_PER_SECOND) / sampleRate_;
int64_t nextFramePresentationTime = existingFramePresentationTime + frameTimeDelta;
// Assume that the next frame will be written into the stream at the current time
int64_t nextFrameWriteTime = get_time_nanoseconds(CLOCK_MONOTONIC);
// Calculate the latency
*latencyMillis = (double) (nextFramePresentationTime - nextFrameWriteTime) / NANOS_PER_MILLISECOND;
A caveat: This method relies on accurate timestamps being reported by the audio hardware. I know this works on Google Pixel devices but have heard reports that it isn't so accurate on other devices so YMMV.

Following the answer of donturner, here's a Java version (that also uses other methods depending on the SDK version)
/** The audio latency has not been estimated yet */
private static long AUDIO_LATENCY_NOT_ESTIMATED = Long.MIN_VALUE+1;
/** The audio latency default value if we cannot estimate it */
private static long DEFAULT_AUDIO_LATENCY = 100L * 1000L * 1000L; // 100ms
/**
* Estimate the audio latency
*
* Not accurate at all, depends on SDK version, etc. But that's the best
* we can do.
*/
private static void estimateAudioLatency(AudioTrack track, long audioFramesWritten) {
long estimatedAudioLatency = AUDIO_LATENCY_NOT_ESTIMATED;
// First method. SDK >= 19.
if (Build.VERSION.SDK_INT >= 19 && track != null) {
AudioTimestamp audioTimestamp = new AudioTimestamp();
if (track.getTimestamp(audioTimestamp)) {
// Calculate the number of frames between our known frame and the write index
long frameIndexDelta = audioFramesWritten - audioTimestamp.framePosition;
// Calculate the time which the next frame will be presented
long frameTimeDelta = _framesToNanoSeconds(frameIndexDelta);
long nextFramePresentationTime = audioTimestamp.nanoTime + frameTimeDelta;
// Assume that the next frame will be written at the current time
long nextFrameWriteTime = System.nanoTime();
// Calculate the latency
estimatedAudioLatency = nextFramePresentationTime - nextFrameWriteTime;
}
}
// Second method. SDK >= 18.
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED && Build.VERSION.SDK_INT >= 18) {
Method getLatencyMethod;
try {
getLatencyMethod = AudioTrack.class.getMethod("getLatency", (Class<?>[]) null);
estimatedAudioLatency = (Integer) getLatencyMethod.invoke(track, (Object[]) null) * 1000000L;
} catch (Exception ignored) {}
}
// If no method has successfully gave us a value, let's try a third method
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
AudioManager audioManager = (AudioManager) CRT.getInstance().getSystemService(Context.AUDIO_SERVICE);
try {
Method getOutputLatencyMethod = audioManager.getClass().getMethod("getOutputLatency", int.class);
estimatedAudioLatency = (Integer) getOutputLatencyMethod.invoke(audioManager, AudioManager.STREAM_MUSIC) * 1000000L;
} catch (Exception ignored) {}
}
// No method gave us a value. Let's use a default value. Better than nothing.
if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
estimatedAudioLatency = DEFAULT_AUDIO_LATENCY;
}
return estimatedAudioLatency
}
private static long _framesToNanoSeconds(long frames) {
return frames * 1000000000L / SAMPLE_RATE;
}

The android MediaPlayer class is notoriously slow to begin audio playback, I experienced an issue in an app I was creating where there was a greater than one second delay to begin playing an audio clip. I resolved it by switching to ExoPlayer which resulted in the playback starting within 100ms. I've also read that ffmpeg has even faster start audio startup time than ExoPlayer but I haven't used it so I can't make any promises.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.