I'm building an app where it's important to have accurate seeking in MP3 files.
Currently, I'm using ExoPlayer in the following manner:
public void playPeriod(long startPositionMs, long endPositionMs) {
MediaSource mediaSource = new ClippingMediaSource(
new ExtractorMediaSource.Factory(mDataSourceFactory).createMediaSource(mFileUri),
startPositionMs * 1000,
endPositionMs * 1000
);
mExoPlayer.prepare(mediaSource);
mExoPlayer.setPlayWhenReady(true);
}
In some cases, this approach results in offsets of 1-3 seconds relative to the expected playback times.
I found this issue on ExoPlayer's github. Looks like this is an intrinsic limitation of ExoPlayer with Mp3 format and it won't be fixed.
I also found this question which seems to suggest that the same issue exists in Android's native MadiaPlayer and MediaExtractor.
Is there a way to perform accurate seek in local (e.g. on-device) Mp3 files on Android? I'm more than willing to take any hack or workaround.
MP3 files are not inherently seekable. They don't contain any timestamps. It's just a series of MPEG frames, one after the other. That makes this tricky. There are two methods for seeking an MP3, each with some tradeoffs.
The most common (and fastest) method is to read the bitrate from the first frame header (or, maybe the average bitrate from the first few frame headers), perhaps 128k. Then, take the byte length of the entire file, divide it by this bitrate to estimate the time length of the file. Then, let the user seek into the file. If they seek 1:00 into a 2:00 file, divide the byte size of the file to the 50% mark and "needle drop" into the stream. Read the file until a sync word for the next frame header comes by, and then begin decoding.
As you can imagine, this method isn't accurate. At best, you're going to be within a half frame of the target on-average. With frame sizes being 576 samples, this is pretty accurate. However, there are problems with calculating the needle drop point in the first place. The most common issue is that ID3 tags and such add size to the file, throwing off the size calculations. A more severe issue is a variable bitrate (VBR) file. If you have music encoded with VBR, and the beginning of the track is silent-ish or otherwise easy to encode, the beginning might be 32 kbps whereas one second in might be 320 kbps. A 10x error in calculating the time length of the file!
The second method is to decode the whole file to raw PCM samples. This means you can guarantee sample-accurate seeking, but you must decode at least up to the seek point. If you want a proper time length for the full track, you must decode the whole file. Some 20 years ago, this was painfully slow. Seeking into a track would take almost as long as listening to the track to the point you were seeking to! These days, for short files, you can probably decode them so fast that it doesn't matter so much.
TL;DR; If you must have sample-accurate seeking, decode the files first before putting them in your player, but understand the performance penalty first before deciding this tradeoff.
For those who might come across this issue in the future, I ended up simply converting mp3 to m4a. This was the simplest solution in my specific case.
Constant bitrate mp3s are better. The system i used was to record the sample offset location of each frame header in the mp3 into a list. Then to seek, I would seek to the closest frame header before the desired sample by using the values in the list and then read from that location to my desired sample. This works fairly well but not perfect as the rendered wave form is decoded from the reference frame not the values if you decoded from start of file. If accuracy is reqired use libmpg123 it appears to be almost sample accurate. Note check licencing if for commercial app.
Related
Use Case
My use case is roughly equal to, adding a 15-second mp3 file to a ~1 min video. All transcoding merging part will be done by FFmpeg-android so that's not the concern right now.
The flow is as follows
User can select any 15 seconds (ExoPlayer-streaming) of an mp3 (considering 192Kbps/44.1KHz of 3mins = up to 7MB)
Then download ONLY the 15 second part and add it to the video's audio stream. (using FFmpeg)
Use the obtained output
Tried solutions
Extracting fragment of audio from a url
RANGE_REQUEST - I have replicated the exact same algorithm/formula in Kotlin using the exact sample file provided. But the output is not accurate (± 1.5 secs * c) where c is proportional to startTime
How to crop a mp3 from x to x+n using ffmpeg?
FFMPEG_SS - This works flawlessly with remote URLs as input, but there are two downsides,
as startTime increases, the size of downloaded bytes are closer to the actual size of the mp3.
ffmpeg-android does not support network requests module (at least the way we complied)
So above two solutions have not been fruitful and currently, I am downloading the whole file and trimming it locally, which is definitely a bad UX.
I wonder how Instagram's music addition to story feature works because that's close to what I wanted to implement.
Its is not possible the way you want to do it. mp3 files do not have timestamps. If you just jump to the middle of an mp3, (and look for the frame start marker), then start decoding, You have no idea at what time this frame is for, because frames are variable size. The only way to know, is to count the number of frames before the current position. Which means you need the whole file.
I'm having a bit of trouble thinking of an efficient solution. There are a few problems I am foreseeing, the first being...
OOM Prevention
If I wanted the past 30 seconds or even 5 minutes it's doable, but what if I wanted the past 30 minutes or full hour, or maybe EVERYTHING? Keeping a byte buffer means storing it in RAM. Storing over a hundred megabytes sounds like Virtual Memory suicide.
Okay so what if we store a Y amount of time, say 30 seconds, of the previously recorded media to disk in some tmp file. That potentially could work and I can use a library like mp4 parser to concatenate them all when finished. However...
If we have 30 minutes worth that's about 60 30-second clips. This seems like a great way to burn through an SD card and even if that's not a problem I can't imagine the time needed to concatenate over a hundred files into one.
From what I've been researching, I was thinking of using local sockets to do something like...
MediaRecorder -> setOutputFile(LocalSocket.getFD())
Then in the local socket...
LocalSocket -> FileOutputStream -> write(data, position, bufsiz) -> flush()
Where the background thread handles writing and keeping track of the position, and the buffer.
This is purely pseudocode and I'm not far enough in yet to test this, am I going in the right direction with this? From what I'm thinking, this only keeps one file which gets overwritten. As it only gets written to once every Y seconds it minimized IO overhead and also minimizes the amount of RAM it eats up.
Video Length to Buffer Size
How would I obtain the size the buffer should be from requested video size. It's strange since I see some long videos that are small but short videos that are huge. So I don't know how to accurately determine this. Anyone know how I can predict this if I know the video length, encoding, etc which gets set up from Media Recorder?
Examples
Does anyone know of any examples of this? I don't think the idea is entirely original but I don't see a lot of them out there and if it does it is closed source. An example goes a long way.
Thanks in advance
The "continuous capture" Activity in Grafika does this, using MediaCodec rather than MediaRecorder. It keeps the last N seconds of video in a circular buffer in memory, and writes it to disk when requested by the user.
The CircularEncoder constructor estimates the memory requirements based on target bit rate. At a reasonable rate (say 4Mbps) you'd need 1.8GB to store an hour worth of video, so that's not going to fit in RAM on current devices. Five minutes is 150MB, which is pushing the bounds of good manners. Spooling out to a file on disk is probably necessary.
Passing data through a socket doesn't buy you anything that you don't get from an appropriate java.util.concurrent data structure. You're just involving the OS in a data copy.
One approach would be to create a memory-mapped file, and just treat it the same way CircularEncoder treats its circular buffer. In Grafika, the frame data goes into a single large byte buffer, and the meta-data (which tells you things like where each packet starts and ends) sits in a parallel array. You could store the frame data on disk, and keep the meta-data in memory. Memory mapping would work for the five-minute case, but generally not for the full hour case, as getting a contiguous virtual address range that large can be problematic.
Without memory-mapped I/O the approach is essentially the same, but you have to seek/read/write with file I/O calls. Again, keep the frame metadata in memory.
An additional buffer stage might be necessary if the disk I/O stalls. When writing video data through MediaMuxer I've seen periodic one-second stalls, which is more buffering than MediaCodec has, leading to dropped frames. You can defer solving that until you're sure you actually have a problem though.
There are some additional details you need to consider, like dropping frames at the start to ensure your video starts on a sync frame, but you can see how Grafika solved those.
I am looking to get the offset in bytes employed by the seekto() method of the MediaPlayer class.
I was wondering if there was anyway to retrieve this information directly somehow, and if not if there is away to calculate it myself for example:
If the media file has a registered bit rate in it's metadata, and I wanted to seek 10 seconds in I could use the following calculation:
10(secs)*(bit rate per second)/8
Can I assume that the MediaPlayer retrieves the bit rate information using the MediaMetadataRetriever class?
I have read the following:
Accuracy of MediaPlayer.seekTo(int msecs)
And I am aware of the issues with variable bit rate, but I am not looking for accuracy in the seekto() method but rather how to get/calculate the value it uses for the offset to retrieve the new data.
Your objective of implementing seekTo() based on an offset is novel, but has multiple challenges attached to the same. Before going into the seekTo() implementation, some clarification about MediaPlayer and MediaMetadataRetriever. Both these classes employ an MediaExtractor object internally to retrieve the metadata information. Hence MediaPlayer doesn't include a MediaMetadataRetriever class.
First, let's consider extracting the bitrate. MediaPlayer is a generic implementation that should support multiple file-formats. Hence, for your design, you need to ensure that bitrate parameter is extracted by all file formats supported by your system like audio-visual formats such as MP4, MPEG-2 TS, AVI, Matroska etc or audio formats only like WAV, MP3 etc. In the latest android implementation, I found that only MP3Extractor is exposing the bitrate through kKeyBitrate key.
Next, coming to your algorithm, I find the following challenges attached to a size based seek.
audio and video tracks are stored in an interleaved fashion. Hence, time * bitrate (in bytes) will not be directly helpful due to the interleaved nature of your input data.
Starting offset needs to be considered. In the file, there is some metadata or boxes stored at the starting of the file which is specific to the file format. You will have to consider this offset also which will be different for different formats.
If your input has more mnumber of tracks like audio, video, text or rather multiple audio tracks as in a movie, then the problem will become more complex.
Video frames are typically irregular in size. Even though a constant bitrate model is employed, the video frame sizes could vary significantly based on the type of frame. Typically, an I-frame / IDR-Frame in H.264 can consume large number of bits as compared to P-frame or B-frame. This will pose practical difficulties for size based seekTo() implementation. One could easily observe a 1:5 ratio in terms of the frame sizes for I and P frames
There is a definite impact of the variable bitrate model which you have already acknowledged. Hence, I am skipping this point.
With the aforementioned points, without discouraging you, I feel a size based implementation looks to be difficult.
I have been scratching my head for the past week to do this effect on the text. http://www.youtube.com/watch?v=gB2PL33DMFs&feature=related
Would be great if someone can give me some tips or guidance or tutorial on how to do this.
thankz for reading and answering =D
If all you want is to display a movie with video and sound, a MediaPlayer can do that easily.
So I assume that you're actually talking about synchronizing some sort of animated display with a sound file being played separately. We did this using a MediaPlayer and polling getCurrentPosition from within an animation loop. This more or less works, but there are serious problems that need to be overcome. (All this deals with playing mp3 files; we didn't try any other audio formats).
First, your mp3 must be recorded at 44,100 Hz sampling rate. Otherwise the value returned by getCurrentPosition is way off. (We think it's scaled by the ratio of the actual sampling rate to 44,100, but we didn't verify this hypothesis.) A bit rate of 128,000 seems to work best.
Second, and more serious, is that the values returned by getCurrentPosition seem to drift away over time from the sound coming out of the device. After about 45 seconds, this starts to be quite noticeable. What's worse is that this drift is significantly different (but always present) in different OS levels, and perhaps from device to device. (We tested this in 2.1 and 2.2 on both emulators and real devices, and 3.0 on an emulator.) We suspected some sort of buffering problem, but couldn't really diagnose it. Our work-around was to break up longer mp3 files into short segments and chain their playback. Lots of bookkeeping aggravation. This is still under test, but so far it seems to have worked.
Ted Hopp: time drifting on MP3 files is likely caused by those MP3 files being VBR. I've been developing Karaoke apps for a while, and pretty much every toolkit - from Qt Phonon to ffmpeg - had problems reporting correct audio position on variable MP3 files. I assume this is because they all try to calculate the current audio position by using the number of decoded frames, which makes it unreliable for VBR MP3s. I described it in a user-friendly way in the Karaoke Lyrics Editor FAQ
Unfortunately the only solution I found is to recode MP3s to CBR. Another was to ditch the current position completely, and rely only on system clocks. That actually produced a better result for VBR MP3s, but still not as good as recoding them into CBR.
I am using the AudioTrack class and have generated my own tones in my Android application. However, I want to be able to control the speed of playback, and I can't figure out how.
I see the setLoopPoints method but that doesn't seem to do what I want (if anyone has used it and can explain that method to me that would be great, the api documentation doesn't help me much).
What I want to do:
As a point (here, a touch on the screen) gets closer to a target on the screen, I want to increase speed of the tones I'm generating. For example, farther away I would, say, have the tone playing 1 time in 1 second, but very close to the target, 5 times in 1 second. I am struggling to find out the best way to do this with Android sounds (generated tones or even .wav files saved to my res/raw).
Any help would be much appreciated!
Shani
You want to use the setPlaybackRate method for this:
http://developer.android.com/reference/android/media/AudioTrack.html
in conjunction with setLoopPoints. However, I believe there is probably a limit to how much you can speed up the file's "natural" playback rate, and the limit is probably 48 kHz (I'm not sure, though, and it may be device-dependent).
So, if you have a file that was recorded at, say, 8000 Hz, to get the effect you want you would set the loop count to 4 (so that it plays 5 times in a row) and set the playback rate to 40,000 (5 * 8000).
Since there is (probably) an upper limit to playback rate, your best approach might be to instead record the original sound at a high frequency, and slow down the playback as necessary to achieve the effect you want.
Update: setLoopPoints lets you specify two arbitrary locations within the file, such that when playback reaches the end looppoint the audio engine will wrap back around to the start looppoint. To loop the entire file, you would set the start looppoint to 0 and the end looppoint to the last frame in the file (the size of each frame is dependent upon the file's format - so a stereo file using 2 bytes per sample would have a frame size of 4, so the last frame is just the size of the audio data in bytes divided by 4).
To get 5 consecutive plays of your file, you would set the loop count to 4 (loopcount of 0 means the file plays once; -1 means it will loop forever).
Update 2: just read the docs some more - the upper limit for setPlaybackRate is documented as twice the rate returned by getNativeOutputSampleRate, which for most devices is probably 44,100 or 48,000 Hz. This means that a standard CD-quality WAV file can only be played back at twice its normal speed. A 22,050 Hz file could be played back at up to 4 times its normal speed, etc.