I am using the AudioTrack class and have generated my own tones in my Android application. However, I want to be able to control the speed of playback, and I can't figure out how.
I see the setLoopPoints method but that doesn't seem to do what I want (if anyone has used it and can explain that method to me that would be great, the api documentation doesn't help me much).
What I want to do:
As a point (here, a touch on the screen) gets closer to a target on the screen, I want to increase speed of the tones I'm generating. For example, farther away I would, say, have the tone playing 1 time in 1 second, but very close to the target, 5 times in 1 second. I am struggling to find out the best way to do this with Android sounds (generated tones or even .wav files saved to my res/raw).
Any help would be much appreciated!
Shani
You want to use the setPlaybackRate method for this:
http://developer.android.com/reference/android/media/AudioTrack.html
in conjunction with setLoopPoints. However, I believe there is probably a limit to how much you can speed up the file's "natural" playback rate, and the limit is probably 48 kHz (I'm not sure, though, and it may be device-dependent).
So, if you have a file that was recorded at, say, 8000 Hz, to get the effect you want you would set the loop count to 4 (so that it plays 5 times in a row) and set the playback rate to 40,000 (5 * 8000).
Since there is (probably) an upper limit to playback rate, your best approach might be to instead record the original sound at a high frequency, and slow down the playback as necessary to achieve the effect you want.
Update: setLoopPoints lets you specify two arbitrary locations within the file, such that when playback reaches the end looppoint the audio engine will wrap back around to the start looppoint. To loop the entire file, you would set the start looppoint to 0 and the end looppoint to the last frame in the file (the size of each frame is dependent upon the file's format - so a stereo file using 2 bytes per sample would have a frame size of 4, so the last frame is just the size of the audio data in bytes divided by 4).
To get 5 consecutive plays of your file, you would set the loop count to 4 (loopcount of 0 means the file plays once; -1 means it will loop forever).
Update 2: just read the docs some more - the upper limit for setPlaybackRate is documented as twice the rate returned by getNativeOutputSampleRate, which for most devices is probably 44,100 or 48,000 Hz. This means that a standard CD-quality WAV file can only be played back at twice its normal speed. A 22,050 Hz file could be played back at up to 4 times its normal speed, etc.
Related
I am using noise meter to read noise in decibels. When I run the app it is recording almost 120 readings per second. I don't want those many recordings. Is there any way to specify that I want only one or two recordings per second like that. Thanks in advance. noise_meter package.
I am using code from git hub which is already written using noise_meter github repo noise_meter example
I tried to calculate no. of samples using sample rate which is 40100 in the package. but I can't understand it.
As you see in the source code , audio streamer uses a fixed size buffer of a new thousand and an audio sample rate of 41000, and includes this comment Uses a buffer array of size 512. Whenever buffer is full, the content is sent to Flutter. So, small audio blocks will arrive at the consumer frequently (as you might expect from a streamer). It doesn't seem possible to adjust this.
The noise meter package simply takes each block of audio and calculates the noise level, so the rate of arrival of those is exactly the same as rate of arrival of audio blocks from the underlying package.
Given the simplicity of the noise meter calculation, you could replace it with your own code directly on top of audio streamer. You just need to collect multiple blocks of audio together before performing the simple decibel calculation.
Alternatively you could simply discard N out of each N+1 samples.
Use Case
My use case is roughly equal to, adding a 15-second mp3 file to a ~1 min video. All transcoding merging part will be done by FFmpeg-android so that's not the concern right now.
The flow is as follows
User can select any 15 seconds (ExoPlayer-streaming) of an mp3 (considering 192Kbps/44.1KHz of 3mins = up to 7MB)
Then download ONLY the 15 second part and add it to the video's audio stream. (using FFmpeg)
Use the obtained output
Tried solutions
Extracting fragment of audio from a url
RANGE_REQUEST - I have replicated the exact same algorithm/formula in Kotlin using the exact sample file provided. But the output is not accurate (± 1.5 secs * c) where c is proportional to startTime
How to crop a mp3 from x to x+n using ffmpeg?
FFMPEG_SS - This works flawlessly with remote URLs as input, but there are two downsides,
as startTime increases, the size of downloaded bytes are closer to the actual size of the mp3.
ffmpeg-android does not support network requests module (at least the way we complied)
So above two solutions have not been fruitful and currently, I am downloading the whole file and trimming it locally, which is definitely a bad UX.
I wonder how Instagram's music addition to story feature works because that's close to what I wanted to implement.
Its is not possible the way you want to do it. mp3 files do not have timestamps. If you just jump to the middle of an mp3, (and look for the frame start marker), then start decoding, You have no idea at what time this frame is for, because frames are variable size. The only way to know, is to count the number of frames before the current position. Which means you need the whole file.
I'm building an app where it's important to have accurate seeking in MP3 files.
Currently, I'm using ExoPlayer in the following manner:
public void playPeriod(long startPositionMs, long endPositionMs) {
MediaSource mediaSource = new ClippingMediaSource(
new ExtractorMediaSource.Factory(mDataSourceFactory).createMediaSource(mFileUri),
startPositionMs * 1000,
endPositionMs * 1000
);
mExoPlayer.prepare(mediaSource);
mExoPlayer.setPlayWhenReady(true);
}
In some cases, this approach results in offsets of 1-3 seconds relative to the expected playback times.
I found this issue on ExoPlayer's github. Looks like this is an intrinsic limitation of ExoPlayer with Mp3 format and it won't be fixed.
I also found this question which seems to suggest that the same issue exists in Android's native MadiaPlayer and MediaExtractor.
Is there a way to perform accurate seek in local (e.g. on-device) Mp3 files on Android? I'm more than willing to take any hack or workaround.
MP3 files are not inherently seekable. They don't contain any timestamps. It's just a series of MPEG frames, one after the other. That makes this tricky. There are two methods for seeking an MP3, each with some tradeoffs.
The most common (and fastest) method is to read the bitrate from the first frame header (or, maybe the average bitrate from the first few frame headers), perhaps 128k. Then, take the byte length of the entire file, divide it by this bitrate to estimate the time length of the file. Then, let the user seek into the file. If they seek 1:00 into a 2:00 file, divide the byte size of the file to the 50% mark and "needle drop" into the stream. Read the file until a sync word for the next frame header comes by, and then begin decoding.
As you can imagine, this method isn't accurate. At best, you're going to be within a half frame of the target on-average. With frame sizes being 576 samples, this is pretty accurate. However, there are problems with calculating the needle drop point in the first place. The most common issue is that ID3 tags and such add size to the file, throwing off the size calculations. A more severe issue is a variable bitrate (VBR) file. If you have music encoded with VBR, and the beginning of the track is silent-ish or otherwise easy to encode, the beginning might be 32 kbps whereas one second in might be 320 kbps. A 10x error in calculating the time length of the file!
The second method is to decode the whole file to raw PCM samples. This means you can guarantee sample-accurate seeking, but you must decode at least up to the seek point. If you want a proper time length for the full track, you must decode the whole file. Some 20 years ago, this was painfully slow. Seeking into a track would take almost as long as listening to the track to the point you were seeking to! These days, for short files, you can probably decode them so fast that it doesn't matter so much.
TL;DR; If you must have sample-accurate seeking, decode the files first before putting them in your player, but understand the performance penalty first before deciding this tradeoff.
For those who might come across this issue in the future, I ended up simply converting mp3 to m4a. This was the simplest solution in my specific case.
Constant bitrate mp3s are better. The system i used was to record the sample offset location of each frame header in the mp3 into a list. Then to seek, I would seek to the closest frame header before the desired sample by using the values in the list and then read from that location to my desired sample. This works fairly well but not perfect as the rendered wave form is decoded from the reference frame not the values if you decoded from start of file. If accuracy is reqired use libmpg123 it appears to be almost sample accurate. Note check licencing if for commercial app.
The ExoPlayer sample comes with the following defaults:
mPlayer = ExoPlayer.Factory.newInstance(RENDERER_COUNT, 1000, 5000);
Where 1000 is minBufferMs and 5000 is minRebufferMs. From the documentation:
minBufferMs - A minimum duration of data that must be buffered for
playback to start or resume following a user action such as a seek.
minRebufferMs - A minimum duration of data that must be buffered for
playback to resume after a player invoked rebuffer (i.e. a rebuffer
that occurs due to buffer depletion, and not due to a user action such
as starting playback or seeking).
These seem like reasonable defaults but are probably related to the average video length the player is built for? In my app an average video is about 24s, however, there are instances where a video can be 1 second or <6s. I think these default values are causing me some issues with those edge case videos (those videos are not playing as their duration is < minBuff or minRebuff after the first buff) so I'm thinking of changing them.
The question is, what's the recommended values and what is the impact of setting those two values to, say, 500 and 2000?
These values are not really related to the total length of the video - they are to do with the amount you want to buffer to try to ensure playback without having to pause the video while it buffers more video.
If your video is extremely short, as in your 1 second example, then this is probably an edge case where it might be worth experimenting with some different values.
I think any recommended values will be no more than recommendations, though, and the defaults above do not seem unreasonable. You could experiment with changing the values but the problem is that your results will reflect the network, CPU load etc conditions that were present during your test.
Adaptive bit rate videos will also muddy the water a little as the player will switch between bit rates also to try to find the highest quality bit rate for the current network conditions. Again, your 1 second duration videos are an edge case for which adaptive bit rate videos probably do not make a lot of sense, unless they are part of a continuous stream of videos and you want to be able to keep the bit rate similar for all.
I am working a phone recording software (android) which record a conversation between 2 people on a phone call. The output of each phone call is an audio file of which contains the sound from both the caller and callee.
However, most of the time, the voice from the phone that this software run on is clearer than the other. Users request me to make the 2 sound equally clear.
So the problem I have now is: I have a sound file containing voices from 2 sources with different volume, what should I do make the volume of voice from those 2 sources equally regarding the noise should not be increased. Given that this is a phone call so at a specific time there is only one person speaking.
I see at least 1 straight solution for this: making a program analyzing the wave form of the sound file, identifying parts of the sound file coming from the source having smaller voice and increase it to a level seemingly balance with the another. However this will be not an easy one to implement and I also hope that there would be better solution out there. Do you have any suggestion for me?
Thank you.
Well, the first thing to do is to get rid of all of the noise that you do not care about.
The spectrum that you would want to use is: 300 Hz to 3500 Hz
You can cut all of the other frequencies which would substantially cut your noise. You can then apply an autoequalization gain profile or even tap into the DSP profiles available on several devices.
I would also take a look at this whitepaper if you have a chance. (IEEE or ACM membership required).
An Auto-Equalization System Based on DirectShow Technology and Its Application in Audio Broadcast System of Radio Station
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5384659&contentType=Conference+Publications&searchWithin%3Dp_Authors%3A.QT.Bai+Xinyue.QT.
This is how I have solved this problem:
1. I decode the audio into a series of Integer value thank to the storing WAV format.
The result be [xi] ; 0 < xi < 255
2. Then I have to decide 2 custom value:
- Noise threshold? if xi > threshold => it is not noise (pretty naive!)
- How long should sound be a chunk of human voice?
I myself choose the first value to 5 and the second value to 100ms
3. My algorithm will analyze the [xi] in to [Yi] with each Y is an array of x and each Y represent a chunk of human sound.
After that, I apply k-mean with k=2 and got 2 different cluster of Y, one belongs to the person whose voice is louder and the other belongs to the one with softer voice.
4. What left is pretty straight forward, I have to decide a parameter M, each x belong to a Y of the softer voice will multiply with M and I get the final result.