I am trying to stream audio through a server. I have set up everything, and it's working fine with recording and playing back static audio, but when I am trying to stream an audio there is a delay on the playing side.
I did a Google search, but couldn't find the proper way of doing this. I am using AudioRecord & the Audiotrack Android media API for sending & receiving audio data. Can anybody tell me how to handle this delay?
I have added my code on GOOGLE GROUP to get clear picture.
I had tried in this way, holding 5 chunks of audio data in a buffer which comes through the server & playing back when it fills 5 chunks of data and again getting next 5 chunks of audio data and filling it like that it goes till 1024 bytes of data (it writes to the audiotrack & the play method is called).This too has a delay,any other solutions??
If you're really trying to do this unbuffered, make sure whatever playback tool you're using is trying to play it back without a buffer. You will be hard-pressed to not have a delay. Nothing on TV, radio, etc. is really 'live'--there is always some kind of delay. With internet streams, you're sending a large amount of data constantly. Even besides the time for it to travel, all this data has to be kept in a particular order and nobody wants choppy playback while the enduser's computer attempts playback. I've had flash players for major networks keep massive cache files on my computer while it's handling playback, but their players do not skip/wait to buffer/etc. (If you load up something and notice a few 100 MBs of extra memory being used, maybe even more during playback, that's what that is.)
You might be able to get away with a very small buffer (the standard in the past used to be 30-60 seconds and a lot of players still default to this) using VLC. I have been able to set its buffer very low but it is on incredibly low quality streams/videos. The big problem you have though I'd guess is your playback is setting the buffer and if your playback is set to 60 seconds buffer, it doesn't matter what you do serverside...the client end will wait until it has that much of a chunk and then begin playback.
Related
I'm messing around in my app with a custom model for speech commands - I have it working fine recording and processing input audio from an AudioRecord, and I give feedback to the user through text to speech.
One issue I have is that I'd like this to work even when audio is playing - either through my own text to speech or through something else playing in the background (music for instance). I realize this is going to be a non trivial problem, but if I could get access in some way to the audio output data (what the phone is playing) and match that up with my microphone input data, I think I can at least adjust my model for this + improve my results.
However, based on Android - Can I get the audio data for playback from the audio mixer? , it sounds like that is impossible.
Two questions:
1) Is there any way that I'm missing to get access to expected audio output/playback data through the android api, or any options the android api provides for dealing with this issue (the feedback loop between audio output and input)?
2) Outside of stopping all other playback or waiting for other playback to finish - is there any other approach to solve this problem? I would assume some calling apps have a way of dealing with this if the user is on speaker phone, I'm just missing how to do it myself
Thanks
Answers to 1 & 2: You want AcousticEchoCanceler.
A short lecture on why "deleting the speaker audio from the microphone input" input is a non-trivial task that takes substantial signal processing knowledge: It's more complicated than just time-shifting the speaker audio a little bit and subtracting it from the mic input. The fact is, the spectrum of the audio changes drastically even as it leaves the speaker (most tiny speakers have a very peaky response centered around 3-4KHz). The audio may bounce off multiple objects (walls, etc.) before it gets back to the mic (multipath interference). Different frequency components interfere at the microphone in different, impossible to predict ways, vastly changing the spectrum of the audio. And by the way -- if anything in the room moves, say, if you put your hand near the phone -- everything changes. That is why you don't want to try to write your own echo cancellation filter. Android has provided one for you, so you can write cool speakerphone apps and such.
I'm very new to opensl es. I'm currently experimenting with the recording and playback features of opensl es for android. Right now I have a recording function which stores data in a buffer queue. I can then playback the buffer queue. Would anyone be able to explain how I can correctly manipulate the data in the buffer queue? so the playback sounds different from the recording.
My current configuration:
sampleFormat.pcmFormat_ = static_cast<uint16_t>(engine.bitsPerSample_);
//the buffer
uint8_t *buf_;
Is there any type of conversion or decoding I need to do to the data in the buffer before manipulating it?
I would really appreciate some help.
Your question is broad, what I can do is tell you how you are supposed to use it, and how you could manipulate audio data obtained from recording.
1) Once you setup your OpenSL_ES engine, recorder and player properly (many examples out there), you have given OpenSL_ES a buffer where to read pcm data from mic, and also a buffer where to read from data you would like to provide for the sink of playback, along with 2 callback functions which will be called upon completion, once the process of reading data has finished (after some time according to your settings like sample rate, size of buffer, etc), the record callback is called, from a thread created by OpenSL_ES which depending on the device and configuration might be a high priority thread usually called fast track (so you are not working on your thread in the callback, but in OpenSL_ES' thread and have to be careful not to do blocking operations there). Now if what you want is to playback audio as fast as posible, work your audio signal processing from inside the callback, if response time is not too important for you, you may use the callback as a signal for your thread to start reading process audio data in the buffer as you wish. In both cases to playback the audio you must enqueue the data (processed or unprocessed) for the playback process (playback also calls player callback upon finishing).
2) Now, if you want to process audio, you need to apply filters, there are many kinds of audio signal filters that can be applied, you should look for dynamic filters in case of real time playback. (some filters require lot of data to start processing and may be bad for real time, some others are optimized to use small chunks of data and dynamically adapt output). So you would need to make a chain of filters in a certain order to obtain what you want. The audio world is huge, you need to read quite a lot to start understanding audio processing. Audio performance is another thing and depends directly from the device you have (hard, soft).
3) Data manipulation to the buffer you obtain depends on your processor. For instance endianess, some processors may work with little or big endian and you get your data in big endian format. There is no compression so pcm data is ready for processing. (if you would like to create a wav from it you only need to add a wave header and add pcm data in the data chunk of the header, if you want other format like mp3 you also need to process your data with a compression algorithm according to the format you would like and add that data to the proper header)
Also to playback data through OpenSL_ES you need uncompressed audio data, so you can't play mp3 directly, you need to uncompress it into pcm data first
This is the basic functioning of OpenSL_ES, hope that answers your question. If something is unclear let me know.
PS: Android says Audio manipulation is easier now with the new library AAudio, which promises to accomplish the same tasks as OpenSL_ES with a third of it's complexity (there might be some issues with latency, some people have encountered but I bet they are being fixed as you read)
I want to create an Android app that plays multiple mp3s simultaneously, with precise sync (less than 1/10 of a second off) and independent volume control. Size of each mp3 could be over 1MB, run time up to several minutes. My understanding is that MediaPlayer will not do the precise sync, and SoundPool can't handle files over 1MB or 5 seconds run time. I am experimenting with superpowered and may end up using that, but I'm wondering if there's anything simpler, given that I don't need any processing (reverb, flange, etc.), which is superpowered's focus.
Also ran across the YouTube video on Android high-performance audio, from Google I/O 2016. Wondering if anyone has any experience with this.
https://www.youtube.com/watch?v=F2ZDp-eNrh4
Superpowered was originally made for my DJ app (DJ Player in the App Store), where precisely syncing multiple tracks is a requirement.
Therefore, syncing multiple mp3s and independent volume control is definitely possible and core to Superpowered. All you need is the SuperpoweredAdvancedAudioPlayer class for this.
The CrossExample project in the SDK has two players playing in sync.
The built-in audio features in Android are highly device and/or build dependent. You can't get a consistent feature set with those. In general, the audio features of Android are not stable. That's why you need a specialized audio library which does everything "inside" your application (so is not a "wrapper" around Android's audio features).
When you are playing compressed files (AAC, MP3, etc) on Android in most situations they are decoded in hardware to save power, except when the output goes to a USB audio interface. The hardware codec accepts data in big chunks (again, to save power). Since it's not possible to issue a command to start playing multiple streams at once, what will often be happening is that one stream will already send a chunk of compressed audio to hardware codec, and it will start playing, while others haven't yet sent their data.
You really need to decode these files in your app and mix the output to produce a single audio stream. Then you will guarantee the desired synchronization. The built-in mixing facilities are mostly intended to allow multiple apps to use the same sound output, they are not designed for multitrack mixing.
Using Android MediaMuxer, what would be a decent way to add my own PCM track as the audio track in the final movie?
In a movie, at a certain time, I'm slowing down, stop, then accelerate and restart a video. For the video part, it's easy to directly affect the presentation time, but for audio, there is a chunk-by-chunk process that makes less intuitive to handle a slow down, a stop and a start in the audio track.
Currently, when iterating through the buffer I've received from the source, to slow down the whole track I do:
// Multiply by 3 the presentation time.
audioEncoderOutputBufferInfo.PresentationTimeUs =
audioEncoderOutputBufferInfo.PresentationTimeUs * ratio);
// I expand the sample by 3. Damn, just realized I haven't
// respected the sample alignment but anyway, the problem is not about white noise...
encoderOutputBuffer = Slowdown(encoderOutputBuffer, 3);
// I then write it in the muxer
muxer.WriteSampleData(outputAudioTrack, encoderOutputBuffer, audioEncoderOutputBufferInfo);
But this just doesn't play. Of course, if the MediaFormat from the source was copied to the destination, then it will have a 3 times shorter duration than the actual audio data.
Could I just take the whole PCM from an input, edit the byte[] array, and add it as a track to the MediaMuxer?
If you want to slow down your audio samples you need to do this before you encode them, so before you queue the input buffer of your audio codec.
From my experience, the audio presentation timestamps are ignored by most of the players out there (I tried it with VLC and ffplay). If you want to make sure that audio and video stay in sync, you must make sure that you actually have enough audio samples to fill in the gap between to pts, otherwise the player will just start to play the following samples regardless of their pts.
Furthermore you cannot just mux PCM samples using the MediaMuxer, you need to encode them first.
I want to make an app, that will have an feature of recording in a loop. That means, app will continuously record video and when a user hits "end of recording" button, the video will have only the last 1 minute recorded. What is the best way to achieve this?
As far as I know, there is no simple way to achieve this. Some rough ideas, though, in order of increasing difficulty:
If you can safely assume that the total recording time will be fairly short (i.e., you won't run out of storage space on the device), you could record the entire video and then perform a post-processing step that trims the video to size.
Record the video in one-minute chunks. When the user stops recording, compute how much of the previous chunk you need to prepend to the current chunk. Stitch the chunks together.
Register as a PreviewCallback and store the video frames in your own file format. Periodically remove the frames that you don't care about because they're too old. You would need to store the audio separately, and then you would need to transcode the custom format into a standard format.
Each of these would probably require some NDK code to do the work efficiently.