I'm developing a VoIP application that runs at the sampling rate of 48 kHz. Since it uses Opus, which uses 48 kHz internally, as its codec, and most current Android hardware natively runs at 48 kHz, AEC is the only piece of the puzzle I'm missing now. I've already found the WebRTC implementation but I can't seem to figure out how to make it work. It looks like it corrupts the memory randomly and crashes the whole thing sooner or later. When it doesn't crash, the sound is kinda chunky as if it's quieter for the half of the frame. Here's my code that processes a 20 ms frame:
webrtc::SplittingFilter* splittingFilter;
webrtc::IFChannelBuffer* bufferIn;
webrtc::IFChannelBuffer* bufferOut;
webrtc::IFChannelBuffer* bufferOut2;
// ...
splittingFilter=new webrtc::SplittingFilter(1, 3, 960);
bufferIn=new webrtc::IFChannelBuffer(960, 1, 1);
bufferOut=new webrtc::IFChannelBuffer(960, 1, 3);
bufferOut2=new webrtc::IFChannelBuffer(960, 1, 3);
// ...
int16_t* samples=(int16_t*)data;
float* fsamples[3];
float* foutput[3];
int i;
float* fbuf=bufferIn->fbuf()->bands(0)[0];
// convert the data from 16-bit PCM into float
for(i=0;i<960;i++){
fbuf[i]=samples[i]/(float)32767;
}
// split it into three "bands" that the AEC needs and for some reason can't do itself
splittingFilter->Analysis(bufferIn, bufferOut);
// split the frame into 6 consecutive 160-sample blocks and perform AEC on them
for(i=0;i<6;i++){
fsamples[0]=&bufferOut->fbuf()->bands(0)[0][160*i];
fsamples[1]=&bufferOut->fbuf()->bands(0)[1][160*i];
fsamples[2]=&bufferOut->fbuf()->bands(0)[2][160*i];
foutput[0]=&bufferOut2->fbuf()->bands(0)[0][160*i];
foutput[1]=&bufferOut2->fbuf()->bands(0)[1][160*i];
foutput[2]=&bufferOut2->fbuf()->bands(0)[2][160*i];
int32_t res=WebRtcAec_Process(aecState, (const float* const*) fsamples, 3, foutput, 160, 20, 0);
}
// put the "bands" back together
splittingFilter->Synthesis(bufferOut2, bufferIn);
// convert the processed data back into 16-bit PCM
for(i=0;i<960;i++){
samples[i]=(int16_t) (CLAMP(fbuf[i], -1, 1)*32767);
}
If I comment out the actual echo cancellation and just do the float conversion and band splitting back and forth, it doesn't corrupt the memory, doesn't sound weird and runs indefinitely. (I do pass the farend/speaker signal into AEC, I just didn't want to make the mess of my code by including it in the question)
I've also tried Android's built-in AEC. While it does work, it upsamples the captured signal from 16 kHz.
Unfortunately, there is no free AEC package that support 48khz. So, either move to 32khz or use a commercial AEC package at 48khz.
Related
I'm working with Android MediaCodec and use it for a realtime H264 encoding and decoding frames from camera. I use MediaCodec in synchronous manner and render the output to the Surface of decoder and everething works fine except that I have a long latency from a realtime, it takes 1.5-2 seconds and I'm very confused why is it so.
I measured a total time of encoding and decoding processes and it keeps around 50-65 milliseconds so I think the problem isn't in them.
I tried to change the configuration of the encoder but it didn't help and currently it configured like this:
val formatEncoder = MediaFormat.createVideoFormat("video/avc", 1920, 1080)
formatEncoder.setInteger(MediaFormat.KEY_FRAME_RATE, 30)
formatEncoder.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5)
formatEncoder.setInteger(MediaFormat.KEY_BIT_RATE, 1920 * 1080)
formatEncoder.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
val encoder = MediaCodec.createEncoderByType("video/avc")
encoder.configure(formatEncoder, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
val inputSurface = encoder.createInputSurface() // I use it to send frames from camera to encoder
encoder.start()
Changing the configuration of the decoder also didn't help me at all and currently I configured it like this:
val formatDecoder = MediaFormat.createVideoFormat("video/avc", 1920, 1080)
val decoder = MediaCodec.createDecoderByType("video/avc")
decoder.configure(formatDecoder , outputSurface, null, 0) // I use outputSurface to render decoded frames into it
decoder.start()
I use the following timeouts for waiting for available encoder/decoder buffers I tried to reduce their values but it didn't help me and I left them like this:
var TIMEOUT_IN_BUFFER = 10000L // microseconds
var TIMEOUT_OUT_BUFFER = 10000L // microseconds
Also I measured the time of consuming the inputSurface a frame and this time takes 0.03-0.05 milliseconds so it isn't a bottleneck. Actually I measured all the places where a bottleneck could be, but I wasn't found anything and I think the problem is in the encoder or decoder itself or in their configurations, or maybe I should use some special routine for sending frames to encoding/decoding..
I also tried to use HW accelerated codec and it's the only thing that helped me, when I use it the latency reduces to ~ 500-800 milliseconds but it still doesn't fit me for a realtime streaming.
It seems to me that the encoder or decoder buffers several frames before start displaying them on the surface and eventually it leads to the latency and if it really so then how can I disable bufferization or reduce the time of it?
Please help me I'm stucking on this problem for about half a year and have no idea how to reduce the latency, I'm sure that it's possible because popular apps like Telegram, Viber, WhatsApp etc. work fine and without latency so what's the secret here?
UPD 07.07.2021:
I still haven't found a solution to get rid of the latency. I've tried to change h264 profiles, increase and decrease I-frame inteval, bitrate, framerate, but result the same, the only thing that hepls a little to reduce the latency - downgrade the resolution from 1920x1080 to e.g. 640x480, but this "solution" doesn't suit me because I want to encode/decode a realtime video with 1920x1080 resolution.
UPD 08.07.2021:
I found out that if I change the values of TIMEOUT_IN_BUFFER and TIMEOUT_OUT_BUFFER from 10_000L to 100_000L it decreases the latency a bit but increases the delay of showing the first frame quite a lot after start encoding/decoding process.
It's possible your encoder is producing B frames -- bilinear interpolation frames. They increase quality and latency, and are great for movies. But no good for low-latency applications.
Key frames = I (interframes)
Predicted frames = P (difference from previous frames)
Interpolated frames = B
A sequence of frames including B frames might look like this:
IBBBPBBBPBBBPBBBI
11111111
12345678901234567
The encoder must encode each P frame, and the decoder must decode it, before the preceding B frames make any sense. So in this example the frames get encoded out of order like this:
1 5 2 3 4 9 6 7 8 13 10 11 12 17 17 13 14 15
In this example the decoder can't handle frame 2 until the encoder has sent frame 5.
On the other hand, this sequence without B frames allows coding and decoding the frames in order.
IPPPPPPPPPPIPPPPPPPPP
Try using the Constrained Baseline Profile setting. It's designed for low latency and low power use. It suppresses B frames. I think this works.
mediaFormat.setInteger(
"profile",
CodecProfileLevel.AVCProfileConstrainedBaseline);
I believe android h264 decoder have latency (at-least in most cases i've tried). Probably that's why android developers added PARAMETER_KEY_LOW_LATENCY from API level 30.
However I could decrease the delay some frames by querying for the output some more times.
Reason: no idea. It's just result of boring trial and errors
int inputIndex = m_codec.dequeueInputBuffer(-1);// Pass in -1 here bc we don't have a playback time reference
if (inputIndex >= 0) {
ByteBuffer buffer;
if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.LOLLIPOP) {
buffer = m_codec.getInputBuffer(inputIndex);
} else {
ByteBuffer[] bbuf = m_codec.getInputBuffers();
buffer = bbuf[inputIndex];
}
buffer.put(frame);
// tell the decoder to process the frame
m_codec.queueInputBuffer(inputIndex, 0, frame.length, 0, 0);
}
MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
int outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
outputIndex = m_codec.dequeueOutputBuffer(info, 0);
if (outputIndex >= 0) {
m_codec.releaseOutputBuffer(outputIndex, true);
}
You need to configure customized(or KEY_LOW_LATENCY if it is supported) low latency parameters for different cpu venders. It is a common problem for android phone.
Check this code https://github.com/moonlight-stream/moonlight-android/blob/master/app/src/main/java/com/limelight/binding/video/MediaCodecHelper.java
Android MediaCodec decode takes a super long time, about 115 to 118 msecs per frame. This is a h264 frame. The Android device has a qualcomm snapdragon 845 processor, so I assume the Android MediaCodec APIs target the qualcomm GPU and not the ARM core CPU.
Wondering if anyone has experienced such issue/s before and can provide guidance on how to make this decode go faster?
The code is all native code, no java at all. With no Java, I have no active window, no surface texture... so Grafika examples don't help here. I am using AndroidP(9.0) API 28. NDK 19.2.5x.
Here's how my code is setup:
Step1: I have two codec instances configured on two separate threads as follows:
codecData.codec = AMediaCodec_createDecoderByType("video/avc");
AMediaFormat_setString(codecData.format_eye, AMEDIAFORMAT_KEY_MIME, "video/avc");
AMediaFormat_setInt32(codecData.format_eye, AMEDIAFORMAT_KEY_HEIGHT, 1920);
AMediaFormat_setInt32(codecData.format_eye, AMEDIAFORMAT_KEY_WIDTH, 1080);
AMediaFormat_setFloat(codecData.format_eye, AMEDIAFORMAT_KEY_FRAME_RATE, 60.0f);
Step2: I enqueue the encoded buffer using these calls which take 14 to 17 msecs on 60 FPS input with two separate threads populating the individual codec Qs:
bufIdx = AMediaCodec_dequeueInputBuffer(codecData.codec, -1); //-1 makes it blocking call
auto buf = AMediaCodec_getInputBuffer(codecData.codec, bufIdx, &bufSize);
uint64_t presentTime = presentTimer.getTimeUs();
memcpy(buf, data, size);
AMediaCodec_queueInputBuffer(codecData.codec, bufIdx, 0, size, presentTime, 0);
Step3: I dequeue the decoded buffer as follows, these takes 115 to 118 msecs per frame per codec on 60 FPS output. The dequeue for both the codecs are done by one consumer thread which goes through both the codec instances one at a time:
AMediaCodecBufferInfo info_eye;
bufIdx = AMediaCodec_dequeueOutputBuffer(codecData.codec, &info_eye, 1);
auto decodedBuf = AMediaCodec_getOutputBuffer(codecData.codec, bufIdx, &bufSize);
Step4: The decoded buffer is then fed to a NV12toRGBA shader on the render thread that populates the texture which takes about 2 msecs. This texture then gets displayed.
I am expecting 60 FPS but get about 50 FPS due to the delays in Step3 i.e. the 115 to 118 msecs latency is killing me :-(
Any ideas? Appreciate any and all help.
I am trying to save image sequences with fixed framerates (preferably up to 30) on an android device with FULL capability for camera2 (Galaxy S7), but I am unable to a) get a steady framerate, b) reach even 20fps (with jpeg encoding). I already included the suggestions from Android camera2 capture burst is too slow.
The minimum frame duration for JPEG is 33.33 milliseconds (for resolutions below 1920x1080) according to
characteristics.get(CameraCharacteristics.SCALER_STREAM_CONFIGURATION_MAP).getOutputMinFrameDuration(ImageFormat.JPEG, size);
and the stallduration is 0ms for every size (similar for YUV_420_888).
My capture builder looks as follows:
captureBuilder.set(CaptureRequest.CONTROL_AE_MODE, CONTROL_AE_MODE_OFF);
captureBuilder.set(CaptureRequest.SENSOR_EXPOSURE_TIME, _exp_time);
captureBuilder.set(CaptureRequest.CONTROL_AE_LOCK, true);
captureBuilder.set(CaptureRequest.SENSOR_SENSITIVITY, _iso_value);
captureBuilder.set(CaptureRequest.LENS_FOCUS_DISTANCE, _foc_dist);
captureBuilder.set(CaptureRequest.CONTROL_AF_MODE, CONTROL_AF_MODE_OFF);
captureBuilder.set(CaptureRequest.CONTROL_AWB_MODE, _wb_value);
// https://stackoverflow.com/questions/29265126/android-camera2-capture-burst-is-too-slow
captureBuilder.set(CaptureRequest.EDGE_MODE,CaptureRequest.EDGE_MODE_OFF);
captureBuilder.set(CaptureRequest.COLOR_CORRECTION_ABERRATION_MODE, CaptureRequest.COLOR_CORRECTION_ABERRATION_MODE_OFF);
captureBuilder.set(CaptureRequest.NOISE_REDUCTION_MODE, CaptureRequest.NOISE_REDUCTION_MODE_OFF);
captureBuilder.set(CaptureRequest.CONTROL_AF_TRIGGER, CaptureRequest.CONTROL_AF_TRIGGER_CANCEL);
// Orientation
int rotation = getWindowManager().getDefaultDisplay().getRotation();
captureBuilder.set(CaptureRequest.JPEG_ORIENTATION,ORIENTATIONS.get(rotation));
Focus distance is set to 0.0 (inf), iso is set to 100, exposure-time 5ms. Whitebalance can be set to OFF/AUTO/ANY VALUE, it does not impact the times below.
I start the capture session with the following command:
session.setRepeatingRequest(_capReq.build(), captureListener, mBackgroundHandler);
Note: It does not make a difference if I request RepeatingRequest or RepeatingBurst..
In the preview (only texture surface attached), everything is at 30fps.
However, as soon as I attach an image reader (listener running on HandlerThread) which I instantiate like follows (without saving, only measuring time between frames):
reader = ImageReader.newInstance(_img_width, _img_height, ImageFormat.JPEG, 2);
reader.setOnImageAvailableListener(readerListener, mBackgroundHandler);
With time-measuring code:
ImageReader.OnImageAvailableListener readerListener = new ImageReader.OnImageAvailableListener() {
#Override
public void onImageAvailable(ImageReader myreader) {
Image image = null;
image = myreader.acquireNextImage();
if (image == null) {
return;
}
long curr = image.getTimestamp();
Log.d("curr- _last_ts", "" + ((curr - last_ts) / 1000000) + " ms");
last_ts = curr;
image.close();
}
}
I get periodically repeating time differences like this:
99 ms - 66 ms - 66 ms - 99 ms - 66 ms - 66 ms ...
I do not understand why these take double or triple the time that the stream configuration map advertised for jpeg? The exposure time is well below the frame duration of 33ms. Is there some other internal processing happening that I am not aware of?
I tried the same for the YUV_420_888 format, which resulted in constant time-differences of 33ms. The problem I have here is that the cellphone lacks the bandwidth to store the images fast enough (I tried the method described in How to save a YUV_420_888 image?). If you know of any method to compress or encode these images fast enough myself, please let me know.
Edit: From the documentation of getOutputStallDuration: "In other words, using a repeating YUV request would result in a steady frame rate (let's say it's 30 FPS). If a single JPEG request is submitted periodically, the frame rate will stay at 30 FPS (as long as we wait for the previous JPEG to return each time). If we try to submit a repeating YUV + JPEG request, then the frame rate will drop from 30 FPS." Does this imply that I need to periodically request a single capture()?
Edit2: From https://developer.android.com/reference/android/hardware/camera2/CaptureRequest.html: "The necessary information for the application, given the model above, is provided via the android.scaler.streamConfigurationMap field using getOutputMinFrameDuration(int, Size). These are used to determine the maximum frame rate / minimum frame duration that is possible for a given stream configuration.
Specifically, the application can use the following rules to determine the minimum frame duration it can request from the camera device:
Let the set of currently configured input/output streams be called S.
Find the minimum frame durations for each stream in S, by looking it up in android.scaler.streamConfigurationMap using getOutputMinFrameDuration(int, Size) (with its respective size/format). Let this set of frame durations be called F.
For any given request R, the minimum frame duration allowed for R is the maximum out of all values in F. Let the streams used in R be called S_r.
If none of the streams in S_r have a stall time (listed in getOutputStallDuration(int, Size) using its respective size/format), then the frame duration in F determines the steady state frame rate that the application will get if it uses R as a repeating request."
The JPEG output is by way not the fastest way to fetch frames. You can accomplish this a lot faster by drawing the frames directly onto a Quad using OpenGL.
For burst capture, a faster solution would be capturing the images to RAM without encoding them, then encoding and saving them asynchronously.
On this website you can find a lot of excellent code related to android multimedia in general.
This specific program uses OpenGL to fetch the pixel data from an MPEG video. It's not difficult to use the camera as input instead of a video. You can basically use the texture used in the CodecOutputSurface class from the mentioned program as output texture for your capture request.
A possible solution I found consists of using and dumping YUV without encoding it as JPEG in combination with a micro Sd-card that is able to save up to 95Mb per second. (I had the misconception that YUV images would be larger, so with a cellphone that has full support for the camera2-pipeline, the write speed should be the limiting factor.
With this setup, I was able to achieve the following stable rates:
1920x1080, 15fps (approx. 4Mb * 15 == 60Mb/sec)
960x720, 30fps. (approx. 1.5Mb * 30 == 45Mb/sec)
I then encode the images offline from YUV to PNG using a python script.
I have an app that uses OpenSL ES. When I try to use it on a Nexus9 6.0.1, I hear a noise like I have the wrong sampling rate. On other devices all is OK.
My SLDataFormat_PCM structure:
SLDataFormat_PCM format_pcm = {
SL_DATAFORMAT_PCM,
aChannels,
48000 * 1000,
SL_PCMSAMPLEFORMAT_FIXED_16,
SL_PCMSAMPLEFORMAT_FIXED_16,
aChannels == 2 ? SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT
: SL_SPEAKER_FRONT_CENTER,
SL_BYTEORDER_LITTLEENDIAN
};
When I change the sample rate (+/- 1Hz) in this structure, the output sounds OK, but I receive an AudioTrack debug message:
W/AudioTrack: AUDIO_OUTPUT_FLAG_FAST denied by client; transfer 1, track 47999 Hz, output 48000 Hz
Why do I have a problem in FAST mode, if the Nexus9 has 48000Hz?
I checked it using this method:
jclass clazz = env.getEnv()->FindClass("android/media/AudioSystem");
jmethodID mid = env.getEnv()->GetStaticMethodID(clazz, "getPrimaryOutputSamplingRate", "()I");
int nSampleRate = env.getEnv()->CallStaticIntMethod(clazz, mid);
LOGDEBUG << "Sample Rate: " << nSampleRate;
[ DBG:c894860f] 11:16:14.902: Sample Rate: 48000
Is there a better method to get device's sample rate?
Yes there's a method to find the preferred sample rate for a device though it'll work for API level > 16. You can have a look at my answer here.
And about your SLDataFormat_PCM structure. You've initialized with sample rate 48k*1k! If you want to sample your PCM data in 48k, try using the code below.
// configure audio source
SLDataFormat_PCM format_pcm = {
SL_DATAFORMAT_PCM,
aChannels,
SL_SAMPLINGRATE_48,
SL_PCMSAMPLEFORMAT_FIXED_16,
SL_PCMSAMPLEFORMAT_FIXED_16,
aChannels == 2 ? SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT
: SL_SPEAKER_FRONT_CENTER,
SL_BYTEORDER_LITTLEENDIAN
};
I didn't work with Nexus 9 before, so I don't know if it supports 48k sampling rate. But, anyway, you can check if it supports.
The problem was with mutex in callback function.
UPD:
OpenSLES Readme
Known issues
At 48000Hz, Galaxy Nexus and Nexus 10 produce glitchy output. At
44100Hz, Galaxy Nexus tends to glitch when switching activities or
bringing up large dialogs. Touch sounds occasionally cause OpenSL to
glitch. It's probably a good idea to disable touch sounds in audio
apps. These problems are not specific to opensl_stream and have been
reproduced in other settings.
I upgraded my Samsung Galaxy S4 from latest KitKat to Lollipop (5.0.1) yesterday and my IR remote control app that I have used for months stopped working.
Since I was using a late copy of KitKat ConsumerIrManager, the transmit( ) function was sending the number of pulses using the code below. It worked very nicely.
private void irSend(int freqHz, int[] pulseTrainInMicroS) {
int [] pulseCounts = new int [pulseTrainInMicroS.length];
for (int i=0; i<pulseTrainInMicroS.length; i++) {
long iValue = pulseTrainInMicroS[i] * freqHz / 1000000;
pulseCounts[i] = (int) iValue;
}
m_IRService.transmit(freqHz, pulseCounts);
}
when it stopped working yesterday, I began looking closely at it.
I noticed that the transmitted waveform is not having any relationship with the requested pulse train. even the code below doesn't work correctly! there is
private void TestSend() {
int [] pulseCounts = {100, 100, 100};
m_IRService.transmit(38000, pulseCounts);
}
the resulting waveforms had many problems and so are entirely useless.
the waveforms were entirely wrong
the frequency was wrong and the pulse spacing was not regular
they were not repeatable
looking at the demodulated waveform:
if my 100, 100, 100 were correctly rendered, I should have seen two pulses 2.6ms (before 4.4.3(?) 100 us) long. instead I received (see attached) "[demodulated] not repeatable 1.BMP" and "[demodulated] not repeatable 2.BMP". note that the waveform isn't 2 pulses...in fact, it's not even repeatable.
as for the captures below, the signal goes low when the IR is detected.
we should have seen two pulses going low for 2.6 ms and 2.6 ms between them (see green line below).
I had also tried shorter pulses using 50, 50, 50 and have observed that the first pulse isn't correct either (see below).
looking at the modulated waveform:
the frequency was not correct; instead, it was about 18kHz and irregular.
I'm quite experienced with this and have formal education in electronics.
It seems to me there's a bug in ConsumerIrManager.transmit( )...
curiously, the "WatchOn" application that comes with the phone still works.
thank you for any insights you can give.
Test equipment:
Tektronix TDS-2014B, 100 MHz, used in peak-detect mode.
As #IvanTellez says, a change was made in Android in respect to this functionality. Strangely, when I had it outputting simple IR signals (for troubleshooting purposes), the function behaves as shown above (erratically, wrong carrier frequency, etc). When I eventually returned to normal types of IR signals, it worked correctly.