I am recording the audio stream using AudioRecord class.
I am reading samples to a buffer with size of 2040 samples.
After reading samples, they are processed and reduced to size of 170 samples.
Even in this case there are a lot of samples to draw them in real time, I have not managed to configure SciChart library to show this samples correctly, they are compressed however I need to make a chart wider.
I am processing the ECG signal, therefore, all main components like R peak, QRS complex and other should be left without change.
Is there any techniques that could be applied to reduce samples count, but do not spoil the signal.
I think techniques like moving average won't work, as far they are used for smoothing the signal, however in my case I don't need to smooth it.
I would be grateful for any help.
Thanks.
SciChart android should be able to draw 2,040 samples in real-time (actually, millions). It includes methods to automatically downsample waveforms without visual loss of data, and in performance benchmarks can draw a million points on Android devices.
Have you got a code sample? Maybe something is wrong in the configuration.
Related
I am developing app in which I need to get face landmarks points on a cam like mirror cam or makeup cam. I want it to be available for iOS too. Please guide me for a robust solution.
I have used Dlib and Luxand.
DLIB: https://github.com/tzutalin/dlib-android-app
Luxand: http://www.luxand.com/facesdk/download/
Dlib is slow and having a lag of 2 sec approximately (Please look at the demo video on the git page) and luxand is ok but it's paid. My priority is to use an open source solution.
I have also use the Google vision but they are not offering much face landmarks points.
So please give me a solution to make the the dlib to work fast or any other option keeping cross-platform in priority.
Thanks in advance.
You can make Dlib detect face landmarks in real-time on Android (20-30 fps) if you take a few shortcuts. It's an awesome library.
Initialization
Firstly you should follow all the recommendations in Evgeniy's answer, especially make sure that you only initialize the frontal_face_detector and shape_predictor objects once instead of every frame. The frontal_face_detector will initialize faster if you deserialize it from a file instead of using the get_serialized_frontal_faces() function. The shape_predictor needs to be initialized from a 100Mb file, and takes several seconds. The serialize and deserialize functions are written to be cross-platform and perform validation on the data, which is robust but makes it quite slow. If you are prepared to make assumptions about endianness you can write your own deserialization function that will be much faster. The file is mostly made up of matrices of 136 floating point values (about 120000 of them, meaning 16320000 floats in total). If you quantize these floats down to 8 or 16 bits you can make big space savings (e.g. you can store the min value and (max-min)/255 as floats for each matrix and quantize each separately). This reduces the file size down to about 18Mb and it loads in a few hundred milliseconds instead of several seconds. The decrease in quality from using quantized values seems negligible to me but YMMV.
Face Detection
You can scale the camera frames down to something small like 240x160 (or whatever, keeping aspect ratio correct) for faster face detection. It means you can't detect smaller faces but it might not be a problem depending on your app. Another more complex approach is to adaptively crop and resize the region you use for face detections: initially check for all faces in a higher res image (e.g. 480x320) and then crop the area +/- one face width around the previous location, scaling down if need be. If you fail to detect a face one frame then revert to detecting the entire region the next one.
Face Tracking
For faster face tracking, you can run face detections continuously in one thread, and then in another thread, track the detected face(s) and perform face feature detections using the tracked rectangles. In my testing I found that face detection took between 100 - 400ms depending on what phone I used (at about 240x160), and I could do 7 or 8 face feature detections on the intermediate frames in that time. This can get a bit tricky if the face is moving a lot, because when you get a new face detection (which will be from 400ms ago), you have to decide whether to keep tracking from the new detected location or the tracked location of the previous detection. Dlib includes a correlation_tracker however unfortunately I wasn't able to get this to run faster than about 250ms per frame, and scaling down the resolution (even drastically) didn't make much of a difference. Tinkering with internal parameters produced increase speed but poor tracking. I ended up using a CAMShift tracker based on the chroma UV planes of the preview frames, generating the color histogram based on the detected face rectangles. There is an implementation of CAMShift in OpenCV, but it's also pretty simple to roll your own.
Hope this helps, it's mostly a matter of picking the low hanging fruit for optimization first and just keep going until you're happy it's fast enough. On a Galaxy Note 5 Dlib does face+feature detections at about 100ms, which might be good enough for your purposes even without all this extra complication.
Dlib is fast enough for most cases. The most of processing time is taken to detect face region on image and its slow because modern smartphones are producing high-resolution images (10MP+)
Yes, face detection can take 2+ seconds on 3-5MP image, but it tries to find very small faces of 80x80 pixels size. I am really sure, that you dont need such small faces on high resolution images and the main optimization here is to reduce the size of image before finding faces.
After the face region is found, the next step - face landmarks detection is extremely fast and takes < 3 ms for one face, this time does not depend on resolution.
dlib-android port is not using dlib's detector the right way for now. Here is a list of recommendations how to make dlib-android port work much faster:
https://github.com/tzutalin/dlib-android/issues/15
Its very simple and you can implement it yourself. I am expecting performance gain about 2x-20x
Apart from OpenCV and Google Vision, there are widely-available web services like Microsoft Cognitive Services. The advantage is that it would be completely platform-independent, which you've listed as a major design goal. I haven't personally used them in an implementation yet but based on playing with their demos for awhile they seem quite powerful; they're pretty accurate and can offer quite a few details depending on what you want to know. (There are similar solutions available from other vendors as well by the way).
The two major potential downsides to something like that are the potential for added network traffic and API pricing (depending on how heavily you'll be using them).
Pricing-wise, Microsoft currently offers up to 5,000 transactions a month for free with added transactions beyond that being some fraction of a penny (depending on traffic, you can actually get a discount for high volume), but if you're doing, for example, millions of transactions per month the fees can start adding up surprisingly quickly. This is actually a fairly typical pricing model; before you select a vendor or implement this kind of a solution make sure you understand how they're going to charge you and how much you're likely to end up paying and how much you could be paying if you scale your user base. Depending on your traffic and business model it could be either very reasonable or cost-prohibitive.
The added network traffic may or may not be a problem depending on how your app is written and how much data you're sending. If you can do the processing asynchronously and be guaranteed reasonably fast Wi-Fi access that obviously wouldn't be a problem but unfortunately you may or may not have that luxury.
I am currently working with the Google Vision API and it seems to be able to detect landmarks out of the box. Check out the FaceTracker here:
google face tracker
This solution should detect the face, happiness, and left and right eye as is. For other landmarks, you can call the getLandmarks on a Face and it should return everything you need (thought I have not tried it) according to their documentation: Face reference
So I'm trying to get the camera pixel data, monitor any major changes in luminosity and then save the image. I have decided to use open gl as I figured it would be quicker to do the luminosity checks in the fragment shader.
I bind a surface texture to the camera to get the image to the shader and am currently using glReadPixels to get the pixels back which I then put in a bitmap and save.
The bottle neck on the glReadPixels is crazy so I looked into other options and saw that EGL_KHR_image_base was probably my best bet as I'm using OpenGL-ES 2.0.
Unfortunately I have no experience with extensions and don't know where to find exactly what I need. I've downloaded the ndk but am pretty stumped. Could anyone point me in the direction of some documentation and help explain it if I don't understand fully?
Copying pixels with glReadPixels() can be slow, though it may vary significantly depending on the specific device and pixel format. Some tests with using glReadPixels() to save frames from video data (which is also initially YUV) found that 96.5% of the time was in PNG compression and file I/O on a Nexus 5.
In some cases, the time required goes up substantially if the source and destination formats don't match. On one particular device I found that copying to RGBA, instead of RGB, reduced the time required.
The EGL calls can work but require non-public API calls. And it's a bit tricky; see e.g. this answer. (I think the comment in the edit would allow it to work, but I never got back around to trying it, and I'm not in a position to do so now.)
The only solution would be using Pixel Pack Buffer (PBO), where the reading is asynchronous. However, to utilize this asynchronous, you need to have PBO and use it as ping pong buffer.
I refer to http://www.jianshu.com/p/3bc4db687546 where I reduce the read time for 1080p from 40ms to 20ms.
I'm thinking of starting a android project, which records audio signals and does some processing to denoise. My quesion is, as many (nearly all) denoising algorithms involve FFT, is it possible for me to do a real-time program? By real-time I mean the program do recording and processing at the same time, so I could save my time when I finish recording.
I have made a sample project, which applies fourier transformation to the audio signal and implement a simple algorithm called sub-spectrum. But I found that it is difficult to implement this algorithm in real time, which means after I press the 'stop' button, it takes me a while to do the processing and save the file (I'm also wondering how do these commercial recorder programs record sound and at the same time save it). I know that my FFT may not be the fastest, but I'd like to know whether I could achieve 'real-time', if I fully optimized it or use the fastest FFT code? Thanks a lot!
It sounds like you are talking about broadband denoising. So I'll address my question to that. There are other kinds of denoising, from simple filtering to adaptive filtering to dynamic range expanding and probably others.
I don't think anyone can answer this question with a simple yes or no. You will have to try it and see what can be done.
First off, there are a variety of FFT implementations, including FFTW, of varying speed you could try. Some are faster than others, but at the end of the day they are all going to deliver comparable results.
This is one place where native C/C++ will outperform Java/Dalvik code because it can truly take advantage of vector code. For that to work, you'll probably need to write some assembler, or find some code that is already android optimized. I'm not aware of an android optimized FFT, but I'm sure it exists.
The real performance win will come from how you structure your overall denoising algorithm. All denoising I'm familiar with is extremely processor intensive and probably won't work on a phone in real-time, although it might on a tablet. That's just a(n educated) guess, though.
So I'm trying to build an android app which acts as a real time audio analyzer as a precursor to a project that will involve detecting and filtering out certain sounds.
So I think I've got the basics of discrete Fourier transforms down, however I'm not sure what the best parameters should be for doing real time frequency analysis.
I get the impression that under ideal situations (unlimited computing power), I would take all the samples from the 44100 sample/sec PCM stream I'm getting from the AudioRecord class and put them through a 44100 element fifo "window" (padded to 2**16 with 0's and maybe a tapering function?) , running an FFT on the window every time a new sample came in. This would (I think), give me the spectrum for 0 - ~22 KHz updated 44100 times per second.
It seems like this is not going to happen on a smartphone. The thing is, I'm not sure which parameters of the computation I should reduce in order to make in order to make it tractable on my Galaxy Nexus while still holding on to as much quality as possible. Eventually I would like to be using an external microphone with better sensitivity.
I figure it will involve moving the window more than one sample between taking FFT's, but I have no idea at what point this becomes more detrimental to accuracy/aliasing/whatever than just doing the FFT on a smaller window, or if there is a third option I'm overlooking.
With the natively implemented KissFFT I'm using from libgdx, I seem to be able to do somewhere between 30-42 44100 element FFT's per 44100 samples and still have it be responsive (meaning that the buffer getting filled from the thread doing AudioRecord.read() isn't filling up faster than the thread doing the fft's can drain it).
So my questions are:
Could the performance I'm currently getting just be the best I'm going to get? Or does it seem like I must be something stupid because much faster speeds are possible?
Is my approach to this at least fundamentally correct or am I barking entirely up the wrong tree?
I'd be happy to show any of my code if that would help answer my questions, but there's a lot of it so I figured I would do so selectively instead of posting it all.
if there is a third option I'm overlooking
Yes: doing both at the same time, a reduction of the FFT size as well as a larger step size. In a comment you pointed out that you want to detect "sniffling/chewing with mouth". So, what you want to do is similar to the typical task of speech recognition. There, you typically extract a feature vector in steps of 10ms (meaning with Fs=44.1kHz every 441 samples) and the signal window to transform is roughly about double the size of the step size, so 20ms which yields to a 2^X FFT size of 1024 samples (make sure that you choose an FFT size which is a power of 2, because it is faster).
Any increase in window size or reduction in step size increases the data but mainly adds redundancy.
Additional hints:
#SztupY correctly pointed out that you need to "window" your signal prior to the FFT, typically with a Hamming-wondow. (But this is not "filtering". It is just multiplying each sample value with the corresponding window value without accumulating the result).
The raw FFT output is hardly suited to recognize "sniffling/chewing with mouth", a classical recognizer consists of HMMs or ANNs which process sequences of MFCCs and their deltas.
Could the performance I'm currently getting just be the best I'm going to get? Or does it seem like I must be something stupid because much faster speeds are possible?
It's close to the best, but you are wasting all the CPU power to estimate highly redundant data, leaving no CPU power to the recognizer.
Is my approach to this at least fundamentally correct or am I barking entirely up the wrong tree?
After considering my answer you might re-think your approach.
The question does not mean that I'm interested if ffmpeg code can be used on Andoid. I know that it can. I'm just asking if somebody has the real performance progress with that stuff.
I've created the question after several weeks of experiments with the stuff and I've had enough.
I do not want to write to branches where people even do not say what kind of video they decode (resolution, codec) and talk only about some mystical FPS. I just don't understand what they want to do. Also I'm not going to develop application only for my phone or for Android 2.2++ phones that have some extended OpenGL features. I have quite popular phone HTC Desire so if the application does not work on it, so what's next?
Well, what do I have?
FFMpeg source from the latest HEAD branch. Actually I could not buld it with NDK5 so I decided to use stolen one.
Bambuser's build script (bash) with appropriate ffmpeg source ([web]: http://bambuser.com/r/opensource/ffmpeg-4f7d2fe-android-2011-03-07.tar.gz).
It builds well after some corrections by using NDK5.
Rockplayer's gelded ffmpeg source code with huge Android.mk in the capacity of build script ([web]: http://www.rockplayer.com/download/rockplayer_ffmpeg_git_20100418.zip).
It builds by NDK3 and NDK5 after some corrections. Rockplayer is probably the most cool media player for Android and I supposed that I would have some perks using it's build.
I had suitable video for a project (is not big and is not small): 600x360 H.264.
Both libraries we got from clauses 2 and 3 provide us possibility to get frames from video (frame-by-frame, seek etc.). I did not try to get an audio track because I did not need one for the project. I'm not publishing my source here because I think that's traditional and it's easy to find.
Well, what's the results with video?
HTC Desire, Android 2.2
600x360, H.264
decoding and rendering are in different threads
Bambuser (NDK5 buld for armv5te, RGBA8888): 33 ms/frame average.
Rockplayer (NDK3 build for neon, RGB565): 27 ms/frame average.
It's not bad for the first look, but just think that these are results only to decode frames.
If somebody has much better results with decoding time, let me know.
The most hard thing for a video is rendering. If we have bitmap 600x360 we should scale one somehow before painting because different phones have different screen sizes and we can not expect that our video will be the same size as screen.
What options do we have to rescale a frame to fit it to screen?
I was able to check (the same phone and video source) those cases:
sws_scale() C function in Bambuser's build: 70 ms/frame. Unacceptable.
Stupid bitmap rescaling in Android (Bitmap.createScaledBitmap): 65 ms/frame. Unacceptable.
OpenGL rendering in ortho projection on textured quad. In this case I did not need to scale frame. I just needed to prepare texture 1024x512 (in my case it was RGBA8888) containig frame pixels and than load it in GPU (gl.glTexImage2D). Result: ~220 ms/frame to render. Unacceptable. I did not expect that glTexImage2D just sucked on Snapdragon CPU.
That's all. I know that there is some way to use fragment shader to convert YUV pixels using GPU, but we will have the same glTexImage2D and 200 ms just to texture loading.
But this is not the end. ...my only friend the end... :) It is not a hopeless condition.
Trying to use RockPlayer you definitely will wonder how they do that damn frame scaling so fast. I suppose that they have really good experience in ARM achitecture. They most probably use avcodec_decode_video2 and than img_convert (as I did in RP version), but then they use some tricks (depends of ARM version) for scaling.
Maybe they also have some "magic" buld configuration for ffmpeg decreasing decoding time but Android.mk that they published is not THE Android.mk they use. I don't know.
So, now it looks like you can not just buld some easy JNI bridge for ffmpeg and than have real media player for Android platform. You can do this only if you have suitable video that you do not need to scale.
Any ideas?
I did compile ffmpeg on android. From this point - playing video is purely implementation dependant, so no point of measuring latencies on things whitch can be highly optimised in needed place and not using standart swscale. And yes - you can build some easy JNI bridge and use it in NDK to perform ffmpeg calls, but this would already be a player code.
In my experience, YUV to RGB conversion has always been a bottleneck. Therefore, using an OpenGL shader for this proved to give a significant boost.
I use http://writingminds.github.io/ffmpeg-android-java/ for my project. There is some workaround with complex commands but for simple commands the wrapper work very well for me.