I am trying to create an application like Snapchat that applies face filters while recording the video and saves it with the filter on.
I know there are packages like AR core and flutter_camera_ml_vision but these are not helping me.
I want to provide face filters and apply them at the time of video recording on the face, and also save the video with the filter on the face.
Not The Easiest Question To Answer, But...
I'll give it a go, let's see how things turn out.
First of all, you should fill in some more details about the statements given in the question, especially what you're trying to say here:
I know there are packages like AR core and flutter_camera_ml_vision but these are not helping me.
How did you approach the problem and what makes you say that it didn't help you?
In the Beginning...
First of all, let's get some needed basics out of the way to better understand your current situation and level in the prerequisite areas of knowledge:
Do you have any experience using Computer Vision & Machine Learning frameworks in other languages / in other apps?
Do you have the required math skills needed to use this technology?
As you're using Flutter, my guess is that cross-platform compatibility is high priority, have you done much Flutter programming before and what devices are your main targets?
So, What is required for creating a Snapchat-like filter for use in live video recording?
Well, quite a lot of work happens behind the scenes when you apply a filter to live video using any app that implements this in a decent way.
Snapchat uses in-house software that they've built up over years, using technology acquired from multiple multi-million dollar company acquisitions, often established companies that specialized in Computer Vision and AR technology, in addition to their own efforts, and has steadily grown to be quite impressive through the last 5-6 years in particular.
This isn't something you can throw together by yourself as an "all night'er" and expect good results. But there are tools available for easing the general learning curve, but these tools also require a firm understanding of the underlying concepts and technologies being used, and quite a lot of math.
The Technical Detour
OK, I know I may have went a bit overboard here, but this is fundamental building blocks, not so many are aware of the actual amount of computation needed for seemingly "basic" functionality, so please, TLDR; or not, this is fundamental stuff.
To create a good filter for live capture using a camera on something like an iPhone or Android device, you could, and most probably would, use AR as you mentioned you wanted to use in the end, but realize that this is a sub-set of the broad field of Computer Vision (CV) that uses various algorithms from Artificial Intelligence (AI) and Machine Learning (ML) for the main tasks of:
Facial Recognition
Given frames of video content from the live camera, define the area containing a human face (some also works with animals, but let's keep it as simple as possible) and output a rectangle suitable for use as a starting point in (x, y, for width & height).
The analysis phase alone will require a rather complex combination of algorithms / techniques from different parts of the AI universe, and this being video, not a single static image file, this must be continuously updated as the person / camera moves, so it must be done in close to real-time, in the millisecond range.
I believe different implementations combining HOG (Histogram of Oriented Gradients) from Computer Vision and SVMs (Support Vector Machines / Networks) from Machine Learning are still pretty common.
Detection of Facial Landmarks
This is what will define how well a certain effect / filter will adapt to different types of facial features and detect accessories like glasses, hats etc. Also called "facial keypoint detection", "facial feature detection" and other variants in different literature on the subject.
Head Pose Estimation
Once you know a few landmark points, you can also estimate the pose of the head.
This is an important part of effects like "face swap" to correctly re-align one face with another in an acceptable manner. A toolkit like OpenFace (Uses Python, OpenCV, OpenBLAS, Dlib ++) contains a lot of useful functionality, capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation, delivering pretty decent results.
The Compositing of Effects into the Video Frames
After the work with the above is done, the rest involves applying the target filter, dog ears, rabbit teeth, whatever to the video frames, using compositing techniques.
As this answer is starting to look more like an article, I'll leave it to you to go figure out if you want to know more of the details in this part of the process.
Hey, Dude. I asked for AR in Flutter, remember?
Yep.
I know, I can get a bit carried away.
Well, my point is that it takes a lot more than one would usually imagine creating something like you ask for.
BUT.
My best advice if Flutter is your tool of choice would be to learn how to use the Cloud-Based ML services from Google's Firebase suite of tools, Firebase Machine Learning and Google's MLKit.
Add to this some AR-specific plugins, like the ARCore Plugin, and I'm sure you'll be able to get the pieces together if you have the right background and attitude, plus a good amount of aptitude for learning.
Hope this wasn't digressing too far from your core question, but there are no shortcuts that I know of that cut more corners than what I've already mentioned.
You could absolutely use the flutter_camera_ml_vision plugin and it's face recognition which will give you positions for landmarks of a face, such as, nose, eyes etc. Then simply stack the CameraPreview with a CustomPaint(foregroundPainter: widget in which you draw your filters using the different landmarks as coordinates for i.e. glasses, beards or whatever you want at the correct position of the face in the camera preview.
Google ML Kit also has face recognition that produces landmarks and you could write your own flutter plugin for that.
You can capture frames from the live camera preview and reformat them and then it as a byte buffer to ML kit or ML vision. I am currently writing a flutter plugin for ML kit pose detection with live capture so if you have any specific question about that let me know.
You will then have to merge the two surfaces and save to file in appropriate format. This is unknown territory for me so I can not provide any details about this part.
I'm thinking of starting a android project, which records audio signals and does some processing to denoise. My quesion is, as many (nearly all) denoising algorithms involve FFT, is it possible for me to do a real-time program? By real-time I mean the program do recording and processing at the same time, so I could save my time when I finish recording.
I have made a sample project, which applies fourier transformation to the audio signal and implement a simple algorithm called sub-spectrum. But I found that it is difficult to implement this algorithm in real time, which means after I press the 'stop' button, it takes me a while to do the processing and save the file (I'm also wondering how do these commercial recorder programs record sound and at the same time save it). I know that my FFT may not be the fastest, but I'd like to know whether I could achieve 'real-time', if I fully optimized it or use the fastest FFT code? Thanks a lot!
It sounds like you are talking about broadband denoising. So I'll address my question to that. There are other kinds of denoising, from simple filtering to adaptive filtering to dynamic range expanding and probably others.
I don't think anyone can answer this question with a simple yes or no. You will have to try it and see what can be done.
First off, there are a variety of FFT implementations, including FFTW, of varying speed you could try. Some are faster than others, but at the end of the day they are all going to deliver comparable results.
This is one place where native C/C++ will outperform Java/Dalvik code because it can truly take advantage of vector code. For that to work, you'll probably need to write some assembler, or find some code that is already android optimized. I'm not aware of an android optimized FFT, but I'm sure it exists.
The real performance win will come from how you structure your overall denoising algorithm. All denoising I'm familiar with is extremely processor intensive and probably won't work on a phone in real-time, although it might on a tablet. That's just a(n educated) guess, though.
I made test application in Delphi that beeps morse code using Windows API Beep function. Then made an application in Android that stores this morse code in WAV file. Now I want Android application to decode the morse code. Is there some tutorials for sound processing or can somebody post some simple code (think there's no simplicity here) for an example? Or maybe steps that I need to do to get it work?
I also downloaded the JTransforms and jfttw libraries but don't really know where to start.
Regards,
evilone
An FFT is overkill for this - you can just use a simple Goertzel filter to isolate the morse code from background noise, then decode the output of this.
I think an older issues of QST magazine had an article on DSP for Morse/CW decoding several years back. Might want to try and search their archives.
Basically, you need DSP code to determine whether or not a tone is present at any given point in time, and an estimate of the onset and off-time of each tone. Then scale the duration of each tone and the gap times between the tones for the expected code speed, and compare against a table of timings for each Morse code letter to estimate the probability of each or any letter being present.
In the simplest case, you might have a dot-dash-space decision tree. In severe noise and fading plus highly personalized fist/timing you might need some sophisticated statistical and/or adaptive audio pattern matching techniques for decent results.
I am developing an android application that provides instruction on various topics. Within my application, I would like to have a "talking head" or even a full-body person that talks with moving lips synchronized (or at least close) to the spoken output. Ideally, I would want the head/body to move while the speech is occurring also, with eyes blinking, arms (if it has a body) moving, etc. I know how to do all the speech parts, but I've never developed animation before. I'm using Eclipse. I really am only looking for advice to get me started down the right path. Is there a framework, add-on purchase, etc. that will make my life easier? There has to be a better method than animating/rotating open/close mouth images during the speech output. I do NOT want jib-jab type of animation! Thank you in advance for any starting advice you can give me!
Xface may be a solution. You need SMIL scripts for the audio.
I noticed that Flash allows you to insert cue's into a video file (flv). Is something like this possible on Android? I have a video that runs locally in my Android app and I would like to insert cues into the video which will give me callbacks when a certain portion of the video has been reached. If this is not possible, are there any other methods to do something similar? I have to be pretty precise with where the cue is located.
Thanks
Note:
I just found this same question on stackoverflow. Can anyone verify that this is still the case? (That it is not possible, only by polling the video continually). I did know of this way, but it's not the most accurate way if you need to be precise and stich dynamic pieces of video together seamlessly.
Android VideoView - Detect point of time in video
I´m working on this as well and a kind of cue/action scripts. For tutorials, instruction video I need to keep track of current position to serve for example questions and navigation menus appropriate for that point in time. Easy when it´s sufficient to act in response to user input but otherwise firing up a thread to poll at some decent interval is the thing. Accuracy might be acceptable and can be calibrated by sensing actual position.