I've been exploring 3D scanning and reconstruction using Google's project Tango.
So far, some apps I've tried like Project Tango Constructor and Voxxlr do a nice job over short time-spans (I would be happy to get recommendations for other potential scanning apps). The problem is, regardless of the app, if I run it long enough the scans accumulate so much drift that eventually everything is misaligned and ruined.
High chance of drift also occurs whenever I point the device over a featureless space like a blank wall, or when I point the cameras upward to scan ceilings. The device gets disoriented temporarily, thereby destroying the alignment of future scans. Whatever the case, getting the device to know where it is and what it is pointing at is a problem for me.
I know that some of the 3D scanning apps use Area Learning to some extent, since these apps ask me for permission to allow area learning upon startup of the app. I presume that this is to help localize the device and stabilize its pose (please correct me if this is inaccurate).
From the apps I've tried, I have never been given an option to load my own ADF. My understanding is that loading in a carefully learned feature-rich ADF helps to better anchor the device pose. Is there a reason for this dearth of apps that allow users to load in their homemade ADFs? Is it hard/impossible to do? Are current apps already optimally leveraging on area learning to localize, and is it the case that no self-recorded ADF I provide could ever do better?
I would appreciate any pointers/instruction on this topic - the method and efficacy of using ADFs in 3D scanning and reconstruction is not clearly documented. Ultimately, I'm looking for a way to use the Tango to make high quality 3D scans. If ADFs are not needed in the picture, that's fine. If the answer is that I'm endeavoring on an impossible task, I'd like to know as well.
If off-the-shelf solutions are not yet available, I am also willing to try to process the point cloud myself, though I have a feeling its probably much easier said than done.
Unfortunately, Tango doesn't have any application could do this at the moment, you will need to develop you own application for this. Just in case you wonder how to do this in code, here are the steps:
First, the learning mode of the application should be on. When we turn the learning mode on, the system will start to record an ADF, which allows the application to see a existing area that it has been to. For each point cloud we have saved, we should save the timestamp that associated with points as well.
After walking around and collecting the points, we will need to call the TangoService_saveAreaDescription function from the API. This step does some optimization on each key poses saved in the system. After done saving, we need to use the timestamp saved with point cloud to query to optimized pose again, to do that, we use the functionTangoService_getPoseAtTime. After this step, you will see the point cloud set to the right transformation, and the points will be overlapped together.
Just as a recap of the steps:
Turn on learning mode in Tango config.
Walk around, save point cloud along with the timestamp associated with the point cloud.
Call save TangoService_saveAreaDescription function.
After saving is done, call TangoServcie_getPoseAtTime to query the optimized pose based on the timestamp saved with the point cloud.
Related
I am trying to create an application like Snapchat that applies face filters while recording the video and saves it with the filter on.
I know there are packages like AR core and flutter_camera_ml_vision but these are not helping me.
I want to provide face filters and apply them at the time of video recording on the face, and also save the video with the filter on the face.
Not The Easiest Question To Answer, But...
I'll give it a go, let's see how things turn out.
First of all, you should fill in some more details about the statements given in the question, especially what you're trying to say here:
I know there are packages like AR core and flutter_camera_ml_vision but these are not helping me.
How did you approach the problem and what makes you say that it didn't help you?
In the Beginning...
First of all, let's get some needed basics out of the way to better understand your current situation and level in the prerequisite areas of knowledge:
Do you have any experience using Computer Vision & Machine Learning frameworks in other languages / in other apps?
Do you have the required math skills needed to use this technology?
As you're using Flutter, my guess is that cross-platform compatibility is high priority, have you done much Flutter programming before and what devices are your main targets?
So, What is required for creating a Snapchat-like filter for use in live video recording?
Well, quite a lot of work happens behind the scenes when you apply a filter to live video using any app that implements this in a decent way.
Snapchat uses in-house software that they've built up over years, using technology acquired from multiple multi-million dollar company acquisitions, often established companies that specialized in Computer Vision and AR technology, in addition to their own efforts, and has steadily grown to be quite impressive through the last 5-6 years in particular.
This isn't something you can throw together by yourself as an "all night'er" and expect good results. But there are tools available for easing the general learning curve, but these tools also require a firm understanding of the underlying concepts and technologies being used, and quite a lot of math.
The Technical Detour
OK, I know I may have went a bit overboard here, but this is fundamental building blocks, not so many are aware of the actual amount of computation needed for seemingly "basic" functionality, so please, TLDR; or not, this is fundamental stuff.
To create a good filter for live capture using a camera on something like an iPhone or Android device, you could, and most probably would, use AR as you mentioned you wanted to use in the end, but realize that this is a sub-set of the broad field of Computer Vision (CV) that uses various algorithms from Artificial Intelligence (AI) and Machine Learning (ML) for the main tasks of:
Facial Recognition
Given frames of video content from the live camera, define the area containing a human face (some also works with animals, but let's keep it as simple as possible) and output a rectangle suitable for use as a starting point in (x, y, for width & height).
The analysis phase alone will require a rather complex combination of algorithms / techniques from different parts of the AI universe, and this being video, not a single static image file, this must be continuously updated as the person / camera moves, so it must be done in close to real-time, in the millisecond range.
I believe different implementations combining HOG (Histogram of Oriented Gradients) from Computer Vision and SVMs (Support Vector Machines / Networks) from Machine Learning are still pretty common.
Detection of Facial Landmarks
This is what will define how well a certain effect / filter will adapt to different types of facial features and detect accessories like glasses, hats etc. Also called "facial keypoint detection", "facial feature detection" and other variants in different literature on the subject.
Head Pose Estimation
Once you know a few landmark points, you can also estimate the pose of the head.
This is an important part of effects like "face swap" to correctly re-align one face with another in an acceptable manner. A toolkit like OpenFace (Uses Python, OpenCV, OpenBLAS, Dlib ++) contains a lot of useful functionality, capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation, delivering pretty decent results.
The Compositing of Effects into the Video Frames
After the work with the above is done, the rest involves applying the target filter, dog ears, rabbit teeth, whatever to the video frames, using compositing techniques.
As this answer is starting to look more like an article, I'll leave it to you to go figure out if you want to know more of the details in this part of the process.
Hey, Dude. I asked for AR in Flutter, remember?
Yep.
I know, I can get a bit carried away.
Well, my point is that it takes a lot more than one would usually imagine creating something like you ask for.
BUT.
My best advice if Flutter is your tool of choice would be to learn how to use the Cloud-Based ML services from Google's Firebase suite of tools, Firebase Machine Learning and Google's MLKit.
Add to this some AR-specific plugins, like the ARCore Plugin, and I'm sure you'll be able to get the pieces together if you have the right background and attitude, plus a good amount of aptitude for learning.
Hope this wasn't digressing too far from your core question, but there are no shortcuts that I know of that cut more corners than what I've already mentioned.
You could absolutely use the flutter_camera_ml_vision plugin and it's face recognition which will give you positions for landmarks of a face, such as, nose, eyes etc. Then simply stack the CameraPreview with a CustomPaint(foregroundPainter: widget in which you draw your filters using the different landmarks as coordinates for i.e. glasses, beards or whatever you want at the correct position of the face in the camera preview.
Google ML Kit also has face recognition that produces landmarks and you could write your own flutter plugin for that.
You can capture frames from the live camera preview and reformat them and then it as a byte buffer to ML kit or ML vision. I am currently writing a flutter plugin for ML kit pose detection with live capture so if you have any specific question about that let me know.
You will then have to merge the two surfaces and save to file in appropriate format. This is unknown territory for me so I can not provide any details about this part.
I am working on one voice messaging application, I need to compare two voice like,
Register with app by record your voice
Sent voice message to
another user by record voice, but first need to compare this voice
to recorded voice in profile.
Its for security purpose and need to know recorded message is from specific user or not.
I tried :
Compare two sound in Android
http://www.dreamincode.net/forums/topic/274280-using-fft-to-compare-two-audio-files-and-then-realtime-comparison/
But not getting idea about voice Comparison.
Please share if anybody know about the same. Didn't find any sample to do this.
Since you indicated it's for security purpose, I'd like to first share a few things on voice biometry :-)
The problem with authenticating someone is that you'd need to be sure he was actually there saying the things that were recorded... and that's a whole different level of complexity than merely comparing voice characteristics.
Algorithms extracting voice features from a sample and later calculating the distance between a new sample and the first one can easily be fooled by a recording made up by an attacker.
Since in your case there's a human recipient, creating a message made up of chopped words or sentences from random conversations is actually quite difficult and time consuming. But not completely impossible...
There are very good sounding softwares created for the music industry that will e.g. take some voice audio input and make it sound (intonation and time wise) like a second audio sample (a guide, made by the fraudster). Vocalign Pro by SynchroArts does this to help get perfect backing vocal tracks. You could further tweak the audio by hand using other voice editing softwares and achieve an acceptable level of quality that wouldn't be immediately detected by the recipient.
Depending on what the attacker wants your user to say, the process complexity could range from an hour to a day provided he has all the recording material he wants...
To fight against this type of attack, you need to detect the audio sample has been edited. The digital edition will leave unnatural traces. E.g. in the background noise surrounding the voice.
AFAICT, only the best commercial softwares achieve this level security check, but I can't tell how far they go in the detection of such edits.
From a pure security perspective, you'd also need to be sure the device was not compromised. So these voice verification check should happen server side and not on the phone itself.
Please note these are general considerations and it all depends on what sort of security measures you actually need for your use case. My car alarm is certainly not unbreakable but it helps raising the bar so fewer attackers could potentially steal it...
Another thing to consider is that biometry is by definition a statistical process and it will yield a certain percentage of false positives and false negatives. By changing the acceptance threshold, you'll be able to lower one of them at the cost of raising the other.
Selecting an appropriate threshold will require you to have a fair amount of test data. Say 1 minute recording of at least 200 speakers to start getting a picture.
One more thing I think you'll need to consider is the inherent variability of the human voice. People may be sick which in some cases might render the voice unrecognizable. Also the emotional state might play a role: sadness or anger will yield different sounding voices...
And last but not least, the surrounding noise might pose a problem. Say the user enrolled while at home and later records a message while on the go in a busy city environment, the system might have troubles making sure it's actually the same person speaking. The signal to noise ratio is definitely going to be one of your main issues. Small tip: depending on the distance of the microphone to the mouth, the ratio will be quite different. You'll get way better result when the user puts the phone close to its face like in a regular phone conversation than when the user looks at the screen while recording the message.
Voice variability and signal to noise ratio are probably the main reasons behind false negative results.
Hopefully, you now have a better understanding of the challenges awaiting you and I can start sharing some pointers for open source and commercial libraries.
AFAIK, there are no open source libraries that includes fraudster detection...
You may want to check Nuance Communication for state-of-the-art. There are plenty other vendors, just check with Google, I only mentioned Nuance because of it's reputation.
There is an OSS library called Alize (written in C++, under LGPL license) which uses an algorithm called MFCC (Mel Frequency Cepstrum Coefficients). MFCC is known to bring excellent results. Expect a steep learning curve as this software is aimed at researchers willing to improve the state-of-the-art on this topic and the vocabulary used is very specific.
I wrote an OSS library named Recognito (Java, Apache 2.0) aimed at regular developers so you should be able to test it in a matter of minutes. The lib is very young and I first focused on it's API before improving the algorithms. The algorithm I use for the moment is called Linear Predictive Coding (LPC) and is known to bring good results (and I do have good results, provided recordings yield the same level of quality :-)). I'm currently in the process of releasing a new version including a likelihood coefficient in the match results. MFCC implementation is on the road map.
There is plenty of javadoc and the code should be very straightforward...
https://github.com/amaurycrickx/recognito
Recognito has a dependency on javax.sound packages for audio file handling. You may want to check this post for what it takes to use it in Android: Voice matching in android
Given many people need something for android, I'll do something about it in the near future instead of saying how one should modify the lib :-)
HTH
I have a macbook and I would like to use it to monitor a nest wireless security camera, including an approximately 1 tb archive of continuously updated video history (perhaps of motion detected clips only). This can be done by subscribing to a nest cloud account, but that can get expensive, especially for several cameras, so I'd rather do it myself.
Can anyone point me to open-source code that will handle this? If not, is there another type of camera that will allow me to do this over wifi?
As promised above, I will update the status of this issue.
After a significant amount of work and also significant progress, I was able to connect to the live nest camera feed programatically but was never able to actually record the live stream into short videos, although this was easy for my MacBook webcam. My belief is that Nest has engineered this feed such that camera owners cannot directly access it, leaving no option but to use their "Nest Aware" monthly service. I do not want to do this as I do not want to pay for it and because I want to create options that Nest Aware does not offer.
Searching the web, it appears that this kind of thing might be done by using another software package, "blue iris". I did not want to get this either as I am sure that flexibility would be sacrificed and also the camera would need to be made publicly shared(!)
So I am giving up on Nest, although I like the hardware.
I did find an alternative. I also had an Arlo Q camera and I tried that, using an open source API on GitHub:
https://github.com/jeffreydwalter/arlo
I was able to access the camera and save motion detected videos to my disk within an hour of finding the above link. So, if you want to do this type of thing, I recommend Arlo over Nest.
What is the best way to minimize recalibration time with the google device? Is it best to have landmarks in the environment to point to in the room? What kinds of tricks can I use to make the most optimal ADF file, in order to calibrate (and re-calibrate) with the fastest possible startup time?
In short, recorded an Area Description File (ADF) with more features.
Also, by re-calibrate, I think you meant relocalization, which is the process of getting Tango to find out the origin for the ADF and shift its position.
A detailed ADF will have more visual features for the system to "look at", so that it will be easier for the system to know where it is within the ADF. When you record an ADF, make sure the device observes all the visual features from all direction. For example, if you are recording a hallway, make sure you walk from both direction, so the device see it from both side. Recording an ADF is a tricky process, I always think it like a painting process, like you have a spray to paint the walls, once the "spray" covered all of the walls, then you have recorded all of the features to the ADF..
I am working on one voice messaging application, I need to compare two voice like,
Register with app by record your voice
Sent voice message to
another user by record voice, but first need to compare this voice
to recorded voice in profile.
Its for security purpose and need to know recorded message is from specific user or not.
I tried :
Compare two sound in Android
http://www.dreamincode.net/forums/topic/274280-using-fft-to-compare-two-audio-files-and-then-realtime-comparison/
But not getting idea about voice Comparison.
Please share if anybody know about the same. Didn't find any sample to do this.
Since you indicated it's for security purpose, I'd like to first share a few things on voice biometry :-)
The problem with authenticating someone is that you'd need to be sure he was actually there saying the things that were recorded... and that's a whole different level of complexity than merely comparing voice characteristics.
Algorithms extracting voice features from a sample and later calculating the distance between a new sample and the first one can easily be fooled by a recording made up by an attacker.
Since in your case there's a human recipient, creating a message made up of chopped words or sentences from random conversations is actually quite difficult and time consuming. But not completely impossible...
There are very good sounding softwares created for the music industry that will e.g. take some voice audio input and make it sound (intonation and time wise) like a second audio sample (a guide, made by the fraudster). Vocalign Pro by SynchroArts does this to help get perfect backing vocal tracks. You could further tweak the audio by hand using other voice editing softwares and achieve an acceptable level of quality that wouldn't be immediately detected by the recipient.
Depending on what the attacker wants your user to say, the process complexity could range from an hour to a day provided he has all the recording material he wants...
To fight against this type of attack, you need to detect the audio sample has been edited. The digital edition will leave unnatural traces. E.g. in the background noise surrounding the voice.
AFAICT, only the best commercial softwares achieve this level security check, but I can't tell how far they go in the detection of such edits.
From a pure security perspective, you'd also need to be sure the device was not compromised. So these voice verification check should happen server side and not on the phone itself.
Please note these are general considerations and it all depends on what sort of security measures you actually need for your use case. My car alarm is certainly not unbreakable but it helps raising the bar so fewer attackers could potentially steal it...
Another thing to consider is that biometry is by definition a statistical process and it will yield a certain percentage of false positives and false negatives. By changing the acceptance threshold, you'll be able to lower one of them at the cost of raising the other.
Selecting an appropriate threshold will require you to have a fair amount of test data. Say 1 minute recording of at least 200 speakers to start getting a picture.
One more thing I think you'll need to consider is the inherent variability of the human voice. People may be sick which in some cases might render the voice unrecognizable. Also the emotional state might play a role: sadness or anger will yield different sounding voices...
And last but not least, the surrounding noise might pose a problem. Say the user enrolled while at home and later records a message while on the go in a busy city environment, the system might have troubles making sure it's actually the same person speaking. The signal to noise ratio is definitely going to be one of your main issues. Small tip: depending on the distance of the microphone to the mouth, the ratio will be quite different. You'll get way better result when the user puts the phone close to its face like in a regular phone conversation than when the user looks at the screen while recording the message.
Voice variability and signal to noise ratio are probably the main reasons behind false negative results.
Hopefully, you now have a better understanding of the challenges awaiting you and I can start sharing some pointers for open source and commercial libraries.
AFAIK, there are no open source libraries that includes fraudster detection...
You may want to check Nuance Communication for state-of-the-art. There are plenty other vendors, just check with Google, I only mentioned Nuance because of it's reputation.
There is an OSS library called Alize (written in C++, under LGPL license) which uses an algorithm called MFCC (Mel Frequency Cepstrum Coefficients). MFCC is known to bring excellent results. Expect a steep learning curve as this software is aimed at researchers willing to improve the state-of-the-art on this topic and the vocabulary used is very specific.
I wrote an OSS library named Recognito (Java, Apache 2.0) aimed at regular developers so you should be able to test it in a matter of minutes. The lib is very young and I first focused on it's API before improving the algorithms. The algorithm I use for the moment is called Linear Predictive Coding (LPC) and is known to bring good results (and I do have good results, provided recordings yield the same level of quality :-)). I'm currently in the process of releasing a new version including a likelihood coefficient in the match results. MFCC implementation is on the road map.
There is plenty of javadoc and the code should be very straightforward...
https://github.com/amaurycrickx/recognito
Recognito has a dependency on javax.sound packages for audio file handling. You may want to check this post for what it takes to use it in Android: Voice matching in android
Given many people need something for android, I'll do something about it in the near future instead of saying how one should modify the lib :-)
HTH