I need some help with 3D-CAD model tracking.
I have to develop an android application that tracks predefined CAD models like parts of a car, e.g. you stand in front of the car with the engine hood opened and the application should tell you what you have to do now (perhabs refill some fluid) with visual support like arrows or cicles. The following youtube video describes my intention https://www.youtube.com/watch?v=4LE_IocFnL0
I have tried this with the Metaio SDK but when I try to transform the CAD models with the MetaioCreator to edge and surface models you cannot recognize any part of the model. I think this is because my models are very detailed (~400.000 polygons each). In addition for test purpose I reduced the polygons to a much lower count (~7.000 polygons), but when I create the edge and surface model an load this models in my test application my test device (Samsung Galaxy Tab S) laggs extremly and its not possible to track the model.
So I would like to ask you if this is the right way because I don't think so.. Perhaps you could give me an advice which tracking method I should get use of.
So far I used the MetaioSDK hybrid 3D tracking witch is a mix from an edgebased and an featurebased trackingmethod. Is there annother method witch is better to reach my goal? I've read about the openCV (witch is available for android too) but i dont now if this is a good method for 3d CAD tracking. Has anyone experience in this kind of augmented reality?
I have the following requirements:
- the framework / toolkit must be running on android
- the tracking should be independent from changing light ratio
- I have to track many different CAD models (the user select one wich shoud now be tracked)
- the user selected CAD model can be more than once in the current viewport and every single must selectable for further rendering operations
- the performance must be well when its running on an wearable device
In addition when there is a group of switches which shall be tracked, is there a possibility to track when the user pressed the marked switch? When I know the exact relative position from all my CAD models is there a possibility to join them together? My intent is that a user tracks model A and by selecting another trackable the device knows the approximately position based on the position from model A and the relative position difference to the new model.
Hope for responses,
lost1994
PS: If something is ambiguous or I didn't explain it cleary please don't be afraid of asking.
I know my answer is maybe a bit late, but anyway:
In general 3d-edge-based tracking is a good choice here. You are using the hybrid version which is good if your AR world wont change (means your car stays on a static position and wont be moved).
The reason for your laggs is that you still have 7000 polygons. That's to much for mobile. Reduce it to 3000 or less (3000 is fine on iPhone6).
Note: Metaio has closed the doors (they have been bought by Apple).
Related
I am trying to create an application like Snapchat that applies face filters while recording the video and saves it with the filter on.
I know there are packages like AR core and flutter_camera_ml_vision but these are not helping me.
I want to provide face filters and apply them at the time of video recording on the face, and also save the video with the filter on the face.
Not The Easiest Question To Answer, But...
I'll give it a go, let's see how things turn out.
First of all, you should fill in some more details about the statements given in the question, especially what you're trying to say here:
I know there are packages like AR core and flutter_camera_ml_vision but these are not helping me.
How did you approach the problem and what makes you say that it didn't help you?
In the Beginning...
First of all, let's get some needed basics out of the way to better understand your current situation and level in the prerequisite areas of knowledge:
Do you have any experience using Computer Vision & Machine Learning frameworks in other languages / in other apps?
Do you have the required math skills needed to use this technology?
As you're using Flutter, my guess is that cross-platform compatibility is high priority, have you done much Flutter programming before and what devices are your main targets?
So, What is required for creating a Snapchat-like filter for use in live video recording?
Well, quite a lot of work happens behind the scenes when you apply a filter to live video using any app that implements this in a decent way.
Snapchat uses in-house software that they've built up over years, using technology acquired from multiple multi-million dollar company acquisitions, often established companies that specialized in Computer Vision and AR technology, in addition to their own efforts, and has steadily grown to be quite impressive through the last 5-6 years in particular.
This isn't something you can throw together by yourself as an "all night'er" and expect good results. But there are tools available for easing the general learning curve, but these tools also require a firm understanding of the underlying concepts and technologies being used, and quite a lot of math.
The Technical Detour
OK, I know I may have went a bit overboard here, but this is fundamental building blocks, not so many are aware of the actual amount of computation needed for seemingly "basic" functionality, so please, TLDR; or not, this is fundamental stuff.
To create a good filter for live capture using a camera on something like an iPhone or Android device, you could, and most probably would, use AR as you mentioned you wanted to use in the end, but realize that this is a sub-set of the broad field of Computer Vision (CV) that uses various algorithms from Artificial Intelligence (AI) and Machine Learning (ML) for the main tasks of:
Facial Recognition
Given frames of video content from the live camera, define the area containing a human face (some also works with animals, but let's keep it as simple as possible) and output a rectangle suitable for use as a starting point in (x, y, for width & height).
The analysis phase alone will require a rather complex combination of algorithms / techniques from different parts of the AI universe, and this being video, not a single static image file, this must be continuously updated as the person / camera moves, so it must be done in close to real-time, in the millisecond range.
I believe different implementations combining HOG (Histogram of Oriented Gradients) from Computer Vision and SVMs (Support Vector Machines / Networks) from Machine Learning are still pretty common.
Detection of Facial Landmarks
This is what will define how well a certain effect / filter will adapt to different types of facial features and detect accessories like glasses, hats etc. Also called "facial keypoint detection", "facial feature detection" and other variants in different literature on the subject.
Head Pose Estimation
Once you know a few landmark points, you can also estimate the pose of the head.
This is an important part of effects like "face swap" to correctly re-align one face with another in an acceptable manner. A toolkit like OpenFace (Uses Python, OpenCV, OpenBLAS, Dlib ++) contains a lot of useful functionality, capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation, delivering pretty decent results.
The Compositing of Effects into the Video Frames
After the work with the above is done, the rest involves applying the target filter, dog ears, rabbit teeth, whatever to the video frames, using compositing techniques.
As this answer is starting to look more like an article, I'll leave it to you to go figure out if you want to know more of the details in this part of the process.
Hey, Dude. I asked for AR in Flutter, remember?
Yep.
I know, I can get a bit carried away.
Well, my point is that it takes a lot more than one would usually imagine creating something like you ask for.
BUT.
My best advice if Flutter is your tool of choice would be to learn how to use the Cloud-Based ML services from Google's Firebase suite of tools, Firebase Machine Learning and Google's MLKit.
Add to this some AR-specific plugins, like the ARCore Plugin, and I'm sure you'll be able to get the pieces together if you have the right background and attitude, plus a good amount of aptitude for learning.
Hope this wasn't digressing too far from your core question, but there are no shortcuts that I know of that cut more corners than what I've already mentioned.
You could absolutely use the flutter_camera_ml_vision plugin and it's face recognition which will give you positions for landmarks of a face, such as, nose, eyes etc. Then simply stack the CameraPreview with a CustomPaint(foregroundPainter: widget in which you draw your filters using the different landmarks as coordinates for i.e. glasses, beards or whatever you want at the correct position of the face in the camera preview.
Google ML Kit also has face recognition that produces landmarks and you could write your own flutter plugin for that.
You can capture frames from the live camera preview and reformat them and then it as a byte buffer to ML kit or ML vision. I am currently writing a flutter plugin for ML kit pose detection with live capture so if you have any specific question about that let me know.
You will then have to merge the two surfaces and save to file in appropriate format. This is unknown territory for me so I can not provide any details about this part.
I want to implement 3D touches in android,just like the 3d touches in the Iphone 6S and 6S plus.
I looked around in google and couldn't find any consistent material.
I could only find an example in Lua language and i am not sure yet if it's exactly what i am looking for.
So i thought may be if there is no libraries out there, then i should implement the algorithm from scratch, or maybe create a library for it.
But i don't know where to start ? do you guys have any clue ?
I believe you could implement something similar using MotionEvent, it has a getPressure() method that is supposed to return a value between 0 and 1 representing the amount of pressure on a screen. You could then do something different depending on the amount of pressure detected.
Note that some devices do not support this feature, and some (notably the Samsung Galaxy S3) will return inconsistent values.
I don't think it is possible on currently available Android devices. 3D touch is hardware technology embedded in displays in iPhones. I don't think you can implement this just writing some code in your Android application.
Short answer - no.
You need to wait for Google to actually copy the technology if it proves to be useful. But I doubt it'll happen in near future. This is because Android is all about accessibility and these screens will be quite expensive.
Long answer - Android is open source. If you are making something internal then go on, it'll allow you to do that with some modifications. Build a device, put in your modified code, create your own application that takes advantage of the feature and be happy to announce it to the world.
-- Background:
We are working on a device called Run-n-Read which tracks a user's head movements and translates it to the appropriate text movement on the screen. The use is to help a person read while running on a treadmill or riding in a moving vehicle. You can check a small video on http://weartrons.com.
We have created a small device which contains accelerometer, a micro-controller and bluetooth to send the head location in real time to the tablet every ~17ms to match with the 60fps of display. We used Processing IDE to create a basic app with downloaded book pages to test the prototype.
-- PROBLEM:
We would like to run our app in the background and dynamically change the display coordinates of any other app contents on the screen, whether it's an eBook or twitter etc. Basically our algorithms are running on our external device and sending the display coordinates (in pixels to move up-down left-right) at about 60 times per second. We would like the Android display origin to move by that many pixels during every frame rendering.
I am an electronics engineer and it's my first stab at writing any piece of software, so please let me know if I was not clear or the answer is too obvious.
Android as OS makes sure applications are encapsulated and oblivious from each other. All inter-app communication is done through what is called intents which are in the end messages. And you have to know exactly the other apps declared intents and on top of that you have no assurances that all apps implemented the kind of feature you are requesting.
Therefore I don't think what you want to do (the coordinates change) is possible at all without tinkering with the OS source code and compiling your own version of Android.
Last week i have chosen my major project. It is a vision based system to monitor cyclists in time trial events passing certain points on the course. It should detect the bright yellow race number on a cyclist's back and extract the number from it, and besides record the time.
I done some research about it and i decided to use Tesseract Android Tools by Robert Theis called Tess Two. To speed up the process of recognizing the text i want to use a fact that the number is mend to be extracted from bright (yellow) rectangle on the cyclist back and to focus the actual OCR only on it. I have not found any piece of code or any ideas how to detect the geometric figures with specific color. Thank you for any help. And sorry if i made any mistakes I am pretty new on this website.
Where are the images coming from? I ask because I was asked to provide some technical help for the design of a similar application (we were working with footballer's shirts) and I can tell you that you'll have a few problems:
Use a high quality video feed rather than rely on a couple of digital camera images.
The number will almost certainly be 'curved' or distorted because of the movement of the rider and being able to use a series of images will sometimes allow you to work out what number it really is based on a series of 'false reads'
Train for the font you're using but also apply as much logic as you can (if the numbers are always two digits and never start with '9', use this information to help you get the right number
If you have the luxury of being able to position the camera (we didn't!), I would have thought your ideal spot would be above the rider and looking slightly forward so you can capture their back with the minimum of distortions.
We found that merging several still-frames from the video into one image gave us the best overall image of the number - however, the technology that was used for this was developed by a third-party and they do not want to release it, I'm afraid :(
Good luck!
In an Android app I'm making, I would like to detect when a user is holding a phone in his hand, makes a gesture like he would when throwing a frissbee. I have seen a couple of apps implementing this, but I can't find any example code or tutorial on the web.
It would be great with some thoughts on how this could be done, and ofc.
It would be even better with some example code or link to a tutorial.
Accelerometer provides you with a stream of 3d vectors. In case your phone is help in hand, its direction is opposite of earth gravity pull and size is the same. (this way you can determine phone orientation)
If user lets if fall, vector value will go to 0 (the process as weighlessness on space station)
If user makes some gesture without throwing it, directon will shift, and amplitude will rise, then fall and then rise again (when user stops movement). To determine how it looks like, you can do some research by recording accelerometer data and performing desireg gestures.
Keep in mind, that accelerometer is pretty noisy - you will have to do some averaging over nearby values to get meaningful results.
I think that one workable approach to match gesture would be invariant moments (like Hu moments used to image recognition) - accelerometer vector over time defines 4 dimensional space, and you will need set of scaling / rotation invariant moments. Designing such set is not easy, but comptuing is not complicated.
After you got your moments, you may use standart techniques of matching vectors to clusters. ( see "moments" and "cluster" modules from our javaocr project: http://javaocr.svn.sourceforge.net/viewvc/javaocr/trunk/plugins/ )
PS: you may get away with just speed over time, which produces 2-Dimensional space and can be analysed with javaocr on the spot.
Not exactly what you are looking for:
Store orientation to an array - and compare
Tracking orientation works well. Perhaps you can do something similar with the accelerometer data (without any integration).
A similar question is Drawing in air with Android phone.
I am curious what other answers you will get.