I have no experience in augmented reality nor image processing. And I know there are lots of document in the internet but to look for right places I should know basic stuff at first. I'm planning to code an android app which will use augmented reality for virtual fitting room. And I have determined some functionalities of app. My question is how could i manage to do those functionalities, which topics should i look into, where to start, which key functionalities app should achieve and which open-source sdk you would suggest. So I can do deeper researches
-- Virtualizing clothes which will be provided by me and make them usable for app
-- Which attributes should virtualized clothes have and how to store them
-- Scan real-life clothes, virtualize them and make usable for app
-- Tracking human who will try on those clothes
-- Human body size can change so clothes which will fit on them should also resized for each person
-- Clothes should be looked as realistic as possible
-- Whenever a person moves, clothes should also move with that person (person bends, clothes also bends and fits on that person). And it should be quick as possible as it gets.
Have you tried Snapchat's face filters?
It's essentially the same problem. They need to:
Create a model of a face (where are the eyes, nose, mouth, chin, etc)
Create a texture to map onto the model of the face
Extract faces from an image/video and map the 2D coordinates from the image to the model of the face you've defined
Draw/Render the texture on top of the image/video feed
Now you'd have to do the same, but instead you'd do it for a human body.
The issues that you'd have to deal with is the fact that only "half" of your body would be visible to your camera at any time (because the other half is facing away from the camera). Also your textures would have to map to a 3D model, vs a relatively 2D model of a face (facial features are mostly on a flat plane which is a good enough estimation).
Good luck!
Related
Before I make a giant word dump my effective question is this:
Can I supply some extra information / heuristics to ARCore to refine it's idea of what the pose of a detected augment image is? Or can I use the pose of other trackable objects to refine the the pose of a detected augment image?
For more info, here is some background information on my workflow:
My AR app revolves around overlaying various 3D CAD models on top of their real-world machine equivalents. The user interaction goes like this:
The user will adhere a QR code (sized .2 meters by .2 meters) to a predetermined location on the associated machine (location is specific to the type of machine).
The user will then load up the app, point the camera at the QR code and the app will pass the camera image to a QR code reading library and use the payload (an id for a specific machine) to retrieve the associated CAD Model & metadata.
Once the QR code is detected I can use the QR code reading library to construct a pristine image of the QR code and pass this image to ARCore so that it can detect it in 3D space from the camera.
Once the QR code is detected in 3D space, I attach an anchor and I use the knowledge of where the QR code should be placed on the given model (also retrieved from my database using the payload info) to determine a basis for my CAD model.
Information can be overlayed using the CAD model to show various operations / interactions.
Now I've got all this working pretty well but I've run into some issues where the model is never quite positioned exactly to the real-world equivalent and requires some manual positional adjustment after the fact to get things just right. I have some ideas for how to resolve this but I don't know how feasible any of these ideas are:
ArTrackable_acquireNewAnchor allows you to specify multiple anchors per trackable with different poses. I assume this will refine the tracking the object but I'm not clear as to how to use this API. I'm currently just passing the pose generated from ArAugmentedImage_getCenterPose so I don't know what other poses I would pass.
If I promote my QR Code anchor to a cloud anchor after detection will that aid in detecting / refining the qr pose detection in the future?
If I try and match other features detected by ARCore (like planes) to known topology in the real environment (like floors / walls) could better approximate the position of the QR code image or provide some heuristic to ARCore so that it can?
Instead of using a single QR code image what if I use a set of images (one QR code, and two other static images) that are slightly offset from each other. If we know how far apart these images are in the real world we can use this information to correct for the error in ARCore's estimation of where they are.
Sorry for the giant word dump, but I figured the more info the better. Any other ideas outside the framing of my question are also appreciated.
I am developing an Android app in which I want to track a 2D image/a piece of paper, analyze what the user write/draw on it, and correctly display different 3D contents on it.
I am working on the tracking and displaying simple 3D contents part, which can actually be achieved using SDKs like Vuforia and Wikitude. However, I am not using them for several reasons.
There are other analysis on the image to be done, e.g. drawings analysis.
The image may not be as rich in features, e.g. paper with lines or some figures.
SDKs like Vuforia may not expose some underlying functionalities like feature detection etc. to developers.
Anyway, right now I only want to achieve the following result.
I have a piece of paper, probably with lines and figures on it. You can think of it as the kind of paper for children to practice writing or drawing on. Example: https://i.pinimg.com/236x/89/3a/80/893a80336adab4120ff197010cd7f6a1--dr-seuss-crafts-notebook-paper.jpg
I point my phone (the camera) at the paper while capturing the video frames.
I want to register the paper, track it and display a simple wire-frame cube on it.
I have been messing around with OpenCV, and have tried the following approaches.
Using homography:
Detect features in the 2D image (ORB, FAST etc.).
Describe the features (ORB).
Do the same in each video frame.
Match the features and find good matches.
Find the homography, use the homography and successfully draw a rectangle around the image in the video frame.
Did not know how to use the homography decomposition (into rotations, translations and normals) to display a 3D object like a cube.
Using solvePnP:
1 to 4 are the same as the above.
Convert all 2D good match points in the image to 3D by assuming the image lies on the world's x-y plane, thus all having z = 0.
Use solvePnP with those 3D points and 2D points in the current frame to retrieve the rotation and translation vectors, and further convert it to the projection matrix using Rodrigues() in OpenCV.
Construct the 3D points of a cube.
Project them into the 2D image using the projection and the camera matrix.
The issue is the cube is jumping around, which I believe is due to the feature detection and mapping not being stable and accurate, thus affecting solvePnP.
Using contours or corners:
I simply grayscale the camera frame, Gaussian-smooth it, dilate or erode it and try to find the biggest 4-edge contour so that I can track it using solvePnP etc. This, unsurprisingly, doesn't give good results, or I'm just doing it wrong.
So my questions are:
How can I solve the two bold problems mentioned above.
More generally, given the type of image target I want to track, what would be the optimal algorithm/solution/technique to track it?
What are the things that I can improve/change in my way of solving the problem?
Thank you very much.
I have a client who works on styling cars, he needs an app that lets the user take several pictures of their car and render 1 3d image he can use to look around the car. Is there any way to do this? I have been searching for methods but can't find a solution.
I'm working on a project to recognize insects from user inputted images. I think that OpenCV is the route I'd like to take since I've worked with it before for facial recognition. I'm not using the camera feed and am instead using images provided by the user. For early development I plan to build in some sample images to ensure the concept is working before moving on to other features.
I would like to use 4-5 template images for each insect and have that be robust enough to detect the insect from the input image. If there are multiple insects I would like for them all to be detected and have their own rectangle drawn around them.
With that brief explanation, I am wondering what the best way to complete this task is. I know that OpenCV has template recognition, but the template size matters and I don't want to make the user ensure their insect is a certain amount of pixels in their image. Is there a way to work around this, possibly by rotating the template images or using variously sized templates? Or is there a better approach than template recognition for this project?
Unfortunately without some form of constraints, you are essentially asking if computer vision has been solved! You have several unresolved, but very interesting research problems.
Lets reduce the problem to just classifying a sample insect in a fixed pose with controlled lighting as of belonging to one of 100k insects categories; that would be tough.
Lets reduce the problem to recognizing a single insect instance in an arbitrary pose in 3d space; that would be tough.
Lets reduce the problem to recognizing a single insect instance in the same pose under arbitrary lighting conditions viewed with arbitrary optical sensors, that would be tough.
Successful computer-vision in the wild, is all about cleverly constraining the operating conditions, otherwise you are in research land. If your are in research land, then a cool thing to do is to try and exploit 3D CAD models to capture the huge variety in poses, here's a nice one on recognizing chairs,
http://www.di.ens.fr/willow/research/seeing3Dchairs/,
If not conducting research and, say your building a app, then you need to consider how you can guide the user, train the user, trick the user, into providing the best operating conditions for the recognition system.
(This was to big to put in comments)
i just need some guide on how to detect a marker and make an output text.. for ex: a marker with an image of a dog , when detected, i have an output text "DOG" in a textfield .. can someone help me with my idea? oh, btw which one is more effective to use nyartoolkit or andar for my idea?thanks:) need help..!
What you're looking for isn't augmented reality, it's object recognition. AR is chiefly concerned with presenting data overlaid on the the real world, so computation is devoted each frame to determining the position relative to the camera of the object. If you don't intent to use this data, AR libraries may be an inefficient. That said...
AR marker tracking libraries usually find markers by prominent features like corners, and can distinguish markers by binary patters encoded inside the marker, or in the marker's borders. If you're happy with having the "dog" part encoded in the border of a marker, there are libraries you can use like Qualcomm's AR development kit. This library, and Metaio's Unifeye mobile can also do natural feature tracking on pre-defined images. If you're happy with being able to recognize one specific image or images of dogs that you have defined in advance, either of these should be ok. You might have to manipulate your dog images to get good features they can identify and track. Natural objects can be problematic.
General object recognition (being able to recognize a picture of any dog, not known beforehand) is still a research topic. There are approaches, but they're mostly very computationally intensive, and most mobile solutions involve offloading the serious computation to a server. Recognition of simple outline sketches however is more tractable, there's a great paper called "Shape recognition and pose estimation for mobile augmented reality" (I can't find a copy online, but the IEEE link is here) that uses contours to identify objects - this is light enough to run on a mobile (and it's pure genius).