Before I make a giant word dump my effective question is this:
Can I supply some extra information / heuristics to ARCore to refine it's idea of what the pose of a detected augment image is? Or can I use the pose of other trackable objects to refine the the pose of a detected augment image?
For more info, here is some background information on my workflow:
My AR app revolves around overlaying various 3D CAD models on top of their real-world machine equivalents. The user interaction goes like this:
The user will adhere a QR code (sized .2 meters by .2 meters) to a predetermined location on the associated machine (location is specific to the type of machine).
The user will then load up the app, point the camera at the QR code and the app will pass the camera image to a QR code reading library and use the payload (an id for a specific machine) to retrieve the associated CAD Model & metadata.
Once the QR code is detected I can use the QR code reading library to construct a pristine image of the QR code and pass this image to ARCore so that it can detect it in 3D space from the camera.
Once the QR code is detected in 3D space, I attach an anchor and I use the knowledge of where the QR code should be placed on the given model (also retrieved from my database using the payload info) to determine a basis for my CAD model.
Information can be overlayed using the CAD model to show various operations / interactions.
Now I've got all this working pretty well but I've run into some issues where the model is never quite positioned exactly to the real-world equivalent and requires some manual positional adjustment after the fact to get things just right. I have some ideas for how to resolve this but I don't know how feasible any of these ideas are:
ArTrackable_acquireNewAnchor allows you to specify multiple anchors per trackable with different poses. I assume this will refine the tracking the object but I'm not clear as to how to use this API. I'm currently just passing the pose generated from ArAugmentedImage_getCenterPose so I don't know what other poses I would pass.
If I promote my QR Code anchor to a cloud anchor after detection will that aid in detecting / refining the qr pose detection in the future?
If I try and match other features detected by ARCore (like planes) to known topology in the real environment (like floors / walls) could better approximate the position of the QR code image or provide some heuristic to ARCore so that it can?
Instead of using a single QR code image what if I use a set of images (one QR code, and two other static images) that are slightly offset from each other. If we know how far apart these images are in the real world we can use this information to correct for the error in ARCore's estimation of where they are.
Sorry for the giant word dump, but I figured the more info the better. Any other ideas outside the framing of my question are also appreciated.
Related
I have no experience in augmented reality nor image processing. And I know there are lots of document in the internet but to look for right places I should know basic stuff at first. I'm planning to code an android app which will use augmented reality for virtual fitting room. And I have determined some functionalities of app. My question is how could i manage to do those functionalities, which topics should i look into, where to start, which key functionalities app should achieve and which open-source sdk you would suggest. So I can do deeper researches
-- Virtualizing clothes which will be provided by me and make them usable for app
-- Which attributes should virtualized clothes have and how to store them
-- Scan real-life clothes, virtualize them and make usable for app
-- Tracking human who will try on those clothes
-- Human body size can change so clothes which will fit on them should also resized for each person
-- Clothes should be looked as realistic as possible
-- Whenever a person moves, clothes should also move with that person (person bends, clothes also bends and fits on that person). And it should be quick as possible as it gets.
Have you tried Snapchat's face filters?
It's essentially the same problem. They need to:
Create a model of a face (where are the eyes, nose, mouth, chin, etc)
Create a texture to map onto the model of the face
Extract faces from an image/video and map the 2D coordinates from the image to the model of the face you've defined
Draw/Render the texture on top of the image/video feed
Now you'd have to do the same, but instead you'd do it for a human body.
The issues that you'd have to deal with is the fact that only "half" of your body would be visible to your camera at any time (because the other half is facing away from the camera). Also your textures would have to map to a 3D model, vs a relatively 2D model of a face (facial features are mostly on a flat plane which is a good enough estimation).
Good luck!
I am new to image processing. I have a data set of images and I want to perform calibration on those images based on a target image. I have surfed a lot on image calibration but the majority of the results yield camera calibration. I am confused as to whether these are same or different things. Can anybody explain to me the difference between these two terms?
On reading through one of the results on image calibration, I got to know that there are three steps that I need to perform:
Bias Frame Calibration
Dark Frame Calibration
Flat Field Frame Calibration
Also, I need to perform this in Android. For that, I have figured out that I will need to use OpenCV or JavaCV.
So, I want to know if these 3 steps will be possible using OpenCV/JavaCV or not?
Calibration is process that is exploiting some knowledge about the data to reconstruct measurements to be more accurate or suite a specific need. As we have no idea what is the desired result of your calibration then it is hard to say.
In general the difference is as follows:
Camera calibration
you got camera and want to achieve that captured images will suffice some condition. This process usually mean taking image of some predefined objects like color markers, geometry checker board, LASER sweeps, etc. This way you can obtain camera parameters needed to reconstruct some specific feature of image for any other image taken (assuming important parameters not change with time like camera position or exposure time ...)
Image calibration
Is similar but the input image can be obtained from different sources (different cameras, render, simulation, etc. ) or under different circumstances (exposure, lighting, etc.). In this case we have not the luxury of calibration process so instead we need to find some kind of know feature in the images and correct the rest of image (for example object of known size, color, temperature, etc.)
So the difference is The Camera calibration is when you got single imaging device as a source of image and Image calibration is when you got multiple image sources (often unknown).
I am not using OpenCV but as people using this lib for such tasks then it should have support for operations like this.
Here small example of such operation:
OpenCV Birdseye view without loss of data
It seems I've found myself in the deep weeds of the Google Vision API for barcode scanning. Perhaps my mind is a bit fried after looking at all sorts of alternative libraries (ZBar, ZXing, and even some for-cost third party implementations), but I'm having some difficulty finding any information on where I can implement some sort of scan region limiting.
The use case is a pretty simple one: if I'm a user pointing my phone at a box with multiple barcodes of the same type (think shipping labels here), I want to explicitly point some little viewfinder or alignment straight-edge on the screen at exactly the thing I'm trying to capture, without having to worry about anything outside that area of interest giving me some scan results I don't want.
The above case is handled in most other Android libraries I've seen, taking in either a Rect with relative or absolute coordinates, and this is also a part of iOS' AVCapture metadata results system (it uses a relative CGRect, but really the same concept).
I've dug pretty deep into the sample app for the barcode-reader
here, but the implementation is a tad opaque to get anything but the high level implementation details down.
It seems an ugly patch to, on successful detection of a barcode anywhere within the camera's preview frame, to simple no-op on barcodes outside of an area of interest, since the device is still working hard to compute those frames.
Am I missing something very simple and obvious on this one? Any ideas on a way to implement this cleanly, otherwise?
Many thanks for your time in reading through this!
The API currently does not have an option to limit the detection area. But you could crop the preview image before it gets passed into the barcode detector. See here for an outline of how to wrap a detector with your own class:
Mobile Vision API - concatenate new detector object to continue frame processing
You'd implement the "detect" method to take the frame received from the camera, create a cropped version of the frame, and pass that through to the underlying detector.
I want to use QR code to get the smart phone's location (either UTM or Lat/Lon). Reading this article, it looks like it is possible to get the position of the smart phone. In addition, I want to render some 3D models on the camera screen. Is it possible? Actually I have no clue from where should I start.
Can anyone help me out regarding this?
Thanks.
If you read that article carefully, all it suggests to get the location of the phone, is to simply encode the lat/lon in the QR code itself. This will only work if the location of the displayed QR codes are fixed (e.g. a sticker on a wall rather than printed on a flyer).
Is it possible to render 3D models on a camera screen? Sure. It wouldn't be the default camera app, you'd have to make your own. It would involve a fair bit of math if you wanted to position the 3D model relative to the QR code. You'd probably try to build planes based on the sides of the squares.
i just need some guide on how to detect a marker and make an output text.. for ex: a marker with an image of a dog , when detected, i have an output text "DOG" in a textfield .. can someone help me with my idea? oh, btw which one is more effective to use nyartoolkit or andar for my idea?thanks:) need help..!
What you're looking for isn't augmented reality, it's object recognition. AR is chiefly concerned with presenting data overlaid on the the real world, so computation is devoted each frame to determining the position relative to the camera of the object. If you don't intent to use this data, AR libraries may be an inefficient. That said...
AR marker tracking libraries usually find markers by prominent features like corners, and can distinguish markers by binary patters encoded inside the marker, or in the marker's borders. If you're happy with having the "dog" part encoded in the border of a marker, there are libraries you can use like Qualcomm's AR development kit. This library, and Metaio's Unifeye mobile can also do natural feature tracking on pre-defined images. If you're happy with being able to recognize one specific image or images of dogs that you have defined in advance, either of these should be ok. You might have to manipulate your dog images to get good features they can identify and track. Natural objects can be problematic.
General object recognition (being able to recognize a picture of any dog, not known beforehand) is still a research topic. There are approaches, but they're mostly very computationally intensive, and most mobile solutions involve offloading the serious computation to a server. Recognition of simple outline sketches however is more tractable, there's a great paper called "Shape recognition and pose estimation for mobile augmented reality" (I can't find a copy online, but the IEEE link is here) that uses contours to identify objects - this is light enough to run on a mobile (and it's pure genius).