Feature detection and tracking on 2D images for displaying AR contents

Feature detection and tracking on 2D images for displaying AR contents - android

I am developing an Android app in which I want to track a 2D image/a piece of paper, analyze what the user write/draw on it, and correctly display different 3D contents on it.
I am working on the tracking and displaying simple 3D contents part, which can actually be achieved using SDKs like Vuforia and Wikitude. However, I am not using them for several reasons.
There are other analysis on the image to be done, e.g. drawings analysis.
The image may not be as rich in features, e.g. paper with lines or some figures.
SDKs like Vuforia may not expose some underlying functionalities like feature detection etc. to developers.
Anyway, right now I only want to achieve the following result.
I have a piece of paper, probably with lines and figures on it. You can think of it as the kind of paper for children to practice writing or drawing on. Example: https://i.pinimg.com/236x/89/3a/80/893a80336adab4120ff197010cd7f6a1--dr-seuss-crafts-notebook-paper.jpg
I point my phone (the camera) at the paper while capturing the video frames.
I want to register the paper, track it and display a simple wire-frame cube on it.
I have been messing around with OpenCV, and have tried the following approaches.
Using homography:
Detect features in the 2D image (ORB, FAST etc.).
Describe the features (ORB).
Do the same in each video frame.
Match the features and find good matches.
Find the homography, use the homography and successfully draw a rectangle around the image in the video frame.
Did not know how to use the homography decomposition (into rotations, translations and normals) to display a 3D object like a cube.
Using solvePnP:
1 to 4 are the same as the above.
Convert all 2D good match points in the image to 3D by assuming the image lies on the world's x-y plane, thus all having z = 0.
Use solvePnP with those 3D points and 2D points in the current frame to retrieve the rotation and translation vectors, and further convert it to the projection matrix using Rodrigues() in OpenCV.
Construct the 3D points of a cube.
Project them into the 2D image using the projection and the camera matrix.
The issue is the cube is jumping around, which I believe is due to the feature detection and mapping not being stable and accurate, thus affecting solvePnP.
Using contours or corners:
I simply grayscale the camera frame, Gaussian-smooth it, dilate or erode it and try to find the biggest 4-edge contour so that I can track it using solvePnP etc. This, unsurprisingly, doesn't give good results, or I'm just doing it wrong.
So my questions are:
How can I solve the two bold problems mentioned above.
More generally, given the type of image target I want to track, what would be the optimal algorithm/solution/technique to track it?
What are the things that I can improve/change in my way of solving the problem?
Thank you very much.

Related

Augmented Reality For Fitting Room Where to start?

I have no experience in augmented reality nor image processing. And I know there are lots of document in the internet but to look for right places I should know basic stuff at first. I'm planning to code an android app which will use augmented reality for virtual fitting room. And I have determined some functionalities of app. My question is how could i manage to do those functionalities, which topics should i look into, where to start, which key functionalities app should achieve and which open-source sdk you would suggest. So I can do deeper researches
-- Virtualizing clothes which will be provided by me and make them usable for app
-- Which attributes should virtualized clothes have and how to store them
-- Scan real-life clothes, virtualize them and make usable for app
-- Tracking human who will try on those clothes
-- Human body size can change so clothes which will fit on them should also resized for each person
-- Clothes should be looked as realistic as possible
-- Whenever a person moves, clothes should also move with that person (person bends, clothes also bends and fits on that person). And it should be quick as possible as it gets.

Have you tried Snapchat's face filters?
It's essentially the same problem. They need to:
Create a model of a face (where are the eyes, nose, mouth, chin, etc)
Create a texture to map onto the model of the face
Extract faces from an image/video and map the 2D coordinates from the image to the model of the face you've defined
Draw/Render the texture on top of the image/video feed
Now you'd have to do the same, but instead you'd do it for a human body.
The issues that you'd have to deal with is the fact that only "half" of your body would be visible to your camera at any time (because the other half is facing away from the camera). Also your textures would have to map to a 3D model, vs a relatively 2D model of a face (facial features are mostly on a flat plane which is a good enough estimation).
Good luck!

Extract pointclouds WITH colour using the Project Tango; i.e. getting the current camera frame

I am trying to produce a point cloud where each point has a colour. I can get just the point cloud or I can get the camera to take a picture, but I need them to be as simultaneous as possible. If I could look up an RGB image with a timestamp or call a function to get the current frame when onXYZijAvailable() is called I would be done. I could just go over the points, find out where it would intersect with the image plane and get the colour of that pixel.
As it is now I have not found any way to get the pixel info of an image or get coloured points. I have seen AR apps where the camera is connected to the CameraView and then things are rendered on top, but the camera stream is never touched by the application.
According to this post it should be possible to get the data I want and synchronize the point cloud and the image plane by a simple transformation. This post is also saying something similar. However, I have no idea how to get the RGB data. I cant find any open source projects or tutorials.
The closest I have gotten is finding out when a frame is ready by using this:
public void onFrameAvailable(final int cameraId) {
if (cameraId == TangoCameraIntrinsics.TANGO_CAMERA_COLOR) {
//Get the new rgb frame somehow.
}
}
I am working with the Java API and I would very much like to not delve into JNI and the NDK if at all possible. How can I get the frame that most closely matches the timestamp of my current point cloud?
Thank you for your help.
Update:
I implemented a CPU version of it and even after optimising it a bit I only managed to get .5 FPS on a small point cloud. This is also due to the fact that the colours have to be converted from the android native NV21 colour space to the GPU native RGBA colour space. I could have optimized it further, but I am not going to get a real time effect with this. The CPU on the android device simply can not perform well enough. If you want to do this on more than a few thousand points, go for the extra hassle of using the GPU or do it in post.

Tango normally delivers color pixel data directly to an OpenGLES texture. In Java, you create the destination texture and register it with Tango.connectTextureId(), then in the onFrameAvailable() callback you update the texture with Tango.updateTexture(). Once you have the color image in a texture, you can access it using OpenGLES drawing calls and shaders.
If your goal is to color a Tango point cloud, the most efficient way to do this is in the GPU. That is, instead of pulling the color image out of the GPU and accessing it in Java, you instead pass the point data into the GPU and use OpenGLES shaders to transform the 3D points into 2D texture coordinates and look up the colors from the texture. This is rather tricky to get right if you're doing it for the first time but may be required for acceptable performance.
If you really want direct access to pixel data without using the C API,
you need to render the texture into a buffer and then read the color data from the buffer. It's kind of tricky if you aren't used to OpenGL and writing shaders, but there is an Android Studio app that demonstrates that here, and is further described in this answer. This project demonstrates both how to draw the camera texture to the screen, and how to draw to an offscreen buffer and read RGBA pixels.
If you really want direct access to pixel data but decide that the NDK might be less painful than OpenGLES, the C API has TangoService_connectOnFrameAvailable() which gives you pixel data directly, i.e. without going through OpenGLES. Note, however, that the format of the pixel data is NV21, not RGB or RGBA.

I am doing this now by capturing depth with onXYZijAvailable() and images with onFrameAvailable(). I am using native code, but the same should work in Java. For every onFrameAvailable() I get the image data and put it in a preallocated ring buffer. I have 10 slots and a counter/pointer. Each new image increments the counter, which loops back from 9 to 0. The counter is an index into an array of images. I save the image timestamp in a similar ring buffer. When I get a depth image, onXYZijAvailable(), I grab the data and the timestamp. Then I go back through the images, starting with the most recent and moving backwards, until I find the one with the closest timestamp to the depth data. As you mentioned, you know that the image data will not be from the same frame as the depth data because they use the same camera. But, using these two calls (in JNI) I get within +/- 33msec, i.e. the previous or next frame, on a consistent basis.
I have not checked how close it would be to just naively use the most recently updated rgb image frame, but that should be pretty close.
Just make sure to use the onXYZijAvailable() to drive the timing, because depth updates more slowly than rgb.
I have found that writing individual images to the file system using OpenCV::imwrite() does not keep up with the real time of the camera. I have not tried streaming to a file using the video codec. That should be much faster. Depending on what you plan to do with the data in the end you will need to be careful how you store your results.

Shape Recognition

I need to implement a simple Android application that allows users to draw a "simple" shape (circle, triangle etc) on their phone and then ask a server if the drawn shape matches one of the shapes in its database, which consists of a low number of shapes (let's say < 100, but can be more). In order to make this application work, I was thinking to use the following steps (we assume that the input image consists only of black & white pixels);
A. re-size & crop the input image in order to bring it to the same scale as the ones in the DB
B. rotate the input image by a small angle (let's say 15 degrees) x times (24 in this case) and try to match each of these rotations against each shape in the DB.
Questions:
For A, what would be the best approach? I was thinking to implement this step in the Android application, before sending the data to the server.
For B, what would be a decent algorithm of comparing 2 black & white pixel images that contain only a shape?
Is there any better / simpler way of implementing this? A solution that also has an implementation is desirable.
PS: I can see that many people have discussed similar topics around here, but I can't seem to find something that matches my requirements well enough.

Machine learning approach
You choose some features which describe contours, choose some classification method, prepare a training set of tagged contours, train the classifier, use it in the program.
Contour features. Given a contour(detected in the image or constructed from the user input), you can calculate rotation-invariant moments. The oldest and the most well known is a set of Hu moments.
You can also consider such features of the contour as eccentricity, area, convexity defects, FFT transform of the centroid distance function and many others.
Classifiers. Now you need to train a classifier. Support Vector Machines, Neural Networks, decision trees, Bayes classifiers are some of the popular methods. There are many methods to choose from. If you choose SVM, LIBSVM is a free SVM library, which works also in Java, and it works on Android too.
Ad-hoc rule approach
You can also approximate contour with a polygonal curve (see Ramer-Douglas-Peucker algorithm, there is a free implementation in OpenCV library, now available on Android). For certain simple forms like triangles or rectangles you can easily invent some ad-hoc heuristic rule which will "recognize" them (for example, if a closed contour can be approximated with just three segments and small error, then it is likely to be a triangle; if the centroid distance function is almost constant and there are zero convexity defects, then it is likely to be a circle).

Since this is very much related to hand writing recognition, you can use a simple hmm algorithm to compare shapes with pre-learnt db.
But for a much simpler approach you can detect the corners in the image and then count the corners to detect shapes.
The first approach can be used for any complicated shapes and the second only suits basic shapes.

You can use a supervised learning approach. For the problem you are trying to solve I think simple classifiers like Naive Bayes, KNN, etc. should give you good results.
You need to extract features from each of the images. For each image you can save the them in a vector. Lets call it the feature vector. For the images you have in your database you already know the type of shape so you can include the id of the type in the feature vector. This will serve as the training set.
Once you have your training set, you can train your classifier and every time you want to classify a new shape you just get its feature vector and use it to query the classifier.
I recommend you to use scale and size invariant features, so you will not have to re-size each image and you just need to compare it once instead of rotating it.
You can do a quick search for Scale/Rotate invariant features and try them.

AndEngine VS Android's Canvas VS OpenGLES - For rendering a 2D indoor vector map

This is a big issue for me I'm trying to figure out for a long time already.
I'm working on an application that should include a 2D vector indoor map in it.
The map will be drawn out from an .svg file that will specify all the data of the lines, curved lines (path) and rectangles that should be drawn.
My main requirement from the map are
Support touch events to detect where exactly the finger is touching.
Great image quality especially when considering the drawings of curved and diagonal lines (anti-aliasing)
Optional but very nice to have - Built in ability to zoom, pan and rotate.
So far I tried AndEngine and Android's canvas.
With AndEngine I had troubles with implementing anti-aliasing for rendering smooth diagonal lines or drawing curved lines, and as far as I understand, this is not an easy thing to implement in AndEngine.
Though I have to mention that AndEngine's ability to zoom in and pan with the camera instead of modifying the objects on the screen was really nice to have.
I also had some little experience with the built in Android's Canvas, mainly with viewing simple bitmaps, but I'm not sure if it supports all of these things, and especially if it would provide smooth results.
Last but no least, there's the option of just plain OpenGLES 1 or 2, that as far as I understand, with enough work should be able to support all the features I require. However it seems like something that would be hard to implement. And I've never programmed in OpenGL or anything like it, but I'm willing very much to learn.
To sum it all up, I need a platform that would provide me with the ability to do the 3 things I mentioned before, but also very important - To allow me to implement this feature as fast as possible.
Any kind of answer or suggestion would be very much welcomed as I'm very eager to solve this problem!
Thanks!

Marker Recognition on Android (recognising Rubik's Cubes)

I'm developing an augmented reality application for Android that uses the phone's camera to recognise the arrangement of the coloured squares on each face of a Rubik's Cube.
One thing that I am unsure about is how exactly I would go about detecting and recognising the coloured squares on each face of the cube. If you look at a Rubik's Cube then you can see that each square is one of six possible colours with a thin black border. This lead me to think that it should be relativly simply to detect a square, possibly using an existing marker detection API.
My question is really, has anybody here had any experience with image recognition and Android? Ideally I'd like to be able to implement and existing API, but it would be an interesting project to do from scratch if somebody could point me in the right direction to get started.
Many thanks in advance.

Do you want to point the camera at a cube, and have it understand the configuration?
Recognizing objects in photographs is an open AI problem. So you'll need to constrain the problem quite a bit to get any traction on it. I suggest starting with something like:
The cube will be photographed from a distance of exactly 12 inches, with a 100W light source directly behind the camera. The cube will be set diagonally so it presents exactly 3 faces, with a corner in the center. The camera will be positioned so that it focuses directly on the cube corner in the center.
A picture will taken. Then the cube will be turned 180 degrees vertically and horizontally, so that the other three faces are visible. A second picture will be taken. Since you know exactly where each face is expected to be, grab a few pixels from each region, and assume that is the color of that square. Remember that the cube will usually be scrambled, not uniform as shown in the picture here. So you always have to look at 9*6 = 54 little squares to get the color of each one.
The information in those two pictures defines the cube configuration. Generate an image of the cube in the same configuration, and allow the user to confirm or correct it.
It might be simpler to take 6 pictures - one of each face, and travel around the faces in well-defined order. Remember that the center square of each face does not move, and defines the correct color for that face.
Once you have the configuration, you can use OpenGL operations to rotate the cube slices. This will be a program with hundreds of lines of code to define and rotate the cube, plus whatever you do for image recognition.

In addition to what Peter said, it is probably best to overlay guide lines on the picture of the cube as the user takes the pictures. The user then lines up the cube within the guide lines, whether its a single side (a square guide line) or three sides (three squares in perspective). You also might want to have the user specify the number of colored boxes in each row. In your code, sample the color in what should be the center of each colored box and compare it to the other colored boxes (within some tolerance level) to identify the colors. In addition to providing the recognized results to the user, it would be nice to allow the user to make changes to the recognized colors. It does not seem like fancy image recognition is needed.

Nice idea, I'm planing to use computer vision and marker detectors too, but for another project. I am still looking if there is any available information on the web, ex: linking openCV or ARtoolkit to the Android SDK. If you have any additional information, about how to link a computer vision API, please let me know.
See you soon and goodluck!

NYARToolkit uses marker detection and is made in JAVA (as well as managed C# for windows devices). I don't know how well it works on the android platform, but I have seen it used on windows mobile devices, and its very well done.
Good luck, and happy programming!

I'd suggest looking at the Andoid OpenCV library. You probably want to examine the blob detection algorithms. You may also want to consider Hough lines or Countours to detect quads.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.