I need to implement a simple Android application that allows users to draw a "simple" shape (circle, triangle etc) on their phone and then ask a server if the drawn shape matches one of the shapes in its database, which consists of a low number of shapes (let's say < 100, but can be more). In order to make this application work, I was thinking to use the following steps (we assume that the input image consists only of black & white pixels);
A. re-size & crop the input image in order to bring it to the same scale as the ones in the DB
B. rotate the input image by a small angle (let's say 15 degrees) x times (24 in this case) and try to match each of these rotations against each shape in the DB.
Questions:
For A, what would be the best approach? I was thinking to implement this step in the Android application, before sending the data to the server.
For B, what would be a decent algorithm of comparing 2 black & white pixel images that contain only a shape?
Is there any better / simpler way of implementing this? A solution that also has an implementation is desirable.
PS: I can see that many people have discussed similar topics around here, but I can't seem to find something that matches my requirements well enough.
Machine learning approach
You choose some features which describe contours, choose some classification method, prepare a training set of tagged contours, train the classifier, use it in the program.
Contour features. Given a contour(detected in the image or constructed from the user input), you can calculate rotation-invariant moments. The oldest and the most well known is a set of Hu moments.
You can also consider such features of the contour as eccentricity, area, convexity defects, FFT transform of the centroid distance function and many others.
Classifiers. Now you need to train a classifier. Support Vector Machines, Neural Networks, decision trees, Bayes classifiers are some of the popular methods. There are many methods to choose from. If you choose SVM, LIBSVM is a free SVM library, which works also in Java, and it works on Android too.
Ad-hoc rule approach
You can also approximate contour with a polygonal curve (see Ramer-Douglas-Peucker algorithm, there is a free implementation in OpenCV library, now available on Android). For certain simple forms like triangles or rectangles you can easily invent some ad-hoc heuristic rule which will "recognize" them (for example, if a closed contour can be approximated with just three segments and small error, then it is likely to be a triangle; if the centroid distance function is almost constant and there are zero convexity defects, then it is likely to be a circle).
Since this is very much related to hand writing recognition, you can use a simple hmm algorithm to compare shapes with pre-learnt db.
But for a much simpler approach you can detect the corners in the image and then count the corners to detect shapes.
The first approach can be used for any complicated shapes and the second only suits basic shapes.
You can use a supervised learning approach. For the problem you are trying to solve I think simple classifiers like Naive Bayes, KNN, etc. should give you good results.
You need to extract features from each of the images. For each image you can save the them in a vector. Lets call it the feature vector. For the images you have in your database you already know the type of shape so you can include the id of the type in the feature vector. This will serve as the training set.
Once you have your training set, you can train your classifier and every time you want to classify a new shape you just get its feature vector and use it to query the classifier.
I recommend you to use scale and size invariant features, so you will not have to re-size each image and you just need to compare it once instead of rotating it.
You can do a quick search for Scale/Rotate invariant features and try them.
Related
I am developing an Android app in which I want to track a 2D image/a piece of paper, analyze what the user write/draw on it, and correctly display different 3D contents on it.
I am working on the tracking and displaying simple 3D contents part, which can actually be achieved using SDKs like Vuforia and Wikitude. However, I am not using them for several reasons.
There are other analysis on the image to be done, e.g. drawings analysis.
The image may not be as rich in features, e.g. paper with lines or some figures.
SDKs like Vuforia may not expose some underlying functionalities like feature detection etc. to developers.
Anyway, right now I only want to achieve the following result.
I have a piece of paper, probably with lines and figures on it. You can think of it as the kind of paper for children to practice writing or drawing on. Example: https://i.pinimg.com/236x/89/3a/80/893a80336adab4120ff197010cd7f6a1--dr-seuss-crafts-notebook-paper.jpg
I point my phone (the camera) at the paper while capturing the video frames.
I want to register the paper, track it and display a simple wire-frame cube on it.
I have been messing around with OpenCV, and have tried the following approaches.
Using homography:
Detect features in the 2D image (ORB, FAST etc.).
Describe the features (ORB).
Do the same in each video frame.
Match the features and find good matches.
Find the homography, use the homography and successfully draw a rectangle around the image in the video frame.
Did not know how to use the homography decomposition (into rotations, translations and normals) to display a 3D object like a cube.
Using solvePnP:
1 to 4 are the same as the above.
Convert all 2D good match points in the image to 3D by assuming the image lies on the world's x-y plane, thus all having z = 0.
Use solvePnP with those 3D points and 2D points in the current frame to retrieve the rotation and translation vectors, and further convert it to the projection matrix using Rodrigues() in OpenCV.
Construct the 3D points of a cube.
Project them into the 2D image using the projection and the camera matrix.
The issue is the cube is jumping around, which I believe is due to the feature detection and mapping not being stable and accurate, thus affecting solvePnP.
Using contours or corners:
I simply grayscale the camera frame, Gaussian-smooth it, dilate or erode it and try to find the biggest 4-edge contour so that I can track it using solvePnP etc. This, unsurprisingly, doesn't give good results, or I'm just doing it wrong.
So my questions are:
How can I solve the two bold problems mentioned above.
More generally, given the type of image target I want to track, what would be the optimal algorithm/solution/technique to track it?
What are the things that I can improve/change in my way of solving the problem?
Thank you very much.
The requirement is to create an Android application running on one specific mobile device that records video of a human eye pupil dilating in response to a bright light (which is physically attached to the mobile device). The video is then post-processed frame by frame on the device to detect & measure the diameter of the pupil AND the iris in each frame. Note the image processing does NOT need doing in real-time. The end result will be a dataset describing the changes in pupil (& iris) size over time. It's expected that the iris size can be used to enhance confidence in the pupil diameter data (eg removing pupil size data that's wildly wrong), but also as a relative measure for how dilated the eye is at any point.
I am familiar with developing Android mobile apps, but my experience with image processing is very limited. I've researched solutions and it seems that the answer may lie with the OpenCV/JavaCv libraries, which should provide shape detection (eg http://opencvlover.blogspot.co.uk/2012/07/hough-circle-in-javacv.html) but can anyone provide guidance on these specific questions:
Am I right to think it can detect the two circle shapes within a bitmap, one inside the other? ie shapes inside each other is not a problem.
Is it true that JavaCv can detect a circle, and return a position & radius/diameter? ie it doesn't return a set of vertices that then require further processing to compare with a circle? It seems to have a HoughCircle method, so I think yes.
What processing of each frame is typically used before doing shape detection? For example an algorithm to enhance edges, smooth, or remove colour?
Can I use it to not just detect presence of, but measure the diameter of the detected circles? (in pixels, but then can easily be converted to real-world measurements because known hardware is being used). I think yes, but would be great to hear confirmation from those more familiar.
This project is a non-commercial charitable project, so any help especially appreciated.
I would really suggest using ndk as it is a bit richer in features. Also it allows you to run and test your algorithms on a laptop with images before pushing it to a device, speeding up development.
Pre-processing steps:
Typically one would use thresholding or canny edge detection and morphological operations like erode dilate.
For detection of iris / pupil, houghcircles is not a very good method, feature detection methods like MSER work better for not-so-well-defined circles. Here is another answer I wrote on the same topic which has code that could help.
If you are looking to measure the regions, I would suggest going through this blog. It has a clear explanation on the steps involved for a reasonably accurate measurement.
Purpose of my application is to take a photo hand (gesture) and comparing it with a picture that is in the database. The first option I was use is background subtraction on images:
http://docs.opencv.org/trunk/doc/tutorials/video/background_subtraction/background_subtraction.html.
The solution works, but sometimes, depending on the first picture is not properly cut hand.
The second option is to detect skin color: http://bytefish.de/blog/opencv/skin_color_thresholding/
Or is it better to use a hand detection based on xml files? To compare the images I wanted to use this method:http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison.html
Let me remind you that I'm talking about comparing the images to which they are gestures. I also read about the possibility of making the histogram is not the entire image only on the subject of the photo and then the data is more reliable, but do not know how to do it.
I want to compare the gesture as a single image, I do not mean here with the sequences. 1 picture for comparison with the baseline. Gesture detection is to enable someone eg. Will ring, and now I will have for example. 5 seconds to take pictures and compare them with the base because I'm not certain if appeared in the lens of the hand or not. Unless there is another solution.
Ultimately, it is to be a comparison of two images in which there will be some hand gestures.
If your goal is to perform gesture recognition, you should take into account that gestures are sequence of images.
Thus, if you want to compare gesture you'll have to find a "smart" way to compare the whole sequences, and not single images, because one frame can belong to different gestures.
State of the art approaches for gesture recognition involves the extraction of the Optical Flow between two consecutive frames and then compute the histogram of optical flow (HOF). Having computed the histograms for all the frame pair in the video sequence, you can use different strategies to compare gestures:
You can concatenate all the HOF in the sequence and then perform histogram intersection to compare the two sequences
You can use the Bag of Word paradigm to create a representation of the HOF
Here there are some pointers to this strateies:
Optical Flow
You can check this article for extracting HOF: "Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on
Nonlinear Dynamical Systems for the Recognition of Human Actions"[Bag of words]
Bag of Words
However, if your application requires just the comparison between two images, I would suggest to extract the Histogram of Gradient (HOG) for each image and then compare them with the histogram intersection measure or, again, use the Bag of words paradigm (it is better if you're looking for higher level representations of the images). The HOG are provided within the OpenCv libraries link
I'm working on a project to recognize insects from user inputted images. I think that OpenCV is the route I'd like to take since I've worked with it before for facial recognition. I'm not using the camera feed and am instead using images provided by the user. For early development I plan to build in some sample images to ensure the concept is working before moving on to other features.
I would like to use 4-5 template images for each insect and have that be robust enough to detect the insect from the input image. If there are multiple insects I would like for them all to be detected and have their own rectangle drawn around them.
With that brief explanation, I am wondering what the best way to complete this task is. I know that OpenCV has template recognition, but the template size matters and I don't want to make the user ensure their insect is a certain amount of pixels in their image. Is there a way to work around this, possibly by rotating the template images or using variously sized templates? Or is there a better approach than template recognition for this project?
Unfortunately without some form of constraints, you are essentially asking if computer vision has been solved! You have several unresolved, but very interesting research problems.
Lets reduce the problem to just classifying a sample insect in a fixed pose with controlled lighting as of belonging to one of 100k insects categories; that would be tough.
Lets reduce the problem to recognizing a single insect instance in an arbitrary pose in 3d space; that would be tough.
Lets reduce the problem to recognizing a single insect instance in the same pose under arbitrary lighting conditions viewed with arbitrary optical sensors, that would be tough.
Successful computer-vision in the wild, is all about cleverly constraining the operating conditions, otherwise you are in research land. If your are in research land, then a cool thing to do is to try and exploit 3D CAD models to capture the huge variety in poses, here's a nice one on recognizing chairs,
http://www.di.ens.fr/willow/research/seeing3Dchairs/,
If not conducting research and, say your building a app, then you need to consider how you can guide the user, train the user, trick the user, into providing the best operating conditions for the recognition system.
(This was to big to put in comments)
In my view I have a simple ARGB drawable that takes about 2ms to draw but I can draw the same file as a bitmap in under 0.5ms (just some quick code, I can't really consider it an option). What are the best ways to optimize the drawing speed of a drawable?
It will depend on the number of drawables and how many times each gets drawn. For a small number, use canvas (an exact number will also depend on the device) I would suggest using Canvas as it's a nice higher level approach to drawing.
If you want to crank out a lot of images (think hundreds), I would suggest creating a GLSurfaceView and using openGL to render your images using VBOs tailored to your app. I would also recommend using a texture sheet if you go down this route since you'll get a huge increase in performance at the cost of code complexity.
But this will also depend on that type of app. My background is in game development so I use openGL exclusively for better performance. for a simple app (something along the lines of androidify) Canvas should be fine. If you want a simple tutorial for openGL, I suggest visiting Bergman's series of posts on the topic (google should give you a link for that). It is a nice intro to openGL.