I'm working on a project to recognize insects from user inputted images. I think that OpenCV is the route I'd like to take since I've worked with it before for facial recognition. I'm not using the camera feed and am instead using images provided by the user. For early development I plan to build in some sample images to ensure the concept is working before moving on to other features.
I would like to use 4-5 template images for each insect and have that be robust enough to detect the insect from the input image. If there are multiple insects I would like for them all to be detected and have their own rectangle drawn around them.
With that brief explanation, I am wondering what the best way to complete this task is. I know that OpenCV has template recognition, but the template size matters and I don't want to make the user ensure their insect is a certain amount of pixels in their image. Is there a way to work around this, possibly by rotating the template images or using variously sized templates? Or is there a better approach than template recognition for this project?
Unfortunately without some form of constraints, you are essentially asking if computer vision has been solved! You have several unresolved, but very interesting research problems.
Lets reduce the problem to just classifying a sample insect in a fixed pose with controlled lighting as of belonging to one of 100k insects categories; that would be tough.
Lets reduce the problem to recognizing a single insect instance in an arbitrary pose in 3d space; that would be tough.
Lets reduce the problem to recognizing a single insect instance in the same pose under arbitrary lighting conditions viewed with arbitrary optical sensors, that would be tough.
Successful computer-vision in the wild, is all about cleverly constraining the operating conditions, otherwise you are in research land. If your are in research land, then a cool thing to do is to try and exploit 3D CAD models to capture the huge variety in poses, here's a nice one on recognizing chairs,
http://www.di.ens.fr/willow/research/seeing3Dchairs/,
If not conducting research and, say your building a app, then you need to consider how you can guide the user, train the user, trick the user, into providing the best operating conditions for the recognition system.
(This was to big to put in comments)
Related
I have no experience in augmented reality nor image processing. And I know there are lots of document in the internet but to look for right places I should know basic stuff at first. I'm planning to code an android app which will use augmented reality for virtual fitting room. And I have determined some functionalities of app. My question is how could i manage to do those functionalities, which topics should i look into, where to start, which key functionalities app should achieve and which open-source sdk you would suggest. So I can do deeper researches
-- Virtualizing clothes which will be provided by me and make them usable for app
-- Which attributes should virtualized clothes have and how to store them
-- Scan real-life clothes, virtualize them and make usable for app
-- Tracking human who will try on those clothes
-- Human body size can change so clothes which will fit on them should also resized for each person
-- Clothes should be looked as realistic as possible
-- Whenever a person moves, clothes should also move with that person (person bends, clothes also bends and fits on that person). And it should be quick as possible as it gets.
Have you tried Snapchat's face filters?
It's essentially the same problem. They need to:
Create a model of a face (where are the eyes, nose, mouth, chin, etc)
Create a texture to map onto the model of the face
Extract faces from an image/video and map the 2D coordinates from the image to the model of the face you've defined
Draw/Render the texture on top of the image/video feed
Now you'd have to do the same, but instead you'd do it for a human body.
The issues that you'd have to deal with is the fact that only "half" of your body would be visible to your camera at any time (because the other half is facing away from the camera). Also your textures would have to map to a 3D model, vs a relatively 2D model of a face (facial features are mostly on a flat plane which is a good enough estimation).
Good luck!
I tried searching a lot about developing 360 camera like Google Street View but still not able to reach through the solution.
I tried with the this panoramagl-android but this is not what i am looking for.
So can any one please give me idea or suggest anything to create spherical camera application.
360 images and videos are generally created with dedicated cameras or groups of regular cameras, and the result then 'stitched' together to produce the 360 representation.
The usual way to represent a 360 image or video at this time time is an equi-rectangular projection, similar to the technique used to depict the spherical globe on flat maps of the world.
If you are trying to do this with a regular phone you face the issue that you only have one camera, so you won't get the an image from multiple cameras at the same time to stitch together. This is maybe easier to understand visually - this is an example of a set up to capture multiple views:
You then need software to 'stitch' the different videos together. There are quite a few options, many being proprietary, VideoStitch is probably the best known at this time: http://www.video-stitch.com/.
Note that this is processing intensive so it nearly always done on relatively high powered servers rather than on mobile devices.
The requirement is to create an Android application running on one specific mobile device that records video of a human eye pupil dilating in response to a bright light (which is physically attached to the mobile device). The video is then post-processed frame by frame on the device to detect & measure the diameter of the pupil AND the iris in each frame. Note the image processing does NOT need doing in real-time. The end result will be a dataset describing the changes in pupil (& iris) size over time. It's expected that the iris size can be used to enhance confidence in the pupil diameter data (eg removing pupil size data that's wildly wrong), but also as a relative measure for how dilated the eye is at any point.
I am familiar with developing Android mobile apps, but my experience with image processing is very limited. I've researched solutions and it seems that the answer may lie with the OpenCV/JavaCv libraries, which should provide shape detection (eg http://opencvlover.blogspot.co.uk/2012/07/hough-circle-in-javacv.html) but can anyone provide guidance on these specific questions:
Am I right to think it can detect the two circle shapes within a bitmap, one inside the other? ie shapes inside each other is not a problem.
Is it true that JavaCv can detect a circle, and return a position & radius/diameter? ie it doesn't return a set of vertices that then require further processing to compare with a circle? It seems to have a HoughCircle method, so I think yes.
What processing of each frame is typically used before doing shape detection? For example an algorithm to enhance edges, smooth, or remove colour?
Can I use it to not just detect presence of, but measure the diameter of the detected circles? (in pixels, but then can easily be converted to real-world measurements because known hardware is being used). I think yes, but would be great to hear confirmation from those more familiar.
This project is a non-commercial charitable project, so any help especially appreciated.
I would really suggest using ndk as it is a bit richer in features. Also it allows you to run and test your algorithms on a laptop with images before pushing it to a device, speeding up development.
Pre-processing steps:
Typically one would use thresholding or canny edge detection and morphological operations like erode dilate.
For detection of iris / pupil, houghcircles is not a very good method, feature detection methods like MSER work better for not-so-well-defined circles. Here is another answer I wrote on the same topic which has code that could help.
If you are looking to measure the regions, I would suggest going through this blog. It has a clear explanation on the steps involved for a reasonably accurate measurement.
I need to implement a simple Android application that allows users to draw a "simple" shape (circle, triangle etc) on their phone and then ask a server if the drawn shape matches one of the shapes in its database, which consists of a low number of shapes (let's say < 100, but can be more). In order to make this application work, I was thinking to use the following steps (we assume that the input image consists only of black & white pixels);
A. re-size & crop the input image in order to bring it to the same scale as the ones in the DB
B. rotate the input image by a small angle (let's say 15 degrees) x times (24 in this case) and try to match each of these rotations against each shape in the DB.
Questions:
For A, what would be the best approach? I was thinking to implement this step in the Android application, before sending the data to the server.
For B, what would be a decent algorithm of comparing 2 black & white pixel images that contain only a shape?
Is there any better / simpler way of implementing this? A solution that also has an implementation is desirable.
PS: I can see that many people have discussed similar topics around here, but I can't seem to find something that matches my requirements well enough.
Machine learning approach
You choose some features which describe contours, choose some classification method, prepare a training set of tagged contours, train the classifier, use it in the program.
Contour features. Given a contour(detected in the image or constructed from the user input), you can calculate rotation-invariant moments. The oldest and the most well known is a set of Hu moments.
You can also consider such features of the contour as eccentricity, area, convexity defects, FFT transform of the centroid distance function and many others.
Classifiers. Now you need to train a classifier. Support Vector Machines, Neural Networks, decision trees, Bayes classifiers are some of the popular methods. There are many methods to choose from. If you choose SVM, LIBSVM is a free SVM library, which works also in Java, and it works on Android too.
Ad-hoc rule approach
You can also approximate contour with a polygonal curve (see Ramer-Douglas-Peucker algorithm, there is a free implementation in OpenCV library, now available on Android). For certain simple forms like triangles or rectangles you can easily invent some ad-hoc heuristic rule which will "recognize" them (for example, if a closed contour can be approximated with just three segments and small error, then it is likely to be a triangle; if the centroid distance function is almost constant and there are zero convexity defects, then it is likely to be a circle).
Since this is very much related to hand writing recognition, you can use a simple hmm algorithm to compare shapes with pre-learnt db.
But for a much simpler approach you can detect the corners in the image and then count the corners to detect shapes.
The first approach can be used for any complicated shapes and the second only suits basic shapes.
You can use a supervised learning approach. For the problem you are trying to solve I think simple classifiers like Naive Bayes, KNN, etc. should give you good results.
You need to extract features from each of the images. For each image you can save the them in a vector. Lets call it the feature vector. For the images you have in your database you already know the type of shape so you can include the id of the type in the feature vector. This will serve as the training set.
Once you have your training set, you can train your classifier and every time you want to classify a new shape you just get its feature vector and use it to query the classifier.
I recommend you to use scale and size invariant features, so you will not have to re-size each image and you just need to compare it once instead of rotating it.
You can do a quick search for Scale/Rotate invariant features and try them.
i just need some guide on how to detect a marker and make an output text.. for ex: a marker with an image of a dog , when detected, i have an output text "DOG" in a textfield .. can someone help me with my idea? oh, btw which one is more effective to use nyartoolkit or andar for my idea?thanks:) need help..!
What you're looking for isn't augmented reality, it's object recognition. AR is chiefly concerned with presenting data overlaid on the the real world, so computation is devoted each frame to determining the position relative to the camera of the object. If you don't intent to use this data, AR libraries may be an inefficient. That said...
AR marker tracking libraries usually find markers by prominent features like corners, and can distinguish markers by binary patters encoded inside the marker, or in the marker's borders. If you're happy with having the "dog" part encoded in the border of a marker, there are libraries you can use like Qualcomm's AR development kit. This library, and Metaio's Unifeye mobile can also do natural feature tracking on pre-defined images. If you're happy with being able to recognize one specific image or images of dogs that you have defined in advance, either of these should be ok. You might have to manipulate your dog images to get good features they can identify and track. Natural objects can be problematic.
General object recognition (being able to recognize a picture of any dog, not known beforehand) is still a research topic. There are approaches, but they're mostly very computationally intensive, and most mobile solutions involve offloading the serious computation to a server. Recognition of simple outline sketches however is more tractable, there's a great paper called "Shape recognition and pose estimation for mobile augmented reality" (I can't find a copy online, but the IEEE link is here) that uses contours to identify objects - this is light enough to run on a mobile (and it's pure genius).