Purpose of my application is to take a photo hand (gesture) and comparing it with a picture that is in the database. The first option I was use is background subtraction on images:
http://docs.opencv.org/trunk/doc/tutorials/video/background_subtraction/background_subtraction.html.
The solution works, but sometimes, depending on the first picture is not properly cut hand.
The second option is to detect skin color: http://bytefish.de/blog/opencv/skin_color_thresholding/
Or is it better to use a hand detection based on xml files? To compare the images I wanted to use this method:http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison.html
Let me remind you that I'm talking about comparing the images to which they are gestures. I also read about the possibility of making the histogram is not the entire image only on the subject of the photo and then the data is more reliable, but do not know how to do it.
I want to compare the gesture as a single image, I do not mean here with the sequences. 1 picture for comparison with the baseline. Gesture detection is to enable someone eg. Will ring, and now I will have for example. 5 seconds to take pictures and compare them with the base because I'm not certain if appeared in the lens of the hand or not. Unless there is another solution.
Ultimately, it is to be a comparison of two images in which there will be some hand gestures.
If your goal is to perform gesture recognition, you should take into account that gestures are sequence of images.
Thus, if you want to compare gesture you'll have to find a "smart" way to compare the whole sequences, and not single images, because one frame can belong to different gestures.
State of the art approaches for gesture recognition involves the extraction of the Optical Flow between two consecutive frames and then compute the histogram of optical flow (HOF). Having computed the histograms for all the frame pair in the video sequence, you can use different strategies to compare gestures:
You can concatenate all the HOF in the sequence and then perform histogram intersection to compare the two sequences
You can use the Bag of Word paradigm to create a representation of the HOF
Here there are some pointers to this strateies:
Optical Flow
You can check this article for extracting HOF: "Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on
Nonlinear Dynamical Systems for the Recognition of Human Actions"[Bag of words]
Bag of Words
However, if your application requires just the comparison between two images, I would suggest to extract the Histogram of Gradient (HOG) for each image and then compare them with the histogram intersection measure or, again, use the Bag of words paradigm (it is better if you're looking for higher level representations of the images). The HOG are provided within the OpenCv libraries link
Related
Before I make a giant word dump my effective question is this:
Can I supply some extra information / heuristics to ARCore to refine it's idea of what the pose of a detected augment image is? Or can I use the pose of other trackable objects to refine the the pose of a detected augment image?
For more info, here is some background information on my workflow:
My AR app revolves around overlaying various 3D CAD models on top of their real-world machine equivalents. The user interaction goes like this:
The user will adhere a QR code (sized .2 meters by .2 meters) to a predetermined location on the associated machine (location is specific to the type of machine).
The user will then load up the app, point the camera at the QR code and the app will pass the camera image to a QR code reading library and use the payload (an id for a specific machine) to retrieve the associated CAD Model & metadata.
Once the QR code is detected I can use the QR code reading library to construct a pristine image of the QR code and pass this image to ARCore so that it can detect it in 3D space from the camera.
Once the QR code is detected in 3D space, I attach an anchor and I use the knowledge of where the QR code should be placed on the given model (also retrieved from my database using the payload info) to determine a basis for my CAD model.
Information can be overlayed using the CAD model to show various operations / interactions.
Now I've got all this working pretty well but I've run into some issues where the model is never quite positioned exactly to the real-world equivalent and requires some manual positional adjustment after the fact to get things just right. I have some ideas for how to resolve this but I don't know how feasible any of these ideas are:
ArTrackable_acquireNewAnchor allows you to specify multiple anchors per trackable with different poses. I assume this will refine the tracking the object but I'm not clear as to how to use this API. I'm currently just passing the pose generated from ArAugmentedImage_getCenterPose so I don't know what other poses I would pass.
If I promote my QR Code anchor to a cloud anchor after detection will that aid in detecting / refining the qr pose detection in the future?
If I try and match other features detected by ARCore (like planes) to known topology in the real environment (like floors / walls) could better approximate the position of the QR code image or provide some heuristic to ARCore so that it can?
Instead of using a single QR code image what if I use a set of images (one QR code, and two other static images) that are slightly offset from each other. If we know how far apart these images are in the real world we can use this information to correct for the error in ARCore's estimation of where they are.
Sorry for the giant word dump, but I figured the more info the better. Any other ideas outside the framing of my question are also appreciated.
I am new to image processing. I have a data set of images and I want to perform calibration on those images based on a target image. I have surfed a lot on image calibration but the majority of the results yield camera calibration. I am confused as to whether these are same or different things. Can anybody explain to me the difference between these two terms?
On reading through one of the results on image calibration, I got to know that there are three steps that I need to perform:
Bias Frame Calibration
Dark Frame Calibration
Flat Field Frame Calibration
Also, I need to perform this in Android. For that, I have figured out that I will need to use OpenCV or JavaCV.
So, I want to know if these 3 steps will be possible using OpenCV/JavaCV or not?
Calibration is process that is exploiting some knowledge about the data to reconstruct measurements to be more accurate or suite a specific need. As we have no idea what is the desired result of your calibration then it is hard to say.
In general the difference is as follows:
Camera calibration
you got camera and want to achieve that captured images will suffice some condition. This process usually mean taking image of some predefined objects like color markers, geometry checker board, LASER sweeps, etc. This way you can obtain camera parameters needed to reconstruct some specific feature of image for any other image taken (assuming important parameters not change with time like camera position or exposure time ...)
Image calibration
Is similar but the input image can be obtained from different sources (different cameras, render, simulation, etc. ) or under different circumstances (exposure, lighting, etc.). In this case we have not the luxury of calibration process so instead we need to find some kind of know feature in the images and correct the rest of image (for example object of known size, color, temperature, etc.)
So the difference is The Camera calibration is when you got single imaging device as a source of image and Image calibration is when you got multiple image sources (often unknown).
I am not using OpenCV but as people using this lib for such tasks then it should have support for operations like this.
Here small example of such operation:
OpenCV Birdseye view without loss of data
The requirement is to create an Android application running on one specific mobile device that records video of a human eye pupil dilating in response to a bright light (which is physically attached to the mobile device). The video is then post-processed frame by frame on the device to detect & measure the diameter of the pupil AND the iris in each frame. Note the image processing does NOT need doing in real-time. The end result will be a dataset describing the changes in pupil (& iris) size over time. It's expected that the iris size can be used to enhance confidence in the pupil diameter data (eg removing pupil size data that's wildly wrong), but also as a relative measure for how dilated the eye is at any point.
I am familiar with developing Android mobile apps, but my experience with image processing is very limited. I've researched solutions and it seems that the answer may lie with the OpenCV/JavaCv libraries, which should provide shape detection (eg http://opencvlover.blogspot.co.uk/2012/07/hough-circle-in-javacv.html) but can anyone provide guidance on these specific questions:
Am I right to think it can detect the two circle shapes within a bitmap, one inside the other? ie shapes inside each other is not a problem.
Is it true that JavaCv can detect a circle, and return a position & radius/diameter? ie it doesn't return a set of vertices that then require further processing to compare with a circle? It seems to have a HoughCircle method, so I think yes.
What processing of each frame is typically used before doing shape detection? For example an algorithm to enhance edges, smooth, or remove colour?
Can I use it to not just detect presence of, but measure the diameter of the detected circles? (in pixels, but then can easily be converted to real-world measurements because known hardware is being used). I think yes, but would be great to hear confirmation from those more familiar.
This project is a non-commercial charitable project, so any help especially appreciated.
I would really suggest using ndk as it is a bit richer in features. Also it allows you to run and test your algorithms on a laptop with images before pushing it to a device, speeding up development.
Pre-processing steps:
Typically one would use thresholding or canny edge detection and morphological operations like erode dilate.
For detection of iris / pupil, houghcircles is not a very good method, feature detection methods like MSER work better for not-so-well-defined circles. Here is another answer I wrote on the same topic which has code that could help.
If you are looking to measure the regions, I would suggest going through this blog. It has a clear explanation on the steps involved for a reasonably accurate measurement.
What I am doing is attempting to using EMGU to perform and AbsDiff of two images.
Given the following conditions:
User starts their webcam and with the webcam stationary takes a picture.
User moves into the frame and takes another picture (WebCam has NOT moved).
AbsDiff works well but what I'm finding is that the ISO adjustments and White Balance adjustments made by certain cameras (even on Android and iPhone) are uncontrollable to a degree.
Therefore instead of fighting a losing battle I'd like to attempt some image post processing to see if I can equalize the two.
I found the following thread but it's not helping me much: How do I equalize contrast & brightness of images using opencv?
Can anyone offer specific details of what functions/methods/approach to take using EMGUCV?
I've tried using things like _EqualizeHist(). This yields very poor results.
Instead of equalizing the histograms for each image individually, I'd like to compare the brightness/contrast values and come up with an average that gets applied to both.
I'm not looking for someone to do the work for me (although code example would CERTAINLY be appreciated). I'm looking for either exact guidance or some way to point the ship in the right direction.
Thanks for your time.
I need to implement a simple Android application that allows users to draw a "simple" shape (circle, triangle etc) on their phone and then ask a server if the drawn shape matches one of the shapes in its database, which consists of a low number of shapes (let's say < 100, but can be more). In order to make this application work, I was thinking to use the following steps (we assume that the input image consists only of black & white pixels);
A. re-size & crop the input image in order to bring it to the same scale as the ones in the DB
B. rotate the input image by a small angle (let's say 15 degrees) x times (24 in this case) and try to match each of these rotations against each shape in the DB.
Questions:
For A, what would be the best approach? I was thinking to implement this step in the Android application, before sending the data to the server.
For B, what would be a decent algorithm of comparing 2 black & white pixel images that contain only a shape?
Is there any better / simpler way of implementing this? A solution that also has an implementation is desirable.
PS: I can see that many people have discussed similar topics around here, but I can't seem to find something that matches my requirements well enough.
Machine learning approach
You choose some features which describe contours, choose some classification method, prepare a training set of tagged contours, train the classifier, use it in the program.
Contour features. Given a contour(detected in the image or constructed from the user input), you can calculate rotation-invariant moments. The oldest and the most well known is a set of Hu moments.
You can also consider such features of the contour as eccentricity, area, convexity defects, FFT transform of the centroid distance function and many others.
Classifiers. Now you need to train a classifier. Support Vector Machines, Neural Networks, decision trees, Bayes classifiers are some of the popular methods. There are many methods to choose from. If you choose SVM, LIBSVM is a free SVM library, which works also in Java, and it works on Android too.
Ad-hoc rule approach
You can also approximate contour with a polygonal curve (see Ramer-Douglas-Peucker algorithm, there is a free implementation in OpenCV library, now available on Android). For certain simple forms like triangles or rectangles you can easily invent some ad-hoc heuristic rule which will "recognize" them (for example, if a closed contour can be approximated with just three segments and small error, then it is likely to be a triangle; if the centroid distance function is almost constant and there are zero convexity defects, then it is likely to be a circle).
Since this is very much related to hand writing recognition, you can use a simple hmm algorithm to compare shapes with pre-learnt db.
But for a much simpler approach you can detect the corners in the image and then count the corners to detect shapes.
The first approach can be used for any complicated shapes and the second only suits basic shapes.
You can use a supervised learning approach. For the problem you are trying to solve I think simple classifiers like Naive Bayes, KNN, etc. should give you good results.
You need to extract features from each of the images. For each image you can save the them in a vector. Lets call it the feature vector. For the images you have in your database you already know the type of shape so you can include the id of the type in the feature vector. This will serve as the training set.
Once you have your training set, you can train your classifier and every time you want to classify a new shape you just get its feature vector and use it to query the classifier.
I recommend you to use scale and size invariant features, so you will not have to re-size each image and you just need to compare it once instead of rotating it.
You can do a quick search for Scale/Rotate invariant features and try them.