Hello stackoverflow community I would like if someone can guide me a little regarding my next question, I want to make an application that takes a photo when it detects a sheet with 3 marks (black squares in the corners) similar to what a QR would have. I have read a little about opencv that I think could help me more however I am not very clear yet.
Here my example
Once you obtain your binary image, you can find contours and filter using contour approximation and contour area. If the approximated contour has a length of four then it must be a square and if it is within a lower and upper area range then we have detected a mark. We keep a counter of the mark and if there are three marks in the image, we can take the photo. Here's the visualization of the process.
We Otsu's threshold to obtain a binary image with the objects to detect in white.
From here we find contours using cv2.findContours and filter using contour approximation cv2.approxPolyDP in addition to contour area cv2.contourArea.
Detected marks highlighted in teal
I implemented it in Python but you can adapt the same approach
Code
import cv2
# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find contours and filter using contour approximation and contour area
marks = 0
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.04 * peri, True)
if len(approx) == 4 and area > 250 and area < 400:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (200,255,12), 2)
marks += 1
# Sheet has 3 marks
if marks == 3:
print('Take photo')
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.waitKey()
Ok I have a strange problem. I'll try to describe it as best as I can.
I've learned my app to detect a car when looking on it from the side
Imgproc.cvtColor(aInputFrame, grayscaleImage, Imgproc.COLOR_RGBA2RGB);
MatOfRect objects = new MatOfRect();
// Use the classifier to detect cars
if (cascadeClassifier != null) {
cascadeClassifier.detectMultiScale(grayscaleImage, objects, 1.1, 1,
2, new Size(absoluteObjectSize, absoluteObjectSize),
new Size());
}
for (int i = 0; i < dataArray.length; i++) {
Core.rectangle(aInputFrame, dataArray[i].tl(), dataArray[i].br(),
new Scalar(0, 255, 0, 255), 3);
mRenderer.setCameraPosition(-5, 5, 60f);
}
Now, this code works nice. I mean that i detects cars and it marks them with green rectangle. The problem is that the marked rectangle jumps like hell. I mean even when the phone is hold still the rectangle jumps from left to right to middle. There is never one still rectangle. I hope I've described the problem properly. I would like to stabilizy the marking cause I want to draw an overlay based on it and I can't make it to jump like this
See(1) the parameters for detectMultiScale and it expects
image of type CV_8U. You will need to convert to gray scale image
with COLOR_RGBA2GRAY instead of COLOR_RGBA2RGB
In detectMultiScale, increase the number of neighbours parameter to avoid false positives.
Suggestion: If input is a video stream, don't run
detectMultiScale on every frame. It is slow even if you use LBP
cascades. Try detection in one frame, followed by tracking techniques.
I looking for some advices about recognition of three handwritten shapes - circles, diamonds and rectangles. I tried diffrent aproaches but they failed so maybe you could point me in another, better direction.
What I tried:
1) Simple algorithm based on dot product between points of handwritten shape and ideal shape. It works not so bad at recognition of rectangle, but failed on circles and diamonds. The problem is that dot product of the circle and diamond is quite similiar even for ideal shapes.
2) Same aproach but using Dynamic Time Warping as measure of simililarity. Similiar problems.
3) Neural networks. I tried few aproaches - giving points data to neural networks (Feedforward and Kohonen) or giving rasterized image. For Kohonen it allways classified all the data (event the sample used to train) into the same category. Feedforward with points was better (but on the same level as aproach 1 and 2) and with rasterized image it was very slow (I needs at least size^2 input neurons and for small sized of raster circle is indistinguishable even for me ;) ) and also without success. I think is because all of this shapes are closed figures? I am not big specialist of ANN (had 1 semester course of them) so maybe I am using them wrong?
4) Saving the shape as Freeman Chain Code and using some algorithms for computing similarity. I though that in FCC the shapes will be realy diffrent from each other. No success here (but I havent explorer this path very deeply).
I am building app for Android with this but I think the language is irrelevant here.
Here's some working code for a shape classifier. http://jsfiddle.net/R3ns3/ I pulled the threshold numbers (*Threshold variables in the code) out of the ether, so of course they can be tweaked for better results.
I use the bounding box, average point in a sub-section, angle between points, polar angle from bounding box center, and corner recognition. It can classify hand drawn rectangles, diamonds, and circles. The code records points while the mouse button is down and tries to classify when you stop drawing.
HTML
<canvas id="draw" width="300" height="300" style="position:absolute; top:0px; left:0p; margin:0; padding:0; width:300px; height:300px; border:2px solid blue;"></canvas>
JS
var state = {
width: 300,
height: 300,
pointRadius: 2,
cornerThreshold: 125,
circleThreshold: 145,
rectangleThreshold: 45,
diamondThreshold: 135,
canvas: document.getElementById("draw"),
ctx: document.getElementById("draw").getContext("2d"),
drawing: false,
points: [],
getCorners: function(angles, pts) {
var list = pts || this.points;
var corners = [];
for(var i=0; i<angles.length; i++) {
if(angles[i] <= this.cornerThreshold) {
corners.push(list[(i + 1) % list.length]);
}
}
return corners;
},
draw: function(color, pts) {
var list = pts||this.points;
this.ctx.fillStyle = color;
for(var i=0; i<list.length; i++) {
this.ctx.beginPath();
this.ctx.arc(list[i].x, list[i].y, this.pointRadius, 0, Math.PI * 2, false);
this.ctx.fill();
}
},
classify: function() {
// get bounding box
var left = this.width, right = 0,
top = this.height, bottom = 0;
for(var i=0; i<this.points.length; i++) {
var pt = this.points[i];
if(left > pt.x) left = pt.x;
if(right < pt.x) right = pt.x;
if(top > pt.y) top = pt.y;
if(bottom < pt.y) bottom = pt.y;
}
var center = {x: (left+right)/2, y: (top+bottom)/2};
this.draw("#00f", [
{x: left, y: top},
{x: right, y: top},
{x: left, y: bottom},
{x: right, y: bottom},
]);
// find average point in each sector (9 sectors)
var sects = [
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0},
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0},
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0}
];
var x3 = (right + (1/(right-left)) - left) / 3;
var y3 = (bottom + (1/(bottom-top)) - top) / 3;
for(var i=0; i<this.points.length; i++) {
var pt = this.points[i];
var sx = Math.floor((pt.x - left) / x3);
var sy = Math.floor((pt.y - top) / y3);
var idx = sy * 3 + sx;
sects[idx].x += pt.x;
sects[idx].y += pt.y;
sects[idx].c ++;
if(sx == 1 && sy == 1) {
return "UNKNOWN";
}
}
// get the significant points (clockwise)
var sigPts = [];
var clk = [0, 1, 2, 5, 8, 7, 6, 3]
for(var i=0; i<clk.length; i++) {
var pt = sects[clk[i]];
if(pt.c > 0) {
sigPts.push({x: pt.x / pt.c, y: pt.y / pt.c});
} else {
return "UNKNOWN";
}
}
this.draw("#0f0", sigPts);
// find angle between consecutive 3 points
var angles = [];
for(var i=0; i<sigPts.length; i++) {
var a = sigPts[i],
b = sigPts[(i + 1) % sigPts.length],
c = sigPts[(i + 2) % sigPts.length],
ab = Math.sqrt(Math.pow(b.x-a.x,2)+Math.pow(b.y-a.y,2)),
bc = Math.sqrt(Math.pow(b.x-c.x,2)+ Math.pow(b.y-c.y,2)),
ac = Math.sqrt(Math.pow(c.x-a.x,2)+ Math.pow(c.y-a.y,2)),
deg = Math.floor(Math.acos((bc*bc+ab*ab-ac*ac)/(2*bc*ab)) * 180 / Math.PI);
angles.push(deg);
}
console.log(angles);
var corners = this.getCorners(angles, sigPts);
// get polar angle of corners
for(var i=0; i<corners.length; i++) {
corners[i].t = Math.floor(Math.atan2(corners[i].y - center.y, corners[i].x - center.x) * 180 / Math.PI);
}
console.log(corners);
// whats the shape ?
if(corners.length <= 1) { // circle
return "CIRCLE";
} else if(corners.length == 2) { // circle || diamond
// difference of polar angles
var diff = Math.abs((corners[0].t - corners[1].t + 180) % 360 - 180);
console.log(diff);
if(diff <= this.circleThreshold) {
return "CIRCLE";
} else {
return "DIAMOND";
}
} else if(corners.length == 4) { // rectangle || diamond
// sum of polar angles of corners
var sum = Math.abs(corners[0].t + corners[1].t + corners[2].t + corners[3].t);
console.log(sum);
if(sum <= this.rectangleThreshold) {
return "RECTANGLE";
} else if(sum >= this.diamondThreshold) {
return "DIAMOND";
} else {
return "UNKNOWN";
}
} else {
alert("draw neater please");
return "UNKNOWN";
}
}
};
state.canvas.addEventListener("mousedown", (function(e) {
if(!this.drawing) {
this.ctx.clearRect(0, 0, 300, 300);
this.points = [];
this.drawing = true;
console.log("drawing start");
}
}).bind(state), false);
state.canvas.addEventListener("mouseup", (function(e) {
this.drawing = false;
console.log("drawing stop");
this.draw("#f00");
alert(this.classify());
}).bind(state), false);
state.canvas.addEventListener("mousemove", (function(e) {
if(this.drawing) {
var x = e.pageX, y = e.pageY;
this.points.push({"x": x, "y": y});
this.ctx.fillStyle = "#000";
this.ctx.fillRect(x-2, y-2, 4, 4);
}
}).bind(state), false);
Given the possible variation in handwritten inputs I would suggest that a neural network approach is the way to go; you will find it difficult or impossible to accurately model these classes by hand. LastCoder's attempt works to a degree, but it does not cope with much variation or have promise for high accuracy if worked on further - this kind of hand-engineered approach was abandoned a very long time ago.
State-of-the-art results in handwritten character classification these days is typically achieved with convolutional neural networks (CNNs). Given that you have only 3 classes the problem should be easier than digit or character classification, although from experience with the MNIST handwritten digit dataset, I expect that your circles, squares and diamonds may occasionally end up being difficult for even humans to distinguish.
So, if it were up to me I would use a CNN. I would input binary images taken from the drawing area to the first layer of the network. These may require some preprocessing. If the drawn shapes cover a very small area of the input space you may benefit from bulking them up (i.e. increasing line thickness) so as to make the shapes more invariant to small differences. It may also be beneficial to centre the shape in the image, although the pooling step might alleviate the need for this.
I would also point out that the more training data the better. One is often faced with a trade-off between increasing the size of one's dataset and improving one's model. Synthesising more examples (e.g. by skewing, rotating, shifting, stretching, etc) or spending a few hours drawing shapes may provide more of a benefit than you could get in the same time attempting to improve your model.
Good luck with your app!
A linear Hough transform of the square or the diamond ought to be easy to recognize. They will both produce four point masses. The square's will be in pairs at zero and 90 degrees with the same y-coordinates for both pairs; in other words, a rectangle. The diamond will be at two other angles corresponding to how skinny the diamond is, e.g. 45 and 135 or else 60 and 120.
For the circle you need a circular Hough transform, and it will produce a single bright point cluster in 3d (x,y,r) Hough space.
Both linear and circular Hough transforms are implemented in OpenCV, and it's possible to run OpenCV on Android. These implementations include thresholding to identify lines and circles. See pg. 329 and pg. 331 of the documentation here.
If you are not familiar with Hough transforms, the Wikipedia page is not bad.
Another algorithm you may find interesting and perhaps useful is given in this paper about polygon similarity. I implemented it many years ago, and it's still around here. If you can convert the figures to loops of vectors, this algorithm could compare them against patterns, and the similarity metric would show goodness of match. The algorithm ignores rotational orientation, so if your definition of square and diamond is with respect to the axes of the drawing surface, you will have to modify the algorithm a bit to differentiate these cases.
What you have here is a fairly standard clasification task, in an arguably vision domain.
You could do this several ways, but the best way isn't known, and can sometimes depend on fine details of the problem.
So, this isn't an answer, per se, but there is a website - Kaggle.com that runs competition for classifications. One of the sample/experiemental tasks they list is reading single hand written numeric digits. That is close enough to this problem, that the same methods are almost certainly going to apply fairly well.
I suggest you go to https://www.kaggle.com/c/digit-recognizer and look around.
But if that is too vague, I can tell you from my reading of it, and playing with that problem space, that Random Forests are a better basic starting place than Neural networks.
In this case (your 3 simple objects) you could try RanSaC-fitting for ellipse (getting the circle) and lines (getting the sides of the rectangle or diamond) - on each connected object if there are several objects to classify at the same time. Based on the actual setting (expected size, etc.) the RanSaC-parameters (how close must a point be to count as voter, how many voters you need at minimun) must be tuned. When you have found a line with RanSaC-fitting, remove the points "close" to it and go for the next line. The angles of the lines should make a distinction between diamand and rectangle easy.
A very simple approach optimized for classifying exactly these 3 objects could be the following:
compute the center of gravity of an object to classify
then compute the distances of the center to the object points as a function on the angle (from 0 to 2 pi).
classify the resulting graph based on the smoothness and/or variance and the position and height of the local maxima and minima (maybe after smoothing the graph).
I propose a way to do it in following steps : -
Take convex hull of the image (consider the shapes being convex)
divide into segments using clustering algorithms
Try to fit a curves or straight line to it and measure & threshold using training set which can be used for classifications
For your application try to divide into 4 clusters .
once you classify clusters as line or curves you can use the info to derive whether curve is circle,rectangle or diamond
I think the answers that are already in place are good, but perhaps a better way of thinking about it is that you should try to break the problem into meaningful pieces.
If possible avoid the problem entirely. For instance if you are recognizing gestures, just analyze the gestures in real time. With gestures you can provide feedback to the user as to how your program interpreted their gesture and the user will change what they are doing appropriately.
Clean up the image in question. Before you do anything come up with an algorithm to try to select what the correct thing is you are trying to analyze. Also use an appropriate filter (convolution perhaps) to remove image artifacts before you begin the process.
Once you have figured out what the thing is you are going to analyze then analyze it and return a score, one for circle, one for noise, one for line, and the last for pyramid.
Repeat this step with the next viable candidate until you come up with the best candidate that is not noise.
I suspect you will find that you don't need a complicated algorithm to find circle, line, pyramid but that it is more so about structuring your code appropriately.
If I was you I'll use already available Image Processing libraries like "AForge".
Take A look at this sample article:
http://www.aforgenet.com/articles/shape_checker
I have a jar on github that can help if you are willing to unpack it and obey the apache license. You can try to recreate it in any other language as well.
Its an edge detector. The best step from there could be to:
find the corners (median of 90 degrees)
find mean median and maximum radius
find skew/angle from horizontal
have a decision agent decide what the shape is
Play around with it and find what you want.
My jar is open to the public at this address. It is not yet production ready but can help.
Just thought I could help. If anyone wants to be a part of the project, please do.
I did this recently with identifying circles (bone centers) in medical images.
Note: Steps 1-2 are if you are grabbing from an image.
Psuedo Code Steps
Step 1. Highlight the Edges
edges = edge_map(of the source image) (using edge detector(s))
(laymens: show the lines/edges--make them searchable)
Step 2. Trace each unique edge
I would (use a nearest neighbor search 9x9 or 25x25) to identify / follow / trace each edge, collecting each point into the list (they become neighbors), and taking note of the gradient at each point.
This step produces: a set of edges.
(where one edge/curve/line = list of [point_gradient_data_structure]s
(laymens: Collect a set of points along the edge in the image)
Step 3. Analyze Each Edge('s points and gradient data)
For each edge,
if the gradient similar for a given region/set of neighbors (a run of points along an edge), then we have a straight line.
If the gradient is changing gradually, we have a curve.
Each region/run of points that is a straight line or a curve, has a mean (center) and other gradient statistics.
Step 4. Detect Objects
We can use the summary information from Step 3 to build conclusions about diamonds, circles, or squares. (i.e. 4 straight lines, that have end points near each other with proper gradients is a diamond or square. One (or more) curves with sufficient points/gradients (with a common focal point) makes a complete circle).
Note: Using an image pyramid can improve algorithm performance, both in terms of results and speed.
This technique (Steps 1-4) would get the job done for well defined shapes, and also could detect shapes that are drawn less than perfectly, and could handle slightly disconnected lines (if needed).
Note: With some machine learning techniques (mentioned by other posters), it could be helpful/important to have good "classifiers" to basically break the problem down into smaller parts/components, so then a decider further down the chain could use to better understand/"see" the objects. I think machine learning might be a little heavy-handed for this question, but still could produce reasonable results. PCA(face detection) could potentially work too.
In my game, a body is randomly relocated on the screen after the user does something. However, if the object is relocated on top of another body, then both are pushed slightly (to make room!). I would like to check the location of the randomly generated coordinates first, so that the relocation only takes place if the position is free (within a certain diameter anyway).
Something like.. location.hasBody(). There surely must be a function for this that I haven't found. Thanks!
There is no way to query a world with a point and get the body, but what you can do is query the world with a small box:
// Make a small box.
b2AABB aabb;
b2Vec2 d;
d.Set(0.001f, 0.001f);
aabb.lowerBound = p - d;
aabb.upperBound = p + d;
// Query the world for overlapping shapes.
QueryCallback callback(p);
m_world->QueryAABB(&callback, aabb);
if (callback.m_fixture)
{
//it had found a fixture at that position
}
Solution originally posted here: Cocos2d-iphone forum
Not sure if box2d includes a 'clean' way to do it. I'd just manually iterate over all bodies in the world just before adding a new one, and manually check if their positions + radio/size overlap with the new body shape.
try
b2Vec2 vec = body->GetPosition(); // in meters
or
CGPoint pos = ccp(body->GetPosition().x * PTM_RATIO, body->GetPosition().y * PTM_RATIO); // in pixels
In Android, I have a Path object which I happen to know defines a closed path, and I need to figure out if a given point is contained within the path. What I was hoping for was something along the lines of
path.contains(int x, int y)
but that doesn't seem to exist.
The specific reason I'm looking for this is because I have a collection of shapes on screen defined as paths, and I want to figure out which one the user clicked on. If there is a better way to be approaching this such as using different UI elements rather than doing it "the hard way" myself, I'm open to suggestions.
I'm open to writing an algorithm myself if I have to, but that means different research I guess.
Here is what I did and it seems to work:
RectF rectF = new RectF();
path.computeBounds(rectF, true);
region = new Region();
region.setPath(path, new Region((int) rectF.left, (int) rectF.top, (int) rectF.right, (int) rectF.bottom));
Now you can use the region.contains(x,y) method.
Point point = new Point();
mapView.getProjection().toPixels(geoPoint, point);
if (region.contains(point.x, point.y)) {
// Within the path.
}
** Update on 6/7/2010 **
The region.setPath method will cause my app to crash (no warning message) if the rectF is too large. Here is my solution:
// Get the screen rect. If this intersects with the path's rect
// then lets display this zone. The rectF will become the
// intersection of the two rects. This will decrease the size therefor no more crashes.
Rect drawableRect = new Rect();
mapView.getDrawingRect(drawableRect);
if (rectF.intersects(drawableRect.left, drawableRect.top, drawableRect.right, drawableRect.bottom)) {
// ... Display Zone.
}
The android.graphics.Path class doesn't have such a method. The Canvas class does have a clipping region that can be set to a path, there is no way to test it against a point. You might try Canvas.quickReject, testing against a single point rectangle (or a 1x1 Rect). I don't know if that would really check against the path or just the enclosing rectangle, though.
The Region class clearly only keeps track of the containing rectangle.
You might consider drawing each of your regions into an 8-bit alpha layer Bitmap with each Path filled in it's own 'color' value (make sure anti-aliasing is turned off in your Paint). This creates kind of a mask for each path filled with an index to the path that filled it. Then you could just use the pixel value as an index into your list of paths.
Bitmap lookup = Bitmap.createBitmap(width, height, Bitmap.Config.ALPHA_8);
//do this so that regions outside any path have a default
//path index of 255
lookup.eraseColor(0xFF000000);
Canvas canvas = new Canvas(lookup);
Paint paint = new Paint();
//these are defaults, you only need them if reusing a Paint
paint.setAntiAlias(false);
paint.setStyle(Paint.Style.FILL);
for(int i=0;i<paths.size();i++)
{
paint.setColor(i<<24); // use only alpha value for color 0xXX000000
canvas.drawPath(paths.get(i), paint);
}
Then look up points,
int pathIndex = lookup.getPixel(x, y);
pathIndex >>>= 24;
Be sure to check for 255 (no path) if there are unfilled points.
WebKit's SkiaUtils has a C++ work-around for Randy Findley's bug:
bool SkPathContainsPoint(SkPath* originalPath, const FloatPoint& point, SkPath::FillType ft)
{
SkRegion rgn;
SkRegion clip;
SkPath::FillType originalFillType = originalPath->getFillType();
const SkPath* path = originalPath;
SkPath scaledPath;
int scale = 1;
SkRect bounds = originalPath->getBounds();
// We can immediately return false if the point is outside the bounding rect
if (!bounds.contains(SkFloatToScalar(point.x()), SkFloatToScalar(point.y())))
return false;
originalPath->setFillType(ft);
// Skia has trouble with coordinates close to the max signed 16-bit values
// If we have those, we need to scale.
//
// TODO: remove this code once Skia is patched to work properly with large
// values
const SkScalar kMaxCoordinate = SkIntToScalar(1<<15);
SkScalar biggestCoord = std::max(std::max(std::max(bounds.fRight, bounds.fBottom), -bounds.fLeft), -bounds.fTop);
if (biggestCoord > kMaxCoordinate) {
scale = SkScalarCeil(SkScalarDiv(biggestCoord, kMaxCoordinate));
SkMatrix m;
m.setScale(SkScalarInvert(SkIntToScalar(scale)), SkScalarInvert(SkIntToScalar(scale)));
originalPath->transform(m, &scaledPath);
path = &scaledPath;
}
int x = static_cast<int>(floorf(point.x() / scale));
int y = static_cast<int>(floorf(point.y() / scale));
clip.setRect(x, y, x + 1, y + 1);
bool contains = rgn.setPath(*path, clip);
originalPath->setFillType(originalFillType);
return contains;
}
I know I'm a bit late to the party, but I would solve this problem by thinking about it like determining whether or not a point is in a polygon.
http://en.wikipedia.org/wiki/Point_in_polygon
The math computes more slowly when you're looking at Bezier splines instead of line segments, but drawing a ray from the point still works.
For completeness, I want to make a couple notes here:
As of API 19, there is an intersection operation for Paths. You could create a very small square path around your test point, intersect it with the Path, and see if the result is empty or not.
You can convert Paths to Regions and do a contains() operation. However Regions work in integer coordinates, and I think they use transformed (pixel) coordinates, so you'll have to work with that. I also suspect that the conversion process is computationally intensive.
The edge-crossing algorithm that Hans posted is good and quick, but you have to be very careful for certain corner cases such as when the ray passes directly through a vertex, or intersects a horizontal edge, or when round-off error is a problem, which it always is.
The winding number method is pretty much fool proof, but involves a lot of trig and is computationally expensive.
This paper by Dan Sunday gives a hybrid algorithm that's as accurate as the winding number but as computationally simple as the ray-casting algorithm. It blew me away how elegant it was.
See https://stackoverflow.com/a/33974251/338479 for my code which will do point-in-path calculation for a path consisting of line segments, arcs, and circles.