Take photo when pattern is detected in image with Android OpenCV - android

Hello stackoverflow community I would like if someone can guide me a little regarding my next question, I want to make an application that takes a photo when it detects a sheet with 3 marks (black squares in the corners) similar to what a QR would have. I have read a little about opencv that I think could help me more however I am not very clear yet.
Here my example

Once you obtain your binary image, you can find contours and filter using contour approximation and contour area. If the approximated contour has a length of four then it must be a square and if it is within a lower and upper area range then we have detected a mark. We keep a counter of the mark and if there are three marks in the image, we can take the photo. Here's the visualization of the process.
We Otsu's threshold to obtain a binary image with the objects to detect in white.
From here we find contours using cv2.findContours and filter using contour approximation cv2.approxPolyDP in addition to contour area cv2.contourArea.
Detected marks highlighted in teal
I implemented it in Python but you can adapt the same approach
Code
import cv2
# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find contours and filter using contour approximation and contour area
marks = 0
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.04 * peri, True)
if len(approx) == 4 and area > 250 and area < 400:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (200,255,12), 2)
marks += 1
# Sheet has 3 marks
if marks == 3:
print('Take photo')
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.waitKey()

Related

Improving threshold result for Tesseract

I am kind of stuck with this problem, and I know there are so many questions about it on stack overflow but in my case. Nothing gives the expected result.
The Context:
Am using Android OpenCV along with Tesseract so I can read the MRZ area in the passport. When the camera is started I pass the input frame to an AsyncTask, the frame is processed, the MRZ area is extracted succesfully, I pass the extracted MRZ area to a function prepareForOCR(inputImage) that takes the MRZ area as gray Mat and Will output a bitmap with the thresholded image that I will pass to Tesseract.
The problem:
The problem is while thresholding the Image, I use adaptive thresholding with blockSize = 13 and C = 15, but the result given is not always the same depending on the lighting of the image and the conditions in general from which the frame is taken.
What I have tried:
First I am resizing the image to a specific size (871,108) so the input image is always the same and not dependant on which phone is used.
After resizing, I try with different BlockSize and C values
//toOcr contains the extracted MRZ area
Bitmap toOCRBitmap = Bitmap.createBitmap(bitmap);
Mat inputFrame = new Mat();
Mat toOcr = new Mat();
Utils.bitmapToMat(toOCRBitmap, inputFrame);
Imgproc.cvtColor(inputFrame, inputFrame, Imgproc.COLOR_BGR2GRAY);
TesseractResult lastResult = null;
for (int B = 11; B < 70; B++) {
for (int C = 11; C < 70; C++){
if (IsPrime(B) && IsPrime(C)){
Imgproc.adaptiveThreshold(inputFrame, toOcr, 255, Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C, Imgproc.THRESH_BINARY, B ,C);
Bitmap toOcrBitmap = OpenCVHelper.getBitmap(toOcr);
TesseractResult result = TesseractInstance.extractFrame(toOcrBitmap, "ocrba");
if (result.getMeanConfidence()> 70) {
if (MrzParser.tryParse(result.getText())){
Log.d("Main2Activity", "Best result with " + B + " : " + C);
return result;
}
}
}
}
}
Using the code below, the thresholded result image is a black on white image which gives a confidence greater than 70, I can't really post the whole image for privacy reasons, but here's a clipped one and a dummy password one.
Using the MrzParser.tryParse function which adds checks for the character position and its validity within the MRZ, am able to correct some occurences like a name containing a 8 instead of B, and get a good result but it takes so much time, which is normal because am thresholding almost 255 images in the loop, adding to that the Tesseract call.
I already tried getting a list of C and B values which occurs the most but the results are different.
The question:
Is there a way to define a C and blocksize value so that it s always giving the same result, maybe adding more OpenCV calls so The input image like increasing contrast and so on, I searched the web for 2 weeks now I can't find a viable solution, this is the only one that is giving accurate results
You can use a clustering algorithm to cluster the pixels based on color. The characters are dark and there is a good contrast in the MRZ region, so a clustering method will most probably give you a good segmentation if you apply it to the MRZ region.
Here I demonstrate it with MRZ regions obtained from sample images that can be found on the internet.
I use color images, apply some smoothing, convert to Lab color space, then cluster the a, b channel data using kmeans (k=2). The code is in python but you can easily adapt it to java. Due to the randomized nature of the kmeans algorithm, the segmented characters will have label 0 or 1. You can easily sort it out by inspecting cluster centers. The cluster-center corresponding to characters should have a dark value in the color space you are using.
I just used the Lab color space here. You can use RGB, HSV or even GRAY and see which one is better for you.
After segmenting like this, I think you can even find good values for B and C of your adaptive-threshold using the properties of the stroke width of the characters (if you think the adaptive-threshold gives a better quality output).
import cv2
import numpy as np
im = cv2.imread('mrz1.png')
# convert to Lab
lab = cv2.cvtColor(cv2.GaussianBlur(im, (3, 3), 1), cv2.COLOR_BGR2Lab)
im32f = np.array(im[:, :, 1:3], dtype=np.float32)
k = 2 # 2 clusters
term_crit = (cv2.TERM_CRITERIA_EPS, 30, 0.1)
ret, labels, centers = cv2.kmeans(im32f.reshape([im.shape[0]*im.shape[1], -1]),
k, None, term_crit, 10, 0)
# segmented image
labels = labels.reshape([im.shape[0], im.shape[1]]) * 255
Some results:

How to remove black borders around License Plate using opencv for android app

I want to remove black borders around License Plate. I am using opencv + android.
Please reply with code using which i can remove the borders.
I have also attached the image.image 1
You can perform (DoG) Difference of Gaussians to detect the high frequency details in your image. By high frequency in an image I mean distinct edges and corners.
Here is the code as requested. The explanations are placed as comments by the side:
import cv2
img = cv2.imread('number_plate.jpg') #---Reading the image---
img1 = img.copy() #----The final contour will be drawn on the copy of the original image---
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #---converting to gray scale---
Before performing DoG, I enhanced the gray sale image by applying Adaptive histogram equalization:
clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
enhanced = clahe.apply(gray_img)
cv2.imshow(enhanced_gray_img', enhanced)
Now I performed Gaussian blur using two separate kernels and subtracted the resulting images as follows:
blur1 = cv2.GaussianBlur(enhanced, (15, 15), 0)
blur2 = cv2.GaussianBlur(enhanced, (25, 25), 0)
difference = blur2 - blur1
cv2.imshow('Difference_of_Gaussians', difference)
Then I performed binary threshold on the image above and found contours. I drew the contour having the largest area:
ret, th = cv2.threshold(difference, 127,255, 0) #---performed binary threshold ---
_, contours, hierarchy = cv2.findContours(th, cv2.RETR_EXTERNAL, 1) #---Find contours---
cnts = contours
max = 0 #----Variable to keep track of the largest area----
c = 0 #----Variable to store the contour having largest area---
for i in range(len(contours)):
if (cv2.contourArea(cnts[i]) > max):
max = cv2.contourArea(cnts[i])
c = i
rep = cv2.drawContours(img1, contours[c], -1, (0,255,0), 3) #----Draw the contour having the largest area on the image---
cv2.imshow('Final_Image.jpg', rep)
And voila!!! There you go.
Now you can obtain bounding rectangles for the contours you found and fed those coordinates as regions to the OCR to extract the text present

how to scale a contour's height by a factor?

I am trying to scan a passport page using the phone's camera using OpenCV.
In the above image the contour marked in red is my ROI (will need a top view of that). Performing segmentation I can detect the MRZ area. And the pages should have a fixed aspect ratio. Is there a way to scale the green contour using the aspect ratio to approximate the red one? I have tried finding the corners of the green rect using approxPolyDP, and then scaling that rect and finally doing a perspective warp to get the top view. The problem is that the perspective rotation is not accounted for while doing the rectangular scaling, so the final rect is often wrong.
Often I get an output as marked in the following image
Update: Adding a little more explanation
In regard to the 1st image (assuming the red rect will always have a constant aspect ratio),
My goal: is to crop out the red marked portion and then get a top view
My approach: detect the MRZ/green rect -> now assume the bottom edge of the green rect is the same as the red one (close enough) -> So I got the width and two corners of the rect -> calculate other two corners using the height/aspect ratio
Problem: my above calculation doesn't output the red rect, instead it outputs the green rect in the 2nd image (may be because those quadrilaterals aren't rectangles, angle between edges aren't either 0 or 90 degrees)
As far as I understand your main goal is to get the top view of the passport page when its photo is taken from arbitrary angle.
Also as I understand your approach is the following:
Find MRZ and its wrapping polygon
Extend the MRZ polygon to the top - this would give you the page polygon
Warp perspective to get the top view.
And the main obstacle currently is to extend the polygon.
Please correct me If understood the goal incorrectly.
Extending a polygon is quiet easy from mathematical perspective. Points on each side of the polygon form a side line. If you draw the line further you can put there a new point. Programmatically it may look like this
new_left_top_x = old_left_bottom_x + (old_left_top_x - old_left_bottom_x) * pass_height_to_MRZ_height_ratio
new_left_top_y = old_left_bottom_y + (old_left_top_y - old_left_bottom_y) * pass_height_to_MRZ_height_ratio
The same can be done for the right part. This approach would also work with rotations up to 45 degrees.
However I'm afraid this approach would not give accurate results. I would suggest to detect the passport page itself instead of MRZ. The reason is that the page itself is quiet noticeable object on the photo and can be easily found by findContours function.
I wrote some code to illustrate the idea that detecting MRZ is not really necessary.
import os
import imutils
import numpy as np
import argparse
import cv2
# Thresholds
passport_page_aspect_ratio = 1.44
passport_page_coverage_ratio_threshold = 0.6
morph_size = (4, 4)
def pre_process_image(image):
# Let's get rid of color first
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Then apply Otsu threshold to reveal important areas
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# erode white areas to "disconnect" them
# and dilate back to restore their original shape
morph_struct = cv2.getStructuringElement(cv2.MORPH_RECT, morph_size)
thresh = cv2.erode(thresh, morph_struct, anchor=(-1, -1), iterations=1)
thresh = cv2.dilate(thresh, morph_struct, anchor=(-1, -1), iterations=1)
return thresh
def find_passport_page_polygon(image):
cnts = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for cnt in cnts:
# compute the aspect ratio and coverage ratio of the bounding box
# width to the width of the image
(x, y, w, h) = cv2.boundingRect(cnt)
ar = w / float(h)
cr_width = w / float(image.shape[1])
# check to see if the aspect ratio and coverage width are within thresholds
if ar > passport_page_aspect_ratio and cr_width > passport_page_coverage_ratio_threshold:
# approximate the contour with a polygon with 4 points
epsilon = 0.02 * cv2.arcLength(cnt, True)
approx = cv2.approxPolyDP(cnt, epsilon, True)
return approx
return None
def order_points(pts):
# initialize a list of coordinates that will be ordered in the order:
# top-left, top-right, bottom-right, bottom-left
rect = np.zeros((4, 2), dtype="float32")
pts = pts.reshape(4, 2)
# the top-left point will have the smallest sum, whereas
# the bottom-right point will have the largest sum
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
# now, compute the difference between the points, the
# top-right point will have the smallest difference,
# whereas the bottom-left will have the largest difference
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
def get_passport_top_vew(image, pts):
rect = order_points(pts)
(tl, tr, br, bl) = rect
# compute the height of the new image, which will be the
# maximum distance between the top-right and bottom-right
# y-coordinates or the top-left and bottom-left y-coordinates
height_a = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
height_b = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
max_height = max(int(height_a), int(height_b))
# compute the width using standard passport page aspect ratio
max_width = int(max_height * passport_page_aspect_ratio)
# construct the set of destination points to obtain the top view, specifying points
# in the top-left, top-right, bottom-right, and bottom-left order
dst = np.array([
[0, 0],
[max_width - 1, 0],
[max_width - 1, max_height - 1],
[0, max_height - 1]], dtype="float32")
# compute the perspective transform matrix and apply it
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (max_width, max_height))
return warped
if __name__ == "__main__":
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to images directory")
args = vars(ap.parse_args())
in_file = args["image"]
filename_base = in_file.replace(os.path.splitext(in_file)[1], "")
img = cv2.imread(in_file)
pre_processed = pre_process_image(img)
# Visualizing pre-processed image
cv2.imwrite(filename_base + ".pre.png", pre_processed)
page_polygon = find_passport_page_polygon(pre_processed)
if page_polygon is not None:
# Visualizing found page polygon
vis = img.copy()
cv2.polylines(vis, [page_polygon], True, (0, 255, 0), 2)
cv2.imwrite(filename_base + ".bounds.png", vis)
# Visualizing the warped top view of the passport page
top_view_page = get_passport_top_vew(img, page_polygon)
cv2.imwrite(filename_base + ".top.png", top_view_page)
The results I got:
For better result it would be also good to compensate the camera aperture distortion.

Detect black ink blob on paper - Opencv Android

I'm new to openCV, I've been getting into the samples provided for Android.
My goals is to detect color-blobs so I started with color-blob-detection sample.
I'm converting color image to grayscale and then thresholding using a binary threshold.
The background is white, blobs are black. I want to detect those black blobs. Also, I would like to draw their contour in color but I'm not able to do it because image is black and white.
I've managed to accomplish this in grayscale but I don't prefer how the contours are drawn, it's like color tolerance is too high and the contour is bigger than the actual blob (maybe blobs are too small?). I guess this 'tolerance' I talk about has something to do with setHsvColor but I don't quite understand that method.
Thanks in advance! Best Regards
UPDATE MORE INFO
The image I want to track is of ink splits. Imagine a white piece of paper with black ink splits. Right now I'm doing it in real-time (camera view). The actual app would take a picture and analyse that picture.
As I said above, I took color-blob-detection sample (android) from openCV GitHub repo. And I add this code in the onCameraFrame method (in order to convert it to black and white in real-time) The convertion is made so I don't mind if ink is black, blue, red:
mRgba = inputFrame.rgba();
/**************************************************************************/
/** BLACK AND WHITE **/
// Convert to Grey
Imgproc.cvtColor(inputFrame.gray(), mRgba, Imgproc.COLOR_GRAY2RGBA, 4);
Mat blackAndWhiteMat = new Mat ( H, W, CvType.CV_8U, new Scalar(1));
double umbral = 100.0;
Imgproc.threshold(mRgba, blackAndWhiteMat , umbral, 255, Imgproc.THRESH_BINARY);
// convert back to bitmap for displaying
Bitmap resultBitmap = Bitmap.createBitmap(mRgba.cols(), mRgba.rows(), Bitmap.Config.ARGB_8888);
blackAndWhiteMat.convertTo(blackAndWhiteMat, CvType.CV_8UC1);
Utils.matToBitmap(blackAndWhiteMat, resultBitmap);
/**************************************************************************/
This may not be the best way but it works.
Now I want to detect black blobs (ink splits). I guess they are detected because the Logcat (log entry of sample app) throws the number of contours detected, but I'm not able to see them because the image is black and white and I want the contour to be red, for example.
Here's an example image:-
And here is what I get using RGB (color-blob-detection as is, not black and white image). Notice how small blobs are not detected. (Is it possible to detect them? or are they too small?)
Thanks for your help! If you need more info I would gladly update this question
UPDATE: GitHub repo of color-blob-detection sample (second image)
GitHub Repo of openCV sample for Android
The solution is based on a combination of adaptive Image thresholding and use of the connected-component algorithm.
Assumption - The paper is the most lit area of the image whereas the ink spots on the paper are darkest regions.
from random import Random
import numpy as np
import cv2
def random_color(random):
"""
Return a random color
"""
icolor = random.randint(0, 0xFFFFFF)
return [icolor & 0xff, (icolor >> 8) & 0xff, (icolor >> 16) & 0xff]
#Read as Grayscale
img = cv2.imread('1-input.jpg', 0)
cimg = cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)
# Gaussian to remove noisy region, comment to see its affect.
img = cv2.medianBlur(img,5)
#Find average intensity to distinguish paper region
avgPixelIntensity = cv2.mean( img )
print "Average intensity of image: ", avgPixelIntensity[0]
# Generate mask to distinguish paper region
#0.8 - used to ignore ill-illuminated region of paper
mask = cv2.inRange(img, avgPixelIntensity[0]*0.8, 255)
mask = 255 - mask
cv2.imwrite('2-maskedImg.jpg', mask)
#Approach 1
# You need to choose 4 or 8 for connectivity type(border pixels)
connectivity = 8
# Perform the operation
output = cv2.connectedComponentsWithStats(mask, connectivity, cv2.CV_8U)
# The first cell is the number of labels
num_labels = output[0]
# The second cell is the label matrix
labels = output[1]
# The third cell is the stat matrix
stats = output[2]
# The fourth cell is the centroid matrix
centroids = output[3]
cv2.imwrite("3-connectedcomponent.jpg", labels)
print "Number of labels", num_labels, labels
# create the random number
random = Random()
for i in range(1, num_labels):
print stats[i, cv2.CC_STAT_LEFT], stats[i, cv2.CC_STAT_TOP], stats[i, cv2.CC_STAT_WIDTH], stats[i, cv2.CC_STAT_HEIGHT]
cv2.rectangle(cimg, (stats[i, cv2.CC_STAT_LEFT], stats[i, cv2.CC_STAT_TOP]),
(stats[i, cv2.CC_STAT_LEFT] + stats[i, cv2.CC_STAT_WIDTH], stats[i, cv2.CC_STAT_TOP] + stats[i, cv2.CC_STAT_HEIGHT]), random_color(random), 2)
cv2.imwrite("4-OutputImage.jpg", cimg)
The Input Image
Masked Image from thresholding and invert operation.
Use of connected component.
Overlaying output of connected component on input image.

Recognition of handwritten circles, diamonds and rectangles

I looking for some advices about recognition of three handwritten shapes - circles, diamonds and rectangles. I tried diffrent aproaches but they failed so maybe you could point me in another, better direction.
What I tried:
1) Simple algorithm based on dot product between points of handwritten shape and ideal shape. It works not so bad at recognition of rectangle, but failed on circles and diamonds. The problem is that dot product of the circle and diamond is quite similiar even for ideal shapes.
2) Same aproach but using Dynamic Time Warping as measure of simililarity. Similiar problems.
3) Neural networks. I tried few aproaches - giving points data to neural networks (Feedforward and Kohonen) or giving rasterized image. For Kohonen it allways classified all the data (event the sample used to train) into the same category. Feedforward with points was better (but on the same level as aproach 1 and 2) and with rasterized image it was very slow (I needs at least size^2 input neurons and for small sized of raster circle is indistinguishable even for me ;) ) and also without success. I think is because all of this shapes are closed figures? I am not big specialist of ANN (had 1 semester course of them) so maybe I am using them wrong?
4) Saving the shape as Freeman Chain Code and using some algorithms for computing similarity. I though that in FCC the shapes will be realy diffrent from each other. No success here (but I havent explorer this path very deeply).
I am building app for Android with this but I think the language is irrelevant here.
Here's some working code for a shape classifier. http://jsfiddle.net/R3ns3/ I pulled the threshold numbers (*Threshold variables in the code) out of the ether, so of course they can be tweaked for better results.
I use the bounding box, average point in a sub-section, angle between points, polar angle from bounding box center, and corner recognition. It can classify hand drawn rectangles, diamonds, and circles. The code records points while the mouse button is down and tries to classify when you stop drawing.
HTML
<canvas id="draw" width="300" height="300" style="position:absolute; top:0px; left:0p; margin:0; padding:0; width:300px; height:300px; border:2px solid blue;"></canvas>
JS
var state = {
width: 300,
height: 300,
pointRadius: 2,
cornerThreshold: 125,
circleThreshold: 145,
rectangleThreshold: 45,
diamondThreshold: 135,
canvas: document.getElementById("draw"),
ctx: document.getElementById("draw").getContext("2d"),
drawing: false,
points: [],
getCorners: function(angles, pts) {
var list = pts || this.points;
var corners = [];
for(var i=0; i<angles.length; i++) {
if(angles[i] <= this.cornerThreshold) {
corners.push(list[(i + 1) % list.length]);
}
}
return corners;
},
draw: function(color, pts) {
var list = pts||this.points;
this.ctx.fillStyle = color;
for(var i=0; i<list.length; i++) {
this.ctx.beginPath();
this.ctx.arc(list[i].x, list[i].y, this.pointRadius, 0, Math.PI * 2, false);
this.ctx.fill();
}
},
classify: function() {
// get bounding box
var left = this.width, right = 0,
top = this.height, bottom = 0;
for(var i=0; i<this.points.length; i++) {
var pt = this.points[i];
if(left > pt.x) left = pt.x;
if(right < pt.x) right = pt.x;
if(top > pt.y) top = pt.y;
if(bottom < pt.y) bottom = pt.y;
}
var center = {x: (left+right)/2, y: (top+bottom)/2};
this.draw("#00f", [
{x: left, y: top},
{x: right, y: top},
{x: left, y: bottom},
{x: right, y: bottom},
]);
// find average point in each sector (9 sectors)
var sects = [
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0},
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0},
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0}
];
var x3 = (right + (1/(right-left)) - left) / 3;
var y3 = (bottom + (1/(bottom-top)) - top) / 3;
for(var i=0; i<this.points.length; i++) {
var pt = this.points[i];
var sx = Math.floor((pt.x - left) / x3);
var sy = Math.floor((pt.y - top) / y3);
var idx = sy * 3 + sx;
sects[idx].x += pt.x;
sects[idx].y += pt.y;
sects[idx].c ++;
if(sx == 1 && sy == 1) {
return "UNKNOWN";
}
}
// get the significant points (clockwise)
var sigPts = [];
var clk = [0, 1, 2, 5, 8, 7, 6, 3]
for(var i=0; i<clk.length; i++) {
var pt = sects[clk[i]];
if(pt.c > 0) {
sigPts.push({x: pt.x / pt.c, y: pt.y / pt.c});
} else {
return "UNKNOWN";
}
}
this.draw("#0f0", sigPts);
// find angle between consecutive 3 points
var angles = [];
for(var i=0; i<sigPts.length; i++) {
var a = sigPts[i],
b = sigPts[(i + 1) % sigPts.length],
c = sigPts[(i + 2) % sigPts.length],
ab = Math.sqrt(Math.pow(b.x-a.x,2)+Math.pow(b.y-a.y,2)),
bc = Math.sqrt(Math.pow(b.x-c.x,2)+ Math.pow(b.y-c.y,2)),
ac = Math.sqrt(Math.pow(c.x-a.x,2)+ Math.pow(c.y-a.y,2)),
deg = Math.floor(Math.acos((bc*bc+ab*ab-ac*ac)/(2*bc*ab)) * 180 / Math.PI);
angles.push(deg);
}
console.log(angles);
var corners = this.getCorners(angles, sigPts);
// get polar angle of corners
for(var i=0; i<corners.length; i++) {
corners[i].t = Math.floor(Math.atan2(corners[i].y - center.y, corners[i].x - center.x) * 180 / Math.PI);
}
console.log(corners);
// whats the shape ?
if(corners.length <= 1) { // circle
return "CIRCLE";
} else if(corners.length == 2) { // circle || diamond
// difference of polar angles
var diff = Math.abs((corners[0].t - corners[1].t + 180) % 360 - 180);
console.log(diff);
if(diff <= this.circleThreshold) {
return "CIRCLE";
} else {
return "DIAMOND";
}
} else if(corners.length == 4) { // rectangle || diamond
// sum of polar angles of corners
var sum = Math.abs(corners[0].t + corners[1].t + corners[2].t + corners[3].t);
console.log(sum);
if(sum <= this.rectangleThreshold) {
return "RECTANGLE";
} else if(sum >= this.diamondThreshold) {
return "DIAMOND";
} else {
return "UNKNOWN";
}
} else {
alert("draw neater please");
return "UNKNOWN";
}
}
};
state.canvas.addEventListener("mousedown", (function(e) {
if(!this.drawing) {
this.ctx.clearRect(0, 0, 300, 300);
this.points = [];
this.drawing = true;
console.log("drawing start");
}
}).bind(state), false);
state.canvas.addEventListener("mouseup", (function(e) {
this.drawing = false;
console.log("drawing stop");
this.draw("#f00");
alert(this.classify());
}).bind(state), false);
state.canvas.addEventListener("mousemove", (function(e) {
if(this.drawing) {
var x = e.pageX, y = e.pageY;
this.points.push({"x": x, "y": y});
this.ctx.fillStyle = "#000";
this.ctx.fillRect(x-2, y-2, 4, 4);
}
}).bind(state), false);
Given the possible variation in handwritten inputs I would suggest that a neural network approach is the way to go; you will find it difficult or impossible to accurately model these classes by hand. LastCoder's attempt works to a degree, but it does not cope with much variation or have promise for high accuracy if worked on further - this kind of hand-engineered approach was abandoned a very long time ago.
State-of-the-art results in handwritten character classification these days is typically achieved with convolutional neural networks (CNNs). Given that you have only 3 classes the problem should be easier than digit or character classification, although from experience with the MNIST handwritten digit dataset, I expect that your circles, squares and diamonds may occasionally end up being difficult for even humans to distinguish.
So, if it were up to me I would use a CNN. I would input binary images taken from the drawing area to the first layer of the network. These may require some preprocessing. If the drawn shapes cover a very small area of the input space you may benefit from bulking them up (i.e. increasing line thickness) so as to make the shapes more invariant to small differences. It may also be beneficial to centre the shape in the image, although the pooling step might alleviate the need for this.
I would also point out that the more training data the better. One is often faced with a trade-off between increasing the size of one's dataset and improving one's model. Synthesising more examples (e.g. by skewing, rotating, shifting, stretching, etc) or spending a few hours drawing shapes may provide more of a benefit than you could get in the same time attempting to improve your model.
Good luck with your app!
A linear Hough transform of the square or the diamond ought to be easy to recognize. They will both produce four point masses. The square's will be in pairs at zero and 90 degrees with the same y-coordinates for both pairs; in other words, a rectangle. The diamond will be at two other angles corresponding to how skinny the diamond is, e.g. 45 and 135 or else 60 and 120.
For the circle you need a circular Hough transform, and it will produce a single bright point cluster in 3d (x,y,r) Hough space.
Both linear and circular Hough transforms are implemented in OpenCV, and it's possible to run OpenCV on Android. These implementations include thresholding to identify lines and circles. See pg. 329 and pg. 331 of the documentation here.
If you are not familiar with Hough transforms, the Wikipedia page is not bad.
Another algorithm you may find interesting and perhaps useful is given in this paper about polygon similarity. I implemented it many years ago, and it's still around here. If you can convert the figures to loops of vectors, this algorithm could compare them against patterns, and the similarity metric would show goodness of match. The algorithm ignores rotational orientation, so if your definition of square and diamond is with respect to the axes of the drawing surface, you will have to modify the algorithm a bit to differentiate these cases.
What you have here is a fairly standard clasification task, in an arguably vision domain.
You could do this several ways, but the best way isn't known, and can sometimes depend on fine details of the problem.
So, this isn't an answer, per se, but there is a website - Kaggle.com that runs competition for classifications. One of the sample/experiemental tasks they list is reading single hand written numeric digits. That is close enough to this problem, that the same methods are almost certainly going to apply fairly well.
I suggest you go to https://www.kaggle.com/c/digit-recognizer and look around.
But if that is too vague, I can tell you from my reading of it, and playing with that problem space, that Random Forests are a better basic starting place than Neural networks.
In this case (your 3 simple objects) you could try RanSaC-fitting for ellipse (getting the circle) and lines (getting the sides of the rectangle or diamond) - on each connected object if there are several objects to classify at the same time. Based on the actual setting (expected size, etc.) the RanSaC-parameters (how close must a point be to count as voter, how many voters you need at minimun) must be tuned. When you have found a line with RanSaC-fitting, remove the points "close" to it and go for the next line. The angles of the lines should make a distinction between diamand and rectangle easy.
A very simple approach optimized for classifying exactly these 3 objects could be the following:
compute the center of gravity of an object to classify
then compute the distances of the center to the object points as a function on the angle (from 0 to 2 pi).
classify the resulting graph based on the smoothness and/or variance and the position and height of the local maxima and minima (maybe after smoothing the graph).
I propose a way to do it in following steps : -
Take convex hull of the image (consider the shapes being convex)
divide into segments using clustering algorithms
Try to fit a curves or straight line to it and measure & threshold using training set which can be used for classifications
For your application try to divide into 4 clusters .
once you classify clusters as line or curves you can use the info to derive whether curve is circle,rectangle or diamond
I think the answers that are already in place are good, but perhaps a better way of thinking about it is that you should try to break the problem into meaningful pieces.
If possible avoid the problem entirely. For instance if you are recognizing gestures, just analyze the gestures in real time. With gestures you can provide feedback to the user as to how your program interpreted their gesture and the user will change what they are doing appropriately.
Clean up the image in question. Before you do anything come up with an algorithm to try to select what the correct thing is you are trying to analyze. Also use an appropriate filter (convolution perhaps) to remove image artifacts before you begin the process.
Once you have figured out what the thing is you are going to analyze then analyze it and return a score, one for circle, one for noise, one for line, and the last for pyramid.
Repeat this step with the next viable candidate until you come up with the best candidate that is not noise.
I suspect you will find that you don't need a complicated algorithm to find circle, line, pyramid but that it is more so about structuring your code appropriately.
If I was you I'll use already available Image Processing libraries like "AForge".
Take A look at this sample article:
http://www.aforgenet.com/articles/shape_checker
I have a jar on github that can help if you are willing to unpack it and obey the apache license. You can try to recreate it in any other language as well.
Its an edge detector. The best step from there could be to:
find the corners (median of 90 degrees)
find mean median and maximum radius
find skew/angle from horizontal
have a decision agent decide what the shape is
Play around with it and find what you want.
My jar is open to the public at this address. It is not yet production ready but can help.
Just thought I could help. If anyone wants to be a part of the project, please do.
I did this recently with identifying circles (bone centers) in medical images.
Note: Steps 1-2 are if you are grabbing from an image.
Psuedo Code Steps
Step 1. Highlight the Edges
edges = edge_map(of the source image) (using edge detector(s))
(laymens: show the lines/edges--make them searchable)
Step 2. Trace each unique edge
I would (use a nearest neighbor search 9x9 or 25x25) to identify / follow / trace each edge, collecting each point into the list (they become neighbors), and taking note of the gradient at each point.
This step produces: a set of edges.
(where one edge/curve/line = list of [point_gradient_data_structure]s
(laymens: Collect a set of points along the edge in the image)
Step 3. Analyze Each Edge('s points and gradient data)
For each edge,
if the gradient similar for a given region/set of neighbors (a run of points along an edge), then we have a straight line.
If the gradient is changing gradually, we have a curve.
Each region/run of points that is a straight line or a curve, has a mean (center) and other gradient statistics.
Step 4. Detect Objects
We can use the summary information from Step 3 to build conclusions about diamonds, circles, or squares. (i.e. 4 straight lines, that have end points near each other with proper gradients is a diamond or square. One (or more) curves with sufficient points/gradients (with a common focal point) makes a complete circle).
Note: Using an image pyramid can improve algorithm performance, both in terms of results and speed.
This technique (Steps 1-4) would get the job done for well defined shapes, and also could detect shapes that are drawn less than perfectly, and could handle slightly disconnected lines (if needed).
Note: With some machine learning techniques (mentioned by other posters), it could be helpful/important to have good "classifiers" to basically break the problem down into smaller parts/components, so then a decider further down the chain could use to better understand/"see" the objects. I think machine learning might be a little heavy-handed for this question, but still could produce reasonable results. PCA(face detection) could potentially work too.

Categories

Resources