how to scale a contour's height by a factor? - android

I am trying to scan a passport page using the phone's camera using OpenCV.
In the above image the contour marked in red is my ROI (will need a top view of that). Performing segmentation I can detect the MRZ area. And the pages should have a fixed aspect ratio. Is there a way to scale the green contour using the aspect ratio to approximate the red one? I have tried finding the corners of the green rect using approxPolyDP, and then scaling that rect and finally doing a perspective warp to get the top view. The problem is that the perspective rotation is not accounted for while doing the rectangular scaling, so the final rect is often wrong.
Often I get an output as marked in the following image
Update: Adding a little more explanation
In regard to the 1st image (assuming the red rect will always have a constant aspect ratio),
My goal: is to crop out the red marked portion and then get a top view
My approach: detect the MRZ/green rect -> now assume the bottom edge of the green rect is the same as the red one (close enough) -> So I got the width and two corners of the rect -> calculate other two corners using the height/aspect ratio
Problem: my above calculation doesn't output the red rect, instead it outputs the green rect in the 2nd image (may be because those quadrilaterals aren't rectangles, angle between edges aren't either 0 or 90 degrees)

As far as I understand your main goal is to get the top view of the passport page when its photo is taken from arbitrary angle.
Also as I understand your approach is the following:
Find MRZ and its wrapping polygon
Extend the MRZ polygon to the top - this would give you the page polygon
Warp perspective to get the top view.
And the main obstacle currently is to extend the polygon.
Please correct me If understood the goal incorrectly.
Extending a polygon is quiet easy from mathematical perspective. Points on each side of the polygon form a side line. If you draw the line further you can put there a new point. Programmatically it may look like this
new_left_top_x = old_left_bottom_x + (old_left_top_x - old_left_bottom_x) * pass_height_to_MRZ_height_ratio
new_left_top_y = old_left_bottom_y + (old_left_top_y - old_left_bottom_y) * pass_height_to_MRZ_height_ratio
The same can be done for the right part. This approach would also work with rotations up to 45 degrees.
However I'm afraid this approach would not give accurate results. I would suggest to detect the passport page itself instead of MRZ. The reason is that the page itself is quiet noticeable object on the photo and can be easily found by findContours function.
I wrote some code to illustrate the idea that detecting MRZ is not really necessary.
import os
import imutils
import numpy as np
import argparse
import cv2
# Thresholds
passport_page_aspect_ratio = 1.44
passport_page_coverage_ratio_threshold = 0.6
morph_size = (4, 4)
def pre_process_image(image):
# Let's get rid of color first
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Then apply Otsu threshold to reveal important areas
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# erode white areas to "disconnect" them
# and dilate back to restore their original shape
morph_struct = cv2.getStructuringElement(cv2.MORPH_RECT, morph_size)
thresh = cv2.erode(thresh, morph_struct, anchor=(-1, -1), iterations=1)
thresh = cv2.dilate(thresh, morph_struct, anchor=(-1, -1), iterations=1)
return thresh
def find_passport_page_polygon(image):
cnts = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for cnt in cnts:
# compute the aspect ratio and coverage ratio of the bounding box
# width to the width of the image
(x, y, w, h) = cv2.boundingRect(cnt)
ar = w / float(h)
cr_width = w / float(image.shape[1])
# check to see if the aspect ratio and coverage width are within thresholds
if ar > passport_page_aspect_ratio and cr_width > passport_page_coverage_ratio_threshold:
# approximate the contour with a polygon with 4 points
epsilon = 0.02 * cv2.arcLength(cnt, True)
approx = cv2.approxPolyDP(cnt, epsilon, True)
return approx
return None
def order_points(pts):
# initialize a list of coordinates that will be ordered in the order:
# top-left, top-right, bottom-right, bottom-left
rect = np.zeros((4, 2), dtype="float32")
pts = pts.reshape(4, 2)
# the top-left point will have the smallest sum, whereas
# the bottom-right point will have the largest sum
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
# now, compute the difference between the points, the
# top-right point will have the smallest difference,
# whereas the bottom-left will have the largest difference
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
def get_passport_top_vew(image, pts):
rect = order_points(pts)
(tl, tr, br, bl) = rect
# compute the height of the new image, which will be the
# maximum distance between the top-right and bottom-right
# y-coordinates or the top-left and bottom-left y-coordinates
height_a = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
height_b = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
max_height = max(int(height_a), int(height_b))
# compute the width using standard passport page aspect ratio
max_width = int(max_height * passport_page_aspect_ratio)
# construct the set of destination points to obtain the top view, specifying points
# in the top-left, top-right, bottom-right, and bottom-left order
dst = np.array([
[0, 0],
[max_width - 1, 0],
[max_width - 1, max_height - 1],
[0, max_height - 1]], dtype="float32")
# compute the perspective transform matrix and apply it
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (max_width, max_height))
return warped
if __name__ == "__main__":
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to images directory")
args = vars(ap.parse_args())
in_file = args["image"]
filename_base = in_file.replace(os.path.splitext(in_file)[1], "")
img = cv2.imread(in_file)
pre_processed = pre_process_image(img)
# Visualizing pre-processed image
cv2.imwrite(filename_base + ".pre.png", pre_processed)
page_polygon = find_passport_page_polygon(pre_processed)
if page_polygon is not None:
# Visualizing found page polygon
vis = img.copy()
cv2.polylines(vis, [page_polygon], True, (0, 255, 0), 2)
cv2.imwrite(filename_base + ".bounds.png", vis)
# Visualizing the warped top view of the passport page
top_view_page = get_passport_top_vew(img, page_polygon)
cv2.imwrite(filename_base + ".top.png", top_view_page)
The results I got:
For better result it would be also good to compensate the camera aperture distortion.

Related

Take photo when pattern is detected in image with Android OpenCV

Hello stackoverflow community I would like if someone can guide me a little regarding my next question, I want to make an application that takes a photo when it detects a sheet with 3 marks (black squares in the corners) similar to what a QR would have. I have read a little about opencv that I think could help me more however I am not very clear yet.
Here my example
Once you obtain your binary image, you can find contours and filter using contour approximation and contour area. If the approximated contour has a length of four then it must be a square and if it is within a lower and upper area range then we have detected a mark. We keep a counter of the mark and if there are three marks in the image, we can take the photo. Here's the visualization of the process.
We Otsu's threshold to obtain a binary image with the objects to detect in white.
From here we find contours using cv2.findContours and filter using contour approximation cv2.approxPolyDP in addition to contour area cv2.contourArea.
Detected marks highlighted in teal
I implemented it in Python but you can adapt the same approach
Code
import cv2
# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find contours and filter using contour approximation and contour area
marks = 0
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.04 * peri, True)
if len(approx) == 4 and area > 250 and area < 400:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (200,255,12), 2)
marks += 1
# Sheet has 3 marks
if marks == 3:
print('Take photo')
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.waitKey()

How to remove black borders around License Plate using opencv for android app

I want to remove black borders around License Plate. I am using opencv + android.
Please reply with code using which i can remove the borders.
I have also attached the image.image 1
You can perform (DoG) Difference of Gaussians to detect the high frequency details in your image. By high frequency in an image I mean distinct edges and corners.
Here is the code as requested. The explanations are placed as comments by the side:
import cv2
img = cv2.imread('number_plate.jpg') #---Reading the image---
img1 = img.copy() #----The final contour will be drawn on the copy of the original image---
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #---converting to gray scale---
Before performing DoG, I enhanced the gray sale image by applying Adaptive histogram equalization:
clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
enhanced = clahe.apply(gray_img)
cv2.imshow(enhanced_gray_img', enhanced)
Now I performed Gaussian blur using two separate kernels and subtracted the resulting images as follows:
blur1 = cv2.GaussianBlur(enhanced, (15, 15), 0)
blur2 = cv2.GaussianBlur(enhanced, (25, 25), 0)
difference = blur2 - blur1
cv2.imshow('Difference_of_Gaussians', difference)
Then I performed binary threshold on the image above and found contours. I drew the contour having the largest area:
ret, th = cv2.threshold(difference, 127,255, 0) #---performed binary threshold ---
_, contours, hierarchy = cv2.findContours(th, cv2.RETR_EXTERNAL, 1) #---Find contours---
cnts = contours
max = 0 #----Variable to keep track of the largest area----
c = 0 #----Variable to store the contour having largest area---
for i in range(len(contours)):
if (cv2.contourArea(cnts[i]) > max):
max = cv2.contourArea(cnts[i])
c = i
rep = cv2.drawContours(img1, contours[c], -1, (0,255,0), 3) #----Draw the contour having the largest area on the image---
cv2.imshow('Final_Image.jpg', rep)
And voila!!! There you go.
Now you can obtain bounding rectangles for the contours you found and fed those coordinates as regions to the OCR to extract the text present

Detect black ink blob on paper - Opencv Android

I'm new to openCV, I've been getting into the samples provided for Android.
My goals is to detect color-blobs so I started with color-blob-detection sample.
I'm converting color image to grayscale and then thresholding using a binary threshold.
The background is white, blobs are black. I want to detect those black blobs. Also, I would like to draw their contour in color but I'm not able to do it because image is black and white.
I've managed to accomplish this in grayscale but I don't prefer how the contours are drawn, it's like color tolerance is too high and the contour is bigger than the actual blob (maybe blobs are too small?). I guess this 'tolerance' I talk about has something to do with setHsvColor but I don't quite understand that method.
Thanks in advance! Best Regards
UPDATE MORE INFO
The image I want to track is of ink splits. Imagine a white piece of paper with black ink splits. Right now I'm doing it in real-time (camera view). The actual app would take a picture and analyse that picture.
As I said above, I took color-blob-detection sample (android) from openCV GitHub repo. And I add this code in the onCameraFrame method (in order to convert it to black and white in real-time) The convertion is made so I don't mind if ink is black, blue, red:
mRgba = inputFrame.rgba();
/**************************************************************************/
/** BLACK AND WHITE **/
// Convert to Grey
Imgproc.cvtColor(inputFrame.gray(), mRgba, Imgproc.COLOR_GRAY2RGBA, 4);
Mat blackAndWhiteMat = new Mat ( H, W, CvType.CV_8U, new Scalar(1));
double umbral = 100.0;
Imgproc.threshold(mRgba, blackAndWhiteMat , umbral, 255, Imgproc.THRESH_BINARY);
// convert back to bitmap for displaying
Bitmap resultBitmap = Bitmap.createBitmap(mRgba.cols(), mRgba.rows(), Bitmap.Config.ARGB_8888);
blackAndWhiteMat.convertTo(blackAndWhiteMat, CvType.CV_8UC1);
Utils.matToBitmap(blackAndWhiteMat, resultBitmap);
/**************************************************************************/
This may not be the best way but it works.
Now I want to detect black blobs (ink splits). I guess they are detected because the Logcat (log entry of sample app) throws the number of contours detected, but I'm not able to see them because the image is black and white and I want the contour to be red, for example.
Here's an example image:-
And here is what I get using RGB (color-blob-detection as is, not black and white image). Notice how small blobs are not detected. (Is it possible to detect them? or are they too small?)
Thanks for your help! If you need more info I would gladly update this question
UPDATE: GitHub repo of color-blob-detection sample (second image)
GitHub Repo of openCV sample for Android
The solution is based on a combination of adaptive Image thresholding and use of the connected-component algorithm.
Assumption - The paper is the most lit area of the image whereas the ink spots on the paper are darkest regions.
from random import Random
import numpy as np
import cv2
def random_color(random):
"""
Return a random color
"""
icolor = random.randint(0, 0xFFFFFF)
return [icolor & 0xff, (icolor >> 8) & 0xff, (icolor >> 16) & 0xff]
#Read as Grayscale
img = cv2.imread('1-input.jpg', 0)
cimg = cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)
# Gaussian to remove noisy region, comment to see its affect.
img = cv2.medianBlur(img,5)
#Find average intensity to distinguish paper region
avgPixelIntensity = cv2.mean( img )
print "Average intensity of image: ", avgPixelIntensity[0]
# Generate mask to distinguish paper region
#0.8 - used to ignore ill-illuminated region of paper
mask = cv2.inRange(img, avgPixelIntensity[0]*0.8, 255)
mask = 255 - mask
cv2.imwrite('2-maskedImg.jpg', mask)
#Approach 1
# You need to choose 4 or 8 for connectivity type(border pixels)
connectivity = 8
# Perform the operation
output = cv2.connectedComponentsWithStats(mask, connectivity, cv2.CV_8U)
# The first cell is the number of labels
num_labels = output[0]
# The second cell is the label matrix
labels = output[1]
# The third cell is the stat matrix
stats = output[2]
# The fourth cell is the centroid matrix
centroids = output[3]
cv2.imwrite("3-connectedcomponent.jpg", labels)
print "Number of labels", num_labels, labels
# create the random number
random = Random()
for i in range(1, num_labels):
print stats[i, cv2.CC_STAT_LEFT], stats[i, cv2.CC_STAT_TOP], stats[i, cv2.CC_STAT_WIDTH], stats[i, cv2.CC_STAT_HEIGHT]
cv2.rectangle(cimg, (stats[i, cv2.CC_STAT_LEFT], stats[i, cv2.CC_STAT_TOP]),
(stats[i, cv2.CC_STAT_LEFT] + stats[i, cv2.CC_STAT_WIDTH], stats[i, cv2.CC_STAT_TOP] + stats[i, cv2.CC_STAT_HEIGHT]), random_color(random), 2)
cv2.imwrite("4-OutputImage.jpg", cimg)
The Input Image
Masked Image from thresholding and invert operation.
Use of connected component.
Overlaying output of connected component on input image.

Is there a way to translate a matrix to an alternate scale?

Let's say I start with an Bitmap that's 1000px x 1000px
I load it into a SurfaceView resident Canvas that displayed at some arbitrary (and depending on the device different) dimensions.
I can get those dimensions at runtime and measure the scale between them and the original (if I need that information at the end).
Then I allow the user to pinch/zoom/translate the image around the displayed canvas. All the while I have a Matrix which keeps track of, and is used to re-draw the image in its displayed screen region.
Subsequently this Matrix's values all apply to the scaled space (and not the original 1000x1000 graphic).
So far so good - I have all this working.
However, when all is said and done, I'd like to apply this Matrix to the original Bitmap and save it out. However, I'm at a loss as to how to modify all its internal values to apply it back to the unscaled original (1000x1000) size.
Curious if there's some auto-magical way to translate these or if I have to somehow apply each value based on the scale between the two sizes back to a new Matrix.
To invert a matrix
matrix{ a,b,c,d,e,f } and the inverse is matrix { ia, ib, ic, id, ie, if }
var cross = a * d - b * c;
ia = d / cross;
ib = -b / cross;
ic = -c / cross;
id = a / cross;
ie = (c * f - d * e) / cross;
if = -(a * f - b * e) / cross;
Reverse the transform, the original from image coordinates to screen coordinates and the inverse matrix transforms screen coordinates to image coordinates.
If you have a transform on the screen and you want to know where on the image the top left of the screen is. Get the inverse transform and apply it to the screen coordinate (0,0) top left.
scrX = ?
scrY = ?
imageX = scrX * ia + scrY * ic + ie;
imageY = scrX * ib + scrY * id + if;

How to tell what part of a texture on a 3d cube was touched [duplicate]

I have a renderer using directx and openGL, and a 3d scene. The viewport and the window are of the same dimensions.
How do I implement picking given mouse coordinates x and y in a platform independent way?
If you can, do the picking on the CPU by calculating a ray from the eye through the mouse pointer and intersect it with your models.
If this isn't an option I would go with some type of ID rendering. Assign each object you want to pick a unique color, render the objects with these colors and finally read out the color from the framebuffer under the mouse pointer.
EDIT: If the question is how to construct the ray from the mouse coordinates you need the following: a projection matrix P and the camera transform C. If the coordinates of the mouse pointer is (x, y) and the size of the viewport is (width, height) one position in clip space along the ray is:
mouse_clip = [
float(x) * 2 / float(width) - 1,
1 - float(y) * 2 / float(height),
0,
1]
(Notice that I flipped the y-axis since often the origin of the mouse coordinates are in the upper left corner)
The following is also true:
mouse_clip = P * C * mouse_worldspace
Which gives:
mouse_worldspace = inverse(C) * inverse(P) * mouse_clip
We now have:
p = C.position(); //origin of camera in worldspace
n = normalize(mouse_worldspace - p); //unit vector from p through mouse pos in worldspace
Here's the viewing frustum:
First you need to determine where on the nearplane the mouse click happened:
rescale the window coordinates (0..640,0..480) to [-1,1], with (-1,-1) at the bottom-left corner and (1,1) at the top-right.
'undo' the projection by multiplying the scaled coordinates by what I call the 'unview' matrix: unview = (P * M).inverse() = M.inverse() * P.inverse(), where M is the ModelView matrix and P is the projection matrix.
Then determine where the camera is in worldspace, and draw a ray starting at the camera and passing through the point you found on the nearplane.
The camera is at M.inverse().col(4), i.e. the final column of the inverse ModelView matrix.
Final pseudocode:
normalised_x = 2 * mouse_x / win_width - 1
normalised_y = 1 - 2 * mouse_y / win_height
// note the y pos is inverted, so +y is at the top of the screen
unviewMat = (projectionMat * modelViewMat).inverse()
near_point = unviewMat * Vec(normalised_x, normalised_y, 0, 1)
camera_pos = ray_origin = modelViewMat.inverse().col(4)
ray_dir = near_point - camera_pos
Well, pretty simple, the theory behind this is always the same
1) Unproject two times your 2D coordinate onto the 3D space. (each API has its own function, but you can implement your own if you want). One at Min Z, one at Max Z.
2) With these two values calculate the vector that goes from Min Z and point to Max Z.
3) With the vector and a point calculate the ray that goes from Min Z to MaxZ
4) Now you have a ray, with this you can do a ray-triangle/ray-plane/ray-something intersection and get your result...
I have little DirectX experience, but I'm sure it's similar to OpenGL. What you want is the gluUnproject call.
Assuming you have a valid Z buffer you can query the contents of the Z buffer at a mouse position with:
// obtain the viewport, modelview matrix and projection matrix
// you may keep the viewport and projection matrices throughout the program if you don't change them
GLint viewport[4];
GLdouble modelview[16];
GLdouble projection[16];
glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, modelview);
glGetDoublev(GL_PROJECTION_MATRIX, projection);
// obtain the Z position (not world coordinates but in range 0 - 1)
GLfloat z_cursor;
glReadPixels(x_cursor, y_cursor, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &z_cursor);
// obtain the world coordinates
GLdouble x, y, z;
gluUnProject(x_cursor, y_cursor, z_cursor, modelview, projection, viewport, &x, &y, &z);
if you don't want to use glu you can also implement the gluUnProject you could also implement it yourself, it's functionality is relatively simple and is described at opengl.org
Ok, this topic is old but it was the best I found on the topic, and it helped me a bit, so I'll post here for those who are are following ;-)
This is the way I got it to work without having to compute the inverse of Projection matrix:
void Application::leftButtonPress(u32 x, u32 y){
GL::Viewport vp = GL::getViewport(); // just a call to glGet GL_VIEWPORT
vec3f p = vec3f::from(
((float)(vp.width - x) / (float)vp.width),
((float)y / (float)vp.height),
1.);
// alternatively vec3f p = vec3f::from(
// ((float)x / (float)vp.width),
// ((float)(vp.height - y) / (float)vp.height),
// 1.);
p *= vec3f::from(APP_FRUSTUM_WIDTH, APP_FRUSTUM_HEIGHT, 1.);
p += vec3f::from(APP_FRUSTUM_LEFT, APP_FRUSTUM_BOTTOM, 0.);
// now p elements are in (-1, 1)
vec3f near = p * vec3f::from(APP_FRUSTUM_NEAR);
vec3f far = p * vec3f::from(APP_FRUSTUM_FAR);
// ray in world coordinates
Ray ray = { _camera->getPos(), -(_camera->getBasis() * (far - near).normalize()) };
_ray->set(ray.origin, ray.dir, 10000.); // this is a debugging vertex array to see the Ray on screen
Node* node = _scene->collide(ray, Transform());
cout << "node is : " << node << endl;
}
This assumes a perspective projection, but the question never arises for the orthographic one in the first place.
I've got the same situation with ordinary ray picking, but something is wrong. I've performed the unproject operation the proper way, but it just doesn't work. I think, I've made some mistake, but can't figure out where. My matix multiplication , inverse and vector by matix multiplications all seen to work fine, I've tested them.
In my code I'm reacting on WM_LBUTTONDOWN. So lParam returns [Y][X] coordinates as 2 words in a dword. I extract them, then convert to normalized space, I've checked this part also works fine. When I click the lower left corner - I'm getting close values to -1 -1 and good values for all 3 other corners. I'm then using linepoins.vtx array for debug and It's not even close to reality.
unsigned int x_coord=lParam&0x0000ffff; //X RAW COORD
unsigned int y_coord=client_area.bottom-(lParam>>16); //Y RAW COORD
double xn=((double)x_coord/client_area.right)*2-1; //X [-1 +1]
double yn=1-((double)y_coord/client_area.bottom)*2;//Y [-1 +1]
_declspec(align(16))gl_vec4 pt_eye(xn,yn,0.0,1.0);
gl_mat4 view_matrix_inversed;
gl_mat4 projection_matrix_inversed;
cam.matrixProjection.inverse(&projection_matrix_inversed);
cam.matrixView.inverse(&view_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&projection_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&view_matrix_inversed);
line_points.vtx[line_points.count*4]=pt_eye.x-cam.pos.x;
line_points.vtx[line_points.count*4+1]=pt_eye.y-cam.pos.y;
line_points.vtx[line_points.count*4+2]=pt_eye.z-cam.pos.z;
line_points.vtx[line_points.count*4+3]=1.0;

Categories

Resources