Related
I am kind of stuck with this problem, and I know there are so many questions about it on stack overflow but in my case. Nothing gives the expected result.
The Context:
Am using Android OpenCV along with Tesseract so I can read the MRZ area in the passport. When the camera is started I pass the input frame to an AsyncTask, the frame is processed, the MRZ area is extracted succesfully, I pass the extracted MRZ area to a function prepareForOCR(inputImage) that takes the MRZ area as gray Mat and Will output a bitmap with the thresholded image that I will pass to Tesseract.
The problem:
The problem is while thresholding the Image, I use adaptive thresholding with blockSize = 13 and C = 15, but the result given is not always the same depending on the lighting of the image and the conditions in general from which the frame is taken.
What I have tried:
First I am resizing the image to a specific size (871,108) so the input image is always the same and not dependant on which phone is used.
After resizing, I try with different BlockSize and C values
//toOcr contains the extracted MRZ area
Bitmap toOCRBitmap = Bitmap.createBitmap(bitmap);
Mat inputFrame = new Mat();
Mat toOcr = new Mat();
Utils.bitmapToMat(toOCRBitmap, inputFrame);
Imgproc.cvtColor(inputFrame, inputFrame, Imgproc.COLOR_BGR2GRAY);
TesseractResult lastResult = null;
for (int B = 11; B < 70; B++) {
for (int C = 11; C < 70; C++){
if (IsPrime(B) && IsPrime(C)){
Imgproc.adaptiveThreshold(inputFrame, toOcr, 255, Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C, Imgproc.THRESH_BINARY, B ,C);
Bitmap toOcrBitmap = OpenCVHelper.getBitmap(toOcr);
TesseractResult result = TesseractInstance.extractFrame(toOcrBitmap, "ocrba");
if (result.getMeanConfidence()> 70) {
if (MrzParser.tryParse(result.getText())){
Log.d("Main2Activity", "Best result with " + B + " : " + C);
return result;
}
}
}
}
}
Using the code below, the thresholded result image is a black on white image which gives a confidence greater than 70, I can't really post the whole image for privacy reasons, but here's a clipped one and a dummy password one.
Using the MrzParser.tryParse function which adds checks for the character position and its validity within the MRZ, am able to correct some occurences like a name containing a 8 instead of B, and get a good result but it takes so much time, which is normal because am thresholding almost 255 images in the loop, adding to that the Tesseract call.
I already tried getting a list of C and B values which occurs the most but the results are different.
The question:
Is there a way to define a C and blocksize value so that it s always giving the same result, maybe adding more OpenCV calls so The input image like increasing contrast and so on, I searched the web for 2 weeks now I can't find a viable solution, this is the only one that is giving accurate results
You can use a clustering algorithm to cluster the pixels based on color. The characters are dark and there is a good contrast in the MRZ region, so a clustering method will most probably give you a good segmentation if you apply it to the MRZ region.
Here I demonstrate it with MRZ regions obtained from sample images that can be found on the internet.
I use color images, apply some smoothing, convert to Lab color space, then cluster the a, b channel data using kmeans (k=2). The code is in python but you can easily adapt it to java. Due to the randomized nature of the kmeans algorithm, the segmented characters will have label 0 or 1. You can easily sort it out by inspecting cluster centers. The cluster-center corresponding to characters should have a dark value in the color space you are using.
I just used the Lab color space here. You can use RGB, HSV or even GRAY and see which one is better for you.
After segmenting like this, I think you can even find good values for B and C of your adaptive-threshold using the properties of the stroke width of the characters (if you think the adaptive-threshold gives a better quality output).
import cv2
import numpy as np
im = cv2.imread('mrz1.png')
# convert to Lab
lab = cv2.cvtColor(cv2.GaussianBlur(im, (3, 3), 1), cv2.COLOR_BGR2Lab)
im32f = np.array(im[:, :, 1:3], dtype=np.float32)
k = 2 # 2 clusters
term_crit = (cv2.TERM_CRITERIA_EPS, 30, 0.1)
ret, labels, centers = cv2.kmeans(im32f.reshape([im.shape[0]*im.shape[1], -1]),
k, None, term_crit, 10, 0)
# segmented image
labels = labels.reshape([im.shape[0], im.shape[1]]) * 255
Some results:
I'm trying to build a simple leaf recognition app with Android and OpenCV; my database consist in just 3 entries (3 pictures of 3 types of leaves) and I would like to be able to recognise if one of the pictures in the database is inside another picture captured by the smartphone.
I'm using the SURF method for extract keypoints from the database images and then compared them with the extracted keypoints of the captured image looking for a match.
My problem is that the result appears as a "color matching", more than a "feature matching": when I compare a picture from the database and the one captured, the number of matches is equal with all 3 entries and thus I get a wrong matching.
This one of the picture from the database (note that is without backgroud)
And this is the result that I get:
Image on top is the one captured from the smartphone and the image below is the result with matches highlighted.
Here is the code that I implemented:
Mat orig = Highgui.imread(photoPathwithoutFile);
Mat origBW = new Mat();
Imgproc.cvtColor(orig, origBW, Imgproc.COLOR_RGB2GRAY);
MatOfKeyPoint kpOrigin = createSURFdetector(origBW);
Mat descOrig = extractDescription(kpOrigin, origBW);
Leaf result = findMatches(descOrig);
Mat imageOut = orig.clone();
Features2d.drawMatches(orig, kpOrigin, maple, keypointsMaple, resultMaple, imageOut);
public MatOfKeyPoint createSURFdetector (Mat origBW) {
FeatureDetector surf = FeatureDetector.create(FeatureDetector.FAST);
MatOfKeyPoint keypointsOrig = new MatOfKeyPoint();
surf.detect(origBW, keypointsOrig);
return keypointsOrig;
}
public Mat extractDescription (MatOfKeyPoint kpOrig, Mat origBW) {
DescriptorExtractor surfExtractor = DescriptorExtractor.create(FeatureDetector.SURF);
Mat origDesc = new Mat();
surfExtractor.compute(origBW, kpOrig, origDesc);
return origDesc;
}
public Leaf findMatches (Mat descriptors) {
DescriptorMatcher m = DescriptorMatcher.create(DescriptorMatcher.BRUTEFORCE);
MatOfDMatch max = new MatOfDMatch();
resultMaple = new MatOfDMatch();
resultChestnut = new MatOfDMatch();
resultSwedish = new MatOfDMatch();
Leaf match = null;
m.match(descriptors, mapleDescriptors, resultMaple);
Log.d("Origin", resultMaple.toList().size()+" matches with Maples");
if (resultMaple.toList().size() > max.toList().size()) { max = resultMaple; match = Leaf.MAPLE; }
m.match(descriptors, chestnutDescriptors, resultChestnut);
Log.d("Origin", resultChestnut.toList().size()+" matches with Chestnut");
if (resultChestnut.toList().size() > max.toList().size()) { max = resultChestnut; match = Leaf.CHESTNUT; }
m.match(descriptors, swedishDescriptors, resultSwedish);
Log.d("Origin", resultSwedish.toList().size()+" matches with Swedish");
if (resultSwedish.toList().size() > max.toList().size()) { max = resultSwedish; match = Leaf.SWEDISH; }
//return the match object with more matches
return match;
}
How can I get a more accurate matching not based on colours but on actual singularities of the picture?
Well, SURF is not the best candidate for this task. SURF descriptor basically encodes some gradient statistics in a small neighborhood of a corner. This gives you invariance to lot of transformations, but you lose the 'big picture' when doing this. This descriptor is used to narrow down a range of correspondences between points to be matched, and then some geometric contraints come into play.
In your case it seems that descriptors are not doing a great job at matching points, and since there are a LOT of them each point eventually gets a match (although it is strange that geometric testing didn't prevent that).
I can advice you to try different approach to matching, maybe HOG with descriptors trained to detect leaf types, or even something contour-based, since shape is what is really different between your images. You can for example detect leaf's outline, normalize it's length, find it's center and then in equal intervals calculate distance from each point to the center - that will be your descriptor. Than find the largest length and circularly shift this descriptor to start at the extrema and divide by this value - that will give you some basic invariance to choice of contour starting point, rotation and scale. But that will most likely fail under perspective and affine transformations.
If you would like to experiment further with feature points - try to detect less of them ,but more representative ones (filter by gradient strength, corner score or something). Maybe use SIFT instead of SURF - it should be a bit more precise. Check for amount of inliers after matching - best match should have higher ratio.
But honestly, this seems more like a machine learning task than computer vision.
Edit: I have checked your code and found out that you are not performing geometric checks on matches, hence why you are getting incorrect match. Try performing findHomography after matching and then consider only points that have been set to one in mask output argument. This will make you only consider points that can be warped to each other using homography and may improve matching a lot.
Edit2: added a code snippet (sorry, but I can't test Java at the moment, so it's in Python)
import cv2
import numpy as np
# read input
a = cv2.imread(r'C:\Temp\leaf1.jpg')
b = cv2.imread(r'C:\Temp\leaf2.jpg')
# convert to gray
agray = cv2.cvtColor(a, cv2.COLOR_BGR2GRAY)
bgray = cv2.cvtColor(b, cv2.COLOR_BGR2GRAY)
# detect features and compute descriptors
surf = cv2.SURF() # better use SIFT instead
kp1, d1 = surf.detectAndCompute(agray,None)
kp2, d2 = surf.detectAndCompute(bgray,None)
print 'numFeatures1 =', len(kp1)
print 'numFeatures2 =', len(kp2)
# use KNN matcher
bf = cv2.BFMatcher()
matches = bf.knnMatch(d1,d2, k=2)
# Apply Lowe ratio test
good = []
for m,n in matches:
if m.distance < 0.75*n.distance:
good.append(m)
print 'numMatches =', len(matches)
print 'numGoodMatches =', len(good)
# if have enough matches - try to calculare homography to discard matches
# that don't fit perspective transformation model
if len(good)>10:
# convert matches into correct format (python-specific)
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
print 'numMatches =', sum(mask.ravel().tolist()) # calc number of 1s in mask
else:
print "not enough good matches are found"
It gives me following output for different leaves using SURF
numFeatures1 = 685
numFeatures2 = 1566
numMatches = 685
numGoodMatches = 52
numMatches = 11
You can see that the amount of 'real' matches is very small. But unfortunately numMatches is similar when we match different images of same leaf type. Maybe you can improve the result by tweaking parameters, but I think using keypoints here is just a not very good approach. Maybe it is due to the leaf variation even within a same class.
I am working on an app that is expected to remove image backgrounds using opencv, at first I tried using grabcut but it was too slow and the results were not always accurate, then I tried using threshold, although the results are not yet close th grabcut, its very fast and looks like a better, So my code is first looking at the image hue and analying which portion of it appears more, that portion is taken in as the background, the issue is at times its getting the foreground as background below is my code:
private Bitmap backGrndErase()
{
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.skirt);
Log.d(TAG, "bitmap: " + bitmap.getWidth() + "x" + bitmap.getHeight());
bitmap = ResizeImage.getResizedBitmap(bitmap, calculatePercentage(40, bitmap.getWidth()), calculatePercentage(40, bitmap.getHeight()));
Mat frame = new Mat();
Utils.bitmapToMat(bitmap, frame);
Mat hsvImg = new Mat();
List<Mat> hsvPlanes = new ArrayList<>();
Mat thresholdImg = new Mat();
// int thresh_type = Imgproc.THRESH_BINARY_INV;
//if (this.inverse.isSelected())
int thresh_type = Imgproc.THRESH_BINARY;
// threshold the image with the average hue value
hsvImg.create(frame.size(), CvType.CV_8U);
Imgproc.cvtColor(frame, hsvImg, Imgproc.COLOR_BGR2HSV);
Core.split(hsvImg, hsvPlanes);
// get the average hue value of the image
double threshValue = this.getHistAverage(hsvImg, hsvPlanes.get(0));
Imgproc.threshold(hsvPlanes.get(0), thresholdImg, threshValue, mThresholdValue, thresh_type);
// Imgproc.adaptiveThreshold(hsvPlanes.get(0), thresholdImg, 255, Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY, 11, 2);
Imgproc.blur(thresholdImg, thresholdImg, new Size(5, 5));
// dilate to fill gaps, erode to smooth edges
Imgproc.dilate(thresholdImg, thresholdImg, new Mat(), new Point(-1, -1), 1);
Imgproc.erode(thresholdImg, thresholdImg, new Mat(), new Point(-1, -1), 3);
Imgproc.threshold(thresholdImg, thresholdImg, threshValue, mThresholdValue, Imgproc.THRESH_BINARY);
//Imgproc.adaptiveThreshold(thresholdImg, thresholdImg, 255, Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY, 11, 2);
// create the new image
Mat foreground = new Mat(frame.size(), CvType.CV_8UC3, new Scalar(255, 255, 255));
frame.copyTo(foreground, thresholdImg);
Utils.matToBitmap(foreground,bitmap);
//return foreground;
alreadyRun = true;
return bitmap;
}
the method responsible for Hue:
private double getHistAverage(Mat hsvImg, Mat hueValues)
{
// init
double average = 0.0;
Mat hist_hue = new Mat();
// 0-180: range of Hue values
MatOfInt histSize = new MatOfInt(180);
List<Mat> hue = new ArrayList<>();
hue.add(hueValues);
// compute the histogram
Imgproc.calcHist(hue, new MatOfInt(0), new Mat(), hist_hue, histSize, new MatOfFloat(0, 179));
// get the average Hue value of the image
// (sum(bin(h)*h))/(image-height*image-width)
// -----------------
// equivalent to get the hue of each pixel in the image, add them, and
// divide for the image size (height and width)
for (int h = 0; h < 180; h++)
{
// for each bin, get its value and multiply it for the corresponding
// hue
average += (hist_hue.get(h, 0)[0] * h);
}
// return the average hue of the image
average = average / hsvImg.size().height / hsvImg.size().width;
return average;
}
A sample of the input and output:[
Input Image 2 and Output:
Input Image 3 and Output:
Indeed, as others have said you are unlikely to get good results just with a threshold on hue. You can use something similar to GrabCut, but faster.
Under the hood, GrabCut calculates foreground and background histograms, then calculates the probability of each pixel being FG/BG based on these histograms, and then optimizes the resulting probability map using graph cut to obtain a segmentation.
Last step is most expensive, and it may be ignored depending on the application. Instead, you may apply the threshold to the probability map to obtain a segmentation. It may (and will) be worse than GrabCut, but will be better than your current approach.
There are some points to consider for this approach. The choice of histogram model would be very important here. You can either consider 2 channels in some space like YUV or HSV, consider 3 channels of RGB, or consider 2 channels of normalized RGB. You also have to select an appropriate bin size for those histograms. Too small bins would lead to 'overtraining', while too large will reduce the precision. The tradeoffs between those are a topic for a separate discussion, in brief - I would advice using RGB with 64 bins per channel for start and then see what changes are better for your data.
Also, you can get better results for coarse binning if you use interpolation to get values between bins. In past I have used trilinear interpolation and it was kind of good, compared to no interpolation at all.
But remember that there are no guarantees that your segmentation will be correct without prior knowledge on object shape, either with GrabCut, thresholding or this approach.
I would try again Grabcut, it is one of the best segmentation methods available. This is the result I get
cv::Mat bgModel,fgModel; // the models (internally used)
cv::grabCut(image,// input image
object_mask,// segmentation result
rectang,// rectangle containing foreground
bgModel,fgModel, // models
5,// number of iterations
cv::GC_INIT_WITH_RECT); // use rectangle
// Get the pixels marked as likely foreground
cv::compare(object_mask,cv::GC_PR_FGD,object_mask,cv::CMP_EQ);
cv::threshold(object_mask, object_mask, 0,255, CV_THRESH_BINARY); //ensure the mask is binary
The only problem of Grabcut is that you have to give as an input a rectangle containing the object you want to extract. Apart from that it works pretty well.
Your method of finding average hue is WRONG! As you most probably know, hue is expressed as angle and takes value in [0,360] range. Therefore, a pixel with hue 360 essentially has same colour as a pixel with hue 0 (both are pure red). In the same way, a pixel with hue 350 is actually closer to a pixel with hue 10 than a pixel with hue, say for example, 300.
As for opencv, cvtColor function actually divides calculated hue value by 2 to fit it in 8 bit integer. Thus, in opencv, hue values wrap after 180. Now, consider we have two red(ish) pixels with hues 10 and 170. If we take their average, we will get 90 — hue of pure cyan, the exact opposite of red — which is not our desired value.
Therefore, to correctly find the average hue, you need to first find average pixel value in RGB colour space, then calculate the hue from this RGB value. You can create 1x1 matrix with average RGB pixel and convert it to HSV/HSL.
Following the same reasoning, applying threshold to hue image doesn't work flawlessly. It does not consider wrapping of hue values.
If I understand correctly, you want to find pixels with similar hue as the background. Assuming we know the colour of background, I would do this segmentation in RGB space. I would introduce some tolerance variable. I would use the background pixel value as centre and this tolerance as radius and thus define a sphere in RGB colour space. Now, rest is inspecting each pixel value, if it falls inside this sphere, then classify as background; otherwise, regard it as foreground pixel.
In OpenCV for Android, the function org.opencv.Calib3d.findHomography(..) returns the homogeneous transformation matrix. For example, this only returns the homography:
Mat homography = Calib3d.findHomography(points1, points2, Calib3d.RANSAC, 0.5);
Is there a way to return the points that RANSAC actually uses from the Android OpenCV API?
Update
I am not sure whether it's a new addition to OpenCV or I've just missed it, but the findHomography() function actually can give you the inliers (OpenCV 2.4.2). The last parameter, mask, which is empty by default, will be filled with ones (or 255) at the indexes of the inliers foound by RANSAC.
Mat findHomography(InputArray srcPoints, InputArray dstPoints,
int method=0, double ransacReprojThreshold=3, OutputArray mask=noArray() )
// ^
// |
Old answer
The points used by RANSAC to estimate the homography (called inliers in technical docs) cannot be extracted directly. They are computed internally, but then the list is deleted.
A way to extract them is to modify the findHomography function (and the corresponding RANSAC functions). But this is ugly.
Another, cleaner way is to test what point pairs in the input match th homography:
use the projectPoints(points1, homography, points1_dest) (i hope this is the function name) to apply homography to points1.
The correct function name and input arguments order is:
void perspectiveTransform(InputArray src, OutputArray dst, InputArray m), in this case cv::perspectiveTransform(points1, points1_dest, homography)
OpenCV Perspective Transform
use cv::distance(points1_dest, points2)
The correct function name and input arguments order is:
double norm(InputArray src1, int normType=NORM_L2, InputArray mask=noArray())
possible implementation:
std::array<cv::Point2f, 1> pt1;
pt1[0] = points1_dest;
std::array<cv::Point2f, 1> pt2;
pt2[0] = points2;
distance = cv::norm(pt1, pt2));
OpenCV norm function
Distance between two points can be also calculated using Pythagorean theorem
to see which of them are close enough to their pair in points2. The distance should be smaller or equal to min_distance^2. In your case, 0.5*0.5 = 0.25.
After using OpenCV's PCACompute function as shown below, I have a Mat representing mean and a Mat of eigenvectors.
org.opencv.core.Core.PCACompute(datiOriginali,mean, eigenvectors,0);
datiOriginali is my input Mat, mean is the mean value Mat and eigenvectors is the eigenvectors Mat.
From there, I used PCAProject thus:
org.opencv.core.Core.PCAProject(datiOriginali, mean,eigenvectors , res);
datiOriginali is always the input Mat, mean and eigenvectors are the same as were calculated in PCACompute and res is the output Mat.
How can I perform face recognition using an eigenfaces value? I don't know how to calculate the Euclidean distance between a training image calculated as shown above and a new image.