I want to recognize digits from odometer by mobile using tesseract library.
Source image:
Next step:
Now i need to fill gaps between each segment.
Can you help me, how i do it?
(english training data work better for me than https://github.com/arturaugusto/display_ocr)
image processing:
func prepareImage(sourceImage: UIImage) -> UIImage {
let avgLuminanceThresholdFilter = GPUImageAverageLuminanceThresholdFilter()
avgLuminanceThresholdFilter.thresholdMultiplier = 0.67
let adaptiveThresholdFilter = GPUImageAdaptiveThresholdFilter()
adaptiveThresholdFilter.blurRadiusInPixels = 0.67
let unsharpMaskFilter = GPUImageUnsharpMaskFilter()
unsharpMaskFilter.blurRadiusInPixels = 4.0
let stillImageFilter = GPUImageAdaptiveThresholdFilter()
stillImageFilter.blurRadiusInPixels = 1.0
let contrastFilter = GPUImageContrastFilter()
contrastFilter.contrast = 0.75
let brightnessFilter = GPUImageBrightnessFilter()
brightnessFilter.brightness = -0.25
//unsharpen
var processingImage = unsharpMaskFilter.imageByFilteringImage(sourceImage)
processingImage = contrastFilter.imageByFilteringImage(processingImage)
processingImage = brightnessFilter.imageByFilteringImage(processingImage)
//convert to binary black/white pixels
processingImage = avgLuminanceThresholdFilter.imageByFilteringImage(processingImage)
return processingImage
}
OCR:
let tesseract_eng = G8Tesseract()
tesseract_eng.language = "eng"
tesseract_eng.engineMode = .TesseractOnly
tesseract_eng.pageSegmentationMode = .Auto
tesseract_eng.maximumRecognitionTime = 60.0
tesseract_eng.setVariableValue("0123456789", forKey: "tessedit_char_whitelist")
tesseract_eng.image = prepareImage(image)
tesseract_eng.recognize()
OpenCV has some morphology methods, which white fill the gaps between black pixels (like THIS or THIS). Pay attention to morphology opening method, this should be the primary method for solving this, but do not be afraid to combine it with dilating if only this does not help. I am not sure what software do you use for image processing, if it does have similar methods, try them out, otherwise I would highly recomend you installing OpenCV, which (is free of course) has many image-processing operations with very high speed. Also, you could try a bit to experiment with threshold values and find the balance between how much corners it cuts out and how much shadows it takes off (combined with morphological operations this should solve the issue for you).
Related
I am developing an application in Android which has to apply Sepia effect on the uploaded image. Application is already. I came across a function in swift code which is applying Sepia effect.
func applySepia() -> UIImage? {
filterValue = 1
let image = self.processPixels(filterValue: filterValue)
guard let cgimg = image?.cgImage else {
print("imageView doesn't have an image!")
return self
}
let value = -filterValue
print("sliderValue = \(filterValue)")
print("value = \(value)")
let openGLContext = EAGLContext(api: .openGLES2)
let context = CIContext(eaglContext: openGLContext!)
let coreImage = CIImage(cgImage: cgimg)
let filter = CIFilter(name: "CISepiaTone")
filter?.setValue(coreImage, forKey: kCIInputImageKey)
filter?.setValue(value, forKey: kCIInputIntensityKey)
if let output = filter?.value(forKey: kCIOutputImageKey) as? CIImage {
let cgimgresult = context.createCGImage(output, from: output.extent)
let image = UIImage(cgImage: cgimgresult!)
return image.applySharpness(filterValue:filterValue)
}
return self
}
Above function is using "CISepiaTone" filter to implement Sepia tone. Here "kCIInputIntensityKey" is passed as -1 which I am unable to understand. As per the documentation, it's value ranges between 0 to 1 but how it is allowing a negative value. Due to this intensity value, the generated image does look like this:
In my opinion, after applying Sepia it should look like:
I am able to achieve the second image in android (which is truly Sepia Tone) using https://github.com/StevenRudenko/ColorMartix/blob/master/src/com/sample/colormatrix/Main.java
However, I couldn't find any built-in method or class in android which can be used to implement Sepia Tone with negative intensity just like the way it is applied in IOS swift. Here are my questions:
How IOS is allowing negative value for kCIInputIntensityKey despite
the fact that it should be ranged between 0-1.
After negative intensity values, Generated images does not look like
a Sepia Tone.
How can I achieve the same effect in Android?
I am kind of stuck with this problem, and I know there are so many questions about it on stack overflow but in my case. Nothing gives the expected result.
The Context:
Am using Android OpenCV along with Tesseract so I can read the MRZ area in the passport. When the camera is started I pass the input frame to an AsyncTask, the frame is processed, the MRZ area is extracted succesfully, I pass the extracted MRZ area to a function prepareForOCR(inputImage) that takes the MRZ area as gray Mat and Will output a bitmap with the thresholded image that I will pass to Tesseract.
The problem:
The problem is while thresholding the Image, I use adaptive thresholding with blockSize = 13 and C = 15, but the result given is not always the same depending on the lighting of the image and the conditions in general from which the frame is taken.
What I have tried:
First I am resizing the image to a specific size (871,108) so the input image is always the same and not dependant on which phone is used.
After resizing, I try with different BlockSize and C values
//toOcr contains the extracted MRZ area
Bitmap toOCRBitmap = Bitmap.createBitmap(bitmap);
Mat inputFrame = new Mat();
Mat toOcr = new Mat();
Utils.bitmapToMat(toOCRBitmap, inputFrame);
Imgproc.cvtColor(inputFrame, inputFrame, Imgproc.COLOR_BGR2GRAY);
TesseractResult lastResult = null;
for (int B = 11; B < 70; B++) {
for (int C = 11; C < 70; C++){
if (IsPrime(B) && IsPrime(C)){
Imgproc.adaptiveThreshold(inputFrame, toOcr, 255, Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C, Imgproc.THRESH_BINARY, B ,C);
Bitmap toOcrBitmap = OpenCVHelper.getBitmap(toOcr);
TesseractResult result = TesseractInstance.extractFrame(toOcrBitmap, "ocrba");
if (result.getMeanConfidence()> 70) {
if (MrzParser.tryParse(result.getText())){
Log.d("Main2Activity", "Best result with " + B + " : " + C);
return result;
}
}
}
}
}
Using the code below, the thresholded result image is a black on white image which gives a confidence greater than 70, I can't really post the whole image for privacy reasons, but here's a clipped one and a dummy password one.
Using the MrzParser.tryParse function which adds checks for the character position and its validity within the MRZ, am able to correct some occurences like a name containing a 8 instead of B, and get a good result but it takes so much time, which is normal because am thresholding almost 255 images in the loop, adding to that the Tesseract call.
I already tried getting a list of C and B values which occurs the most but the results are different.
The question:
Is there a way to define a C and blocksize value so that it s always giving the same result, maybe adding more OpenCV calls so The input image like increasing contrast and so on, I searched the web for 2 weeks now I can't find a viable solution, this is the only one that is giving accurate results
You can use a clustering algorithm to cluster the pixels based on color. The characters are dark and there is a good contrast in the MRZ region, so a clustering method will most probably give you a good segmentation if you apply it to the MRZ region.
Here I demonstrate it with MRZ regions obtained from sample images that can be found on the internet.
I use color images, apply some smoothing, convert to Lab color space, then cluster the a, b channel data using kmeans (k=2). The code is in python but you can easily adapt it to java. Due to the randomized nature of the kmeans algorithm, the segmented characters will have label 0 or 1. You can easily sort it out by inspecting cluster centers. The cluster-center corresponding to characters should have a dark value in the color space you are using.
I just used the Lab color space here. You can use RGB, HSV or even GRAY and see which one is better for you.
After segmenting like this, I think you can even find good values for B and C of your adaptive-threshold using the properties of the stroke width of the characters (if you think the adaptive-threshold gives a better quality output).
import cv2
import numpy as np
im = cv2.imread('mrz1.png')
# convert to Lab
lab = cv2.cvtColor(cv2.GaussianBlur(im, (3, 3), 1), cv2.COLOR_BGR2Lab)
im32f = np.array(im[:, :, 1:3], dtype=np.float32)
k = 2 # 2 clusters
term_crit = (cv2.TERM_CRITERIA_EPS, 30, 0.1)
ret, labels, centers = cv2.kmeans(im32f.reshape([im.shape[0]*im.shape[1], -1]),
k, None, term_crit, 10, 0)
# segmented image
labels = labels.reshape([im.shape[0], im.shape[1]]) * 255
Some results:
I'm using the nativescript-google-maps-sdk plugin to create a Google map.
Everything works fine but I've got a problem with my custom marker icons, if you look at these pictures you can see that the icon size is not preserved on Android, making them very, very small to the point where you can barely even see them. This happens both in the emulators and on a real phone.
On IOS however the size is fine, as you can see in the 2nd image. The icon images have a size of 16x16 pixels and are in .png format.
I haven't been able to find any solution to this so this is my last resort, does anyone know why this might be happening?
This is the code I use to create the markers:
getImage(this.getWarningIcon(warning.status)).then((result) => {
const icon = new Image();
icon.imageSource = result;
const marker = new Marker();
marker.position = warning.centerOfPolygon;
marker.icon = icon;
marker.flat = true;
marker.anchor = [0.5, 0.5];
marker.visible = warning.isVisible;
marker.zIndex = zIndexOffset;
marker.infoWindowTemplate = 'markerTemplate';
marker.userData = {
description: warning.description,
startTime: warning.startTime,
completionTime: warning.completionTime,
freeText: warning.freeText
};
this.layers.push(marker);
this.map.addMarker(marker);
});
In that case 16px sounds too low for a high density device. Increase the size of the image sent from server or locally resize the image before passing it to marker.
You may also consider generating a scaled bitmap natively if you are familiar with Android apis. Image processing is something always complicated in Android. Using drawables are recommend when your images are static at least.
After some weeks of waiting I finally have my Project Tango. My idea is to create an app that generates a point cloud of my room and exports this to .xyz data. I'll then use the .xyz file to show the point cloud in a browser! I started off by compiling and adjusting the point cloud example that's on Google's github.
Right now I use the onXyzIjAvailable(TangoXyzIjData tangoXyzIjData) to get a frame of x y and z values; the points. I then save these frames in a PCLManager in the form of Vector3. After I'm done scanning my room, I simple write all the Vector3 from the PCLManager to a .xyz file using:
OutputStream os = new FileOutputStream(file);
size = pointCloud.size();
for (int i = 0; i < size; i++) {
String row = String.valueOf(pointCloud.get(i).x) + " "
+ String.valueOf(pointCloud.get(i).y) + " "
+ String.valueOf(pointCloud.get(i).z) + "\r\n";
os.write(row.getBytes());
}
os.close();
Everything works fine, not compilation errors or crashes. The only thing that seems to be going wrong is the rotation or translation of the points in the cloud. When I view the point cloud everything is messed up; the area I scanned is not recognizable, though the amount of points is the same as recorded.
Could this have to do something with the fact that I don't use PoseData together with the XyzIjData? I'm kind of new to this subject and have a hard time understanding what the PoseData exactly does. Could someone explain it to me and help me fix my point cloud?
Yes, you have to use TangoPoseData.
I guess you are using TangoXyzIjData correctly; but the data you get this way is relative to where the device is and how the device is tilted when you take the shot.
Here's how i solved this:
I started from java_point_to_point_example. In this example they get the coords of 2 different points with 2 different coordinate system and then write those coordinates wrt the base Coordinate frame pair.
First of all you have to setup your exstrinsics, so you'll be able to perform all the transformations you'll need. To do that I call mExstrinsics = setupExtrinsics(mTango) function at the end of my setTangoListener() function. Here's the code (that you can find also in the example I linked above).
private DeviceExtrinsics setupExtrinsics(Tango mTango) {
//camera to IMU tranform
TangoCoordinateFramePair framePair = new TangoCoordinateFramePair();
framePair.baseFrame = TangoPoseData.COORDINATE_FRAME_IMU;
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_COLOR;
TangoPoseData imu_T_rgb = mTango.getPoseAtTime(0.0,framePair);
//IMU to device transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_DEVICE;
TangoPoseData imu_T_device = mTango.getPoseAtTime(0.0,framePair);
//IMU to depth transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_DEPTH;
TangoPoseData imu_T_depth = mTango.getPoseAtTime(0.0,framePair);
return new DeviceExtrinsics(imu_T_device,imu_T_rgb,imu_T_depth);
}
Then when you get the point Cloud you have to "normalize" it. Using your exstrinsics is pretty simple:
public ArrayList<Vector3> normalize(TangoXyzIjData cloud, TangoPoseData cameraPose, DeviceExtrinsics extrinsics) {
ArrayList<Vector3> normalizedCloud = new ArrayList<>();
TangoPoseData camera_T_imu = ScenePoseCalculator.matrixToTangoPose(extrinsics.getDeviceTDepthCamera());
while (cloud.xyz.hasRemaining()) {
Vector3 rotatedV = ScenePoseCalculator.getPointInEngineFrame(
new Vector3(cloud.xyz.get(),cloud.xyz.get(),cloud.xyz.get()),
camera_T_imu,
cameraPose
);
normalizedCloud.add(rotatedV);
}
return normalizedCloud;
}
This should be enough, now you have a point cloud wrt you base frame of reference.
If you overimpose two or more of this "normalized" cloud you can get the 3D representation of your room.
There is another way to do this with rotation matrix, explained here.
My solution is pretty slow (it takes around 700ms to the dev kit to normalize a cloud of ~3000 points), so it is not suitable for a real time application for 3D reconstruction.
Atm i'm trying to use Tango 3D Reconstruction Library in C using NDK and JNI. The library is well documented but it is very painful to set up your environment and start using JNI. (I'm stuck at the moment in fact).
Drifting
There still is a problem when I turn around with the device. It seems that the point cloud spreads out a lot.
I guess you are experiencing some drifting.
Drifting happens when you use Motion Tracking alone: it consist of a lot of very small error in estimating your Pose that all together cause a big error in your pose relative to the world. For instance if you take your tango device and you walk in a circle tracking your TangoPoseData and then you draw you trajectory in a spreadsheet or whatever you want you'll notice that the Tablet will never return at his starting point because he is drifting away.
Solution to that is using Area Learning.
If you have no clear ideas about this topic i'll suggest watching this talk from Google I/O 2016. It will cover lots of point and give you a nice introduction.
Using area learning is quite simple.
You have just to change your base frame of reference in TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION. In this way you tell your Tango to estimate his pose not wrt on where it was when you launched the app but wrt some fixed point in the area.
Here's my code:
private static final ArrayList<TangoCoordinateFramePair> FRAME_PAIRS =
new ArrayList<TangoCoordinateFramePair>();
{
FRAME_PAIRS.add(new TangoCoordinateFramePair(
TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION,
TangoPoseData.COORDINATE_FRAME_DEVICE
));
}
Now you can use this FRAME_PAIRS as usual.
Then you have to modify your TangoConfig in order to issue Tango to use Area Learning using the key TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION. Remember that when using TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION you CAN'T use learningmode and load ADF (area description file).
So you cant use:
TangoConfig.KEY_BOOLEAN_LEARNINGMODE
TangoConfig.KEY_STRING_AREADESCRIPTION
Here's how I initialize TangoConfig in my app:
TangoConfig config = tango.getConfig(TangoConfig.CONFIG_TYPE_DEFAULT);
//Turning depth sensor on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DEPTH, true);
//Turning motiontracking on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_MOTIONTRACKING,true);
//If tango gets stuck he tries to autorecover himself.
config.putBoolean(TangoConfig.KEY_BOOLEAN_AUTORECOVERY,true);
//Tango tries to store and remember places and rooms,
//this is used to reduce drifting.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION,true);
//Turns the color camera on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_COLORCAMERA, true);
Using this technique you'll get rid of those spreads.
PS
In the Talk i linked above, at around 22:35 they show you how to port your application to Area Learning. In their example they use TangoConfig.KEY_BOOLEAN_ENABLE_DRIFT_CORRECTION. This key does not exist anymore (at least in Java API). Use TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION instead.
I'm trying to build a simple leaf recognition app with Android and OpenCV; my database consist in just 3 entries (3 pictures of 3 types of leaves) and I would like to be able to recognise if one of the pictures in the database is inside another picture captured by the smartphone.
I'm using the SURF method for extract keypoints from the database images and then compared them with the extracted keypoints of the captured image looking for a match.
My problem is that the result appears as a "color matching", more than a "feature matching": when I compare a picture from the database and the one captured, the number of matches is equal with all 3 entries and thus I get a wrong matching.
This one of the picture from the database (note that is without backgroud)
And this is the result that I get:
Image on top is the one captured from the smartphone and the image below is the result with matches highlighted.
Here is the code that I implemented:
Mat orig = Highgui.imread(photoPathwithoutFile);
Mat origBW = new Mat();
Imgproc.cvtColor(orig, origBW, Imgproc.COLOR_RGB2GRAY);
MatOfKeyPoint kpOrigin = createSURFdetector(origBW);
Mat descOrig = extractDescription(kpOrigin, origBW);
Leaf result = findMatches(descOrig);
Mat imageOut = orig.clone();
Features2d.drawMatches(orig, kpOrigin, maple, keypointsMaple, resultMaple, imageOut);
public MatOfKeyPoint createSURFdetector (Mat origBW) {
FeatureDetector surf = FeatureDetector.create(FeatureDetector.FAST);
MatOfKeyPoint keypointsOrig = new MatOfKeyPoint();
surf.detect(origBW, keypointsOrig);
return keypointsOrig;
}
public Mat extractDescription (MatOfKeyPoint kpOrig, Mat origBW) {
DescriptorExtractor surfExtractor = DescriptorExtractor.create(FeatureDetector.SURF);
Mat origDesc = new Mat();
surfExtractor.compute(origBW, kpOrig, origDesc);
return origDesc;
}
public Leaf findMatches (Mat descriptors) {
DescriptorMatcher m = DescriptorMatcher.create(DescriptorMatcher.BRUTEFORCE);
MatOfDMatch max = new MatOfDMatch();
resultMaple = new MatOfDMatch();
resultChestnut = new MatOfDMatch();
resultSwedish = new MatOfDMatch();
Leaf match = null;
m.match(descriptors, mapleDescriptors, resultMaple);
Log.d("Origin", resultMaple.toList().size()+" matches with Maples");
if (resultMaple.toList().size() > max.toList().size()) { max = resultMaple; match = Leaf.MAPLE; }
m.match(descriptors, chestnutDescriptors, resultChestnut);
Log.d("Origin", resultChestnut.toList().size()+" matches with Chestnut");
if (resultChestnut.toList().size() > max.toList().size()) { max = resultChestnut; match = Leaf.CHESTNUT; }
m.match(descriptors, swedishDescriptors, resultSwedish);
Log.d("Origin", resultSwedish.toList().size()+" matches with Swedish");
if (resultSwedish.toList().size() > max.toList().size()) { max = resultSwedish; match = Leaf.SWEDISH; }
//return the match object with more matches
return match;
}
How can I get a more accurate matching not based on colours but on actual singularities of the picture?
Well, SURF is not the best candidate for this task. SURF descriptor basically encodes some gradient statistics in a small neighborhood of a corner. This gives you invariance to lot of transformations, but you lose the 'big picture' when doing this. This descriptor is used to narrow down a range of correspondences between points to be matched, and then some geometric contraints come into play.
In your case it seems that descriptors are not doing a great job at matching points, and since there are a LOT of them each point eventually gets a match (although it is strange that geometric testing didn't prevent that).
I can advice you to try different approach to matching, maybe HOG with descriptors trained to detect leaf types, or even something contour-based, since shape is what is really different between your images. You can for example detect leaf's outline, normalize it's length, find it's center and then in equal intervals calculate distance from each point to the center - that will be your descriptor. Than find the largest length and circularly shift this descriptor to start at the extrema and divide by this value - that will give you some basic invariance to choice of contour starting point, rotation and scale. But that will most likely fail under perspective and affine transformations.
If you would like to experiment further with feature points - try to detect less of them ,but more representative ones (filter by gradient strength, corner score or something). Maybe use SIFT instead of SURF - it should be a bit more precise. Check for amount of inliers after matching - best match should have higher ratio.
But honestly, this seems more like a machine learning task than computer vision.
Edit: I have checked your code and found out that you are not performing geometric checks on matches, hence why you are getting incorrect match. Try performing findHomography after matching and then consider only points that have been set to one in mask output argument. This will make you only consider points that can be warped to each other using homography and may improve matching a lot.
Edit2: added a code snippet (sorry, but I can't test Java at the moment, so it's in Python)
import cv2
import numpy as np
# read input
a = cv2.imread(r'C:\Temp\leaf1.jpg')
b = cv2.imread(r'C:\Temp\leaf2.jpg')
# convert to gray
agray = cv2.cvtColor(a, cv2.COLOR_BGR2GRAY)
bgray = cv2.cvtColor(b, cv2.COLOR_BGR2GRAY)
# detect features and compute descriptors
surf = cv2.SURF() # better use SIFT instead
kp1, d1 = surf.detectAndCompute(agray,None)
kp2, d2 = surf.detectAndCompute(bgray,None)
print 'numFeatures1 =', len(kp1)
print 'numFeatures2 =', len(kp2)
# use KNN matcher
bf = cv2.BFMatcher()
matches = bf.knnMatch(d1,d2, k=2)
# Apply Lowe ratio test
good = []
for m,n in matches:
if m.distance < 0.75*n.distance:
good.append(m)
print 'numMatches =', len(matches)
print 'numGoodMatches =', len(good)
# if have enough matches - try to calculare homography to discard matches
# that don't fit perspective transformation model
if len(good)>10:
# convert matches into correct format (python-specific)
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
print 'numMatches =', sum(mask.ravel().tolist()) # calc number of 1s in mask
else:
print "not enough good matches are found"
It gives me following output for different leaves using SURF
numFeatures1 = 685
numFeatures2 = 1566
numMatches = 685
numGoodMatches = 52
numMatches = 11
You can see that the amount of 'real' matches is very small. But unfortunately numMatches is similar when we match different images of same leaf type. Maybe you can improve the result by tweaking parameters, but I think using keypoints here is just a not very good approach. Maybe it is due to the leaf variation even within a same class.