I want to detect yellow colour objects and plot a centroid position on the largest detected yellow object.
I perform my steps in this order:
converted the input rgbaframe to hsv using cvtColor() method
perform color segmentation in HSV using inRange() method, bound it only to yellow color ranges & returns a binary threshold mask.
I perform morphology operation (specifically MORPH_CLOSE) to perform dilation then erosion on the mask to remove any noise.
I perform a gaussian blur to smooth the mask.
I apply canny algorithm to perform edge detection to make edges more obvious to prepare for Contour detection in the next step. (i'm starting to wonder if this step is of beneficial use at all?)
I apply findContour() algorithm to find the Contours in the image as well as find the hierarchy.
HERE I intend to use feature2d.FeatureDetection(SIMPLEBLOB)& pass in the blob area for detection as Params, however there seems to be no implementation supporting for Android, hence I had to work around the limitation and find largest blob using Imgproc.contourArea().
is there a way to do so?
I pass in the contours obtained previously from the findContours() method as parameter to Imgproc.moments to compute the Centroid Position of the objects detected.
HOWEVER, I would like to bring to everyone's attention that this current implementation will compute all centroids in EACH contour (yellow objects) detected. *PLS SEE/REFER to pic 1, 2 to see what is output onto the Frame back to the user.
What I would like to achieve is to find a way to use the contour of the largestblob (via largestContourArea) and pass that info on as a parameter into the ImgprocMoments() so that I will ONLY COMPUTE the centroid of that largest Contour(Object) detected, so I should only see 1 centroid Pos plotted on the screen at any particular point in time.
I have tried a few methods such as passing the contour of the Largest object as param into Imgproc.moments() but it didn't work either due to difference in data Type / if it worked, the output is not as desired, w multiple centroid points being plotted within or along the perimeter of the object, rather than 1 single point at the center of the largest contour object.
public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
InputFrame = inputFrame.rgba();
Core.transpose(InputFrame,mat1); //transpose mat1(src) to mat2(dst), sorta like a Clone!
Imgproc.resize(mat1,mat2,InputFrame.size(),0,0,0); // params:(Mat src, Mat dst, Size dsize, fx, fy, interpolation) Extract the dimensions of the new Screen Orientation, obtain the new orientation's surface width & height. Try to resize to fit to screen.
Core.flip(mat2,InputFrame,-1); // mat3 now get updated, no longer is the Origi inputFrame.rgba BUT RATHER the transposed, resized, flipped version of inputFrame.rgba().
int rowWidth = InputFrame.rows();
int colWidth = InputFrame.cols();
Imgproc.cvtColor(InputFrame,InputFrame,Imgproc.COLOR_RGBA2RGB);
Imgproc.cvtColor(InputFrame,InputFrame,Imgproc.COLOR_RGB2HSV);
Lower_Yellow = new Scalar(21,150,150); //HSV color scale H to adjust color, S to control color variation, V is indicator of amt of light required to be shine on object to be seen.
Upper_Yellow = new Scalar(31,255,360); //HSV color scale
Core.inRange(InputFrame,Lower_Yellow, Upper_Yellow, maskForYellow);
final Size kernelSize = new Size(5, 5); //must be odd num size & greater than 1.
final Point anchor = new Point(-1, -1); //default (-1,-1) means that the anchor is at the center of the structuring element.
final int iterations = 1; //number of times dilation is applied. https://docs.opencv.org/3.4/d4/d76/tutorial_js_morphological_ops.html
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, kernelSize);
Imgproc.morphologyEx(maskForYellow, yellowMaskMorphed, Imgproc.MORPH_CLOSE, kernel, anchor, iterations); //dilate first to remove then erode. White regions becomes more pronounced, erode away black regions
Mat mIntermediateMat = new Mat();
Imgproc.GaussianBlur(yellowMaskMorphed,mIntermediateMat,new Size(9,9),0,0); //better result than kernel size (3,3, maybe cos reference area wider, bigger, can decide better whether inrange / out of range.
Imgproc.Canny(mIntermediateMat, mIntermediateMat, 5, 120); //try adjust threshold //https://stackoverflow.com/questions/25125670/best-value-for-threshold-in-canny
List<MatOfPoint> contours = new ArrayList<>();
Mat hierarchy = new Mat();
Imgproc.findContours(mIntermediateMat, contours, hierarchy, Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_SIMPLE, new Point(0, 0));
byte[] arr = new byte[100];
//List<double>hierarchyHolder = new ArrayList<>();
int cols = hierarchy.cols();
int rows = hierarchy.rows();
for (int i=0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
//hierarchyHolder.add(hierarchy.get(i,j));
//hierarchy.get(i,j) is a double[] type, not byte.
Log.d("hierarchy"," " + hierarchy.get(i,j).toString());
}
}
double maxArea1 = 0;
int maxAreaIndex1 = 0;
//MatOfPoint max_contours = new MatOfPoint();
Rect r = null;
ArrayList<Rect> rect_array = new ArrayList<Rect>();
for(int i=0; i < contours.size(); i++) {
//if(Imgproc.contourArea(contours.get(i)) > 300) { //Size of Mat contour # that particular point in ArrayList of Points.
double contourArea1 = Imgproc.contourArea(contours.get(i));
//Size of Mat contour # that particular point in ArrayList of Points.
if (maxArea1 < contourArea1){
maxArea1 = contourArea1;
maxAreaIndex1 = i;
}
//maxArea1 = Imgproc.contourArea(contours.get(i)); //assigned but nvr used
//max_contours = contours.get(i);
r = Imgproc.boundingRect(contours.get(maxAreaIndex1));
rect_array.add(r); //will only have 1 r in the array eventually, cos we will only take the one w largestContourArea.
}
Imgproc.cvtColor(InputFrame, InputFrame, Imgproc.COLOR_HSV2RGB);
if (rect_array.size() > 0) { //if got more than 1 rect found in rect_array, draw them out!
Iterator<Rect> it2 = rect_array.iterator(); //only got 1 though, this method much faster than drawContour, wont lag. =D
while (it2.hasNext()) {
Rect obj = it2.next();
//if
Imgproc.rectangle(InputFrame, obj.br(), obj.tl(),
new Scalar(0, 255, 0), 1);
}
}
//========= Compute CENTROID POS! WHAT WE WANT TO SHOW ON SCREEN EVENTUALLY!======================
List<Moments> mu = new ArrayList<>(contours.size()); //HUMoments
for (int i = 0; i < contours.size(); i++) {
mu.add(Imgproc.moments(contours.get(i)));
}
List<Point> mc = new ArrayList<>(contours.size()); //the Circle centre Point!
for (int i = 0; i < contours.size(); i++) {
//add 1e-5 to avoid division by zero
mc.add(new Point(mu.get(i).m10 / (mu.get(i).m00 + 1e-5), mu.get(i).m01 / (mu.get(i).m00 + 1e-5)));
}
for (int i = 0; i < contours.size(); i++) {
Scalar color = new Scalar(150, 150, 150);
Imgproc.circle(InputFrame, mc.get(i), 20, color, -1); //just to plot the small central point as a dot on the detected ImgObject.
}
Picture of output when view on CameraFrame:
1:
2:
Managed to resolve the issue, instead of looping thru the entire array of Contours for yellow objects detected, and pass each contour as parameter to Imgproc.moments, I now only assign the Contour at the particular index which the LargestContour is detected, so only 1 single contour is being processed now by Imgproc.moments to compute Centroid! The code correction is as shown below
//========= Compute CENTROID POS! WHAT WE WANT TO SHOW ON SCREEN EVENTUALLY!======================
List<Moments> mu = new ArrayList<>(contours.size());
mu.add(Imgproc.moments(contours.get(maxAreaContourIndex1))); //Just adding that 1 Single Largest Contour (largest ContourArea) to arryalist to be computed for MOMENTS to compute CENTROID POS!
List<Point> mc = new ArrayList<>(contours.size()); //the Circle centre Point!
//add 1e-5 to avoid division by zero
mc.add(new Point(mu.get(0).m10 / (mu.get(0).m00 + 1e-5), mu.get(0).m01 / (mu.get(0).m00 + 1e-5))); //index 0 cos there shld only be 1 contour now, the largest one only!
//notice that it only adds 1 point, the centroid point. Hence only 1 point in the mc list<Point>, so ltr reference that point w an index 0!
Scalar color = new Scalar(150, 150, 150);
Imgproc.circle(InputFrame, mc.get(0), 15, color, -1); //just to plot the small central point as a dot on the detected ImgObject.
I want to ask about some ideas / study materials connected to binarization. I am trying to create system that detects human emotions. I am able to get areas such as brows, eyes, nose, mouth etc. but then comes another stage -> processing...
My images are taken in various places/time of day/weather conditions. It's problematic during binarization, with the same treshold value one images are fully black, other looks well and provide me informations I want.
What I want to ask you about is:
1) If there is known way how to bring all images to the same level of brightness?
2) How to create dependency between treshold value and brightness on image?
What I have tried for now is normalize the image... but there are no effects, maybe I'm doing something wrong. I'm using OpenCV (for android)
Core.normalize(cleanFaceMatGRAY, cleanFaceMatGRAY,0, 255, Core.NORM_MINMAX, CvType.CV_8U);
EDIT:
I tried adaptive treshold, OTSU - they didnt work for me. I have problems with using CLAHE in Android but I managed to implement Niblack algorithm.
Core.normalize(cleanFaceMatGRAY, cleanFaceMatGRAY,0, 255, Core.NORM_MINMAX, CvType.CV_8U);
nibelBlackTresholding(cleanFaceMatGRAY, -0.2);
private void nibelBlackTresholding(Mat image, double parameter) {
Mat meanPowered = image.clone();
Core.multiply(image, image, meanPowered);
Scalar mean = Core.mean(image);
Scalar stdmean = Core.mean(meanPowered);
double tresholdValue = mean.val[0] + parameter * stdmean.val[0];
int totalRows = image.rows();
int totalCols = image.cols();
for (int cols=0; cols < totalCols; cols++) {
for (int rows=0; rows < totalRows; rows++) {
if (image.get(rows, cols)[0] > tresholdValue) {
image.put(rows, cols, 255);
} else {
image.put(rows, cols, 0);
}
}
}
}
The results are really good, but still not enough for some images. I paste links cuz images are big and I don't want to take too much screen:
For example this one is tresholded really fine:
https://dl.dropboxusercontent.com/u/108321090/a1.png
https://dl.dropboxusercontent.com/u/108321090/a.png
But bad light produce shadows sometimes and this gives this effect:
https://dl.dropboxusercontent.com/u/108321090/b1.png
https://dl.dropboxusercontent.com/u/108321090/b.png
Do you have any idea that could help me to improve treshold of those images with high light difference (shadows)?
EDIT2:
I found that my previous Algorithm is implemented in wrong way. Std was calculated in wrong way. In Niblack Thresholding mean is local value not global. I repaired it according to this reference http://arxiv.org/ftp/arxiv/papers/1201/1201.5227.pdf
private void niblackThresholding2(Mat image, double parameter, int window) {
int totalRows = image.rows();
int totalCols = image.cols();
int offset = (window-1)/2;
double tresholdValue = 0;
double localMean = 0;
double meanDeviation = 0;
for (int y=offset+1; y<totalCols-offset; y++) {
for (int x=offset+1; x<totalRows-offset; x++) {
localMean = calculateLocalMean(x, y, image, window);
meanDeviation = image.get(y, x)[0] - localMean;
tresholdValue = localMean*(1 + parameter * ( (meanDeviation/(1 - meanDeviation)) - 1 ));
Log.d("QWERTY","TRESHOLD " +tresholdValue);
if (image.get(y, x)[0] > tresholdValue) {
image.put(y, x, 255);
} else {
image.put(y, x, 0);
}
}
}
}
private double calculateLocalMean(int x, int y, Mat image, int window) {
int offset = (window-1)/2;
Mat tempMat;
Rect tempRect = new Rect();
Point leftTop, bottomRight;
leftTop = new Point(x - (offset + 1), y - (offset + 1));
bottomRight = new Point(x + offset, y + offset);
tempRect = new Rect(leftTop, bottomRight);
tempMat = new Mat(image, tempRect);
return Core.mean(tempMat).val[0];
}
Results for 7x7 window and proposed in reference k parameter = 0.34: I still can't get rid of shadow on faces.
https://dl.dropboxusercontent.com/u/108321090/b2.png
https://dl.dropboxusercontent.com/u/108321090/b1.png
things to look at:
http://docs.opencv.org/java/org/opencv/imgproc/CLAHE.html
http://docs.opencv.org/java/org/opencv/imgproc/Imgproc.html#adaptiveThreshold(org.opencv.core.Mat,%20org.opencv.core.Mat,%20double,%20int,%20int,%20int,%20double)
http://docs.opencv.org/java/org/opencv/imgproc/Imgproc.html#threshold(org.opencv.core.Mat,%20org.opencv.core.Mat,%20double,%20double,%20int) (THRESH_OTSU)
I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.