OpenCV crop function fatal signal 11 - android

Hello I am doing an android app which uses OpenCV to detect rectangles/squares, to detect them I am using functions (modified a bit) from squares.cpp. Points of every square found I am storing in vector> squares, then i pass it to the function which choose the biggest one and store it in vector theBiggestSq. The problem is with the cropping function which code i will paste below (i will post the link to youtube showing the problem too). If the actual square is far enough from the camera it works ok but if i will close it a bit in some point it will hang. I will post the print screen of the problem from LogCat and there are the points printed out (the boundaries points taken from theBiggestSq vector, maybe it will help to find the solution).
void cutAndSave(vector<Point> theBiggestSq, Mat image){
RotatedRect box = minAreaRect(Mat(theBiggestSq));
// Draw bounding box in the original image (debug purposes)
//cv::Point2f vertices[4];
//box.points(vertices);
//for (int i = 0; i < 4; ++i)
//{
//cv::line(img, vertices[i], vertices[(i + 1) % 4], cv::Scalar(0, 255, 0), 1, CV_AA);
//}
//cv::imshow("box", img);
//cv::imwrite("box.png", img);
// Set Region of Interest to the area defined by the box
Rect roi;
roi.x = box.center.x - (box.size.width / 2);
roi.y = box.center.y - (box.size.height / 2);
roi.width = box.size.width;
roi.height = box.size.height;
// Crop the original image to the defined ROI
//bmp=Bitmap.createBitmap(box.size.width / 2, box.size.height / 2, Bitmap.Config.ARGB_8888);
Mat crop = image(roi);
//Mat crop = image(Rect(roi.x, roi.y, roi.width, roi.height)).clone();
//Utils.matToBitmap(crop*.clone()* ,bmp);
imwrite("/sdcard/OpenCVTest/1.png", bmp);
imshow("crop", crop);
}
video of my app and its problems
cords printed respectively are: roi.x roi.y roi.width roi.height
Another problem is that the boundaries drawn should have a green colour but as you see on the video they are distorted (flexed like those boundaries would be made from glass?).
Thank you for any help. I am new in openCV doing it from only one month so please be tolerant.
EDIT:
drawing code:
//draw//
for( size_t i = 0; i < squares.size(); i++ )
{
const Point* p = &squares[i][0];
int n = (int)squares[i].size();
polylines(mBgra, &p, &n, 1, true, Scalar(255,255,0), 5, 10);
//Rect rect = boundingRect(cv::Mat(squares[i]));
//rectangle(mBgra, rect.tl(), rect.br(), cv::Scalar(0,255,0), 2, 8, 0);
}

This error basically tells you the cause - your ROI exceeds the image dimensions. This means that when you are extracting Rect roi from RotatedRect box then either x or y are smaller than zero, or the width/height pushes the dimensions outside the image. You should check this using something like
// Propose rectangle from data
int proposedX = box.center.x - (box.size.width / 2);
int proposedY = box.center.y - (box.size.height / 2);
int proposedW = box.size.width;
int proposedH = box.size.height;
// Ensure top-left edge is within image
roi.x = proposedX < 0 ? 0 : proposedX;
roi.y = proposedY < 0 ? 0 : proposedY;
// Ensure bottom-right edge is within image
roi.width =
(roi.x - 1 + proposedW) > image.cols ? // Will this roi exceed image?
(image.cols - 1 - roi.x) // YES: make roi go to image edge
: proposedW; // NO: continue as proposed
// Similar for height
roi.height = (roi.y - 1 + proposedH) > image.rows ? (image.rows - 1 - roi.y) : proposedH;

Related

Overlay a image over live frame - OpenCV4Android

I'm trying to overlay on the live frame a little image previosuly selected by the user. I already have the image's path from a previous activity. My problem is that I am not able to show the image on the frame.
I'm trying to detect a rectangle on the frame, and over the rectangle display the image selected. I could detect the rectangle, but now i can't display the image on any part of the frame (I don't care about the rectangle right now).
I've been trying to do it with the explanations from Adding Image Overlay OpenCV for Android and add watermark small image to large image opencv4android but it didn't worked for me.
public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
Mat gray = inputFrame.gray();
Mat dst = inputFrame.rgba();
Imgproc.pyrDown(gray, dsIMG, new Size(gray.cols() / 2, gray.rows() / 2));
Imgproc.pyrUp(dsIMG, usIMG, gray.size());
Imgproc.Canny(usIMG, bwIMG, 0, threshold);
Imgproc.dilate(bwIMG, bwIMG, new Mat(), new Point(-1, 1), 1);
List<MatOfPoint> contours = new ArrayList<MatOfPoint>();
cIMG = bwIMG.clone();
Imgproc.findContours(cIMG, contours, hovIMG, Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_SIMPLE);
for (MatOfPoint cnt : contours) {
MatOfPoint2f curve = new MatOfPoint2f(cnt.toArray());
Imgproc.approxPolyDP(curve, approxCurve, 0.02 * Imgproc.arcLength(curve, true), true);
int numberVertices = (int) approxCurve.total();
double contourArea = Imgproc.contourArea(cnt);
if (Math.abs(contourArea) < 100) {
continue;
}
//Rectangle detected
if (numberVertices >= 4 && numberVertices <= 6) {
List<Double> cos = new ArrayList<>();
for (int j = 2; j < numberVertices + 1; j++) {
cos.add(angle(approxCurve.toArray()[j % numberVertices], approxCurve.toArray()[j - 2], approxCurve.toArray()[j - 1]));
}
Collections.sort(cos);
double mincos = cos.get(0);
double maxcos = cos.get(cos.size() - 1);
if (numberVertices == 4 && mincos >= -0.3 && maxcos <= 0.5) {
//Small watermark image
Mat a = imread(img_path);
Mat bSubmat = dst.submat(0, dst.rows() -1 , 0, dst.cols()-1);
a.copyTo(bSubmat);
}
}
}
return dst;
}
NOTE: img_path is the path of the selected image I want to display over the frame. I got it from the previous activity.
By now, I just want to display the image over the frame. Later, I will try to display it on the same position where it found the rectangle.
Please, any suggestion or recommendation is welcome, as I am new with OpenCV. I'm sorry for my english, but feel free to ask me anything I didn't explain correctly. I'll do my best to explain it better.
Thanks a lot!
If you just want to display the image as an overlay, and not save it as part of the video, you may find it easier to simple display it in a separate view above the video view. This will likely use less processing, battery etc also.
If you want to draw onto the camera image bitmap then the following will allow you do that:
Bitmap cameraBitmap = BitmapFactory.decodeByteArray(bytes,0,bytes.length, opt);
Canvas camImgCanvas = new Canvas(cameraBitmap);
Drawable d = ContextCompat.getDrawable(getActivity(), R.drawable.myDrawable);
//Centre the drawing
int bitMapWidthCenter = cameraBitmap.getWidth()/2;
int bitMapheightCenter = cameraBitmap.getHeight()/2;
d.setBounds(bitMapWidthCenter, bitMapheightCenter, bitMapWidthCenter+d.getIntrinsicWidth(),
bitMapheightCenter+d.getIntrinsicHeight());
//And draw it...
d.draw(camImgCanvas);

Hough circle doesn't detect eyes iris

I want to detect eyes irises and their centers using Hough Circle algorithm.
I'm using this code:
private void houghCircle()
{
Bitmap obtainedBitmap = imagesList.getFirst();
/* convert bitmap to mat */
Mat mat = new Mat(obtainedBitmap.getWidth(),obtainedBitmap.getHeight(),
CvType.CV_8UC1);
Mat grayMat = new Mat(obtainedBitmap.getWidth(), obtainedBitmap.getHeight(),
CvType.CV_8UC1);
Utils.bitmapToMat(obtainedBitmap, mat);
/* convert to grayscale */
int colorChannels = (mat.channels() == 3) ? Imgproc.COLOR_BGR2GRAY : ((mat.channels() == 4) ? Imgproc.COLOR_BGRA2GRAY : 1);
Imgproc.cvtColor(mat, grayMat, colorChannels);
/* reduce the noise so we avoid false circle detection */
Imgproc.GaussianBlur(grayMat, grayMat, new Size(9, 9), 2, 2);
// accumulator value
double dp = 1.2d;
// minimum distance between the center coordinates of detected circles in pixels
double minDist = 100;
// min and max radii (set these values as you desire)
int minRadius = 0, maxRadius = 1000;
// param1 = gradient value used to handle edge detection
// param2 = Accumulator threshold value for the
// cv2.CV_HOUGH_GRADIENT method.
// The smaller the threshold is, the more circles will be
// detected (including false circles).
// The larger the threshold is, the more circles will
// potentially be returned.
double param1 = 70, param2 = 72;
/* create a Mat object to store the circles detected */
Mat circles = new Mat(obtainedBitmap.getWidth(), obtainedBitmap.getHeight(), CvType.CV_8UC1);
/* find the circle in the image */
Imgproc.HoughCircles(grayMat, circles, Imgproc.CV_HOUGH_GRADIENT, dp, minDist, param1, param2, minRadius, maxRadius);
/* get the number of circles detected */
int numberOfCircles = (circles.rows() == 0) ? 0 : circles.cols();
/* draw the circles found on the image */
for (int i=0; i<numberOfCircles; i++) {
/* get the circle details, circleCoordinates[0, 1, 2] = (x,y,r)
* (x,y) are the coordinates of the circle's center
*/
double[] circleCoordinates = circles.get(0, i);
int x = (int) circleCoordinates[0], y = (int) circleCoordinates[1];
Point center = new Point(x, y);
int radius = (int) circleCoordinates[2];
/* circle's outline */
Core.circle(mat, center, radius, new Scalar(0,
255, 0), 4);
/* circle's center outline */
Core.rectangle(mat, new Point(x - 5, y - 5),
new Point(x + 5, y + 5),
new Scalar(0, 128, 255), -1);
}
/* convert back to bitmap */
Utils.matToBitmap(mat, obtainedBitmap);
MediaStore.Images.Media.insertImage(getContentResolver(),obtainedBitmap, "testgray", "gray" );
}
But it doesn't detect iris in all images correctly. Specially, if the iris has a dark color like brown. How can I fix this code to detect the irises and their centers correctly?
EDIT: Here are some sample images (which I got from the web) that shows the performance of the algorithm (Please ignore the landmarks which are represented by the red squares):
In these images the algorithm doesn't detect all irises:
This image shows how the algorithm couldn't detect irises at all:
EDIT 2: Here is a code which uses Canny edge detection, but it causes the app to crash:
private void houghCircle()
{
Mat grayMat = new Mat();
Mat cannyEdges = new Mat();
Mat circles = new Mat();
Bitmap obtainedBitmap = imagesList.getFirst();
/* convert bitmap to mat */
Mat originalBitmap = new Mat(obtainedBitmap.getWidth(),obtainedBitmap.getHeight(),
CvType.CV_8UC1);
//Converting the image to grayscale
Imgproc.cvtColor(originalBitmap,grayMat,Imgproc.COLOR_BGR2GRAY);
Imgproc.Canny(grayMat, cannyEdges,10, 100);
Imgproc.HoughCircles(cannyEdges, circles,
Imgproc.CV_HOUGH_GRADIENT,1, cannyEdges.rows() / 15); //now circles is filled with detected circles.
//, grayMat.rows() / 8);
Mat houghCircles = new Mat();
houghCircles.create(cannyEdges.rows(),cannyEdges.cols()
,CvType.CV_8UC1);
//Drawing lines on the image
for(int i = 0 ; i < circles.cols() ; i++)
{
double[] parameters = circles.get(0,i);
double x, y;
int r;
x = parameters[0];
y = parameters[1];
r = (int)parameters[2];
Point center = new Point(x, y);
//Drawing circles on an image
Core.circle(houghCircles,center,r,
new Scalar(255,0,0),1);
}
//Converting Mat back to Bitmap
Utils.matToBitmap(houghCircles, obtainedBitmap);
MediaStore.Images.Media.insertImage(getContentResolver(),obtainedBitmap, "testgray", "gray" );
}
This is the error I get in the log
FATAL EXCEPTION: Thread-28685
CvException [org.opencv.core.CvException: cv::Exception: /hdd2/buildbot/slaves/slave_ardbeg1/50-SDK/opencv/modules/imgproc/src/color.cpp:3739: error: (-215) scn == 3 || scn == 4 in function void cv::cvtColor(cv::InputArray, cv::OutputArray, int, int)
]
at org.opencv.imgproc.Imgproc.cvtColor_1(Native Method)
at org.opencv.imgproc.Imgproc.cvtColor(Imgproc.java:4598)
Which is caused by this line: Imgproc.cvtColor(originalBitmap,grayMat,Imgproc.COLOR_BGR2GRAY);
Can anyone please tell me how this error can solved? Perhaps adding a canny edge detection will improve the results.
Hough circles work better on well defined circles. They are not good with things like iris.
After some thresholding, morphological operations or canny edge detection, feature detection methods like MSER work much better for iris detection.
Here is a similar question with a solution if you are looking for some code.
As you want to detect iris using hough transform (there are others), you had better studying the Canny edge detector and its parameters.
cv::HoughCircles takes the Canny-hysteresis threshold in param1. Investigating Canny alone, you get the impression of good threshold range.
Maybe instead of gaussian blur, you apply a better denoising (non local means with say h=32 and window sizes 5 and 15), and also try to harmonize the image contrast, e.g., using contrast limited adaptive histogram equalization (cv::CLAHE).
Harmonization is to make sure all (highlight and shadow) eyes map to similar intensity range.
I wanted to know if those images are the images you processed or if you like took a cell phone snapshot of your screen to upload them here. Because the irises are bigger than the maximum radius you set in your code. Therefor I don't understand how you could find any iris at all. The irises in the first image have a radius of over 20. So you shouldn't be able to detect them.
You should set the radii to the radius range you expect your irises to be.

Library for android for Google cardboard barrel distortion

I am trying to implement a 3D app for Android that should also support cardboard like viewers. I have seen some of those images and they seem to have some kind of barrel distortion in order to be orthogonal through the cardboard lenses.
So I was looking for algorithms or libraries specifically for Java/Android that would help me achieving this.
I have found this implementation: http://www.helviojunior.com.br/fotografia/barrel-and-pincushion-distortion/
It would be great to have something like this because it has everything I'd need. Unfortunately it's for C# and it has some specific code that I just couldn't easily translate into more generic code.
Then there is a simpler Java implementation here: http://popscan.blogspot.de/2012/04/fisheye-lens-equation-simple-fisheye.html
I have changed it to:
public static Bitmap fisheye(Bitmap srcimage) {
/*
* Fish eye effect
* tejopa, 2012-04-29
* http://popscan.blogspot.com
* http://www.eemeli.de
*/
// get image pixels
double w = srcimage.getWidth();
double h = srcimage.getHeight();
int[] srcpixels = new int[(int)(w*h)];
srcimage.getPixels(srcpixels, 0, (int)w, 0, 0, (int)w, (int)h);
Bitmap resultimage = srcimage.copy(srcimage.getConfig(), true);
// create the result data
int[] dstpixels = new int[(int)(w*h)];
// for each row
for (int y=0;y<h;y++) {
// normalize y coordinate to -1 ... 1
double ny = ((2*y)/h)-1;
// pre calculate ny*ny
double ny2 = ny*ny;
// for each column
for (int x=0;x<w;x++) {
// preset to black
dstpixels[(int)(y*w+x)] = 0;
// normalize x coordinate to -1 ... 1
double nx = ((2*x)/w)-1;
// pre calculate nx*nx
double nx2 = nx*nx;
// calculate distance from center (0,0)
// this will include circle or ellipse shape portion
// of the image, depending on image dimensions
// you can experiment with images with different dimensions
double r = Math.sqrt(nx2+ny2);
// discard pixels outside from circle!
if (0.0<=r&&r<=1.0) {
double nr = Math.sqrt(1.0-r*r);
// new distance is between 0 ... 1
nr = (r + (1.0-nr)) / 2.0;
// discard radius greater than 1.0
if (nr<=1.0) {
// calculate the angle for polar coordinates
double theta = Math.atan2(ny,nx);
// calculate new x position with new distance in same angle
double nxn = nr*Math.cos(theta);
// calculate new y position with new distance in same angle
double nyn = nr*Math.sin(theta);
// map from -1 ... 1 to image coordinates
int x2 = (int)(((nxn+1)*w)/2.0);
// map from -1 ... 1 to image coordinates
int y2 = (int)(((nyn+1)*h)/2.0);
// find (x2,y2) position from source pixels
int srcpos = (int)(y2*w+x2);
// make sure that position stays within arrays
if (srcpos>=0 & srcpos < w*h) {
// get new pixel (x2,y2) and put it to target array at (x,y)
dstpixels[(int)(y*w+x)] = srcpixels[srcpos];
}
}
}
}
}
resultimage.setPixels(dstpixels, 0, (int)w, 0, 0, (int)w, (int)h);
//return result pixels
return resultimage;
}
But it doesn't have this lens factor, so the resulting image is always a full circle/ellipse.
Any chance you could point me to some working Java code or library or (maybe even better) help me to amend this code for the lens factor to be taken into account (0.0 <= factor <= 1.0)?
I managed to get it to work.
Bottom line: I created a Bitmap bigger than the original Bitmap, and then I drew the original Bitmap on the new Bitmap (and centered it there) using
Canvas canvas = new Canvas(newBitmap);
canvas.drawBitmap(originalBitmap, null, new Rect(x, y, r, b), null);
I used the Java algorithm posted in my question to create the effect on the new Bitmap. That worked great.

Coin detection using android opencv

I am trying to detect coin ( circle ) detection using Opencv4Android.
So far I have tried two approaches
1 ) Regular method :
// convert image to grayscale
Imgproc.cvtColor(mRgba, mGray, Imgproc.COLOR_RGBA2GRAY);
// apply Gaussian Blur
Imgproc.GaussianBlur(mGray, mGray, sSize5, 2, 2);
iMinRadius = 20;
iMaxRadius = 400;
iAccumulator = 300;
iCannyUpperThreshold = 100;
//apply houghCircles
Imgproc.HoughCircles(mGray, mIntermediateMat, Imgproc.CV_HOUGH_GRADIENT, 2.0, mGray.rows() / 8,
iCannyUpperThreshold, iAccumulator, iMinRadius, iMaxRadius);
if (mIntermediateMat.cols() > 0)
for (int x = 0; x < Math.min(mIntermediateMat.cols(), 10); x++) {
double vCircle[] = mIntermediateMat.get(0,x);
if (vCircle == null)
break;
pt.x = Math.round(vCircle[0]);
pt.y = Math.round(vCircle[1]);
radius = (int)Math.round(vCircle[2]);
// draw the found circle
Core.circle(mRgba, pt, radius, colorRed, iLineThickness);
}
2 ) Sobel and then Hough Cicles
// apply Gaussian Blur
Imgproc.GaussianBlur(mRgba, mRgba, sSize3, 2, 2,
Imgproc.BORDER_DEFAULT);
// / Convert it to grayscale
Imgproc.cvtColor(mRgba, mGray, Imgproc.COLOR_RGBA2GRAY);
// / Gradient X
Imgproc.Sobel(mGray, grad_x, CvType.CV_16S, 1, 0, 3, scale, delta,
Imgproc.BORDER_DEFAULT);
Core.convertScaleAbs(grad_x, abs_grad_x);
// / Gradient Y
Imgproc.Sobel(mGray, grad_y, CvType.CV_16S, 0, 1, 3, scale, delta,
Imgproc.BORDER_DEFAULT);
Core.convertScaleAbs(grad_y, abs_grad_y);
// / Total Gradient (approximate)
Core.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0, grad);
iCannyUpperThreshold = 100;
Imgproc.HoughCircles(grad, mIntermediateMat,
Imgproc.CV_HOUGH_GRADIENT, 2.0, grad.rows() / 8,
iCannyUpperThreshold, iAccumulator, iMinRadius, iMaxRadius);
if (mIntermediateMat.cols() > 0)
for (int x = 0; x < Math.min(mIntermediateMat.cols(), 10); x++) {
double vCircle[] = mIntermediateMat.get(0, x);
if (vCircle == null)
break;
pt.x = Math.round(vCircle[0]);
pt.y = Math.round(vCircle[1]);
radius = (int) Math.round(vCircle[2]);
// draw the found circle
Core.circle(mRgba, pt, radius, colorRed, iLineThickness);
}
method one gives fair result in case of coin detection and method two gives better result
Out of these two methods second method processing is slow but results are good
Both of these methods are working when camera frmae is caputured using JavaCameraView or NativeCameraView from opencv library .
If I use same procedure on image captured from android naive image capture intent which returns Bitmap , I am unable to get any results at all i.e. no circles are detected at all.
In methods one sometimes I get circle detected when using Bitmap captured using android camera intent.
I also tried changing the captured bitmap as suggested in this Post but still no circle detection.
Can anybody tell me what modifications I have to do.
And also I want to know which algorithm will give better results in coin ( circle ) detection but with less processing.
I have played with various values of houghCircle method and also tried canny edge out put as intput to houghCircles but its not considerably good enough.

How to improve OCR Accuracy which use tesseract? [duplicate]

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.

Categories

Resources