I'm trying to build a classification model with keras and deploy the model to my Android phone. I use the code from this website to deploy my own converted model, which is a .pb file, to my Android phone. I load a image from my phone and everything worked fine, but the prediction result is totally different from the result I got from my PC.
The procedure of testing on my PC are:
load the image with cv2, and convert to np.float32
use the keras resnet50 'preprocess_input' python function to preprocess the image
expand the image dimension for batching (batch size is 1)
forward the image to model and get the result
Relevant code:
img = cv2.imread('./my_test_image.jpg')
x = preprocess_input(img.astype(np.float32))
x = np.expand_dims(x, axis=0)
net = load_model('./my_model.h5')
prediction_result = net.predict(x)
And I noticed that the image preprocessing part of Android is different from the method I used in keras, which mode is caffe(convert the images from RGB to BGR, then zero-center each color channel with respect to the ImageNet dataset). It seems that the original code is for mode tf(will scale pixels between -1 to 1).
So I modified the following code of 'preprocessBitmap' to what I think it should be, and use a 3 channel RGB image with pixel value [127,127,127] to test it. The code predicted the same result as .h5 model did. But when I load a image to classify, the prediction result is different from .h5 model.
Does anyone has any idea? Thank you very much.
I have tried the following:
Load a 3 channel RGB image in my Phone with pixel value [127,127,127], and use the modified code below, and it will give me a prediction result that is same as prediction result using .h5 model on PC.
Test the converted .pb model on PC using tensorflow gfile module with a image, and it give me a correct prediction result (compare to .h5 model). So I think the converted .pb file does not have any problem.
Entire section of preprocessBitmap
// code of 'preprocessBitmap' section in TensorflowImageClassifier.java
TraceCompat.beginSection("preprocessBitmap");
// Preprocess the image data from 0-255 int to normalized float based
// on the provided parameters.
bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());
for (int i = 0; i < intValues.length; ++i) {
// this is a ARGB format, so we need to mask the least significant 8 bits to get blue, and next 8 bits to get green and next 8 bits to get red. Since we have an opaque image, alpha can be ignored.
final int val = intValues[i];
// original
/*
floatValues[i * 3 + 0] = (((val >> 16) & 0xFF) - imageMean) / imageStd;
floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - imageMean) / imageStd;
floatValues[i * 3 + 2] = ((val & 0xFF) - imageMean) / imageStd;
*/
// what I think it should be to do the same thing in mode caffe when using keras
floatValues[i * 3 + 0] = (((val >> 16) & 0xFF) - (float)123.68);
floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - (float)116.779);
floatValues[i * 3 + 2] = (((val & 0xFF)) - (float)103.939);
}
TraceCompat.endSection();
This question is old, but remains the top Google result for preprocess_input for ResNet50 on Android. I could not find an answer for implementing preprocess_input for Java/Android, so I came up with the following based on the original python/keras code:
/*
Preprocesses RGB bitmap IAW keras/imagenet
Port of https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/applications/imagenet_utils.py#L169
with data_format='channels_last', mode='caffe'
Convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling.
Returns 3D float array
*/
static float[][][] imagenet_preprocess_input_caffe( Bitmap bitmap ) {
// https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/applications/imagenet_utils.py#L210
final float[] imagenet_means_caffe = new float[]{103.939f, 116.779f, 123.68f};
float[][][] result = new float[bitmap.getHeight()][bitmap.getWidth()][3]; // assuming rgb
for (int y = 0; y < bitmap.getHeight(); y++) {
for (int x = 0; x < bitmap.getWidth(); x++) {
final int px = bitmap.getPixel(x, y);
// rgb-->bgr, then subtract means. no scaling
result[y][x][0] = (Color.blue(px) - imagenet_means_caffe[0] );
result[y][x][1] = (Color.green(px) - imagenet_means_caffe[1] );
result[y][x][2] = (Color.red(px) - imagenet_means_caffe[2] );
}
}
return result;
}
Usage with a 3D tensorflow-lite input with shape (1,224,224,3):
Bitmap bitmap = <your bitmap of size 224x224x3>;
float[][][][] imgValues = new float[1][bitmap.getHeight()][bitmap.getWidth()][3];
imgValues[0]=imagenet_preprocess_input_caffe(bitmap);
... <prep tfInput, tfOutput> ...
tfLite.run(tfInput, tfOutput);
My android app uses an external lib that makes some image treatments. The final output of the treatment chain is a monochrome bitmap but saved has a color bitmap (32bpp).
The image has to be uploaded to a cloud blob, so for bandwidth concerns, i'd like to convert it to 1bpp G4 compression TIFF. I successfully integrated libTIFF in my app via JNI and now i'm writing the conversion routine in C. I'm a little stuck here.
I managed to produce a 32 BPP TIFF, but impossible to reduce to 1bpp, the output image is always unreadable. Did someone succeded to do similar task ?
More speciffically :
What should be the value of SAMPLE_PER_PIXEL and BITS_PER_SAMPLE
parameters ?
How to determine the strip size ?
How to fill each strip ? (i.e. : How to convert 32bpp pixel lines to 1 bpp pixels strips ?)
Many thanks !
UPDATE : The code produced with the precious help of Mohit Jain
int ConvertMonochrome32BppBitmapTo1BppTiff(char* bitmap, int height, int width, int resx, int resy, char const *tifffilename)
{
TIFF *tiff;
if ((tiff = TIFFOpen(tifffilename, "w")) == NULL)
{
return TC_ERROR_OPEN_FAILED;
}
// TIFF Settings
TIFFSetField(tiff, TIFFTAG_RESOLUTIONUNIT, RESUNIT_INCH);
TIFFSetField(tiff, TIFFTAG_XRESOLUTION, resx);
TIFFSetField(tiff, TIFFTAG_YRESOLUTION, resy);
TIFFSetField(tiff, TIFFTAG_COMPRESSION, COMPRESSION_CCITTFAX4); //Group4 compression
TIFFSetField(tiff, TIFFTAG_IMAGEWIDTH, width);
TIFFSetField(tiff, TIFFTAG_IMAGELENGTH, height);
TIFFSetField(tiff, TIFFTAG_ROWSPERSTRIP, 1);
TIFFSetField(tiff, TIFFTAG_SAMPLESPERPIXEL, 1);
TIFFSetField(tiff, TIFFTAG_BITSPERSAMPLE, 1);
TIFFSetField(tiff, TIFFTAG_ORIENTATION, ORIENTATION_TOPLEFT);
TIFFSetField(tiff, TIFFTAG_PLANARCONFIG, PLANARCONFIG_CONTIG);
TIFFSetField(tiff, TIFFTAG_PHOTOMETRIC, PHOTOMETRIC_MINISWHITE);
tsize_t tbufsize = (width + 7) / 8; //Tiff ScanLine buffer size for 1bpp pixel row
//Now writing image to the file one row by one
int x, y;
for (y = 0; y < height; y++)
{
char *buffer = malloc(tbufsize);
memset(buffer, 0, tbufsize);
for (x = 0; x < width; x++)
{
//offset of the 1st byte of each pixel in the input image (is enough to determine is black or white in 32 bpp monochrome bitmap)
uint32 bmpoffset = ((y * width) + x) * 4;
if (bitmap[bmpoffset] == 0) //Black pixel ?
{
uint32 tiffoffset = x / 8;
*(buffer + tiffoffset) |= (0b10000000 >> (x % 8));
}
}
if (TIFFWriteScanline(tiff, buffer, y, 0) != 1)
{
return TC_ERROR_WRITING_FAILED;
}
if (buffer)
{
free(buffer);
buffer = NULL;
}
}
TIFFClose(tiff);
tiff = NULL;
return TC_SUCCESSFULL;
}
To convert 32 bpp to 1 bpp, extract RGB and convert it into Y (luminance) and use some threshold to convert to 1 bpp.
Number of samples and bits per pixel should be 1.
I'm a beginner in openCV4android and I would like to get some help if possible
.
I'm trying to detect colored triangles,squares or circles using my Android phone camera but I don't know where to start.
I have been reading OReilly Learning OpenCV book and I got some knowledge about OpenCV.
Here is what I want to make:
1- Get the tracking color (just the color HSV) of the object by touching the screen
- I have already done this by using the color blob example from the OpenCV4android example
2- Find on the camera shapes like triangles, squares or circles based on the color choosed before.
I have just found examples of finding shapes within an image. What I would like to make is finding using the camera on real time.
Any help would be appreciated.
Best regards and have a nice day.
If you plan to implement NDK for your opencv stuff then you can use the same idea they are using in OpenCV tutorial 2-Mixedprocessing.
// on camera frames call your native method
public Mat onCameraFrame(CvCameraViewFrame inputFrame)
{
mRgba = inputFrame.rgba();
Nativecleshpdetect(mRgba.getNativeObjAddr()); // native method call to perform color and object detection
// the method getNativeObjAddr gets the address of the Mat object(camera frame) and passes it to native side as long object so that you dont have to create and destroy Mat object on each frame
}
public native void Nativecleshpdetect(long matAddrRgba);
In Native side
JNIEXPORT void JNICALL Java_org_opencv_samples_tutorial2_Tutorial2Activity_Nativecleshpdetect(JNIEnv*, jobject,jlong addrRgba1)
{
Mat& mRgb1 = *(Mat*)addrRgba1;
// mRgb1 is a mat object which points to the address of the input camera frame, so all the manipulations you do here will reflect on the live camera frame
//once you have your mat object(i.e mRgb1 ) you can implement all the colour and shape detection algorithm you have learnt in opencv book
}
since all manipulations are done using pointers you have to be bit careful handling them. hope this helps
Why dont you make use of JavaCV i think its a better alternative..you dont have to use the NDK at all for this..
try this:
http://code.google.com/p/javacv/
If you check OpenCV's Back Projection tutorial it does what you are looking for (and a bit more).
Back Projection:
"In terms of statistics, the values stored in the BackProjection
matrix represent the probability that a pixel in a image belongs to
the region with the selected color."
I have converted that tutorial to OpenCV4Android (2.4.8) like you were looking for, it does not use Android NDK. You can see all the code here at Github.
You can also check this answer for more details.
Though its a bit late i would like to make a contribution to the question.
1- Get the tracking color (just the color HSV) of the object by
touching the screen - I have already done this by using the color blob
example from the OpenCV4android example
Implement OnTouchListener to your activity
onTouch function
int cols = mRgba.cols();
int rows = mRgba.rows();
int xOffset = (mOpenCvCameraView.getWidth() - cols) / 2;
int yOffset = (mOpenCvCameraView.getHeight() - rows) / 2;
int x = (int) event.getX() - xOffset;
int y = (int) event.getY() - yOffset;
Log.i(TAG, "Touch image coordinates: (" + x + ", " + y + ")");
if ((x < 0) || (y < 0) || (x > cols) || (y > rows)) return false;
Rect touchedRect = new Rect();
touchedRect.x = (x > 4) ? x - 4 : 0;
touchedRect.y = (y > 4) ? y - 4 : 0;
touchedRect.width = (x + 4 < cols) ? x + 4 - touchedRect.x : cols - touchedRect.x;
touchedRect.height = (y + 4 < rows) ? y + 4 - touchedRect.y : rows - touchedRect.y;
Mat touchedRegionRgba = mRgba.submat(touchedRect);
Mat touchedRegionHsv = new Mat();
Imgproc.cvtColor(touchedRegionRgba, touchedRegionHsv, Imgproc.COLOR_RGB2HSV_FULL);
// Calculate average color of touched region
mBlobColorHsv = Core.sumElems(touchedRegionHsv);
int pointCount = touchedRect.width * touchedRect.height;
for (int i = 0; i < mBlobColorHsv.val.length; i++)
mBlobColorHsv.val[i] /= pointCount;
mBlobColorRgba = converScalarHsv2Rgba(mBlobColorHsv);
mColor = mBlobColorRgba.val[0] + ", " + mBlobColorRgba.val[1] + ", " + mBlobColorRgba.val[2] + ", " + mBlobColorRgba.val[3];
Log.i(TAG, "Touched rgba color: (" + mBlobColorRgba.val[0] + ", " + mBlobColorRgba.val[1] +
", " + mBlobColorRgba.val[2] + ", " + mBlobColorRgba.val[3] + ")");
mRGBA is a mat object which was initiated in onCameraViewStarted as
mRgba = new Mat(height, width, CvType.CV_8UC4);
And for the 2nd part:
2- Find on the camera shapes like triangles, squares or circles based
on the color choosed before.
I have tried to find out the selected contours shape using approxPolyDP
MatOfPoint2f contour2f = new MatOfPoint2f(contours.get(0).toArray());
//Processing on mMOP2f1 which is in type MatOfPoint2f
double approxDistance = Imgproc.arcLength(contour2f, true) * 0.02;
Imgproc.approxPolyDP(contour2f, approxCurve, approxDistance, true);
//Convert back to MatOfPoint
MatOfPoint points = new MatOfPoint(approxCurve.toArray());
System.out.println("points length" + points.toArray().length);
if( points.toArray().length == 5)
{
System.out.println("Pentagon");
mShape = "Pentagon";
}
else if(points.toArray().length > 5)
{
System.out.println("Circle");
Imgproc.drawContours(mRgba, contours, 0, new Scalar(255, 255, 0, -1));
mShape = "Circle";
}
else if(points.toArray().length == 4)
{
System.out.println("Square");
mShape = "Square";
}
else if(points.toArray().length == 4)
{
System.out.println("Triangle");
mShape = "Triangle";
}
This was done on onCameraFrame function after i obtained the contour list
For me if the length of point array was more than 5 it was usually a circle. But there is other algorithm to obtain circle and its attributes.
I'm writing an image processing app on android, and I'm trying to speed it up using the NDK. I have the following for-loop:
int x, y, c, idx;
const int pitch3 = pitch * 3;
float adj, result;
...
// px, py, u, u_bar are all float arrays of size nx*ny*3
// theta, tau, denom are float constants
// idx >= pitch3
for(y=1;y<ny;++y)
{
for(x=1;x<nx;++x)
{
for(c=0;c<3;++c)
{
adj = -px[idx] - py[idx] + px[idx - 3] + py[idx - pitch3];
result = ((u[idx] - tau * adj) + tau * f[idx]) * denom;
u_bar[idx] = result + theta * (result - u[idx]);
u[idx] = result;
++idx;
}
}
}
I'm wondering if it is possible to speed up this loop?
I'm thinking that using fixed-point arithmetic wouldn't do much, except on really old android phone (which I'm not going to target). Would writing it in assembly give a big improvement?
EDIT: I know I could use SIMD/NEON instructions, but they are not so common I think ...
Since you're accessing the array as a flat structure, the 3 levels of looping is only increasing the value used for idx. You can loop for (idx = pitch3; idx < nx*ny*3; idx++).
Another option is to move to fixed-point math. Do you really need more than 64 bits of dynamic range?
I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.