How to pass image to tflite model in android - android

I have converted a Yolo model to .tflite for use in android. This is how it was used in python -
net = cv2.dnn.readNet("yolov2.weights", "yolov2.cfg")
classes = []
with open("yolov3.txt", "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))
cap= cv2.VideoCapture(0)
while True:
_,frame= cap.read()
height,width,channel= frame.shape
blob = cv2.dnn.blobFromImage(frame, 0.00392, (320, 320), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.2:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
I used netron https://github.com/lutzroeder/netron to visualize the model. The input is described as name: inputs,
type: float32[1,416,416,3],
quantization: 0 ≤ q ≤ 255,
location: 399
and the output as
name: output_boxes,
type: float32[1,10647,8],
location: 400.
My problem is regarding using this model in android. I have loaded the model in "Interpreter tflite", I am getting the input frames from the camera in byte[] format. How can I convert it into the required input for tflite.run(input, output)?

You need to resize the input image to match with the input size of TensorFlow-Lite model, and then convert it to RGB format to feed to the model.
By using the ImageProcessor from TensorFlow-Lite Support Library, you can easily do image resizing and conversion.
ImageProcessor imageProcessor =
new ImageProcessor.Builder()
.add(new ResizeWithCropOrPadOp(cropSize, cropSize))
.add(new ResizeOp(imageSizeX, imageSizeY, ResizeMethod.NEAREST_NEIGHBOR))
.add(new Rot90Op(numRoration))
.add(getPreprocessNormalizeOp())
.build();
return imageProcessor.process(inputImageBuffer);
Next to run inference with the interpreter, you feed the preprocessed image to the TensorFlow-Lite interpreter:
tflite.run(inputImageBuffer.getBuffer(), outputProbabilityBuffer.getBuffer().rewind());
Refer this official example for more details, additionally you can refer this example as well.

Related

Android: How to expand dimension of image using tensorflow lite in Android

The question itself is self-explanatory. In Python, its quite simple to do that with tf.expand_dims(image, 0). How can I do the same thing in Android?
I'm getting error on running the tensorflow model I prepared. It says,
Cannot copy to a TensorFlowLite tensor (input_3) with X bytes from
a Java Buffer with Y bytes.
I'm guessing it comes from one less dimension of image. I've run another model which is working fine. So I need to know how to do that.
My code snippet:
val contentArray =
ImageUtils.bitmapToByteBuffer(
scaledBitmap,
imageSize,
imageSize,
IMAGE_MEAN,
IMAGE_STD
)
val tfliteOptions = Interpreter.Options()
tfliteOptions.setNumThreads(4)
val tflite = Interpreter(tfliteModel, tfliteOptions)
tflite.run(contentArray, segmentationMasks)
fun bitmapToByteBuffer(
bitmapIn: Bitmap,
width: Int,
height: Int,
mean: Float = 0.0f,
std: Float = 255.0f
): ByteBuffer {
val bitmap = scaleBitmapAndKeepRatio(bitmapIn, width, height)
val inputImage = ByteBuffer.allocateDirect(1 * width * height * 3 * 4)
inputImage.order(ByteOrder.nativeOrder())
inputImage.rewind()
val intValues = IntArray(width * height)
bitmap.getPixels(intValues, 0, width, 0, 0, width, height)
var pixel = 0
for (y in 0 until height) {
for (x in 0 until width) {
val value = intValues[pixel++]
// Normalize channel values to [-1.0, 1.0]. This requirement varies by
// model. For example, some models might require values to be normalized
// to the range [0.0, 1.0] instead.
inputImage.putFloat(((value shr 16 and 0xFF) - mean) / std)
inputImage.putFloat(((value shr 8 and 0xFF) - mean) / std)
inputImage.putFloat(((value and 0xFF) - mean) / std)
}
}
inputImage.rewind()
return inputImage
}
There is JVM/Android equivalent op in the TensorFlow API: https://www.tensorflow.org/jvm/api_docs/java/org/tensorflow/op/core/ExpandDims.
However, if you are using TfLite Interpreter API to run inference on a pre-trained model, then you will most likely want to deal with the array dimensions when you construct and save the model (i.e. using Python) instead of when you call the interpreter from the Android code.
If what you mean is to expand the dimension by 1 and make the first dimension of size 1, so as to mimic a batch of tensors (the same as when you are training), the ImageProcessor class appears to be taking care of that automatically. So you shouldn't have to do it manually.

Pretrained keras model is returing the same result in android

I have created an image classifier in Keras, later I saved the model in pb format to use it in android.
However, in the python code, it can classify the image properly. But in android whatever image I gave as input the output is always the same .
This is how I have trained my model
rom keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
# Initialising the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Step 3 - Flattening
classifier.add(Flatten())
# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
# Compiling the CNN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Part 2 - Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('dataset/training_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('dataset/test_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 25,
validation_data = test_set,
validation_steps = 2000)
classifier.summary()
classifier.save('saved_model.h5')
Later I convert that keras model(saved_model.h5) to tensorflow model by using this
This is how I have converted my bitmap float array
public static float[] getPixels(Bitmap bitmap) {
final int IMAGE_SIZE = 64;
int[] intValues = new int[IMAGE_SIZE * IMAGE_SIZE];
float[] floatValues = new float[IMAGE_SIZE * IMAGE_SIZE * 3];
if (bitmap.getWidth() != IMAGE_SIZE || bitmap.getHeight() != IMAGE_SIZE) {
// rescale the bitmap if needed
bitmap = ThumbnailUtils.extractThumbnail(bitmap, IMAGE_SIZE, IMAGE_SIZE);
}
bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());
for (int i = 0; i < intValues.length; ++i) {
final int val = intValues[i];
// bitwise shifting - without our image is shaped [1, 64, 64, 1] but we need [1, 168, 168, 3]
floatValues[i * 3 + 2] = Color.red(val) / 255.0f;
floatValues[i * 3 + 1] = Color.green(val) / 255.0f;
floatValues[i * 3] = Color.blue(val) / 255.0f;
}
return floatValues;
}
Later, I tried to classify image using tensorflow in android , like following .
TensorFlowInferenceInterface tensorFlowInferenceInterface;
tensorFlowInferenceInterface = new TensorFlowInferenceInterface(getAssets(),"model.pb");
float[] output = new float[2];
tensorFlowInferenceInterface.feed("conv2d_11_input",
getPixels(bitmap), 1,64,64,3);
tensorFlowInferenceInterface.run(new String[]{"dense_12/Sigmoid"});
tensorFlowInferenceInterface.fetch("dense_12/Sigmoid",output);
Whatever image I gave the value of the output is [1,0]
Is there anything I have missed?
The color components returned by Color.red(int), Color.blue(int) and Color.green(int) are integers in the range [0, 255] (see doc). The same thing holds when reading images using ImageDataGenerator of Keras. However, as I stated in comments section, in prediction phase you need to do the same preprocessing steps as done in training phase. You are scaling the image pixels by 1./255 in training (using rescale = 1./255 in ImageDataGenerator) and therefore, according to the first point I mentioned, this must also be done in prediction:
floatValues[i * 3 + 2] = Color.red(val) / 255.0;
floatValues[i * 3 + 1] = Color.green(val) / 255.0;
floatValues[i * 3] = Color.blue(val) / 255.0;

How to fix the image preprocessing difference between tensorflow and android studio?

I'm trying to build a classification model with keras and deploy the model to my Android phone. I use the code from this website to deploy my own converted model, which is a .pb file, to my Android phone. I load a image from my phone and everything worked fine, but the prediction result is totally different from the result I got from my PC.
The procedure of testing on my PC are:
load the image with cv2, and convert to np.float32
use the keras resnet50 'preprocess_input' python function to preprocess the image
expand the image dimension for batching (batch size is 1)
forward the image to model and get the result
Relevant code:
img = cv2.imread('./my_test_image.jpg')
x = preprocess_input(img.astype(np.float32))
x = np.expand_dims(x, axis=0)
net = load_model('./my_model.h5')
prediction_result = net.predict(x)
And I noticed that the image preprocessing part of Android is different from the method I used in keras, which mode is caffe(convert the images from RGB to BGR, then zero-center each color channel with respect to the ImageNet dataset). It seems that the original code is for mode tf(will scale pixels between -1 to 1).
So I modified the following code of 'preprocessBitmap' to what I think it should be, and use a 3 channel RGB image with pixel value [127,127,127] to test it. The code predicted the same result as .h5 model did. But when I load a image to classify, the prediction result is different from .h5 model.
Does anyone has any idea? Thank you very much.
I have tried the following:
Load a 3 channel RGB image in my Phone with pixel value [127,127,127], and use the modified code below, and it will give me a prediction result that is same as prediction result using .h5 model on PC.
Test the converted .pb model on PC using tensorflow gfile module with a image, and it give me a correct prediction result (compare to .h5 model). So I think the converted .pb file does not have any problem.
Entire section of preprocessBitmap
// code of 'preprocessBitmap' section in TensorflowImageClassifier.java
TraceCompat.beginSection("preprocessBitmap");
// Preprocess the image data from 0-255 int to normalized float based
// on the provided parameters.
bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());
for (int i = 0; i < intValues.length; ++i) {
// this is a ARGB format, so we need to mask the least significant 8 bits to get blue, and next 8 bits to get green and next 8 bits to get red. Since we have an opaque image, alpha can be ignored.
final int val = intValues[i];
// original
/*
floatValues[i * 3 + 0] = (((val >> 16) & 0xFF) - imageMean) / imageStd;
floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - imageMean) / imageStd;
floatValues[i * 3 + 2] = ((val & 0xFF) - imageMean) / imageStd;
*/
// what I think it should be to do the same thing in mode caffe when using keras
floatValues[i * 3 + 0] = (((val >> 16) & 0xFF) - (float)123.68);
floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - (float)116.779);
floatValues[i * 3 + 2] = (((val & 0xFF)) - (float)103.939);
}
TraceCompat.endSection();
This question is old, but remains the top Google result for preprocess_input for ResNet50 on Android. I could not find an answer for implementing preprocess_input for Java/Android, so I came up with the following based on the original python/keras code:
/*
Preprocesses RGB bitmap IAW keras/imagenet
Port of https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/applications/imagenet_utils.py#L169
with data_format='channels_last', mode='caffe'
Convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling.
Returns 3D float array
*/
static float[][][] imagenet_preprocess_input_caffe( Bitmap bitmap ) {
// https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/applications/imagenet_utils.py#L210
final float[] imagenet_means_caffe = new float[]{103.939f, 116.779f, 123.68f};
float[][][] result = new float[bitmap.getHeight()][bitmap.getWidth()][3]; // assuming rgb
for (int y = 0; y < bitmap.getHeight(); y++) {
for (int x = 0; x < bitmap.getWidth(); x++) {
final int px = bitmap.getPixel(x, y);
// rgb-->bgr, then subtract means. no scaling
result[y][x][0] = (Color.blue(px) - imagenet_means_caffe[0] );
result[y][x][1] = (Color.green(px) - imagenet_means_caffe[1] );
result[y][x][2] = (Color.red(px) - imagenet_means_caffe[2] );
}
}
return result;
}
Usage with a 3D tensorflow-lite input with shape (1,224,224,3):
Bitmap bitmap = <your bitmap of size 224x224x3>;
float[][][][] imgValues = new float[1][bitmap.getHeight()][bitmap.getWidth()][3];
imgValues[0]=imagenet_preprocess_input_caffe(bitmap);
... <prep tfInput, tfOutput> ...
tfLite.run(tfInput, tfOutput);

Image classification through Tensorflow gives the exact same prediction

Bear with me while I try to elucidate.
I have an Android Application which uses OpenCV to convert a YUV420 image into a bitmap and transfers it to an Interpreter. The problem is, every time I run it, I get the exact same class prediction with the exact same confidence values irrelevant of what I point at.
...
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
...
Now before you mention my model is not trained enough, I've tested the exact same .tflite file in TFLite example provided in the Tensorflow Codelab-2. It works as it should and recognizes all 4 of my classes with 90%+ accuracy. In addition, I used a label_image.py script to test the .pb file using which my .tflite was derived from and it works as it should. I've trained the model on nearly 5000+ images of each class. Since it works on other apps, I'm guessing there's no problem with the model but my implementation. Though I just can't pinpoint it.
Following code is used to Create Mat(s) from the image bytes :
//Retrieve the camera Image from ARCore
val cameraImage = frame.acquireCameraImage()
val cameraPlaneY = cameraImage.planes[0].buffer
val cameraPlaneUV = cameraImage.planes[1].buffer
// Create a new Mat with OpenCV. One for each plane - Y and UV
val y_mat = Mat(cameraImage.height, cameraImage.width, CvType.CV_8UC1, cameraPlaneY)
val uv_mat = Mat(cameraImage.height / 2, cameraImage.width / 2, CvType.CV_8UC2, cameraPlaneUV)
var mat224 = Mat()
var cvFrameRGBA = Mat()
// Retrieve an RGBA frame from the produced YUV
Imgproc.cvtColorTwoPlane(y_mat, uv_mat, cvFrameRGBA, Imgproc.COLOR_YUV2BGRA_NV21)
// I've tried the following in the above line
// Imgproc.COLOR_YUV2RGBA_NV12
// Imgproc.COLOR_YUV2RGBA_NV21
// Imgproc.COLOR_YUV2BGRA_NV12
// Imgproc.COLOR_YUV2BGRA_NV21
Following code is used to add Image data into a ByteBuffer :
// imageFrame is a Mat object created from OpenCV by processing a YUV420 image received from ARCore
override fun setImageFrame(imageFrame: Mat) {
...
// Convert mat224 into a float array that can be sent to Tensorflow
val rgbBytes: ByteBuffer = ByteBuffer.allocate(1 * 4 * 224 * 224 * 3)
rgbBytes.order(ByteOrder.nativeOrder())
val frameBitmap = Bitmap.createBitmap(imageFrame.cols(), imageFrame.rows(), Bitmap.Config.ARGB_8888, true)
// convert Mat to Bitmap
Utils.matToBitmap(imageFrame, frameBitmap, true)
frameBitmap.getPixels(intValues, 0, frameBitmap.width, 0, 0, frameBitmap.width, frameBitmap.height)
// Iterate over all pixels and retrieve information of RGB channels
intValues.forEach { packedPixel ->
rgbBytes.putFloat((((packedPixel shr 16) and 0xFF) - 128) / 128.0f)
rgbBytes.putFloat((((packedPixel shr 8) and 0xFF) - 128) / 128.0f)
rgbBytes.putFloat(((packedPixel and 0xFF) - 128) / 128.0f)
}
}
.......
private var labelProb: Array<FloatArray>? = null
.......
// and classify
labelProb?.let { interpreter?.run(rgbBytes, it) }
.......
I checked the bitmap that gets converted from Mat. It shows up quite as best as it possibly can.
Any ideas anyone?
Update One
I changed the implementation of setImageFrame method slightly to match an implementation here. Since it works for him, I hoped it would work for me as well. It still doesn't.
override fun setImageFrame(imageFrame: Mat) {
// Reset the rgb bytes buffer
rgbBytes.rewind()
// Iterate over all pixels and retrieve information of RGB channels only
for(rows in 0 until imageFrame.rows())
for(cols in 0 until imageFrame.cols()) {
val imageData = imageFrame.get(rows, cols)
// Type of Mat is 24
// Channels is 4
// Depth is 0
rgbBytes.putFloat(imageData[0].toFloat())
rgbBytes.putFloat(imageData[1].toFloat())
rgbBytes.putFloat(imageData[2].toFloat())
}
}
Update Two
Suspicious of my float model, I changed it to a pre-built MobileNet Quant model just to eliminate a possibility. The problem persists in this as well.
...
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
...
Okay. so After 4 days, I was able to finally solve this. The Issue was how The ByteBuffer is initiated. I was doing :
private var rgbBytes: ByteBuffer = ByteBuffer.allocate(1 * 4 * 224 * 224 * 3)
instead of what I ought to be doing :
private val rgbBytes: ByteBuffer = ByteBuffer.allocateDirect(1 * 4 * 224 * 224 * 3)
I tried to understand what is the difference between ByteBuffer.allocate() and ByteBuffer.allocateDirect() here but to no avail.
I'd be glad if someone can answer two further questions :
Why does Tensorflow need a Direct Byte Buffer rather than a Non Direct buffer?
What is the difference between Direct and Non Direct ByteBuffer in a simplified description?

How to improve OCR Accuracy which use tesseract? [duplicate]

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.

Categories

Resources