How to improve OCR Accuracy which use tesseract? [duplicate] - android
I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.
Related
How to fix the image preprocessing difference between tensorflow and android studio?
I'm trying to build a classification model with keras and deploy the model to my Android phone. I use the code from this website to deploy my own converted model, which is a .pb file, to my Android phone. I load a image from my phone and everything worked fine, but the prediction result is totally different from the result I got from my PC. The procedure of testing on my PC are: load the image with cv2, and convert to np.float32 use the keras resnet50 'preprocess_input' python function to preprocess the image expand the image dimension for batching (batch size is 1) forward the image to model and get the result Relevant code: img = cv2.imread('./my_test_image.jpg') x = preprocess_input(img.astype(np.float32)) x = np.expand_dims(x, axis=0) net = load_model('./my_model.h5') prediction_result = net.predict(x) And I noticed that the image preprocessing part of Android is different from the method I used in keras, which mode is caffe(convert the images from RGB to BGR, then zero-center each color channel with respect to the ImageNet dataset). It seems that the original code is for mode tf(will scale pixels between -1 to 1). So I modified the following code of 'preprocessBitmap' to what I think it should be, and use a 3 channel RGB image with pixel value [127,127,127] to test it. The code predicted the same result as .h5 model did. But when I load a image to classify, the prediction result is different from .h5 model. Does anyone has any idea? Thank you very much. I have tried the following: Load a 3 channel RGB image in my Phone with pixel value [127,127,127], and use the modified code below, and it will give me a prediction result that is same as prediction result using .h5 model on PC. Test the converted .pb model on PC using tensorflow gfile module with a image, and it give me a correct prediction result (compare to .h5 model). So I think the converted .pb file does not have any problem. Entire section of preprocessBitmap // code of 'preprocessBitmap' section in TensorflowImageClassifier.java TraceCompat.beginSection("preprocessBitmap"); // Preprocess the image data from 0-255 int to normalized float based // on the provided parameters. bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight()); for (int i = 0; i < intValues.length; ++i) { // this is a ARGB format, so we need to mask the least significant 8 bits to get blue, and next 8 bits to get green and next 8 bits to get red. Since we have an opaque image, alpha can be ignored. final int val = intValues[i]; // original /* floatValues[i * 3 + 0] = (((val >> 16) & 0xFF) - imageMean) / imageStd; floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - imageMean) / imageStd; floatValues[i * 3 + 2] = ((val & 0xFF) - imageMean) / imageStd; */ // what I think it should be to do the same thing in mode caffe when using keras floatValues[i * 3 + 0] = (((val >> 16) & 0xFF) - (float)123.68); floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - (float)116.779); floatValues[i * 3 + 2] = (((val & 0xFF)) - (float)103.939); } TraceCompat.endSection();
This question is old, but remains the top Google result for preprocess_input for ResNet50 on Android. I could not find an answer for implementing preprocess_input for Java/Android, so I came up with the following based on the original python/keras code: /* Preprocesses RGB bitmap IAW keras/imagenet Port of https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/applications/imagenet_utils.py#L169 with data_format='channels_last', mode='caffe' Convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling. Returns 3D float array */ static float[][][] imagenet_preprocess_input_caffe( Bitmap bitmap ) { // https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/applications/imagenet_utils.py#L210 final float[] imagenet_means_caffe = new float[]{103.939f, 116.779f, 123.68f}; float[][][] result = new float[bitmap.getHeight()][bitmap.getWidth()][3]; // assuming rgb for (int y = 0; y < bitmap.getHeight(); y++) { for (int x = 0; x < bitmap.getWidth(); x++) { final int px = bitmap.getPixel(x, y); // rgb-->bgr, then subtract means. no scaling result[y][x][0] = (Color.blue(px) - imagenet_means_caffe[0] ); result[y][x][1] = (Color.green(px) - imagenet_means_caffe[1] ); result[y][x][2] = (Color.red(px) - imagenet_means_caffe[2] ); } } return result; } Usage with a 3D tensorflow-lite input with shape (1,224,224,3): Bitmap bitmap = <your bitmap of size 224x224x3>; float[][][][] imgValues = new float[1][bitmap.getHeight()][bitmap.getWidth()][3]; imgValues[0]=imagenet_preprocess_input_caffe(bitmap); ... <prep tfInput, tfOutput> ... tfLite.run(tfInput, tfOutput);
YUV_420_888 interpretation on Samsung Galaxy S7 (Camera2)
I wrote a conversion from YUV_420_888 to Bitmap, considering the following logic (as I understand it): To summarize the approach: the kernel’s coordinates x and y are congruent both with the x and y of the non-padded part of the Y-Plane (2d-allocation) and the x and y of the output-Bitmap. The U- and V-Planes, however, have a different structure than the Y-Plane, because they use 1 byte for coverage of 4 pixels, and, in addition, may have a PixelStride that is more than one, in addition they might also have a padding that can be different from that of the Y-Plane. Therefore, in order to access the U’s and V’s efficiently by the kernel I put them into 1-d allocations and created an index “uvIndex” that gives the position of the corresponding U- and V within that 1-d allocation, for given (x,y) coordinates in the (non-padded) Y-plane (and, so, the output Bitmap). In order to keep the rs-Kernel lean, I excluded the padding area in the yPlane by capping the x-range via LaunchOptions (this reflects the RowStride of the y-plane which thus can be ignored WITHIN the kernel). So we just need to consider the uvPixelStride and uvRowStride within the uvIndex, i.e. the index used in order to access to the u- and v-values. This is my code: Renderscript Kernel, named yuv420888.rs #pragma version(1) #pragma rs java_package_name(com.xxxyyy.testcamera2); #pragma rs_fp_relaxed int32_t width; int32_t height; uint picWidth, uvPixelStride, uvRowStride ; rs_allocation ypsIn,uIn,vIn; // The LaunchOptions ensure that the Kernel does not enter the padding zone of Y, so yRowStride can be ignored WITHIN the Kernel. uchar4 __attribute__((kernel)) doConvert(uint32_t x, uint32_t y) { // index for accessing the uIn's and vIn's uint uvIndex= uvPixelStride * (x/2) + uvRowStride*(y/2); // get the y,u,v values uchar yps= rsGetElementAt_uchar(ypsIn, x, y); uchar u= rsGetElementAt_uchar(uIn, uvIndex); uchar v= rsGetElementAt_uchar(vIn, uvIndex); // calc argb int4 argb; argb.r = yps + v * 1436 / 1024 - 179; argb.g = yps -u * 46549 / 131072 + 44 -v * 93604 / 131072 + 91; argb.b = yps +u * 1814 / 1024 - 227; argb.a = 255; uchar4 out = convert_uchar4(clamp(argb, 0, 255)); return out; } Java side: private Bitmap YUV_420_888_toRGB(Image image, int width, int height){ // Get the three image planes Image.Plane[] planes = image.getPlanes(); ByteBuffer buffer = planes[0].getBuffer(); byte[] y = new byte[buffer.remaining()]; buffer.get(y); buffer = planes[1].getBuffer(); byte[] u = new byte[buffer.remaining()]; buffer.get(u); buffer = planes[2].getBuffer(); byte[] v = new byte[buffer.remaining()]; buffer.get(v); // get the relevant RowStrides and PixelStrides // (we know from documentation that PixelStride is 1 for y) int yRowStride= planes[0].getRowStride(); int uvRowStride= planes[1].getRowStride(); // we know from documentation that RowStride is the same for u and v. int uvPixelStride= planes[1].getPixelStride(); // we know from documentation that PixelStride is the same for u and v. // rs creation just for demo. Create rs just once in onCreate and use it again. RenderScript rs = RenderScript.create(this); //RenderScript rs = MainActivity.rs; ScriptC_yuv420888 mYuv420=new ScriptC_yuv420888 (rs); // Y,U,V are defined as global allocations, the out-Allocation is the Bitmap. // Note also that uAlloc and vAlloc are 1-dimensional while yAlloc is 2-dimensional. Type.Builder typeUcharY = new Type.Builder(rs, Element.U8(rs)); //using safe height typeUcharY.setX(yRowStride).setY(y.length / yRowStride); Allocation yAlloc = Allocation.createTyped(rs, typeUcharY.create()); yAlloc.copyFrom(y); mYuv420.set_ypsIn(yAlloc); Type.Builder typeUcharUV = new Type.Builder(rs, Element.U8(rs)); // note that the size of the u's and v's are as follows: // ( (width/2)*PixelStride + padding ) * (height/2) // = (RowStride ) * (height/2) // but I noted that on the S7 it is 1 less... typeUcharUV.setX(u.length); Allocation uAlloc = Allocation.createTyped(rs, typeUcharUV.create()); uAlloc.copyFrom(u); mYuv420.set_uIn(uAlloc); Allocation vAlloc = Allocation.createTyped(rs, typeUcharUV.create()); vAlloc.copyFrom(v); mYuv420.set_vIn(vAlloc); // handover parameters mYuv420.set_picWidth(width); mYuv420.set_uvRowStride (uvRowStride); mYuv420.set_uvPixelStride (uvPixelStride); Bitmap outBitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888); Allocation outAlloc = Allocation.createFromBitmap(rs, outBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT); Script.LaunchOptions lo = new Script.LaunchOptions(); lo.setX(0, width); // by this we ignore the y’s padding zone, i.e. the right side of x between width and yRowStride //using safe height lo.setY(0, y.length / yRowStride); mYuv420.forEach_doConvert(outAlloc,lo); outAlloc.copyTo(outBitmap); return outBitmap; } Testing on Nexus 7 (API 22) this returns nice color Bitmaps. This device, however, has trivial pixelstrides (=1) and no padding (i.e. rowstride=width). Testing on the brandnew Samsung S7 (API 23) I get pictures whose colors are not correct - except of the green ones. But the Picture does not show a general bias towards green, it just seems that non-green colors are not reproduced correctly. Note, that the S7 applies an u/v pixelstride of 2, and no padding. Since the most crucial code line is within the rs-code the Access of the u/v planes uint uvIndex= (...) I think, there could be the problem, probably with incorrect consideration of pixelstrides here. Does anyone see the solution? Thanks. UPDATE: I checked everything, and I am pretty sure that the code regarding the access of y,u,v is correct. So the problem must be with the u and v values themselves. Non green colors have a purple tilt, and looking at the u,v values they seem to be in a rather narrow range of about 110-150. Is it really possible that we need to cope with device specific YUV -> RBG conversions...?! Did I miss anything? UPDATE 2: have corrected code, it works now, thanks to Eddy's Feedback.
Look at floor((float) uvPixelStride*(x)/2) which calculates your U,V row offset (uv_row_offset) from the Y x-coordinate. if uvPixelStride = 2, then as x increases: x = 0, uv_row_offset = 0 x = 1, uv_row_offset = 1 x = 2, uv_row_offset = 2 x = 3, uv_row_offset = 3 and this is incorrect. There's no valid U/V pixel value at uv_row_offset = 1 or 3, since uvPixelStride = 2. You want uvPixelStride * floor(x/2) (assuming you don't trust yourself to remember the critical round-down behavior of integer divide, if you do then): uvPixelStride * (x/2) should be enough With that, your mapping becomes: x = 0, uv_row_offset = 0 x = 1, uv_row_offset = 0 x = 2, uv_row_offset = 2 x = 3, uv_row_offset = 2 See if that fixes the color errors. In practice, the incorrect addressing here would mean every other color sample would be from the wrong color plane, since it's likely that the underlying YUV data is semiplanar (so the U plane starts at V plane + 1 byte, with the two planes interleaved)
For people who encounter error android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type use buffer.capacity() instead of buffer.remaining() and if you already made some operations on the image, you'll need to call rewind() method on the buffer.
Furthermore for anyone else getting android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type I fixed it by changing yAlloc.copyFrom(y); to yAlloc.copy1DRangeFrom(0, y.length, y);
Posting full solution to convert YUV->BGR (can be adopted for other formats too) and also rotate image to upright using renderscript. Allocation is used as input and byte array is used as output. It was tested on Android 8+ including Samsung devices too. Java /** * Renderscript-based process to convert YUV_420_888 to BGR_888 and rotation to upright. */ public class ImageProcessor { protected final String TAG = this.getClass().getSimpleName(); private Allocation mInputAllocation; private Allocation mOutAllocLand; private Allocation mOutAllocPort; private Handler mProcessingHandler; private ScriptC_yuv_bgr mConvertScript; private byte[] frameBGR; public ProcessingTask mTask; private ImageListener listener; private Supplier<Integer> rotation; public ImageProcessor(RenderScript rs, Size dimensions, ImageListener listener, Supplier<Integer> rotation) { this.listener = listener; this.rotation = rotation; int w = dimensions.getWidth(); int h = dimensions.getHeight(); Type.Builder yuvTypeBuilder = new Type.Builder(rs, Element.YUV(rs)); yuvTypeBuilder.setX(w); yuvTypeBuilder.setY(h); yuvTypeBuilder.setYuvFormat(ImageFormat.YUV_420_888); mInputAllocation = Allocation.createTyped(rs, yuvTypeBuilder.create(), Allocation.USAGE_IO_INPUT | Allocation.USAGE_SCRIPT); //keep 2 allocations to handle different image rotations mOutAllocLand = createOutBGRAlloc(rs, w, h); mOutAllocPort = createOutBGRAlloc(rs, h, w); frameBGR = new byte[w*h*3]; HandlerThread processingThread = new HandlerThread(this.getClass().getSimpleName()); processingThread.start(); mProcessingHandler = new Handler(processingThread.getLooper()); mConvertScript = new ScriptC_yuv_bgr(rs); mConvertScript.set_inWidth(w); mConvertScript.set_inHeight(h); mTask = new ProcessingTask(mInputAllocation); } private Allocation createOutBGRAlloc(RenderScript rs, int width, int height) { //Stored as Vec4, it's impossible to store as Vec3, buffer size will be for Vec4 anyway //using RGB_888 as alternative for BGR_888, can be just U8_3 type Type.Builder rgbTypeBuilderPort = new Type.Builder(rs, Element.RGB_888(rs)); rgbTypeBuilderPort.setX(width); rgbTypeBuilderPort.setY(height); Allocation allocation = Allocation.createTyped( rs, rgbTypeBuilderPort.create(), Allocation.USAGE_SCRIPT ); //Use auto-padding to be able to copy to x*h*3 bytes array allocation.setAutoPadding(true); return allocation; } public Surface getInputSurface() { return mInputAllocation.getSurface(); } /** * Simple class to keep track of incoming frame count, * and to process the newest one in the processing thread */ class ProcessingTask implements Runnable, Allocation.OnBufferAvailableListener { private int mPendingFrames = 0; private Allocation mInputAllocation; public ProcessingTask(Allocation input) { mInputAllocation = input; mInputAllocation.setOnBufferAvailableListener(this); } #Override public void onBufferAvailable(Allocation a) { synchronized(this) { mPendingFrames++; mProcessingHandler.post(this); } } #Override public void run() { // Find out how many frames have arrived int pendingFrames; synchronized(this) { pendingFrames = mPendingFrames; mPendingFrames = 0; // Discard extra messages in case processing is slower than frame rate mProcessingHandler.removeCallbacks(this); } // Get to newest input for (int i = 0; i < pendingFrames; i++) { mInputAllocation.ioReceive(); } int rot = rotation.get(); mConvertScript.set_currentYUVFrame(mInputAllocation); mConvertScript.set_rotation(rot); Allocation allocOut = rot==90 || rot== 270 ? mOutAllocPort : mOutAllocLand; // Run processing // ain allocation isn't really used, global frame param is used to get data from mConvertScript.forEach_yuv_bgr(allocOut); //Save to byte array, BGR 24bit allocOut.copyTo(frameBGR); int w = allocOut.getType().getX(); int h = allocOut.getType().getY(); if (listener != null) { listener.onImageAvailable(frameBGR, w, h); } } } public interface ImageListener { /** * Called when there is available image, image is in upright position. * * #param bgr BGR 24bit bytes * #param width image width * #param height image height */ void onImageAvailable(byte[] bgr, int width, int height); } } RS #pragma version(1) #pragma rs java_package_name(com.affectiva.camera) #pragma rs_fp_relaxed //Script convers YUV to BGR(uchar3) //current YUV frame to read pixels from rs_allocation currentYUVFrame; //input image rotation: 0,90,180,270 clockwise uint32_t rotation; uint32_t inWidth; uint32_t inHeight; //method returns uchar3 BGR which will be set to x,y in output allocation uchar3 __attribute__((kernel)) yuv_bgr(uint32_t x, uint32_t y) { // Read in pixel values from latest frame - YUV color space uchar3 inPixel; uint32_t xRot = x; uint32_t yRot = y; //Do not rotate if 0 if (rotation==90) { //rotate 270 clockwise xRot = y; yRot = inHeight - 1 - x; } else if (rotation==180) { xRot = inWidth - 1 - x; yRot = inHeight - 1 - y; } else if (rotation==270) { //rotate 90 clockwise xRot = inWidth - 1 - y; yRot = x; } inPixel.r = rsGetElementAtYuv_uchar_Y(currentYUVFrame, xRot, yRot); inPixel.g = rsGetElementAtYuv_uchar_U(currentYUVFrame, xRot, yRot); inPixel.b = rsGetElementAtYuv_uchar_V(currentYUVFrame, xRot, yRot); // Convert YUV to RGB, JFIF transform with fixed-point math // R = Y + 1.402 * (V - 128) // G = Y - 0.34414 * (U - 128) - 0.71414 * (V - 128) // B = Y + 1.772 * (U - 128) int3 bgr; //get red pixel and assing to b bgr.b = inPixel.r + inPixel.b * 1436 / 1024 - 179; bgr.g = inPixel.r - inPixel.g * 46549 / 131072 + 44 - inPixel.b * 93604 / 131072 + 91; //get blue pixel and assign to red bgr.r = inPixel.r + inPixel.g * 1814 / 1024 - 227; // Write out return convert_uchar3(clamp(bgr, 0, 255)); }
On a Samsung Galaxy Tab 5 (Tablet), android version 5.1.1 (22), with alleged YUV_420_888 format, the following renderscript math works well and produces correct colors: uchar yValue = rsGetElementAt_uchar(gCurrentFrame, x + y * yRowStride); uchar vValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) ); uchar uValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) + (xSize * ySize) / 4); I do not understand why the horizontal value (i.e., y) is scaled by a factor of four instead of two, but it works well. I also needed to avoid use of rsGetElementAtYuv_uchar_Y|U|V. I believe the associated allocation stride value is set to zero instead of something proper. Use of rsGetElementAt_uchar() is a reasonable work-around. On a Samsung Galaxy S5 (Smart Phone), android version 5.0 (21), with alleged YUV_420_888 format, I cannot recover the u and v values, they come through as all zeros. This results in a green looking image. Luminous is OK, but image is vertically flipped.
This code requires the use of the RenderScript compatibility library (android.support.v8.renderscript.*). In order to get the compatibility library to work with Android API 23, I updated to gradle-plugin 2.1.0 and Build-Tools 23.0.3 as per Miao Wang's answer at How to create Renderscript scripts on Android Studio, and make them run? If you follow his answer and get an error "Gradle version 2.10 is required" appears, do NOT change classpath 'com.android.tools.build:gradle:2.1.0' Instead, update the distributionUrl field of the Project\gradle\wrapper\gradle-wrapper.properties file to distributionUrl=https\://services.gradle.org/distributions/gradle-2.10-all.zip and change File > Settings > Builds,Execution,Deployment > Build Tools > Gradle >Gradle to Use default gradle wrapper as per "Gradle Version 2.10 is required." Error.
Re: RSIllegalArgumentException In my case this was the case that buffer.remaining() was not multiple of stride: The length of last line was less than stride (i.e. only up to where actual data was.)
An FYI in case someone else gets this as I was also getting "android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type" when trying out the code. In my case it turns out that the when allocating the buffer for Y i had to rewind the buffer because it was being left at the wrong end and wasn't copying the data. By doing buffer.rewind(); before allocation the new bytes array makes it work fine now.
optimizing my inner loop (ARM, android ndk)
I'm writing an image processing app on android, and I'm trying to speed it up using the NDK. I have the following for-loop: int x, y, c, idx; const int pitch3 = pitch * 3; float adj, result; ... // px, py, u, u_bar are all float arrays of size nx*ny*3 // theta, tau, denom are float constants // idx >= pitch3 for(y=1;y<ny;++y) { for(x=1;x<nx;++x) { for(c=0;c<3;++c) { adj = -px[idx] - py[idx] + px[idx - 3] + py[idx - pitch3]; result = ((u[idx] - tau * adj) + tau * f[idx]) * denom; u_bar[idx] = result + theta * (result - u[idx]); u[idx] = result; ++idx; } } } I'm wondering if it is possible to speed up this loop? I'm thinking that using fixed-point arithmetic wouldn't do much, except on really old android phone (which I'm not going to target). Would writing it in assembly give a big improvement? EDIT: I know I could use SIMD/NEON instructions, but they are not so common I think ...
Since you're accessing the array as a flat structure, the 3 levels of looping is only increasing the value used for idx. You can loop for (idx = pitch3; idx < nx*ny*3; idx++). Another option is to move to fixed-point math. Do you really need more than 64 bits of dynamic range?
OpenCV crop function fatal signal 11
Hello I am doing an android app which uses OpenCV to detect rectangles/squares, to detect them I am using functions (modified a bit) from squares.cpp. Points of every square found I am storing in vector> squares, then i pass it to the function which choose the biggest one and store it in vector theBiggestSq. The problem is with the cropping function which code i will paste below (i will post the link to youtube showing the problem too). If the actual square is far enough from the camera it works ok but if i will close it a bit in some point it will hang. I will post the print screen of the problem from LogCat and there are the points printed out (the boundaries points taken from theBiggestSq vector, maybe it will help to find the solution). void cutAndSave(vector<Point> theBiggestSq, Mat image){ RotatedRect box = minAreaRect(Mat(theBiggestSq)); // Draw bounding box in the original image (debug purposes) //cv::Point2f vertices[4]; //box.points(vertices); //for (int i = 0; i < 4; ++i) //{ //cv::line(img, vertices[i], vertices[(i + 1) % 4], cv::Scalar(0, 255, 0), 1, CV_AA); //} //cv::imshow("box", img); //cv::imwrite("box.png", img); // Set Region of Interest to the area defined by the box Rect roi; roi.x = box.center.x - (box.size.width / 2); roi.y = box.center.y - (box.size.height / 2); roi.width = box.size.width; roi.height = box.size.height; // Crop the original image to the defined ROI //bmp=Bitmap.createBitmap(box.size.width / 2, box.size.height / 2, Bitmap.Config.ARGB_8888); Mat crop = image(roi); //Mat crop = image(Rect(roi.x, roi.y, roi.width, roi.height)).clone(); //Utils.matToBitmap(crop*.clone()* ,bmp); imwrite("/sdcard/OpenCVTest/1.png", bmp); imshow("crop", crop); } video of my app and its problems cords printed respectively are: roi.x roi.y roi.width roi.height Another problem is that the boundaries drawn should have a green colour but as you see on the video they are distorted (flexed like those boundaries would be made from glass?). Thank you for any help. I am new in openCV doing it from only one month so please be tolerant. EDIT: drawing code: //draw// for( size_t i = 0; i < squares.size(); i++ ) { const Point* p = &squares[i][0]; int n = (int)squares[i].size(); polylines(mBgra, &p, &n, 1, true, Scalar(255,255,0), 5, 10); //Rect rect = boundingRect(cv::Mat(squares[i])); //rectangle(mBgra, rect.tl(), rect.br(), cv::Scalar(0,255,0), 2, 8, 0); }
This error basically tells you the cause - your ROI exceeds the image dimensions. This means that when you are extracting Rect roi from RotatedRect box then either x or y are smaller than zero, or the width/height pushes the dimensions outside the image. You should check this using something like // Propose rectangle from data int proposedX = box.center.x - (box.size.width / 2); int proposedY = box.center.y - (box.size.height / 2); int proposedW = box.size.width; int proposedH = box.size.height; // Ensure top-left edge is within image roi.x = proposedX < 0 ? 0 : proposedX; roi.y = proposedY < 0 ? 0 : proposedY; // Ensure bottom-right edge is within image roi.width = (roi.x - 1 + proposedW) > image.cols ? // Will this roi exceed image? (image.cols - 1 - roi.x) // YES: make roi go to image edge : proposedW; // NO: continue as proposed // Similar for height roi.height = (roi.y - 1 + proposedH) > image.rows ? (image.rows - 1 - roi.y) : proposedH;
Simple Image Sharpening algorithm for Android App
I am looking for a simple image sharpening algorithm to use in my Android application. I have a grayscale image captured from video (mostly used for text) and I would like to sharpen it because my phone does not have auto focus, and the close object distance blurs the text. I don't have any background in image processing. But as a user, I am familiar with unsharp masking and other sharpening tools available in Gimp, Photoshop, etc. I didn't see any support for image processing in the Android API, and hence am looking for a method to implement myself. Thanks.
This is a simple image sharpening algorithm. You should pass to this function width, height and byte[] array of your grayscale image and it will sharpen the image in this byte[] array. void sharpen(int width, int height, byte* yuv) { char *mas; mas = (char *) malloc(width * height); memcpy(mas, yuv, width * height); signed int res; int ywidth; for (int y = 1; y < height - 1; y++) { ywidth = y * width; for (int x = 1; x < width - 1; x++) { res = ( mas[x + ywidth] * 5 - mas[x - 1 + ywidth] - mas[x + 1 + ywidth] - mas[x + (ywidth + width)] - mas[x + (ywidth - width)] ); if (res > 255) { res = 255; }; if (res < 0) { res = 0; }; yuv[x + ywidth] = res; } } free(mas); }
If you have access to pixel information, your most basic option woul be a sharpening convolution kernel. Take a look at the following sites, you can learn more about sharpening kernels and how to apply kernels there. link1 link2
ImageJ has many algorithms in Java and is freely available.