I want to ask about some ideas / study materials connected to binarization. I am trying to create system that detects human emotions. I am able to get areas such as brows, eyes, nose, mouth etc. but then comes another stage -> processing...
My images are taken in various places/time of day/weather conditions. It's problematic during binarization, with the same treshold value one images are fully black, other looks well and provide me informations I want.
What I want to ask you about is:
1) If there is known way how to bring all images to the same level of brightness?
2) How to create dependency between treshold value and brightness on image?
What I have tried for now is normalize the image... but there are no effects, maybe I'm doing something wrong. I'm using OpenCV (for android)
Core.normalize(cleanFaceMatGRAY, cleanFaceMatGRAY,0, 255, Core.NORM_MINMAX, CvType.CV_8U);
EDIT:
I tried adaptive treshold, OTSU - they didnt work for me. I have problems with using CLAHE in Android but I managed to implement Niblack algorithm.
Core.normalize(cleanFaceMatGRAY, cleanFaceMatGRAY,0, 255, Core.NORM_MINMAX, CvType.CV_8U);
nibelBlackTresholding(cleanFaceMatGRAY, -0.2);
private void nibelBlackTresholding(Mat image, double parameter) {
Mat meanPowered = image.clone();
Core.multiply(image, image, meanPowered);
Scalar mean = Core.mean(image);
Scalar stdmean = Core.mean(meanPowered);
double tresholdValue = mean.val[0] + parameter * stdmean.val[0];
int totalRows = image.rows();
int totalCols = image.cols();
for (int cols=0; cols < totalCols; cols++) {
for (int rows=0; rows < totalRows; rows++) {
if (image.get(rows, cols)[0] > tresholdValue) {
image.put(rows, cols, 255);
} else {
image.put(rows, cols, 0);
}
}
}
}
The results are really good, but still not enough for some images. I paste links cuz images are big and I don't want to take too much screen:
For example this one is tresholded really fine:
https://dl.dropboxusercontent.com/u/108321090/a1.png
https://dl.dropboxusercontent.com/u/108321090/a.png
But bad light produce shadows sometimes and this gives this effect:
https://dl.dropboxusercontent.com/u/108321090/b1.png
https://dl.dropboxusercontent.com/u/108321090/b.png
Do you have any idea that could help me to improve treshold of those images with high light difference (shadows)?
EDIT2:
I found that my previous Algorithm is implemented in wrong way. Std was calculated in wrong way. In Niblack Thresholding mean is local value not global. I repaired it according to this reference http://arxiv.org/ftp/arxiv/papers/1201/1201.5227.pdf
private void niblackThresholding2(Mat image, double parameter, int window) {
int totalRows = image.rows();
int totalCols = image.cols();
int offset = (window-1)/2;
double tresholdValue = 0;
double localMean = 0;
double meanDeviation = 0;
for (int y=offset+1; y<totalCols-offset; y++) {
for (int x=offset+1; x<totalRows-offset; x++) {
localMean = calculateLocalMean(x, y, image, window);
meanDeviation = image.get(y, x)[0] - localMean;
tresholdValue = localMean*(1 + parameter * ( (meanDeviation/(1 - meanDeviation)) - 1 ));
Log.d("QWERTY","TRESHOLD " +tresholdValue);
if (image.get(y, x)[0] > tresholdValue) {
image.put(y, x, 255);
} else {
image.put(y, x, 0);
}
}
}
}
private double calculateLocalMean(int x, int y, Mat image, int window) {
int offset = (window-1)/2;
Mat tempMat;
Rect tempRect = new Rect();
Point leftTop, bottomRight;
leftTop = new Point(x - (offset + 1), y - (offset + 1));
bottomRight = new Point(x + offset, y + offset);
tempRect = new Rect(leftTop, bottomRight);
tempMat = new Mat(image, tempRect);
return Core.mean(tempMat).val[0];
}
Results for 7x7 window and proposed in reference k parameter = 0.34: I still can't get rid of shadow on faces.
https://dl.dropboxusercontent.com/u/108321090/b2.png
https://dl.dropboxusercontent.com/u/108321090/b1.png
things to look at:
http://docs.opencv.org/java/org/opencv/imgproc/CLAHE.html
http://docs.opencv.org/java/org/opencv/imgproc/Imgproc.html#adaptiveThreshold(org.opencv.core.Mat,%20org.opencv.core.Mat,%20double,%20int,%20int,%20int,%20double)
http://docs.opencv.org/java/org/opencv/imgproc/Imgproc.html#threshold(org.opencv.core.Mat,%20org.opencv.core.Mat,%20double,%20double,%20int) (THRESH_OTSU)
Related
I am new to image processing. I am trying to implement NDVI index on an image of agricultural farm in Android. But, I am unable to get the desired result.
Original Image:
I am using this code snippet to compute NDVI:
public int[][] applyNDVI(int[][] src1, int[][] src2) {
final int pixels[][] = new int[width][height];
for (int i = 0; i < width; i++) {
for (int j = 0; j < height; j++) {
//src1 is the image in red channel and src2 is the image in the blue channel.
final int pixel1 = src1[i][j];
final int pixel2 = src2[i][j];
final double X = (pixel2 - pixel1);
final double Y = (pixel2 + pixel1);
double Z = (X / Y);
//double Z = ((double)pixel1/(double)pixel2);
int NDVI;
NDVI = (int) (Z * 255);
pixels[i][j] = Color.argb(Color.alpha(pixel1), NDVI, NDVI, NDVI);
}
}
return pixels;
}
This code snippet is giving me this output:
From what I have read about NDVI, vegetation has a higher value of NDVI (between 0.2 and 0.9) as compared to barren/dry ground (<0.2). So, plants should be light grey and ground should be dark grey in the NDVI image. But I am getting and exactly opposite image.
Also when I am using a different NDVI formula i.e. NDVI = Red Channel / Blue Channel`i.e.
double Z = ((double)pixel1/(double)pixel2);`
This formula gives me the desired output (almost).
So, I am unable to understand this discrepancy. Also, I want to know if the NDVI formula depends on the camera used and the image taken? Or is the formula of NDVI universal? If it is universal, where am I going wrong and why am not I able to get the desired output. Any kind of help would be deeply appreciated.
I am looking for tips and hints on "the best way" to approach something. I want to either import, or create, geometry (initially a cylinder), isolate half of it, and move the vertices around, then export it again as an .obj or .stl. I realise there are libraries that will do this but I need this to work on Android and the libraries (as far as I know) don't. I made these images in 3DMax to explain what I mean. I can handle much of the coding, BUT the geometry mathematics I just cannot get my head around.
I have adapted this method for creating a cylinder from an example in the book: Processing 2: Creative Coding Hotshot...
float[][] vertx;
float[][] verty;
void setup() {
size(800, 600, P3D);
vertx = new float[36][36];//36 triangle strips, 36 vertices
verty = new float[36][36];
}
void draw() {
hint( ENABLE_DEPTH_TEST );
pushMatrix();
background(125);
fill(255);
strokeWeight(0.5);
translate( width/2, height/2, 200);
rotateX(radians(-45));
scale( 1 );
translate(0, -50, 0);
initPoints();
beginShape(TRIANGLE_STRIP );
for ( int h = 1; h < 36; h++) {
for ( int a = 0; a<37; a++ ) {
int aa = a % 36;
// normal( vertx[h][aa], 0, verty[h][aa]);
vertex( vertx[h][aa], h*5.0, verty[h][aa] );
//normal( vertx[h-1][aa], 0, verty[h-1][aa]);
vertex( vertx[h-1][aa], (h-1)*5.0, verty[h-1][aa] );
}
}
endShape();
beginShape(TRIANGLE_FAN); //bottom
int h = 35;
vertex( 0, h*5, 0 );
for ( int a = 0; a<37; a++ ) {
int aa = a % 36;
vertex( vertx[h][aa], h*5, verty[h][aa] );
}
endShape();
popMatrix();
hint(DISABLE_DEPTH_TEST);
}
float getR( float a, float h ) {
float r = 50;
return r;
}
void initPoints() {
for ( int h = 0; h < 36; h++) {
for ( int a = 0; a<36; a++) {
float r = getR( a*10.0, h*5.0 ); //a = 10 (360/36)
vertx[h][a] = cos( radians( a*10.0 )) * r;
verty[h][a] = sin( radians( a*10.0 )) * r;
}
}
}
...and I am assuming it is possible to isolate/grab certain vertices from the array?
Any other approaches, or any advice on how to develop this? Is the import > transform method even possible? #Spektre - might there be a better approach than this?
This is a very general question, so I'll give you a general answer.
Your first step is to really digest the code you posted: what do each of the calls to vertex() do? Where is each vertex being placed? You should be able to write your own code that draws a cylinder without copy-pasting anything from this code. You should be able to draw other shapes. Start by only drawing a few vertexes to see where they're showing up, then add a few more, then a few more, until you understand exactly what this code is doing.
There are a ton of tutorials that can help you understand the code. Here is a 2D tutorial to get you started (read (and understand) that one first), and here is a 3D version.
Once you have that working it should be pretty easy to manipulate the vertexes however you want. If you want the user to be able to select a vertex, google something like "3d point picking" for a ton of resources.
Finally, I'm not sure Processing provides an easy way to export a .obj file. But again, google is your friend- try googling something like "Processing export obj file" for a handful of libraries that seem to help with your goal.
If you get stuck on a specific step, post an MCVE and ask a specific question. Good luck!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I want to make a an application like a cam scanner for cropping a document.
But I need same functionality like my two images..
First Images shown image captured by camera..
Second image recognize a captured image part like this..
I research more and more but not getting any out put so, I ask here if,any one done this tell me..
Thanks
I assume your problem is to detect the object to scan.
Object detection mechanisms like pattern matching or feature detection won't bring you the results you are looking for as you don't know what exactly is the object you are scanning.
Basically you search for a rectangular object in the picture.
A basic approach to this could be as following:
Run a canny edge detector on the image. It could help to blur the image a bit before doing this. The edges of the object should be clearly visible.
Now you want to do a Hough transform to find lines in the picture.
Search for lines with an angle around 90deg to each other. The problem would be to find the right ones. Maybe it is enough to use the lines closest to the frame of the picture that are reasonably parallel to them.
Find the intersecting points to define the edges of your object.
At least this should give you a hint where to research further.
As further steps in such an app you will have to calculate the projection of the points and do a affine transform of the object.
I hope this helps.
After writing all this i found this post. It should help you lot.
As my answer targets OpenCV you have to use the OpenCV library.
In Order to do this, you need to install the Android Native Development Kit (NDK).
There are some good tutorials on how to use OpenCV on Android on the OpenCV for Android page.
One thing to keep in mind is that almost each function of the Java wrapper calls a native method. That costs lots of time. So you want to do as much as possible in your native code before returning your results to the Java part.
I know I am too late to answer but it might be helpful to someone.
Try the following code.
#Override
protected void onDraw(Canvas canvas) {
super.onDraw(canvas);
path = new Path();
path.moveTo(x1, y1); // this should set the start point right
//path.lineTo(x1, y1); <-- this line should be drawn at the end of course,sorry
path.lineTo(x2, y2);
path.lineTo(x3, y3);
path.lineTo(x4, y4);
path.lineTo(x1, y1);
canvas.drawPath(path, currentPaint);
}
Pass your image mat in this method:
void findSquares(Mat image, List<MatOfPoint> squares) {
int N = 10;
squares.clear();
Mat smallerImg = new Mat(new Size(image.width() / 2, image.height() / 2), image.type());
Mat gray = new Mat(image.size(), image.type());
Mat gray0 = new Mat(image.size(), CvType.CV_8U);
// down-scale and upscale the image to filter out the noise
Imgproc.pyrDown(image, smallerImg, smallerImg.size());
Imgproc.pyrUp(smallerImg, image, image.size());
// find squares in every color plane of the image
Outer:
for (int c = 0; c < 3; c++) {
extractChannel(image, gray, c);
// try several threshold levels
Inner:
for (int l = 1; l < N; l++) {
Imgproc.threshold(gray, gray0, (l + 1) * 255 / N, 255, Imgproc.THRESH_BINARY);
List<MatOfPoint> contours = new ArrayList<MatOfPoint>();
// find contours and store them all as a list
Imgproc.findContours(gray0, contours, new Mat(), Imgproc.RETR_LIST, Imgproc.CHAIN_APPROX_SIMPLE);
MatOfPoint approx = new MatOfPoint();
// test each contour
for (int i = 0; i < contours.size(); i++) {
approx = approxPolyDP(contours.get(i), Imgproc.arcLength(new MatOfPoint2f(contours.get(i).toArray()), true) * 0.02, true);
// square contours should have 4 vertices after approximation
// relatively large area (to filter out noisy contours)
// and be convex.
// Note: absolute value of an area is used because
// area may be positive or negative - in accordance with the
// contour orientation
double area = Imgproc.contourArea(approx);
if (area > 5000) {
if (approx.toArray().length == 4 &&
Math.abs(Imgproc.contourArea(approx)) > 1000 &&
Imgproc.isContourConvex(approx)) {
double maxCosine = 0;
Rect bitmap_rect = null;
for (int j = 2; j < 5; j++) {
// find the maximum cosine of the angle between joint edges
double cosine = Math.abs(angle(approx.toArray()[j % 4], approx.toArray()[j - 2], approx.toArray()[j - 1]));
maxCosine = Math.max(maxCosine, cosine);
bitmap_rect = new Rect(approx.toArray()[j % 4], approx.toArray()[j - 2]);
}
// if cosines of all angles are small
// (all angles are ~90 degree) then write quandrange
// vertices to resultant sequence
if (maxCosine < 0.3)
squares.add(approx);
}
}
}
}
}
}
In this method you get four point of document then you can cut this image using below method:
public Bitmap warpDisplayImage(Mat inputMat) {
List<Point> newClockVisePoints = new ArrayList<>();
int resultWidth = inputMat.width();
int resultHeight = inputMat.height();
Mat startM = Converters.vector_Point2f_to_Mat(orderRectCorners(Previes method four poit list(like : List<Point> points)));
Point ocvPOut4 = new Point(0, 0);
Point ocvPOut1 = new Point(0, resultHeight);
Point ocvPOut2 = new Point(resultWidth, resultHeight);
Point ocvPOut3 = new Point(resultWidth, 0);
ocvPOut3 = new Point(0, 0);
ocvPOut4 = new Point(0, resultHeight);
ocvPOut1 = new Point(resultWidth, resultHeight);
ocvPOut2 = new Point(resultWidth, 0);
}
Mat outputMat = new Mat(resultWidth, resultHeight, CvType.CV_8UC4);
List<Point> dest = new ArrayList<Point>();
dest.add(ocvPOut3);
dest.add(ocvPOut2);
dest.add(ocvPOut1);
dest.add(ocvPOut4);
Mat endM = Converters.vector_Point2f_to_Mat(dest);
Mat perspectiveTransform = Imgproc.getPerspectiveTransform(startM, endM);
Imgproc.warpPerspective(inputMat, outputMat, perspectiveTransform, new Size(resultWidth, resultHeight), Imgproc.INTER_CUBIC);
Bitmap descBitmap = Bitmap.createBitmap(outputMat.cols(), outputMat.rows(), Bitmap.Config.ARGB_8888);
Utils.matToBitmap(outputMat, descBitmap);
return descBitmap;
}
I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.
I Have Some static images like below:
Now, I want is, when i touch on the face or hand, then the selected color should be fill on that skin portion.
See below image of result:
So how to get the result like above ??
Redo and Undo Functionality Should be also there.
I have try with the FloodFill color but doing that i can only able to do color in to the perticular portion. as FloodFill only fill the color till the same pixwl color comes. If the touch place pixel color get change the it will not fill color on it.
So Usinf FloodFill i got the result like below image, If i press on the hand, then only hand portion will fill with color, instead of it i want to fill color to the other hand and face also.
So Please help me in this case.
EDITED
After some reply i got the solution like this one.
But still there is a memory issue. It consume lots of memory to draw the color. So please can anyone help me for it ?
You can have a complete image colored the actual way and when you fill a certain region with a color, it will replace all the regions that is specified by that color to be filled in.
Layman's terms:
User will click on the hand of the OUTLINE
That click location will be checked with another image with perfectly color coded regions. Lets call it a MASK for this case. All the skin regions will have the same color. The shirt areas will be another color.
Wherever the user clicks, the selected color by the user will be applied to every pixel that has that similar color in the MASK, but instead of painting directly on the MASK, you paint onto the pixels of the the OUTLINE.
I hope this helps.
Feel free to comment if you want an example and then I can update the answer with that, but I think you can get it from here.
EDIT:
Basically start off with a simple image like this. This we can call as OUTLINE
Then as the developer, you have to do some work. Here, you color code the OUTLINE. The result we call a MASK. To make this we, color code the regions with the same color that you want. This can be done on paint or whatever. I used Photoshop to be cool lol :D.
Then there is the ALGORITHM to get it working on the phone. Before you read the code, look at this variable.
int ANTILAISING_TOLERANCE = 70; //Larger better coloring, reduced sensing
If you zoom up on the image specifically noting the black regions of the border, you can actually see that sometimes, the computer blends the colors a little bit. In order to account for that change, we use this tolerance value.
COLORINGANDROIDACTIVITY.JAVA
package mk.coloring;
import android.app.Activity;
import android.graphics.Bitmap;
import android.graphics.Bitmap.Config;
import android.graphics.BitmapFactory;
import android.graphics.Canvas;
import android.os.Bundle;
import android.view.MotionEvent;
import android.view.View;
import android.widget.ImageView;
import android.view.View.OnTouchListener;
public class ColoringAndroidActivity extends Activity implements OnTouchListener{
/** Called when the activity is first created. */
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
findViewById(R.id.imageView1).setOnTouchListener(this);
}
int ANTILAISING_TOLERANCE = 70;
public boolean onTouch(View arg0, MotionEvent arg1) {
Bitmap mask = BitmapFactory.decodeResource(getResources(), R.drawable.mask);
int selectedColor = mask.getPixel((int)arg1.getX(),(int)arg1.getY());
int sG = (selectedColor & 0x0000FF00) >> 8;
int sR = (selectedColor & 0x00FF0000) >> 16;
int sB = (selectedColor & 0x000000FF);
Bitmap original = BitmapFactory.decodeResource(getResources(), R.drawable.empty);
Bitmap colored = Bitmap.createBitmap(mask.getWidth(), mask.getHeight(), Config.ARGB_8888);
Canvas cv = new Canvas(colored);
cv.drawBitmap(original, 0,0, null);
for(int x = 0; x<mask.getWidth();x++){
for(int y = 0; y<mask.getHeight();y++){
int g = (mask.getPixel(x,y) & 0x0000FF00) >> 8;
int r = (mask.getPixel(x,y) & 0x00FF0000) >> 16;
int b = (mask.getPixel(x,y) & 0x000000FF);
if(Math.abs(sR - r) < ANTILAISING_TOLERANCE && Math.abs(sG - g) < ANTILAISING_TOLERANCE && Math.abs(sB - b) < ANTILAISING_TOLERANCE)
colored.setPixel(x, y, (colored.getPixel(x, y) & 0xFF000000) | 0x00458414);
}
}
((ImageView)findViewById(R.id.imageView1)).setImageBitmap(colored);
return true;
}
}
This code doesn't provide the user with much of color choices. Instead, if the user touches a region, it will look at the MASK and paint the OUTLINE accordingly. But, you can make really interesting and interactive.
RESULT
When I touched the man's hair, it not only colored the hair, but colored his shirt and hand with the same color. Compare it with the MASK to get a good idea of what happened.
This is just a basic idea. I have created multiple Bitmaps but there is not really a need for that. I had used it for testing purposes and takes up unnecessary memory. And you don't need to recreate the mask on every click, etc.
I hope this helps you :D
Good luck
Use a FloodFill Algorithm. Fill the complete canvas but keep the bound fill area as it is like circle, rectangle. You can also check this link. Android: How to fill color to the specific part of the Image only?. The general idea get the x and y co-ordinates on click.
final Point p1 = new Point();
p1.x=(int) x; p1.y=(int) y; X and y are co-ordinates when user clicks on the screen
final int sourceColor= mBitmap.getPixel((int)x,(int) y);
final int targetColor =mPaint.getColor();
new TheTask(mDrawingManager.mDrawingUtilities.mBitmap, p1, sourceColor, targetColor).execute(); //Use AsyncTask and do floodfillin the doinBackground().
Check the above links for floodfill algorithmin android. This should help you achieve what you want. Android FingerPaint Undo/Redo implementation. This should help you modify according to your needs regarding undo and redo.
Edit:
A post on stackoverflow led me to a efficient way of using flood fill algorithm without delay and OOM.
Picking from the SO Post
Filling a small closed area works fine with the above flood fill algorithm. However for large area the algorithm works slow and consumes lot of memory. Recently i came across a post which uses QueueLinear Flood Fill which is way faster that the above.
Source :
http://www.codeproject.com/Articles/16405/Queue-Linear-Flood-Fill-A-Fast-Flood-Fill-Algorith
Code :
public class QueueLinearFloodFiller {
protected Bitmap image = null;
protected int[] tolerance = new int[] { 0, 0, 0 };
protected int width = 0;
protected int height = 0;
protected int[] pixels = null;
protected int fillColor = 0;
protected int[] startColor = new int[] { 0, 0, 0 };
protected boolean[] pixelsChecked;
protected Queue<FloodFillRange> ranges;
// Construct using an image and a copy will be made to fill into,
// Construct with BufferedImage and flood fill will write directly to
// provided BufferedImage
public QueueLinearFloodFiller(Bitmap img) {
copyImage(img);
}
public QueueLinearFloodFiller(Bitmap img, int targetColor, int newColor) {
useImage(img);
setFillColor(newColor);
setTargetColor(targetColor);
}
public void setTargetColor(int targetColor) {
startColor[0] = Color.red(targetColor);
startColor[1] = Color.green(targetColor);
startColor[2] = Color.blue(targetColor);
}
public int getFillColor() {
return fillColor;
}
public void setFillColor(int value) {
fillColor = value;
}
public int[] getTolerance() {
return tolerance;
}
public void setTolerance(int[] value) {
tolerance = value;
}
public void setTolerance(int value) {
tolerance = new int[] { value, value, value };
}
public Bitmap getImage() {
return image;
}
public void copyImage(Bitmap img) {
// Copy data from provided Image to a BufferedImage to write flood fill
// to, use getImage to retrieve
// cache data in member variables to decrease overhead of property calls
width = img.getWidth();
height = img.getHeight();
image = Bitmap.createBitmap(width, height, Bitmap.Config.RGB_565);
Canvas canvas = new Canvas(image);
canvas.drawBitmap(img, 0, 0, null);
pixels = new int[width * height];
image.getPixels(pixels, 0, width, 1, 1, width - 1, height - 1);
}
public void useImage(Bitmap img) {
// Use a pre-existing provided BufferedImage and write directly to it
// cache data in member variables to decrease overhead of property calls
width = img.getWidth();
height = img.getHeight();
image = img;
pixels = new int[width * height];
image.getPixels(pixels, 0, width, 1, 1, width - 1, height - 1);
}
protected void prepare() {
// Called before starting flood-fill
pixelsChecked = new boolean[pixels.length];
ranges = new LinkedList<FloodFillRange>();
}
// Fills the specified point on the bitmap with the currently selected fill
// color.
// int x, int y: The starting coords for the fill
public void floodFill(int x, int y) {
// Setup
prepare();
if (startColor[0] == 0) {
// ***Get starting color.
int startPixel = pixels[(width * y) + x];
startColor[0] = (startPixel >> 16) & 0xff;
startColor[1] = (startPixel >> 8) & 0xff;
startColor[2] = startPixel & 0xff;
}
// ***Do first call to floodfill.
LinearFill(x, y);
// ***Call floodfill routine while floodfill ranges still exist on the
// queue
FloodFillRange range;
while (ranges.size() > 0) {
// **Get Next Range Off the Queue
range = ranges.remove();
// **Check Above and Below Each Pixel in the Floodfill Range
int downPxIdx = (width * (range.Y + 1)) + range.startX;
int upPxIdx = (width * (range.Y - 1)) + range.startX;
int upY = range.Y - 1;// so we can pass the y coord by ref
int downY = range.Y + 1;
for (int i = range.startX; i <= range.endX; i++) {
// *Start Fill Upwards
// if we're not above the top of the bitmap and the pixel above
// this one is within the color tolerance
if (range.Y > 0 && (!pixelsChecked[upPxIdx])
&& CheckPixel(upPxIdx))
LinearFill(i, upY);
// *Start Fill Downwards
// if we're not below the bottom of the bitmap and the pixel
// below this one is within the color tolerance
if (range.Y < (height - 1) && (!pixelsChecked[downPxIdx])
&& CheckPixel(downPxIdx))
LinearFill(i, downY);
downPxIdx++;
upPxIdx++;
}
}
image.setPixels(pixels, 0, width, 1, 1, width - 1, height - 1);
}
// Finds the furthermost left and right boundaries of the fill area
// on a given y coordinate, starting from a given x coordinate, filling as
// it goes.
// Adds the resulting horizontal range to the queue of floodfill ranges,
// to be processed in the main loop.
// int x, int y: The starting coords
protected void LinearFill(int x, int y) {
// ***Find Left Edge of Color Area
int lFillLoc = x; // the location to check/fill on the left
int pxIdx = (width * y) + x;
while (true) {
// **fill with the color
pixels[pxIdx] = fillColor;
// **indicate that this pixel has already been checked and filled
pixelsChecked[pxIdx] = true;
// **de-increment
lFillLoc--; // de-increment counter
pxIdx--; // de-increment pixel index
// **exit loop if we're at edge of bitmap or color area
if (lFillLoc < 0 || (pixelsChecked[pxIdx]) || !CheckPixel(pxIdx)) {
break;
}
}
lFillLoc++;
// ***Find Right Edge of Color Area
int rFillLoc = x; // the location to check/fill on the left
pxIdx = (width * y) + x;
while (true) {
// **fill with the color
pixels[pxIdx] = fillColor;
// **indicate that this pixel has already been checked and filled
pixelsChecked[pxIdx] = true;
// **increment
rFillLoc++; // increment counter
pxIdx++; // increment pixel index
// **exit loop if we're at edge of bitmap or color area
if (rFillLoc >= width || pixelsChecked[pxIdx] || !CheckPixel(pxIdx)) {
break;
}
}
rFillLoc--;
// add range to queue
FloodFillRange r = new FloodFillRange(lFillLoc, rFillLoc, y);
ranges.offer(r);
}
// Sees if a pixel is within the color tolerance range.
protected boolean CheckPixel(int px) {
int red = (pixels[px] >>> 16) & 0xff;
int green = (pixels[px] >>> 8) & 0xff;
int blue = pixels[px] & 0xff;
return (red >= (startColor[0] - tolerance[0])
&& red <= (startColor[0] + tolerance[0])
&& green >= (startColor[1] - tolerance[1])
&& green <= (startColor[1] + tolerance[1])
&& blue >= (startColor[2] - tolerance[2]) && blue <= (startColor[2] + tolerance[2]));
}
// Represents a linear range to be filled and branched from.
protected class FloodFillRange {
public int startX;
public int endX;
public int Y;
public FloodFillRange(int startX, int endX, int y) {
this.startX = startX;
this.endX = endX;
this.Y = y;
}
}
}
One basic way would be something like the floodfill algorythm.
The Wikipedia article describes the algorythm and its variations pretty well.
Here you can find a implementation on SO. But depending on your specific needs this one has to be modified.