optimizing my inner loop (ARM, android ndk)

optimizing my inner loop (ARM, android ndk) - android

I'm writing an image processing app on android, and I'm trying to speed it up using the NDK. I have the following for-loop:
int x, y, c, idx;
const int pitch3 = pitch * 3;
float adj, result;
...
// px, py, u, u_bar are all float arrays of size nx*ny*3
// theta, tau, denom are float constants
// idx >= pitch3
for(y=1;y<ny;++y)
{
for(x=1;x<nx;++x)
{
for(c=0;c<3;++c)
{
adj = -px[idx] - py[idx] + px[idx - 3] + py[idx - pitch3];
result = ((u[idx] - tau * adj) + tau * f[idx]) * denom;
u_bar[idx] = result + theta * (result - u[idx]);
u[idx] = result;
++idx;
}
}
}
I'm wondering if it is possible to speed up this loop?
I'm thinking that using fixed-point arithmetic wouldn't do much, except on really old android phone (which I'm not going to target). Would writing it in assembly give a big improvement?
EDIT: I know I could use SIMD/NEON instructions, but they are not so common I think ...

Since you're accessing the array as a flat structure, the 3 levels of looping is only increasing the value used for idx. You can loop for (idx = pitch3; idx < nx*ny*3; idx++).
Another option is to move to fixed-point math. Do you really need more than 64 bits of dynamic range?

Related

Canvas.drawLines() does not draw anything on KitKat if the input array contains NaN values

I've got a properly generated array of x-y pairs. Some of those values are NaN
The Canvas.drawLines() in ICS, JB 4.1 and LP 5.1 just ignores those values and skips them, drawing the rest of the lines, but for some reason KK does not draw anything in such cases. Any way to fix it?
Turning off antialiasing and hardware acceleration does not help

Run a quick bit of code to remove all the NaN XY pairs. You could equally replace them with copies of the last valid XY pair, depending on how your code is working. You can always pretty efficiently fix these issues. But, if KitKat is doing that either set your app version above that or play ball. It's feeding it into the gpu and it doesn't have any documentation as to how it is required to handle NaN. It apparently messes with the tesselator which later only killed that one line segment but at kitkat would balk at the whole thing.
If you have x-y pairs with NaN just consider those improper. Loop and remove them, efficiently.
public int nanDrop(int index) {
int arrayIndex = index << 1;
int returnIndex = index;
int validPosition = 0;
float mx, my;
for (int pos = 0, s = count; pos < s; pos += 2) {
mx = pointlist[pos];
my = pointlist[pos + 1];
if (pos == arrayIndex) returnIndex = validPosition >> 1;
if (!Float.isNaN(mx)) {
pointlist[validPosition] = mx;
pointlist[validPosition + 1] = my;
validPosition += 2;
}
}
count = validPosition;
return returnIndex;
}

Detect whether a polygon is well formed or not (Google Map Android) [duplicate]

From the man page for XFillPolygon:
If shape is Complex, the path may self-intersect. Note that contiguous coincident points in the path are not treated as self-intersection.
If shape is Convex, for every pair of points inside the polygon, the line segment connecting them does not intersect the path. If known by the client, specifying Convex can improve performance. If you specify Convex for a path that is not convex, the graphics results are undefined.
If shape is Nonconvex, the path does not self-intersect, but the shape is not wholly convex. If known by the client, specifying Nonconvex instead of Complex may improve performance. If you specify Nonconvex for a self-intersecting path, the graphics results are undefined.
I am having performance problems with fill XFillPolygon and, as the man page suggests, the first step I want to take is to specify the correct shape of the polygon. I am currently using Complex to be on the safe side.
Is there an efficient algorithm to determine if a polygon (defined by a series of coordinates) is convex, non-convex or complex?

You can make things a lot easier than the Gift-Wrapping Algorithm... that's a good answer when you have a set of points w/o any particular boundary and need to find the convex hull.
In contrast, consider the case where the polygon is not self-intersecting, and it consists of a set of points in a list where the consecutive points form the boundary. In this case it is much easier to figure out whether a polygon is convex or not (and you don't have to calculate any angles, either):
For each consecutive pair of edges of the polygon (each triplet of points), compute the z-component of the cross product of the vectors defined by the edges pointing towards the points in increasing order. Take the cross product of these vectors:
given p[k], p[k+1], p[k+2] each with coordinates x, y:
dx1 = x[k+1]-x[k]
dy1 = y[k+1]-y[k]
dx2 = x[k+2]-x[k+1]
dy2 = y[k+2]-y[k+1]
zcrossproduct = dx1*dy2 - dy1*dx2
The polygon is convex if the z-components of the cross products are either all positive or all negative. Otherwise the polygon is nonconvex.
If there are N points, make sure you calculate N cross products, e.g. be sure to use the triplets (p[N-2],p[N-1],p[0]) and (p[N-1],p[0],p[1]).
If the polygon is self-intersecting, then it fails the technical definition of convexity even if its directed angles are all in the same direction, in which case the above approach would not produce the correct result.

This question is now the first item in either Bing or Google when you search for "determine convex polygon." However, none of the answers are good enough.
The (now deleted) answer by #EugeneYokota works by checking whether an unordered set of points can be made into a convex polygon, but that's not what the OP asked for. He asked for a method to check whether a given polygon is convex or not. (A "polygon" in computer science is usually defined [as in the XFillPolygon documentation] as an ordered array of 2D points, with consecutive points joined with a side as well as the last point to the first.) Also, the gift wrapping algorithm in this case would have the time-complexity of O(n^2) for n points - which is much larger than actually needed to solve this problem, while the question asks for an efficient algorithm.
#JasonS's answer, along with the other answers that follow his idea, accepts star polygons such as a pentagram or the one in #zenna's comment, but star polygons are not considered to be convex. As
#plasmacel notes in a comment, this is a good approach to use if you have prior knowledge that the polygon is not self-intersecting, but it can fail if you do not have that knowledge.
#Sekhat's answer is correct but it also has the time-complexity of O(n^2) and thus is inefficient.
#LorenPechtel's added answer after her edit is the best one here but it is vague.
A correct algorithm with optimal complexity
The algorithm I present here has the time-complexity of O(n), correctly tests whether a polygon is convex or not, and passes all the tests I have thrown at it. The idea is to traverse the sides of the polygon, noting the direction of each side and the signed change of direction between consecutive sides. "Signed" here means left-ward is positive and right-ward is negative (or the reverse) and straight-ahead is zero. Those angles are normalized to be between minus-pi (exclusive) and pi (inclusive). Summing all these direction-change angles (a.k.a the deflection angles) together will result in plus-or-minus one turn (i.e. 360 degrees) for a convex polygon, while a star-like polygon (or a self-intersecting loop) will have a different sum ( n * 360 degrees, for n turns overall, for polygons where all the deflection angles are of the same sign). So we must check that the sum of the direction-change angles is plus-or-minus one turn. We also check that the direction-change angles are all positive or all negative and not reverses (pi radians), all points are actual 2D points, and that no consecutive vertices are identical. (That last point is debatable--you may want to allow repeated vertices but I prefer to prohibit them.) The combination of those checks catches all convex and non-convex polygons.
Here is code for Python 3 that implements the algorithm and includes some minor efficiencies. The code looks longer than it really is due to the the comment lines and the bookkeeping involved in avoiding repeated point accesses.
TWO_PI = 2 * pi
def is_convex_polygon(polygon):
"""Return True if the polynomial defined by the sequence of 2D
points is 'strictly convex': points are valid, side lengths non-
zero, interior angles are strictly between zero and a straight
angle, and the polygon does not intersect itself.
NOTES: 1. Algorithm: the signed changes of the direction angles
from one side to the next side must be all positive or
all negative, and their sum must equal plus-or-minus
one full turn (2 pi radians). Also check for too few,
invalid, or repeated points.
2. No check is explicitly done for zero internal angles
(180 degree direction-change angle) as this is covered
in other ways, including the `n < 3` check.
"""
try: # needed for any bad points or direction changes
# Check for too few points
if len(polygon) < 3:
return False
# Get starting information
old_x, old_y = polygon[-2]
new_x, new_y = polygon[-1]
new_direction = atan2(new_y - old_y, new_x - old_x)
angle_sum = 0.0
# Check each point (the side ending there, its angle) and accum. angles
for ndx, newpoint in enumerate(polygon):
# Update point coordinates and side directions, check side length
old_x, old_y, old_direction = new_x, new_y, new_direction
new_x, new_y = newpoint
new_direction = atan2(new_y - old_y, new_x - old_x)
if old_x == new_x and old_y == new_y:
return False # repeated consecutive points
# Calculate & check the normalized direction-change angle
angle = new_direction - old_direction
if angle <= -pi:
angle += TWO_PI # make it in half-open interval (-Pi, Pi]
elif angle > pi:
angle -= TWO_PI
if ndx == 0: # if first time through loop, initialize orientation
if angle == 0.0:
return False
orientation = 1.0 if angle > 0.0 else -1.0
else: # if other time through loop, check orientation is stable
if orientation * angle <= 0.0: # not both pos. or both neg.
return False
# Accumulate the direction-change angle
angle_sum += angle
# Check that the total number of full turns is plus-or-minus 1
return abs(round(angle_sum / TWO_PI)) == 1
except (ArithmeticError, TypeError, ValueError):
return False # any exception means not a proper convex polygon

The following Java function/method is an implementation of the algorithm described in this answer.
public boolean isConvex()
{
if (_vertices.size() < 4)
return true;
boolean sign = false;
int n = _vertices.size();
for(int i = 0; i < n; i++)
{
double dx1 = _vertices.get((i + 2) % n).X - _vertices.get((i + 1) % n).X;
double dy1 = _vertices.get((i + 2) % n).Y - _vertices.get((i + 1) % n).Y;
double dx2 = _vertices.get(i).X - _vertices.get((i + 1) % n).X;
double dy2 = _vertices.get(i).Y - _vertices.get((i + 1) % n).Y;
double zcrossproduct = dx1 * dy2 - dy1 * dx2;
if (i == 0)
sign = zcrossproduct > 0;
else if (sign != (zcrossproduct > 0))
return false;
}
return true;
}
The algorithm is guaranteed to work as long as the vertices are ordered (either clockwise or counter-clockwise), and you don't have self-intersecting edges (i.e. it only works for simple polygons).

Here's a test to check if a polygon is convex.
Consider each set of three points along the polygon--a vertex, the vertex before, the vertex after. If every angle is 180 degrees or less you have a convex polygon. When you figure out each angle, also keep a running total of (180 - angle). For a convex polygon, this will total 360.
This test runs in O(n) time.
Note, also, that in most cases this calculation is something you can do once and save — most of the time you have a set of polygons to work with that don't go changing all the time.

To test if a polygon is convex, every point of the polygon should be level with or behind each line.
Here's an example picture:

The answer by #RoryDaulton
seems the best to me, but what if one of the angles is exactly 0?
Some may want such an edge case to return True, in which case, change "<=" to "<" in the line :
if orientation * angle < 0.0: # not both pos. or both neg.
Here are my test cases which highlight the issue :
# A square
assert is_convex_polygon( ((0,0), (1,0), (1,1), (0,1)) )
# This LOOKS like a square, but it has an extra point on one of the edges.
assert is_convex_polygon( ((0,0), (0.5,0), (1,0), (1,1), (0,1)) )
The 2nd assert fails in the original answer. Should it?
For my use case, I would prefer it didn't.

This method would work on simple polygons (no self intersecting edges) assuming that the vertices are ordered (either clockwise or counter)
For an array of vertices:
vertices = [(0,0),(1,0),(1,1),(0,1)]
The following python implementation checks whether the z component of all the cross products have the same sign
def zCrossProduct(a,b,c):
return (a[0]-b[0])*(b[1]-c[1])-(a[1]-b[1])*(b[0]-c[0])
def isConvex(vertices):
if len(vertices)<4:
return True
signs= [zCrossProduct(a,b,c)>0 for a,b,c in zip(vertices[2:],vertices[1:],vertices)]
return all(signs) or not any(signs)

I implemented both algorithms: the one posted by #UriGoren (with a small improvement - only integer math) and the one from #RoryDaulton, in Java. I had some problems because my polygon is closed, so both algorithms were considering the second as concave, when it was convex. So i changed it to prevent such situation. My methods also uses a base index (which can be or not 0).
These are my test vertices:
// concave
int []x = {0,100,200,200,100,0,0};
int []y = {50,0,50,200,50,200,50};
// convex
int []x = {0,100,200,100,0,0};
int []y = {50,0,50,200,200,50};
And now the algorithms:
private boolean isConvex1(int[] x, int[] y, int base, int n) // Rory Daulton
{
final double TWO_PI = 2 * Math.PI;
// points is 'strictly convex': points are valid, side lengths non-zero, interior angles are strictly between zero and a straight
// angle, and the polygon does not intersect itself.
// NOTES: 1. Algorithm: the signed changes of the direction angles from one side to the next side must be all positive or
// all negative, and their sum must equal plus-or-minus one full turn (2 pi radians). Also check for too few,
// invalid, or repeated points.
// 2. No check is explicitly done for zero internal angles(180 degree direction-change angle) as this is covered
// in other ways, including the `n < 3` check.
// needed for any bad points or direction changes
// Check for too few points
if (n <= 3) return true;
if (x[base] == x[n-1] && y[base] == y[n-1]) // if its a closed polygon, ignore last vertex
n--;
// Get starting information
int old_x = x[n-2], old_y = y[n-2];
int new_x = x[n-1], new_y = y[n-1];
double new_direction = Math.atan2(new_y - old_y, new_x - old_x), old_direction;
double angle_sum = 0.0, orientation=0;
// Check each point (the side ending there, its angle) and accum. angles for ndx, newpoint in enumerate(polygon):
for (int i = 0; i < n; i++)
{
// Update point coordinates and side directions, check side length
old_x = new_x; old_y = new_y; old_direction = new_direction;
int p = base++;
new_x = x[p]; new_y = y[p];
new_direction = Math.atan2(new_y - old_y, new_x - old_x);
if (old_x == new_x && old_y == new_y)
return false; // repeated consecutive points
// Calculate & check the normalized direction-change angle
double angle = new_direction - old_direction;
if (angle <= -Math.PI)
angle += TWO_PI; // make it in half-open interval (-Pi, Pi]
else if (angle > Math.PI)
angle -= TWO_PI;
if (i == 0) // if first time through loop, initialize orientation
{
if (angle == 0.0) return false;
orientation = angle > 0 ? 1 : -1;
}
else // if other time through loop, check orientation is stable
if (orientation * angle <= 0) // not both pos. or both neg.
return false;
// Accumulate the direction-change angle
angle_sum += angle;
// Check that the total number of full turns is plus-or-minus 1
}
return Math.abs(Math.round(angle_sum / TWO_PI)) == 1;
}
And now from Uri Goren
private boolean isConvex2(int[] x, int[] y, int base, int n)
{
if (n < 4)
return true;
boolean sign = false;
if (x[base] == x[n-1] && y[base] == y[n-1]) // if its a closed polygon, ignore last vertex
n--;
for(int p=0; p < n; p++)
{
int i = base++;
int i1 = i+1; if (i1 >= n) i1 = base + i1-n;
int i2 = i+2; if (i2 >= n) i2 = base + i2-n;
int dx1 = x[i1] - x[i];
int dy1 = y[i1] - y[i];
int dx2 = x[i2] - x[i1];
int dy2 = y[i2] - y[i1];
int crossproduct = dx1*dy2 - dy1*dx2;
if (i == base)
sign = crossproduct > 0;
else
if (sign != (crossproduct > 0))
return false;
}
return true;
}

For a non complex (intersecting) polygon to be convex, vector frames obtained from any two connected linearly independent lines a,b must be point-convex otherwise the polygon is concave.
For example the lines a,b are convex to the point p and concave to it below for each case i.e. above: p exists inside a,b and below: p exists outside a,b
Similarly for each polygon below, if each line pair making up a sharp edge is point-convex to the centroid c then the polygon is convex otherwise it’s concave.
blunt edges (wronged green) are to be ignored
N.B
This approach would require you compute the centroid of your polygon beforehand since it doesn’t employ angles but vector algebra/transformations

Adapted Uri's code into matlab. Hope this may help.
Be aware that Uri's algorithm only works for simple polygons! So, be sure to test if the polygon is simple first!
% M [ x1 x2 x3 ...
% y1 y2 y3 ...]
% test if a polygon is convex
function ret = isConvex(M)
N = size(M,2);
if (N<4)
ret = 1;
return;
end
x0 = M(1, 1:end);
x1 = [x0(2:end), x0(1)];
x2 = [x0(3:end), x0(1:2)];
y0 = M(2, 1:end);
y1 = [y0(2:end), y0(1)];
y2 = [y0(3:end), y0(1:2)];
dx1 = x2 - x1;
dy1 = y2 - y1;
dx2 = x0 - x1;
dy2 = y0 - y1;
zcrossproduct = dx1 .* dy2 - dy1 .* dx2;
% equality allows two consecutive edges to be parallel
t1 = sum(zcrossproduct >= 0);
t2 = sum(zcrossproduct <= 0);
ret = t1 == N || t2 == N;
end

How to improve OCR Accuracy which use tesseract? [duplicate]

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?

fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.

I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.

As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing

Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE

This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.

What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.

Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}

The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.

Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html

I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.

Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.

Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033

you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here

So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.

Android audio FFT to display fundamental frequency

I have been working on an Android project for awhile that displays the fundamental frequency of an input signal (to act as a tuner). I have successfully implemented the AudioRecord class and am getting data from it. However, I am having a hard time performing an FFT on this data to get the fundamental frequency of the input signal. I have been looking at the post here, and am using FFT in Java and Complex class to go with it.
I have successfully used the FFT function found in FFT in Java, but I am not sure if I am obtaining the correct results. For the magnitude of the FFT (sqrt[rere+imim]) I am getting values that start high, around 15000 Hz, and then slowly diminish to about 300 Hz. Doesn't seem right.
Also, as far as the raw data from the mic goes, the data seems fine, except that the first 50 values or so are always the number 3, unless I hit the tuning button again while still in the application and then I only get about 15. Is that normal?
Here is a bit of my code.
First of all, I convert the short data (obtained from the microphone) to a double using the following code which is from the post I have been looking at. This snippet of code I do not completely understand, but I think it works.
//Conversion from short to double
double[] micBufferData = new double[bufferSizeInBytes];//size may need to change
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 1.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bufferSizeInBytes - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = audioData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
The code then continues as follows:
//Create Complex array for use in FFT
Complex[] fftTempArray = new Complex[bufferSizeInBytes];
for (int i=0; i<bufferSizeInBytes; i++)
{
fftTempArray[i] = new Complex(micBufferData[i], 0);
}
//Obtain array of FFT data
final Complex[] fftArray = FFT.fft(fftTempArray);
final Complex[] fftInverse = FFT.ifft(fftTempArray);
//Create an array of magnitude of fftArray
double[] magnitude = new double[fftArray.length];
for (int i=0; i<fftArray.length; i++){
magnitude[i]= fftArray[i].abs();
}
fft.setTextColor(Color.GREEN);
fft.setText("fftArray is "+ fftArray[500] +" and fftTempArray is "+fftTempArray[500] + " and fftInverse is "+fftInverse[500]+" and audioData is "+audioData[500]+ " and magnitude is "+ magnitude[1] + ", "+magnitude[500]+", "+magnitude[1000]+" Good job!");
for(int i = 2; i < samples; i++){
fft.append(" " + magnitude[i] + " Hz");
}
That last bit is just to check what values I am getting (and to keep me sane!). In the post referred to above, it talks about needing the sampling frequency and gives this code:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}
How do I implement this code? I don't realy understand where fftOutWindowSize and arrayIndex comes from?
Any help is greatly appreciated!
Dustin

Recently I'm working on a project which requires almost the same. Probably you don't need any help anymore but I will give my thoughts anyway. Maybe someone need this in the future.
I'm not sure whether the short to double function works, I don't understand that snippet of code neither. It is wrote for byte to double conversion.
In the code: "double[] micBufferData = new double[bufferSizeInBytes];" I think the size of micBufferData should be "bufferSizeInBytes / 2", since every sample takes two bytes and the size of micBufferData should be the sample number.
FFT algorithms do require a FFT window size, and it has to be a number which is the power of 2. However many algorithms can receive an arbitrary of number as input and it will do the rest. In the document of those algorithms should have the requirements of input. In your case, the size of the Complex array can be the input of FFT algorithms. And I don't really know the detail of the FFT algorithm but I think the inverse one is not needed.
To use the code you gave at last, you should firstly find the peak index in the sample array. I used double array as input instead of Complex, so in my case it is something like: double maxVal = -1;int maxIndex = -1;
for( int j=0; j < mFftSize / 2; ++j ) {
double v = fftResult[2*j] * fftResult[2*j] + fftResult[2*j+1] * fftResult[2*j+1];
if( v > maxVal ) {
maxVal = v;
maxIndex = j;
}
}
2*j is the real part and 2*j+1 is the imaginary part. maxIndex is the index of the peak magnitude you want (More detail here), and use it as input to the ComputeFrequency function. The return value is the frequency of the sample array you want.
Hopefully it can help someone.

You should pick an FFT window size depending on your time versus frequency resolution requirements, and not just use the audio buffer size when creating your FFT temp array.
The array index is your int i, as used in your magnitude[i] print statement.
The fundamental pitch frequency for music is often different from FFT peak magnitude, so you may want to research some pitch estimation algorithms.

I suspect that the strange results you're getting are because you might need to unpack the FFT. How this is done will depend on the library that you're using (see here for docs on how it's packed in GSL, for example). The packing may mean that the real and imaginary components are not in the positions in the array that you expect.
For your other questions about window size and resolution, if you're creating a tuner then I'd suggest trying a window size of about 20ms (eg 1024 samples at 44.1kHz). For a tuner you need quite high resolution, so you could try zero-padding by a factor of 8 or 16 which will give you a resolution of 3-6Hz.

Simple Image Sharpening algorithm for Android App

I am looking for a simple image sharpening algorithm to use in my Android application. I have a grayscale image captured from video (mostly used for text) and I would like to sharpen it because my phone does not have auto focus, and the close object distance blurs the text. I don't have any background in image processing. But as a user, I am familiar with unsharp masking and other sharpening tools available in Gimp, Photoshop, etc. I didn't see any support for image processing in the Android API, and hence am looking for a method to implement myself. Thanks.

This is a simple image sharpening algorithm.
You should pass to this function width, height and byte[] array of your grayscale image and it will sharpen the image in this byte[] array.
void sharpen(int width, int height, byte* yuv) {
char *mas;
mas = (char *) malloc(width * height);
memcpy(mas, yuv, width * height);
signed int res;
int ywidth;
for (int y = 1; y < height - 1; y++) {
ywidth = y * width;
for (int x = 1; x < width - 1; x++) {
res = (
mas[x + ywidth] * 5
- mas[x - 1 + ywidth]
- mas[x + 1 + ywidth]
- mas[x + (ywidth + width)]
- mas[x + (ywidth - width)]
);
if (res > 255) {
res = 255;
};
if (res < 0) {
res = 0;
};
yuv[x + ywidth] = res;
}
}
free(mas);
}

If you have access to pixel information, your most basic option woul be a sharpening convolution kernel. Take a look at the following sites, you can learn more about sharpening kernels and how to apply kernels there.
link1
link2

ImageJ has many algorithms in Java and is freely available.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.