Dealing with Android's texture size limit - android

I have a requirement to display somewhat big images on an Android app.
Right now I'm using an ImageView with a source Bitmap.
I understand openGL has a certain device-independent limitation as to
how big the image dimensions can be in order for it to process it.
Is there ANY way to display these images (with fixed width, without cropping) regardless of this limit,
other than splitting the image into multiple ImageView elements?
Thank you.
UPDATE 01 Apr 2013
Still no luck so far all suggestions were to reduce image quality. One suggested it might be possible to bypass this limitation by using the CPU to do the processing instead of using the GPU (though might take more time to process).
I don't understand, is there really no way to display long images with a fixed width without reducing image quality? I bet there is, I'd love it if anyone would at least point me to the right direction.
Thanks everyone.

You can use BitmapRegionDecoder to break apart larger bitmaps (requires API level 10). I've wrote a method that will utilize this class and return a single Drawable that can be placed inside an ImageView:
private static final int MAX_SIZE = 1024;
private Drawable createLargeDrawable(int resId) throws IOException {
InputStream is = getResources().openRawResource(resId);
BitmapRegionDecoder brd = BitmapRegionDecoder.newInstance(is, true);
try {
if (brd.getWidth() <= MAX_SIZE && brd.getHeight() <= MAX_SIZE) {
return new BitmapDrawable(getResources(), is);
}
int rowCount = (int) Math.ceil((float) brd.getHeight() / (float) MAX_SIZE);
int colCount = (int) Math.ceil((float) brd.getWidth() / (float) MAX_SIZE);
BitmapDrawable[] drawables = new BitmapDrawable[rowCount * colCount];
for (int i = 0; i < rowCount; i++) {
int top = MAX_SIZE * i;
int bottom = i == rowCount - 1 ? brd.getHeight() : top + MAX_SIZE;
for (int j = 0; j < colCount; j++) {
int left = MAX_SIZE * j;
int right = j == colCount - 1 ? brd.getWidth() : left + MAX_SIZE;
Bitmap b = brd.decodeRegion(new Rect(left, top, right, bottom), null);
BitmapDrawable bd = new BitmapDrawable(getResources(), b);
bd.setGravity(Gravity.TOP | Gravity.LEFT);
drawables[i * colCount + j] = bd;
}
}
LayerDrawable ld = new LayerDrawable(drawables);
for (int i = 0; i < rowCount; i++) {
for (int j = 0; j < colCount; j++) {
ld.setLayerInset(i * colCount + j, MAX_SIZE * j, MAX_SIZE * i, 0, 0);
}
}
return ld;
}
finally {
brd.recycle();
}
}
The method will check to see if the drawable resource is smaller than MAX_SIZE (1024) in both axes. If it is, it just returns the drawable. If it's not, it will break the image apart and decode chunks of the image and place them in a LayerDrawable.
I chose 1024 because I believe most available phones will support images at least that large. If you want to find the actual texture size limit for a phone, you have to do some funky stuff through OpenGL, and it's not something I wanted to dive into.
I wasn't sure how you were accessing your images, so I assumed they were in your drawable folder. If that's not the case, it should be fairly easy to refactor the method to take in whatever parameter you need.

You can use BitmapFactoryOptions to reduce size of picture.You can use somthing like that :
BitmapFactory.Options options = new BitmapFactory.Options();
options.inSampleSize = 3; //reduce size 3 times

Have you seen how your maps working? I had made a renderer for maps once. You can use same trick to display your image.
Divide your image into square tiles (e.g. of 128x128 pixels). Create custom imageView supporting rendering from tiles. Your imageView knows which part of bitmap it should show now and displays only required tiles loading them from your sd card. Using such tile map you can display endless images.

It would help if you gave us the dimensions of your bitmap.
Please understand that OpenGL runs against natural mathematical limits.
For instance, there is a very good reason a texture in OpenGL must be 2 to the power of x. This is really the only way the math of any downscaling can be done cleanly without any remainder.
So if you give us the exact dimensions of the smallest bitmap that's giving you trouble, some of us may be able to tell you what kind of actual limit you're running up against.

Related

Android YUV to grayscale performance optimization

I'm trying to convert an YUV image to grayscale, so basically I just need the Y values.
To do so I wrote this little piece of code (with frame being the YUV image):
imageConversionTime = System.currentTimeMillis();
size = frame.getSize();
byte nv21ByteArray[] = frame.getImage();
int lol;
for (int i = 0; i < size.width; i++) {
for (int j = 0; j < size.height; j++) {
lol = size.width*j + i;
yMatrix.put(j, i, nv21ByteArray[lol]);
}
}
bitmap = Bitmap.createBitmap(size.width, size.height, Bitmap.Config.ARGB_8888);
Utils.matToBitmap(yMatrix, bitmap);
imageConversionTime = System.currentTimeMillis() - imageConversionTime;
However, this takes about 13500 ms. I need it to be A LOT faster (on my computer it takes 8.5 ms in python) (I work on a Motorola Moto E 4G 2nd generation, not super powerful but it should be enough for converting images right?).
Any suggestions?
Thanks in advance.
First of all I would assign size.width and size.height to a variable. I don't think the compiler will optimize this by default, but I am not sure about this.
Furthermore Create a byte[] representing the result instead of using a Matrix.
Then you could do something like this:
int[] grayScalePixels = new int[size.width * size.height];
int cntPixels = 0;
In your inner loop set
grayScalePixels[cntPixels] = nv21ByteArray[lol];
cntPixels++;
To get your final image do the following:
Bitmap grayScaleBitmap = Bitmap.createBitmap(grayScalePixels, size.width, size.height, Bitmap.Config.ARGB_8888);
Hope it works properly (I have not tested it, however at least the shown principle should be applicable -> relying on a byte[] instead of Matrix)
Probably 2 years too late but anyways ;)
To convert to gray scale, all you need to do is set the u/v values to 128 and leave the y values as is. Note that this code is for YUY2 format. You can refer to this document for other formats.
private void convertToBW(byte[] ptrIn, String filePath) {
// change all u and v values to 127 (cause 128 will cause byte overflow)
byte[] ptrOut = Arrays.copyOf(ptrIn, ptrIn.length);
for (int i = 0, ptrInLength = ptrOut.length; i < ptrInLength; i++) {
if (i % 2 != 0) {
ptrOut[i] = (byte) 127;
}
}
convertToJpeg(ptrOut, filePath);
}
For NV21/NV12, I think the loop would change to:
for (int i = ptrOut.length/2, ptrInLength = ptrOut.length; i < ptrInLength; i++) {}
Note: (didn't try this myself)
Also I would suggest to profile your utils method and createBitmap functions separately.

LibGDX texture bleeding issue

I'm new to LibGDX and was trying to implement parallax background.
Everything went good until I faced such issue: I get some stripes when scrolling background. You can see it in attached image:
So I looked deeper into an issue and figured out that this some sort of texture bleeding. But the case is that my textures already have [Linear, Nearest] filter set and TexturePacker uses duplicatePadding. Actually, I don't know any other methods to solve this issue. Please help!
Here's some of my code:
TexturePacker
TexturePacker.Settings settings = new TexturePacker.Settings();
settings.minWidth = 256;
settings.minHeight = 256;
settings.duplicatePadding = true;
TexturePacker.process(settings, "../../design", "./", "textures");
AssetLoader
textureAtlas = new TextureAtlas(Gdx.files.internal("textures.atlas"));
for (int i = 0; i < 2; i++) {
Background.skies.add(textureAtlas.findRegion("background/sky", i));
Background.skies.get(i).getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
}
for (int i = 0; i < 2; i++) {
Background.clouds.add(textureAtlas.findRegion("background/cloud", i));
Background.clouds.get(i).getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
}
for (int i = 0; i < 8; i++) {
Background.cities.add(textureAtlas.findRegion("background/city", i));
Background.cities.get(i).getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
}
Background.moon = textureAtlas.findRegion("background/moon");
Background.forest = textureAtlas.findRegion("background/forest");
Background.road = textureAtlas.findRegion("background/road");
Background.moon.getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
Background.forest.getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
Background.road.getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
BackgroundDrawer
private void drawParallaxTextureList(Batch batch, List<TextureAtlas.AtlasRegion> list,
float moveX, float posY) {
for (int i = 0; i < list.size(); i++) {
boolean needDraw = false;
float shift = GameScreen.VIEWPORT_WIDTH * i;
float drawX = 0.0f;
if (shift - moveX <= -(GameScreen.VIEWPORT_WIDTH)) { // If it's behind the screen
if (i == 0) { // If it's first element
if (moveX >= GameScreen.VIEWPORT_WIDTH * (list.size() - 1)) { // We need to show first after last
needDraw = true;
drawX = (GameScreen.VIEWPORT_WIDTH) - (moveX - ((GameScreen
.VIEWPORT_WIDTH) * (list.size() - 1)));
}
}
} else if (shift - moveX < (GameScreen.VIEWPORT_WIDTH - 1)) {
needDraw = true;
drawX = shift - moveX;
}
if (needDraw) {
batch.draw(list.get(i), (int) drawX, (int) posY);
}
}
}
NOTE: I don't use any camera for drawing right now. I only use FitViewport with size of 1920x1280. Also, bleeding sometimes appears even in FullHD resolution.
UPDATE: Setting both Nearest filters for minification and magification with increasing paddingX and disabling antialiasing solved issue, but final image become too ugly! Is there way to avoid disabling antialiasing? Because without it, downscale look awful.
Try to set both min and mag filters as Nearest
.setFilter(Texture.TextureFilter.Nearest, Texture.TextureFilter.Nearest);
In GUI TexturePacker there is an option to extrude graphics - it means repeating every of border pixel of texture. Then you can set both filters to Linear
.setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Linear);
but unfortunately I cannot see this option in the TexturePacker.Settings object you are using. You can try to set Linear to both but I'm pretty sure it won't be working (Linear filter takes nearest 4 texels to generate the one so it will probably still generate issues).
Try to use GUI Texturepacker then with extrude option maybe
A few possible reasons for this artifact:
Maybe the padding is not big enough when the sprite resolution is shrunk down. Try changing your texture packer's filterMin to MipMapLinearNearest. And also try increasing the size of paddingX and paddingY.
Maybe you're seeing dim or brightened pixels at the edge of your sprite because you're not using pre-multiplied alpha and your texture's background color (where its alpha is zero) is white. Try setting premultiplyAlpha: true. If you do this, you need to also change the SpriteBatch's blend function to (GL20.GL_ONE, GL20.GL_ONE_MINUS_SRC_ALPHA) to render properly.
You seem to be rounding your sprite positions and sizes to integers when you draw them. This would work in a pixel perfect game, where you're sure the sprites are being rendered exactly at 1:1 resolution to the screen. But once the screen size does not match exactly, your rounding might produce gaps that are less than 1 pixel wide, which will look like semi-transparent pixels.

How to improve OCR Accuracy which use tesseract? [duplicate]

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.

Creating a 1bpp (bit per pixel) Bitmap in Android

Ok so been racking my brain on this one all day. Trying to figure out how I can convert a Bitmap from canvas to a 1bpp (bit per pixel) Bitmap file in Android and physically save it as such.
So far I've iterated through the bitmap and created an int[] of the resulting pixel values as 1s or 0s. However, my next question is what do I do with that?
What I tried to do was something like
int[] bits = // populated earlier
byte[] bmp = new byte[bits.length / 8];
int byteindex = 0;
int bitindex = 0;
for (int i=0; i<bits.length; i++) {
if (bits[i] == 1)
// set to 1
else
// set to 0
if (bitindex++ == 8) {
bitindex = 0;
byteindex++;
}
}
OutputStream out = new FileOutputStream("/mnt/sdcard/dynbmp.bmp");
out.write(bmp);
out.close();
I get a file out of it but it's obviously not a valid bmp file. Who knows what it is. You'll have to forgive me for my lack of bit-byte and imaging knowledge, but where am I screwing up? Do I the idea completely wrong? Am I missing some header info or something?
Yes, you are missing several things. It's a little bit more complicated... Look here:
http://en.wikipedia.org/wiki/BMP_file_format

ALPHA_8 bitmaps and getPixel

I am trying to load a movement map from a PNG image. In order to save memory
after I load the bitmap I do something like that.
`Bitmap mapBmp = tempBmp.copy(Bitmap.Config.ALPHA_8, false);`
If I draw the mapBmp I can see the map but when I use getPixel() I get
always 0 (zero).
Is there a way to retrieve ALPHA information from a bitmap other than
with getPixel() ?
Seems to be an Android bug in handling ALPHA_8. I also tried copyPixelsToBuffer, to no avail. Simplest workaround is to waste lots of memory and use ARGB_8888.
Issue 25690
I found this question from Google and I was able to extract the pixels using the copyPixelsToBuffer() method that Mitrescu Catalin ended up using. This is what my code looks like in case anyone else finds this as well:
public byte[] getPixels(Bitmap b) {
int bytes = b.getRowBytes() * b.getHeight();
ByteBuffer buffer = ByteBuffer.allocate(bytes);
b.copyPixelsToBuffer(buffer);
return buffer.array();
}
If you are coding for API level 12 or higher you could use getByteCount() instead to get the total number of bytes to allocate. However if you are coding for API level 19 (KitKat) you should probably use getAllocationByteCount() instead.
I was able to find a nice and sort of clean way to create boundary maps. I create an ALPHA_8 bitmap from the start. I paint my boundry map with paths. Then I use the copyPixelsToBuffer() and transfer the bytes into a ByteBuffer. I use the buffer to "getPixels" from.
I think is a good solution since you can scale down or up the path() and draw the boundary map at the desired screen resolution scale and no IO + decode operations.
Bitmap.getPixel() is useless for ALPHA_8 bitmaps, it always returns 0.
I developed solution with PNGJ library, to read image from assets and then create Bitmap with Config.ALPHA_8.
import ar.com.hjg.pngj.IImageLine;
import ar.com.hjg.pngj.ImageLineHelper;
import ar.com.hjg.pngj.PngReader;
public Bitmap getAlpha8BitmapFromAssets(String file) {
Bitmap result = null;
try {
PngReader pngr = new PngReader(getAssets().open(file));
int channels = pngr.imgInfo.channels;
if (channels < 3 || pngr.imgInfo.bitDepth != 8)
throw new RuntimeException("This method is for RGB8/RGBA8 images");
int bytes = pngr.imgInfo.cols * pngr.imgInfo.rows;
ByteBuffer buffer = ByteBuffer.allocate(bytes);
for (int row = 0; row < pngr.imgInfo.rows; row++) {
IImageLine l1 = pngr.readRow();
for (int j = 0; j < pngr.imgInfo.cols; j++) {
int original_color = ImageLineHelper.getPixelARGB8(l1, j);
byte x = (byte) Color.alpha(original_color);
buffer.put(row * pngr.imgInfo.cols + j, x ^= 0xff);
}
}
pngr.end();
result = Bitmap.createBitmap(pngr.imgInfo.cols,pngr.imgInfo.rows, Bitmap.Config.ALPHA_8);
result.copyPixelsFromBuffer(buffer);
} catch (IOException e) {
Log.e(LOG_TAG, e.getMessage());
}
return result;
}
I also invert alpha values, because of my particular needs. This code is only tested for API 21.

Categories

Resources