YUV_420_888 interpretation on Samsung Galaxy S7 (Camera2) - android

I wrote a conversion from YUV_420_888 to Bitmap, considering the following logic (as I understand it):
To summarize the approach: the kernel’s coordinates x and y are congruent both with the x and y of the non-padded part of the Y-Plane (2d-allocation) and the x and y of the output-Bitmap. The U- and V-Planes, however, have a different structure than the Y-Plane, because they use 1 byte for coverage of 4 pixels, and, in addition, may have a PixelStride that is more than one, in addition they might also have a padding that can be different from that of the Y-Plane. Therefore, in order to access the U’s and V’s efficiently by the kernel I put them into 1-d allocations and created an index “uvIndex” that gives the position of the corresponding U- and V within that 1-d allocation, for given (x,y) coordinates in the (non-padded) Y-plane (and, so, the output Bitmap).
In order to keep the rs-Kernel lean, I excluded the padding area in the yPlane by capping the x-range via LaunchOptions (this reflects the RowStride of the y-plane which thus can be ignored WITHIN the kernel). So we just need to consider the uvPixelStride and uvRowStride within the uvIndex, i.e. the index used in order to access to the u- and v-values.
This is my code:
Renderscript Kernel, named yuv420888.rs
#pragma version(1)
#pragma rs java_package_name(com.xxxyyy.testcamera2);
#pragma rs_fp_relaxed
int32_t width;
int32_t height;
uint picWidth, uvPixelStride, uvRowStride ;
rs_allocation ypsIn,uIn,vIn;
// The LaunchOptions ensure that the Kernel does not enter the padding zone of Y, so yRowStride can be ignored WITHIN the Kernel.
uchar4 __attribute__((kernel)) doConvert(uint32_t x, uint32_t y) {
// index for accessing the uIn's and vIn's
uint uvIndex= uvPixelStride * (x/2) + uvRowStride*(y/2);
// get the y,u,v values
uchar yps= rsGetElementAt_uchar(ypsIn, x, y);
uchar u= rsGetElementAt_uchar(uIn, uvIndex);
uchar v= rsGetElementAt_uchar(vIn, uvIndex);
// calc argb
int4 argb;
argb.r = yps + v * 1436 / 1024 - 179;
argb.g = yps -u * 46549 / 131072 + 44 -v * 93604 / 131072 + 91;
argb.b = yps +u * 1814 / 1024 - 227;
argb.a = 255;
uchar4 out = convert_uchar4(clamp(argb, 0, 255));
return out;
}
Java side:
private Bitmap YUV_420_888_toRGB(Image image, int width, int height){
// Get the three image planes
Image.Plane[] planes = image.getPlanes();
ByteBuffer buffer = planes[0].getBuffer();
byte[] y = new byte[buffer.remaining()];
buffer.get(y);
buffer = planes[1].getBuffer();
byte[] u = new byte[buffer.remaining()];
buffer.get(u);
buffer = planes[2].getBuffer();
byte[] v = new byte[buffer.remaining()];
buffer.get(v);
// get the relevant RowStrides and PixelStrides
// (we know from documentation that PixelStride is 1 for y)
int yRowStride= planes[0].getRowStride();
int uvRowStride= planes[1].getRowStride(); // we know from documentation that RowStride is the same for u and v.
int uvPixelStride= planes[1].getPixelStride(); // we know from documentation that PixelStride is the same for u and v.
// rs creation just for demo. Create rs just once in onCreate and use it again.
RenderScript rs = RenderScript.create(this);
//RenderScript rs = MainActivity.rs;
ScriptC_yuv420888 mYuv420=new ScriptC_yuv420888 (rs);
// Y,U,V are defined as global allocations, the out-Allocation is the Bitmap.
// Note also that uAlloc and vAlloc are 1-dimensional while yAlloc is 2-dimensional.
Type.Builder typeUcharY = new Type.Builder(rs, Element.U8(rs));
//using safe height
typeUcharY.setX(yRowStride).setY(y.length / yRowStride);
Allocation yAlloc = Allocation.createTyped(rs, typeUcharY.create());
yAlloc.copyFrom(y);
mYuv420.set_ypsIn(yAlloc);
Type.Builder typeUcharUV = new Type.Builder(rs, Element.U8(rs));
// note that the size of the u's and v's are as follows:
// ( (width/2)*PixelStride + padding ) * (height/2)
// = (RowStride ) * (height/2)
// but I noted that on the S7 it is 1 less...
typeUcharUV.setX(u.length);
Allocation uAlloc = Allocation.createTyped(rs, typeUcharUV.create());
uAlloc.copyFrom(u);
mYuv420.set_uIn(uAlloc);
Allocation vAlloc = Allocation.createTyped(rs, typeUcharUV.create());
vAlloc.copyFrom(v);
mYuv420.set_vIn(vAlloc);
// handover parameters
mYuv420.set_picWidth(width);
mYuv420.set_uvRowStride (uvRowStride);
mYuv420.set_uvPixelStride (uvPixelStride);
Bitmap outBitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);
Allocation outAlloc = Allocation.createFromBitmap(rs, outBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
Script.LaunchOptions lo = new Script.LaunchOptions();
lo.setX(0, width); // by this we ignore the y’s padding zone, i.e. the right side of x between width and yRowStride
//using safe height
lo.setY(0, y.length / yRowStride);
mYuv420.forEach_doConvert(outAlloc,lo);
outAlloc.copyTo(outBitmap);
return outBitmap;
}
Testing on Nexus 7 (API 22) this returns nice color Bitmaps. This device, however, has trivial pixelstrides (=1) and no padding (i.e. rowstride=width). Testing on the brandnew Samsung S7 (API 23) I get pictures whose colors are not correct - except of the green ones. But the Picture does not show a general bias towards green, it just seems that non-green colors are not reproduced correctly. Note, that the S7 applies an u/v pixelstride of 2, and no padding.
Since the most crucial code line is within the rs-code the Access of the u/v planes uint uvIndex= (...) I think, there could be the problem, probably with incorrect consideration of pixelstrides here. Does anyone see the solution? Thanks.
UPDATE: I checked everything, and I am pretty sure that the code regarding the access of y,u,v is correct. So the problem must be with the u and v values themselves. Non green colors have a purple tilt, and looking at the u,v values they seem to be in a rather narrow range of about 110-150. Is it really possible that we need to cope with device specific YUV -> RBG conversions...?! Did I miss anything?
UPDATE 2: have corrected code, it works now, thanks to Eddy's Feedback.

Look at
floor((float) uvPixelStride*(x)/2)
which calculates your U,V row offset (uv_row_offset) from the Y x-coordinate.
if uvPixelStride = 2, then as x increases:
x = 0, uv_row_offset = 0
x = 1, uv_row_offset = 1
x = 2, uv_row_offset = 2
x = 3, uv_row_offset = 3
and this is incorrect. There's no valid U/V pixel value at uv_row_offset = 1 or 3, since uvPixelStride = 2.
You want
uvPixelStride * floor(x/2)
(assuming you don't trust yourself to remember the critical round-down behavior of integer divide, if you do then):
uvPixelStride * (x/2)
should be enough
With that, your mapping becomes:
x = 0, uv_row_offset = 0
x = 1, uv_row_offset = 0
x = 2, uv_row_offset = 2
x = 3, uv_row_offset = 2
See if that fixes the color errors. In practice, the incorrect addressing here would mean every other color sample would be from the wrong color plane, since it's likely that the underlying YUV data is semiplanar (so the U plane starts at V plane + 1 byte, with the two planes interleaved)

For people who encounter error
android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type
use buffer.capacity() instead of buffer.remaining()
and if you already made some operations on the image, you'll need to call rewind() method on the buffer.

Furthermore for anyone else getting
android.support.v8.renderscript.RSIllegalArgumentException: Array too
small for allocation type
I fixed it by changing yAlloc.copyFrom(y); to yAlloc.copy1DRangeFrom(0, y.length, y);

Posting full solution to convert YUV->BGR (can be adopted for other formats too) and also rotate image to upright using renderscript. Allocation is used as input and byte array is used as output. It was tested on Android 8+ including Samsung devices too.
Java
/**
* Renderscript-based process to convert YUV_420_888 to BGR_888 and rotation to upright.
*/
public class ImageProcessor {
protected final String TAG = this.getClass().getSimpleName();
private Allocation mInputAllocation;
private Allocation mOutAllocLand;
private Allocation mOutAllocPort;
private Handler mProcessingHandler;
private ScriptC_yuv_bgr mConvertScript;
private byte[] frameBGR;
public ProcessingTask mTask;
private ImageListener listener;
private Supplier<Integer> rotation;
public ImageProcessor(RenderScript rs, Size dimensions, ImageListener listener, Supplier<Integer> rotation) {
this.listener = listener;
this.rotation = rotation;
int w = dimensions.getWidth();
int h = dimensions.getHeight();
Type.Builder yuvTypeBuilder = new Type.Builder(rs, Element.YUV(rs));
yuvTypeBuilder.setX(w);
yuvTypeBuilder.setY(h);
yuvTypeBuilder.setYuvFormat(ImageFormat.YUV_420_888);
mInputAllocation = Allocation.createTyped(rs, yuvTypeBuilder.create(),
Allocation.USAGE_IO_INPUT | Allocation.USAGE_SCRIPT);
//keep 2 allocations to handle different image rotations
mOutAllocLand = createOutBGRAlloc(rs, w, h);
mOutAllocPort = createOutBGRAlloc(rs, h, w);
frameBGR = new byte[w*h*3];
HandlerThread processingThread = new HandlerThread(this.getClass().getSimpleName());
processingThread.start();
mProcessingHandler = new Handler(processingThread.getLooper());
mConvertScript = new ScriptC_yuv_bgr(rs);
mConvertScript.set_inWidth(w);
mConvertScript.set_inHeight(h);
mTask = new ProcessingTask(mInputAllocation);
}
private Allocation createOutBGRAlloc(RenderScript rs, int width, int height) {
//Stored as Vec4, it's impossible to store as Vec3, buffer size will be for Vec4 anyway
//using RGB_888 as alternative for BGR_888, can be just U8_3 type
Type.Builder rgbTypeBuilderPort = new Type.Builder(rs, Element.RGB_888(rs));
rgbTypeBuilderPort.setX(width);
rgbTypeBuilderPort.setY(height);
Allocation allocation = Allocation.createTyped(
rs, rgbTypeBuilderPort.create(), Allocation.USAGE_SCRIPT
);
//Use auto-padding to be able to copy to x*h*3 bytes array
allocation.setAutoPadding(true);
return allocation;
}
public Surface getInputSurface() {
return mInputAllocation.getSurface();
}
/**
* Simple class to keep track of incoming frame count,
* and to process the newest one in the processing thread
*/
class ProcessingTask implements Runnable, Allocation.OnBufferAvailableListener {
private int mPendingFrames = 0;
private Allocation mInputAllocation;
public ProcessingTask(Allocation input) {
mInputAllocation = input;
mInputAllocation.setOnBufferAvailableListener(this);
}
#Override
public void onBufferAvailable(Allocation a) {
synchronized(this) {
mPendingFrames++;
mProcessingHandler.post(this);
}
}
#Override
public void run() {
// Find out how many frames have arrived
int pendingFrames;
synchronized(this) {
pendingFrames = mPendingFrames;
mPendingFrames = 0;
// Discard extra messages in case processing is slower than frame rate
mProcessingHandler.removeCallbacks(this);
}
// Get to newest input
for (int i = 0; i < pendingFrames; i++) {
mInputAllocation.ioReceive();
}
int rot = rotation.get();
mConvertScript.set_currentYUVFrame(mInputAllocation);
mConvertScript.set_rotation(rot);
Allocation allocOut = rot==90 || rot== 270 ? mOutAllocPort : mOutAllocLand;
// Run processing
// ain allocation isn't really used, global frame param is used to get data from
mConvertScript.forEach_yuv_bgr(allocOut);
//Save to byte array, BGR 24bit
allocOut.copyTo(frameBGR);
int w = allocOut.getType().getX();
int h = allocOut.getType().getY();
if (listener != null) {
listener.onImageAvailable(frameBGR, w, h);
}
}
}
public interface ImageListener {
/**
* Called when there is available image, image is in upright position.
*
* #param bgr BGR 24bit bytes
* #param width image width
* #param height image height
*/
void onImageAvailable(byte[] bgr, int width, int height);
}
}
RS
#pragma version(1)
#pragma rs java_package_name(com.affectiva.camera)
#pragma rs_fp_relaxed
//Script convers YUV to BGR(uchar3)
//current YUV frame to read pixels from
rs_allocation currentYUVFrame;
//input image rotation: 0,90,180,270 clockwise
uint32_t rotation;
uint32_t inWidth;
uint32_t inHeight;
//method returns uchar3 BGR which will be set to x,y in output allocation
uchar3 __attribute__((kernel)) yuv_bgr(uint32_t x, uint32_t y) {
// Read in pixel values from latest frame - YUV color space
uchar3 inPixel;
uint32_t xRot = x;
uint32_t yRot = y;
//Do not rotate if 0
if (rotation==90) {
//rotate 270 clockwise
xRot = y;
yRot = inHeight - 1 - x;
} else if (rotation==180) {
xRot = inWidth - 1 - x;
yRot = inHeight - 1 - y;
} else if (rotation==270) {
//rotate 90 clockwise
xRot = inWidth - 1 - y;
yRot = x;
}
inPixel.r = rsGetElementAtYuv_uchar_Y(currentYUVFrame, xRot, yRot);
inPixel.g = rsGetElementAtYuv_uchar_U(currentYUVFrame, xRot, yRot);
inPixel.b = rsGetElementAtYuv_uchar_V(currentYUVFrame, xRot, yRot);
// Convert YUV to RGB, JFIF transform with fixed-point math
// R = Y + 1.402 * (V - 128)
// G = Y - 0.34414 * (U - 128) - 0.71414 * (V - 128)
// B = Y + 1.772 * (U - 128)
int3 bgr;
//get red pixel and assing to b
bgr.b = inPixel.r +
inPixel.b * 1436 / 1024 - 179;
bgr.g = inPixel.r -
inPixel.g * 46549 / 131072 + 44 -
inPixel.b * 93604 / 131072 + 91;
//get blue pixel and assign to red
bgr.r = inPixel.r +
inPixel.g * 1814 / 1024 - 227;
// Write out
return convert_uchar3(clamp(bgr, 0, 255));
}

On a Samsung Galaxy Tab 5 (Tablet), android version 5.1.1 (22), with alleged YUV_420_888 format, the following renderscript math works well and produces correct colors:
uchar yValue = rsGetElementAt_uchar(gCurrentFrame, x + y * yRowStride);
uchar vValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) );
uchar uValue = rsGetElementAt_uchar(gCurrentFrame, ( (x/2) + (y/4) * yRowStride ) + (xSize * ySize) + (xSize * ySize) / 4);
I do not understand why the horizontal value (i.e., y) is scaled by a factor of four instead of two, but it works well. I also needed to avoid use of rsGetElementAtYuv_uchar_Y|U|V. I believe the associated allocation stride value is set to zero instead of something proper. Use of rsGetElementAt_uchar() is a reasonable work-around.
On a Samsung Galaxy S5 (Smart Phone), android version 5.0 (21), with alleged YUV_420_888 format, I cannot recover the u and v values, they come through as all zeros. This results in a green looking image. Luminous is OK, but image is vertically flipped.

This code requires the use of the RenderScript compatibility library (android.support.v8.renderscript.*).
In order to get the compatibility library to work with Android API 23, I updated to gradle-plugin 2.1.0 and Build-Tools 23.0.3 as per Miao Wang's answer at How to create Renderscript scripts on Android Studio, and make them run?
If you follow his answer and get an error "Gradle version 2.10 is required" appears, do NOT change
classpath 'com.android.tools.build:gradle:2.1.0'
Instead, update the distributionUrl field of the Project\gradle\wrapper\gradle-wrapper.properties file to
distributionUrl=https\://services.gradle.org/distributions/gradle-2.10-all.zip
and change File > Settings > Builds,Execution,Deployment > Build Tools > Gradle >Gradle to Use default gradle wrapper as per "Gradle Version 2.10 is required." Error.

Re: RSIllegalArgumentException
In my case this was the case that buffer.remaining() was not multiple of stride:
The length of last line was less than stride (i.e. only up to where actual data was.)

An FYI in case someone else gets this as I was also getting "android.support.v8.renderscript.RSIllegalArgumentException: Array too small for allocation type" when trying out the code. In my case it turns out that the when allocating the buffer for Y i had to rewind the buffer because it was being left at the wrong end and wasn't copying the data. By doing buffer.rewind(); before allocation the new bytes array makes it work fine now.

Related

Scaling YUV420 Image using libyuv produces weird output

Possible duplicate of This question with major parts picked from here. I've tried whatever solutions were provided there, they don't work for me.
Background
I'm capturing an image in YUV_420_888 image format returned from ARCore's frame.acquireCameraImage() method. Since I've set the camera configuration at 1920*1080 resolution, I need to scale it down to 224*224 to pass it to my tensorflow-lite implementation. I do that by using LibYuv library through the Android NDK.
Implementation
Prepare the image frames
//Figure out the source image dimensions
int y_size = srcWidth * srcHeight;
//Get dimensions of the desired output image
int out_size = destWidth * destHeight;
//Generate input frame
i420_input_frame.width = srcWidth;
i420_input_frame.height = srcHeight;
i420_input_frame.data = (uint8_t*) yuvArray;
i420_input_frame.y = i420_input_frame.data;
i420_input_frame.u = i420_input_frame.y + y_size;
i420_input_frame.v = i420_input_frame.u + (y_size / 4);
//Generate output frame
free(i420_output_frame.data);
i420_output_frame.width = destWidth;
i420_output_frame.height = destHeight;
i420_output_frame.data = new unsigned char[out_size * 3 / 2];
i420_output_frame.y = i420_output_frame.data;
i420_output_frame.u = i420_output_frame.y + out_size;
i420_output_frame.v = i420_output_frame.u + (out_size / 4);
I scale my image using Libyuv's I420Scale method
libyuv::FilterMode mode = libyuv::FilterModeEnum::kFilterBox;
jint result = libyuv::I420Scale(i420_input_frame.y, i420_input_frame.width,
i420_input_frame.u, i420_input_frame.width / 2,
i420_input_frame.v, i420_input_frame.width / 2,
i420_input_frame.width, i420_input_frame.height,
i420_output_frame.y, i420_output_frame.width,
i420_output_frame.u, i420_output_frame.width / 2,
i420_output_frame.v, i420_output_frame.width / 2,
i420_output_frame.width, i420_output_frame.height,
mode);
and return it to java
//Create a new byte array to return to the caller in Java
jbyteArray outputArray = env -> NewByteArray(out_size * 3 / 2);
env -> SetByteArrayRegion(outputArray, 0, out_size, (jbyte*) i420_output_frame.y);
env -> SetByteArrayRegion(outputArray, out_size, out_size / 4, (jbyte*) i420_output_frame.u);
env -> SetByteArrayRegion(outputArray, out_size + (out_size / 4), out_size / 4, (jbyte*) i420_output_frame.v);
Actual image :
What it looks like post scaling :
What it looks like if I create an Image from the i420_input_frame without scaling :
Since the scaling messes up the colors big time, tensorflow fails to recognize objects properly. (It recognizes properly in their sample application) What am I doing wrong to mess up the colors big time?
Either I was doing something wrong (Which I couldn't fix) or LibYuv does not handle colors properly while dealing with YUV images from Android.
Refer official bug posted on Libyuv library : https://bugs.chromium.org/p/libyuv/issues/detail?id=815&can=1&q=&sort=-id
They suggested I use a method Android420ToI420() first and then I apply whatever transformations I need. I ended up using Android420ToI420() first, then Scaling, then transformation to RGB. In the end, the output was slightly better than the cup image posted above but the distorted colors were still present. I ended up using OpenCV to shrink the image and convert it to RGBA or RGB formats.
// The camera image received is in YUV YCbCr Format at preview dimensions
// so we will scale it down to 224x224 size using OpenCV
// Y plane (0) non-interleaved => stride == 1; U/V plane interleaved => stride == 2
// Refer : https://developer.android.com/reference/android/graphics/ImageFormat.html#YUV_420_888
val cameraPlaneY = cameraImage.planes[0].buffer
val cameraPlaneUV = cameraImage.planes[1].buffer
// Create a new Mat with OpenCV. One for each plane - Y and UV
val y_mat = Mat(cameraImage.height, cameraImage.width, CvType.CV_8UC1, cameraPlaneY)
val uv_mat = Mat(cameraImage.height / 2, cameraImage.width / 2, CvType.CV_8UC2, cameraPlaneUV)
var mat224 = Mat()
var cvFrameRGBA = Mat()
// Retrieve an RGBA frame from the produced YUV
Imgproc.cvtColorTwoPlane(y_mat, uv_mat, cvFrameRGBA, Imgproc.COLOR_YUV2BGRA_NV21)
//Then use this frame to retrieve all RGB channel data
//Iterate over all pixels and retrieve information of RGB channels
for(rows in 1 until cvFrameRGBA.rows())
for(cols in 1 until cvFrameRGBA.cols()) {
val imageData = cvFrameRGBA.get(rows, cols)
// Type of Mat is 24
// Channels is 4
// Depth is 0
rgbBytes.put(imageData[0].toByte())
rgbBytes.put(imageData[1].toByte())
rgbBytes.put(imageData[2].toByte())
}
The color problem is caused because you are working with a different YUV format. The YUV format that camera frameworks use is YUV NV21. This format (NV21) is the standard picture format on Android camera preview. YUV 4:2:0 planar image, with 8 bit Y samples, followed by interleaved V/U plane with 8bit 2x2 subsampled chroma samples.
If your colors are inversed, it means that:
You are working with a YUV NV12 (plane U is V and V is U).
One of your color planes is doing something weird.
To work properly with libyuv I suggest you to convert your camera output to a YUV I420 using transformI420 method and sending the format by parameter:
return libyuv::ConvertToI420(src, srcSize, //src data
dstY, dstWidth, //dst planes
dstU, dstWidth / 2,
dstV, dstWidth / 2,
cropLeft, cropTop, //crop start
srcWidth, srcHeight, //src dimensions
cropRight - cropLeft, cropBottom - cropTop, //dst dimensions
rotationMode,
libyuv::FOURCC_NV21); //libyuv::FOURCC_NV12
After do this conversion you will be able to properly work with libyuv using all the I420scale, I420rotate... and so on. Your scale method should look like:
JNIEXPORT jint JNICALL
Java_com_aa_project_images_yuv_myJNIcl_scaleI420(JNIEnv *env, jclass type,
jobject srcBufferY,
jobject srcBufferU,
jobject srcBufferV,
jint srcWidth, jint srcHeight,
jobject dstBufferY,
jobject dstBufferU,
jobject dstBufferV,
jint dstWidth, jint dstHeight,
jint filterMode) {
const uint8_t *srcY = static_cast<uint8_t *>(env->GetDirectBufferAddress(srcBufferY));
const uint8_t *srcU = static_cast<uint8_t *>(env->GetDirectBufferAddress(srcBufferU));
const uint8_t *srcV = static_cast<uint8_t *>(env->GetDirectBufferAddress(srcBufferV));
uint8_t *dstY = static_cast<uint8_t *>(env->GetDirectBufferAddress(dstBufferY));
uint8_t *dstU = static_cast<uint8_t *>(env->GetDirectBufferAddress(dstBufferU));
uint8_t *dstV = static_cast<uint8_t *>(env->GetDirectBufferAddress(dstBufferV));
return libyuv::I420Scale(srcY, srcWidth,
srcU, srcWidth / 2,
srcV, srcWidth / 2,
srcWidth, srcHeight,
dstY, dstWidth,
dstU, dstWidth / 2,
dstV, dstWidth / 2,
dstWidth, dstHeight,
static_cast<libyuv::FilterMode>(filterMode));
}
If you want to convert this image to a JPEG after all the process. You can use I420toNV21 method and after that use the android native conversion from YUV to JPEG. Also you can use libJpegTurbo which is a complementary library for this kind of situations.

Crop image data using Renderscript Launch Options

I have an byte[] array of a greyvalue image dataIn with dimension width * height. Now, I want to crop the image by applying a translation (dx, dy) and cut-off the out-of-border regions, so that the dataOut has dimension (width-abs(dx))*(height-abs(dy)).
In RenderScript I would use a 2-d uchar-Allocation both for the input and the output. In order to efficiently apply the crop operation, it was thinking of using the LaunchOptions with (for example) setX(dx,width) and setY(0, height-dy) and apply a trivial kernel that just takes values from a subset of the original dimensions.
However, when using the Launch Options, the out-Allocation still has the original size width*height, i.e. the cropped parts will just be shown as zeros - but, I actually want them to be removed, i.e. out-Allocation be of reduced dimension.
Question: is there a solution in RS in order to perform this cropping job more elegantly? Thanks for your feedback.
UPDATE: I think, I found the solution. It is by defining the out-Allocation as a script global at the reduced dimensions from the outset, passing dx and dy as well as globals and then apply rsSetElementAt_uchar to set the values of the out-Allocation. Will give an udpate later.
So, here is my quick rs-crop tool, taking 5ms for cropping a 500k pixel image. It uses LaunchOptions and reduced dimensions for the cropped output. Should you need to crop a Bitmap, just use element type U8_4 and allocation uchar_4 instead of U8 and uchar, respectively.
The crop.rs file:
#pragma version(1)
#pragma rs java_package_name(com.xxx.yyy)
#pragma rs_fp_relaxed
int32_t width;
int32_t height;
rs_allocation croppedImg;
uint xStart, yStart;
void __attribute__((kernel)) doCrop(uchar in,uint32_t x, uint32_t y) {
rsSetElementAt_uchar(croppedImg,in, x-xStart, y-yStart);
}
The Java part:
// data1 is the byte[] array with (grayvalue) data of size
// width*height you want to crop.
// define crop shift (dx, dy) here
int dx=0; // (-width < dx < width);
int dy=250; // (- height < dy < height);
int xStart=0, xEnd=0;
int yStart=0, yEnd=0;
// x direction
if (dx<0) {
xStart= Math.abs(dx);
xEnd=width;
}
else {
xStart = 0;
xEnd = width - Math.abs(dx);
}
// same for y direction
if (dy<0) {
yStart= Math.abs(dy);
yEnd=height;
}
else {
yStart = 0;
yEnd = height - Math.abs(dy);
}
// initiate rs and crop script
RenderScript rs = RenderScript.create(this);
ScriptC_crop mcropScr=new ScriptC_crop (rs);
// define allocations. Note the reduced size of cropAlloc
Type.Builder typeUCHAR = new Type.Builder(rs, Element.U8(rs));
typeUCHAR.setX(width).setY(height);
inAlloc = Allocation.createTyped(rs, typeUCHAR.create());
inAlloc.copyFrom(data1);
Type.Builder TypeUCHARCropped = new Type.Builder(rs, Element.U8(rs));
TypeUCHARCropped.setX(xEnd-xStart).setY(yEnd-yStart);
Allocation cropAlloc = Allocation.createTyped(rs, TypeUCHARCropped.create());
mcropScr.set_croppedImg(cropAlloc);
mcropScr.set_xStart(xStart);
mcropScr.set_yStart(yStart);
Script.LaunchOptions lo = new Script.LaunchOptions();
lo.setX(xStart, xEnd);
lo.setY(yStart, yEnd);
mcropScr.forEach_doCrop(inAlloc, lo);
byte[] data1_cropped =new byte[(xEnd-xStart)*(yEnd-yStart)];
cropAlloc.copyTo(data1_cropped);
Similar idea to the other answer, but this matches the style of Google's API for Intrinsics:
#pragma version(1)
#pragma rs java_package_name(com.sicariusnoctis.collaborativeintelligence)
#pragma rs_fp_relaxed
rs_allocation input;
uint32_t xStart, yStart;
uchar4 RS_KERNEL crop(uint32_t x, uint32_t y) {
return rsGetElementAt_uchar4(input, x + xStart, y + yStart);
}
To set up:
fun setup(
width: Int, height: Int,
new_width: Int, new_height: Int,
xStart: Int, yStart: Int
) {
val inputType = Type.createXY(rs, Element.RGBA_8888(rs), width, height)
val outputType = Type.createXY(rs, Element.RGBA_8888(rs), new_width, new_height)
inputAllocation = Allocation.createTyped(rs, inputType, Allocation.USAGE_SCRIPT)
outputAllocation = Allocation.createTyped(rs, outputType, Allocation.USAGE_SCRIPT)
crop = ScriptC_crop(rs)
crop._xStart = xStart.toLong()
crop._yStart = yStart.toLong()
crop._input = inputAllocation
}
And to execute:
fun execute(inputArray: ByteArray): ByteArray {
inputAllocation.copyFrom(inputArray)
crop.forEach_crop(outputAllocation)
val outputArray = ByteArray(outputAllocation.bytesSize)
outputAllocation.copyTo(outputArray)
return outputArray
}

Memory allocation for YUV buffer to convert into RGB

I'm facing issue on few android devices while copying the data from DecodeFrame2()
This is my code:
uint8_t* m_yuvData[3];
SBufferInfo yuvDataInfo;
memset(&yuvDataInfo, 0, sizeof(SBufferInfo));
m_yuvData[0] = NULL;
m_yuvData[1] = NULL;
m_yuvData[2] = NULL;
DECODING_STATE decodingState = m_decoder->DecodeFrame2(bufferData, bufferDataSize, m_yuvData, &yuvDataInfo);
if(yuvDataInfo.iBufferStatus == 1)
{
int yStride = yuvDataInfo->UsrData.sSystemBuffer.iStride[0];
int uvStride = yuvDataInfo->UsrData.sSystemBuffer.iStride[1];
uint32_t width = yuvDataInfo->UsrData.sSystemBuffer.iWidth;
uint32_t height = yuvDataInfo->UsrData.sSystemBuffer.iHeight;
size_t yDataSize = (width * height) + (height * yStride);
size_t uvDataSize = (((width * height) / 4) + (height * uvStride));
size_t requiredSize = yDataSize + (2 * uvDataSize);
uint8_t* yuvBufferedData = (uint8_t*)malloc(requiredSize);
// when i move yuvData[0] to another location i am getting crash.
memcpy(yuvBufferedData, yuvData[0], yDataSize);
memcpy(yuvBufferedData + yDataSize, yuvData[1], uvDataSize);
memcpy(yuvBufferedData + yDataSize + uvDataSize, yuvData[2], uvDataSize);
}
The above code snippet is working on high end android devices. but on few android devices after processing first frame, second frame onwards i am getting crash in first memcpy() statement.
What is wrong in this code? and how to calculate the buffer size from the output of DecodeFrame2().
If i process alternative frames(instead of 30, just 15 frames alternative ones),
it is copying fine.
Please help me to fix this?
yDataSize and uvDataSize is very huge based on the above formula,
This issue has been fixed by modifying the size.

MediaCodec different colours on genymotion and huddle 2

My aim:
Use filters (cropping, Black and white, Edge detection) on a MP4 video from the SD card using render script.
Attempted Solutions
-Use MediaCodec to output to a surface directly.
The rendered colour were correct but I could not find a way to process each frame at a time to apply filters using renderscript.
-Copy the decoded buffer from render script and convert to RGB using ScriptIntrinsicYuvToRGB
I can not using ScriptIntrinsicYuvToRGB because it assumes the incoming YUV data is formatted in a different way from the incoming data. (How do I correctly convert YUV colours to RGB in Android?)
-Copy the decoded buffer from render script and convert to RGB using custom code
My current solution converts the YUV to RGB using code very similar to https://stackoverflow.com/a/12702836/601147 (Thanks #Derzu)
/**
* Converts YUV420 NV21 to RGB8888
*
* #param data byte array on YUV420 NV21 format.
* #param width pixels width
* #param height pixels height
* #return a RGB8888 pixels int array. Where each int is a pixels ARGB.
*/
public int[] convertYUV420_NV21toRGB8888(byte [] data, int width, int height) {
int size = width*height;
int offset = size;
int[] pixels = new int[size];
int u, v, y1, y2, y3, y4;
// i percorre os Y and the final pixels
// k percorre os pixles U e V
for(int i=0, k=0; i < size; i+=2, k+=2) {
y1 = data[i]&0xff;
y2 = data[i+1]&0xff;
y3 = data[width+i ]&0xff;
y4 = data[width+i+1]&0xff;
u = data[offset+k ]&0xff;
v = data[offset+k+1]&0xff;
u = u-128;
v = v-128;
// pixels[i] = convertYUVtoRGB(y1, v,u);
// pixels[i+1] = convertYUVtoRGB(y2, v,u);
// pixels[width+i ] = convertYUVtoRGB(y3, v,u);
// pixels[width+i+1] = convertYUVtoRGB(y4, v,u);
pixels[i] = convertYUVtoRGB(y1, u, v);
pixels[i+1] = convertYUVtoRGB(y2, u, v);
pixels[width+i ] = convertYUVtoRGB(y3, u, v);
pixels[width+i+1] = convertYUVtoRGB(y4, u, v);
if (i!=0 && (i+2)%width==0)
i+=width;
}
return pixels;
}
private int convertYUVtoRGB(int y, int u, int v) {
int r,g,b;
r = y + (int)1.402f*v;
g = y - (int)(0.344f*u +0.714f*v);
b = y + (int)1.772f*u;
r = r>255? 255 : r<0 ? 0 : r;
g = g>255? 255 : g<0 ? 0 : g;
b = b>255? 255 : b<0 ? 0 : b;
return 0xff000000 | (b<<16) | (g<<8) | r;
}
The only difference I made was to flip the UV values from
pixels[i] = convertYUVtoRGB(y1, v,u);
to
pixels[i] = convertYUVtoRGB(y1, u, v);
Because the latter one works better.
My problem
The colours look ok-ish on my Huddle 2, but they are totally wrong on the Genymotion JellyBean emulator.
Huddle2
Genymotion
Please can you offer me help and suggestions?
Thanks you very much
You need to check which color format it actually uses - this assumes that the output is NV12, but it looks like your output is I420 or something similar (planar, not semiplanar). Have a look at http://bigflake.com/mediacodec/, in particular https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java (the checkFrame method).
The key element is this:
int colorFormat = format.getInteger(MediaFormat.KEY_COLOR_FORMAT);
The checkFrame method also shows you how to deal with it if it is planar instead of semiplanar, which should work for your case.
Do keep in mind that devices are also allowed to output in proprietary formats, that you can't interpret as simple as this - you need to be ready to handle that case in some way (e.g. telling the user that your app can't handle it, fall back to other SW based implementations, etc).

How to improve OCR Accuracy which use tesseract? [duplicate]

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image
There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.
If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.
I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.
I started with a colorized, RGB 445x747 pixel jpg.
I immediately tried tesseract on this, and the program converted almost nothing.
I then went into GIMP and did the following.
image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0
I then saved as a new jpg at 100% quality.
Tesseract then was able to extract all the text into a .txt file
Gimp is your friend.
As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:
Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
Converting image to grayscale:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):
cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.
In case you'd like to check them out, here I'm sharing the links with you:
Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing
Three points to improve the readability of the image:
Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).
Refer below code :
Resize
public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
{
Bitmap temp = (Bitmap)bmp;
Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
double nWidthFactor = (double)temp.Width / (double)newWidth;
double nHeightFactor = (double)temp.Height / (double)newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
Color color1 = new Color();
Color color2 = new Color();
Color color3 = new Color();
Color color4 = new Color();
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.Width; ++x)
{
for (int y = 0; y < bmap.Height; ++y)
{
fr_x = (int)Math.Floor(x * nWidthFactor);
fr_y = (int)Math.Floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= temp.Width) cx = fr_x;
cy = fr_y + 1;
if (cy >= temp.Height) cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = temp.GetPixel(fr_x, fr_y);
color2 = temp.GetPixel(cx, fr_y);
color3 = temp.GetPixel(fr_x, cy);
color4 = temp.GetPixel(cx, cy);
// Blue
bp1 = (byte)(nx * color1.B + fx * color2.B);
bp2 = (byte)(nx * color3.B + fx * color4.B);
nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Green
bp1 = (byte)(nx * color1.G + fx * color2.G);
bp2 = (byte)(nx * color3.G + fx * color4.G);
nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
// Red
bp1 = (byte)(nx * color1.R + fx * color2.R);
bp2 = (byte)(nx * color3.R + fx * color4.R);
nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));
bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
(255, nRed, nGreen, nBlue));
}
}
bmap = SetGrayscale(bmap);
bmap = RemoveNoise(bmap);
return bmap;
}
SetGrayscale
public Bitmap SetGrayscale(Bitmap img)
{
Bitmap temp = (Bitmap)img;
Bitmap bmap = (Bitmap)temp.Clone();
Color c;
for (int i = 0; i < bmap.Width; i++)
{
for (int j = 0; j < bmap.Height; j++)
{
c = bmap.GetPixel(i, j);
byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
}
}
return (Bitmap)bmap.Clone();
}
RemoveNoise
public Bitmap RemoveNoise(Bitmap bmap)
{
for (var x = 0; x < bmap.Width; x++)
{
for (var y = 0; y < bmap.Height; y++)
{
var pixel = bmap.GetPixel(x, y);
if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
bmap.SetPixel(x, y, Color.Black);
else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
bmap.SetPixel(x, y, Color.White);
}
}
return bmap;
}
INPUT IMAGE
OUTPUT IMAGE
This is somewhat ago but it still might be useful.
My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.
Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.
What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project.
http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.
Java version for Sathyaraj's code above:
// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
Bitmap bmap = img.copy(img.getConfig(), true);
double nWidthFactor = (double) img.getWidth() / (double) newWidth;
double nHeightFactor = (double) img.getHeight() / (double) newHeight;
double fx, fy, nx, ny;
int cx, cy, fr_x, fr_y;
int color1;
int color2;
int color3;
int color4;
byte nRed, nGreen, nBlue;
byte bp1, bp2;
for (int x = 0; x < bmap.getWidth(); ++x) {
for (int y = 0; y < bmap.getHeight(); ++y) {
fr_x = (int) Math.floor(x * nWidthFactor);
fr_y = (int) Math.floor(y * nHeightFactor);
cx = fr_x + 1;
if (cx >= img.getWidth())
cx = fr_x;
cy = fr_y + 1;
if (cy >= img.getHeight())
cy = fr_y;
fx = x * nWidthFactor - fr_x;
fy = y * nHeightFactor - fr_y;
nx = 1.0 - fx;
ny = 1.0 - fy;
color1 = img.getPixel(fr_x, fr_y);
color2 = img.getPixel(cx, fr_y);
color3 = img.getPixel(fr_x, cy);
color4 = img.getPixel(cx, cy);
// Blue
bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Green
bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
// Red
bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));
bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
}
}
bmap = setGrayscale(bmap);
bmap = removeNoise(bmap);
return bmap;
}
// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
Bitmap bmap = img.copy(img.getConfig(), true);
int c;
for (int i = 0; i < bmap.getWidth(); i++) {
for (int j = 0; j < bmap.getHeight(); j++) {
c = bmap.getPixel(i, j);
byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
+ .114 * Color.blue(c));
bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
}
}
return bmap;
}
// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
bmap.setPixel(x, y, Color.BLACK);
}
}
}
for (int x = 0; x < bmap.getWidth(); x++) {
for (int y = 0; y < bmap.getHeight(); y++) {
int pixel = bmap.getPixel(x, y);
if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
bmap.setPixel(x, y, Color.WHITE);
}
}
}
return bmap;
}
The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.
To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).
More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:
$ tesseract --oem 1 -l deu page.png result pdf
(this example selects the german language)
Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.
Adaptive thresholding is important if the lighting is uneven across the image.
My preprocessing using GraphicsMagic is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.
Another method of thresholding using OpenCV is described here:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
I did these to get good results out of an image which has not very small text.
Apply blur to the original image.
Apply Adaptive Threshold.
Apply Sharpening effect.
And if the still not getting good results, scale the image to 150% or 200%.
Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.
1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.
2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.
This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.
Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.
I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.
Please write a comment if you have a suggestion or better idea on this topic to improve it.
https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
you can do noise reduction and then apply thresholding, but that you can you can play around with the configuration of the OCR by changing the --psm and --oem values
try:
--psm 5
--oem 2
you can also look at the following link for further details
here
So far, I've played a lot with tesseract 3.x, 4.x and 5.0.0.
tesseract 4.x and 5.x seem to yield the exact same accuracy.
Sometimes, I get better results with legacy engine (using --oem 0) and sometimes I get better results with LTSM engine --oem 1.
Generally speaking, I get the best results on upscaled images with LTSM engine. The latter is on par with my earlier engine (ABBYY CLI OCR 11 for Linux).
Of course, the traineddata needs to be downloaded from github, since most linux distros will only provide the fast versions.
The trained data that will work for both legacy and LTSM engines can be downloaded at https://github.com/tesseract-ocr/tessdata with some command like the following. Don't forget to download the OSD trained data too.
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/eng.traineddata
curl -L https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata?raw=true -o /usr/share/tesseract/tessdata/osd.traineddata
I've ended up using ImageMagick as my image preprocessor since it's convenient and can easily run scripted. You can install it with yum install ImageMagick or apt install imagemagick depending on your distro flavor.
So here's my oneliner preprocessor that fits most of the stuff I feed to my OCR:
convert my_document.jpg -units PixelsPerInch -respect-parenthesis \( -compress LZW -resample 300 -bordercolor black -border 1 -trim +repage -fill white -draw "color 0,0 floodfill" -alpha off -shave 1x1 \) \( -bordercolor black -border 2 -fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -deskew 40 +repage \) -antialias -sharpen 0x3 preprocessed_my_document.tiff
Basically we:
use TIFF format since tesseract likes it more than JPG (decompressor related, who knows)
use lossless LZW TIFF compression
Resample the image to 300dpi
Use some black magic to remove unwanted colors
Try to rotate the page if rotation can be detected
Antialias the image
Sharpen text
The latter image can than be fed to tesseract with:
tesseract -l eng preprocessed_my_document.tiff - --oem 1 -psm 1
Btw, some years ago I wrote the 'poor man's OCR server' which checks for changed files in a given directory and launches OCR operations on all not already OCRed files. pmocr is compatible with tesseract 3.x-5.x and abbyyocr11.
See the pmocr project on github.

Categories

Resources