I'm developing an Android app which uses tesseract OCR to recognize Text, now I have the Problem that on different Smartphones the image gets rotate in a different way, so on one it is in landscape mode right away and on the other in portrait mode. So now i want to intelligently rotate the Image so that Tesseract can recognize the Text. Which is only in one of the two options possible, but it might be in either, due to the user taking the picture. I don't want the User to have to take the picture in the same format everytime, i want to rotate it so it fits the need, if possible without too much of a performance loss.
The Tesseract lib with the autorotate does not seem to work for me in that way.
Anybody an idea how to solve that problem.
Thanks
If this question is still relevant for you: Maybe you can extract the exif data of the image, to get its orientation?
Otherwise this paper maybe can help you: Combined Orientation and Script Detection using the Tesseract OCR Engine.
If you don't mind rolling your sleeves up, http://www.leptonica.org/ is probably a good option to evaluate the glyphs (raw Pix that is not detected as text yet) and determine orientation. I've seen references to Android bindings for Leptonica.
Related
Link To Image. Please Look at Image to Understand Question
I have a bigger problem to solve but I have to take a small step first.
Bigger Problem Statement: In above-Linked Image, There is a series of Led Lights and a Digit next to them. I have to Read the Digit (Not Those F and R Letters) of Led which is currently Glowing. And this should happen when You open your Android Phone camera and Hold in front of this device. I understand that it is a much bigger Problem and Need a lot of ML and OCR algorithms to solve my problem.
Reduced Problem Statement: But by Myself, I have figured it out a simple way of solving the above problem in a little tricky way (If Possible, Please help). What I want to do is to process this Image using OpenCV Java Library and Somehow read those Switched On or Switched Of LED lights and somehow stored their brightness value values into an Array and It must be sequential From Top to Bottom along with some brightness data so that I can figure out later that this LED was on or off. So using this simple logic I can solve this problem for now without going through all those ML DL and OCR algorithm because I have to deliver it within 3 Days.
So now the issue is that I am new to OpenCV. Till now I have figured it out that using below
public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {}
I can Get a Frame from Live Android Camera. and then I can do some processing over this frame/image. Also, I have tried many ad-hoc approaches Which I found over SOF, But nowhere I could figure out How can I solve my problem.
Is there any way to detect all those Switched On/Off LEDs brightness data and can be stored in an array in sequence from Top to bottom?
Any help or approach will be appreciated. Thanks in advance.
I'm building an Android app that has to identify, in realtime, a mark/pattern which will be on the four corners of a visiting card. I'm using a preview stream of the rear camera of the phone as input.
I want to overlay a small circle on the screen where the mark is present. This is similar to how reference dots will be shown on screen by a QR reader at the corner points of the QR code preview.
I'm aware about how to get the frames from camera using native Android SDK, but I have no clue about the processing which needs to be done and optimization for real time detection. I tried messing around with OpenCV and there seems to be a bit of lag in its preview frames.
So I'm trying to write a native algorithm usint raw pixel values from the frame. Is this advisable? The mark/pattern will always be the same in my case. Please guide me with the algorithm to use to find the pattern.
The below image shows my pattern along with some details (ratios) about the same (same as the one used in QR, but I'm having it at 4 corners instead of 3)
I think one approach is to find black and white pixels in the ratio mentioned below to detect the mark and find coordinates of its center, but I have no idea how to code it in Android. I looking forward for an optimized approach for real-time recognition and display.
Any help is much appreciated! Thanks
Detecting patterns on four corners of a visiting card:
Assuming background is white, you can simply try this method.
Needs to be done and optimization for real time detection:
Yes, you need OpenCV
Here is an example of real-time marker detection on Google Glass using OpenCV
In this example, image showing in tablet has delay (blutooth), Google Glass preview is much faster than that of tablet. But, still have lag.
What I am doing is attempting to using EMGU to perform and AbsDiff of two images.
Given the following conditions:
User starts their webcam and with the webcam stationary takes a picture.
User moves into the frame and takes another picture (WebCam has NOT moved).
AbsDiff works well but what I'm finding is that the ISO adjustments and White Balance adjustments made by certain cameras (even on Android and iPhone) are uncontrollable to a degree.
Therefore instead of fighting a losing battle I'd like to attempt some image post processing to see if I can equalize the two.
I found the following thread but it's not helping me much: How do I equalize contrast & brightness of images using opencv?
Can anyone offer specific details of what functions/methods/approach to take using EMGUCV?
I've tried using things like _EqualizeHist(). This yields very poor results.
Instead of equalizing the histograms for each image individually, I'd like to compare the brightness/contrast values and come up with an average that gets applied to both.
I'm not looking for someone to do the work for me (although code example would CERTAINLY be appreciated). I'm looking for either exact guidance or some way to point the ship in the right direction.
Thanks for your time.
I'm trying to develop an app for Android, and I would need to get uncompressed pictures with a resolution as high as possible from the camera. I tried takePicture's rawCallback and postviewCallback, but they are not working.
Right now I'm trying with OpenCV (version 2.4) using VideoCapture, but I'm stuck in the default 960x720, which is poor for what I need; and my phone, a Samsung Galaxy S3, is able to provide, theoretically, up to 8Mpx (3,264×2,448 for pictures, and 1,920×1,080 for video, according to Wikipedia). VideoCapture.set(Highgui.CV_CAP_PROP_FRAME_WIDTH/HEIGHT, some number) makes the camera return a black image as far as I've found.
Is there any way to obtain a higher resolution, either through OpenCV or with the Android API, without compressing?
I'm really sorry if this has been asked before; I have been looking for days and I have found nothing.
Thank you for your time!
EDIT: Although it is not exactly what I was asking, I found that there is a way to do something very similar: if you set an OnPreviewCallback for the Camera, using setPreviewCallback, you do get the raw picture data from the camera (at least in the S3 I'm working with). I leave it here in case somebody finds it useful in the future.
EDIT: A partial solution is explained in an answer below. To sum up,
vc.set(Highgui.CV_CAP_PROP_FRAME_WIDTH, desiredFrameWidth);
vc.set(Highgui.CV_CAP_PROP_FRAME_HEIGHT, desiredFrameHeight);
works under some conditions; please see below for further detail.
You have to get supported camera preview resoultions by calling getSupportedPreviewSizes.
After this you can set any resolution with method setPreviewSize. And don't forget to setParameters in the end. Actally many OpenCV Android examples contain this information (look at sample3).
In case anybody ever finds this useful, I found a (partial) solution: If your VideoCapture variable is called vc, this should work:
vc.set(Highgui.CV_CAP_PROP_FRAME_WIDTH, desiredFrameWidth);
vc.set(Highgui.CV_CAP_PROP_FRAME_HEIGHT, desiredFrameHeight);
Mind that the combination of width and height must be one of the supported picture formats for your camera, otherwise it will just get a black image. You can get those through Camera.Parameters.getSupportedPictureSizes().
However, setting a high resolution appears to exceed the YUV conversion buffer's capacity, so I'm still struggling with that. I'm going to make a new separate question for that, to keep everything clearer: new thread
setPreviewSize does not set picture resolution. setPictureSize does.
I was hoping someone could tell me why it is my Tesseract has trouble recognizing some images with digits, and if there is something i can do about it.
Everything is working according to test, and since it is only digits i need, i thought i could manage with the english pattern untill i had to start with the 7segmented display aswell.
Though i am having a lot of trouble with the appended images, i'd like to know if i should start working on my own recognition algorithms or if I could do my own datasets for Tesseract and then it would work, does anyone know where the limitation lies with Tesseract?
things tried:
tried to set psm to one_line, one_word, one_char(and chop up the picture).
With one_line and one_word there was no significant change.
with one_char it did recognize a bit better, but sometimes, due to big spacing it attached an extra number to it, which then screwed it up, if you look at the attached image then it resulted in 04.
I have also tried to do the binarization myself, this resulted in poorer recognition and was very rescource consuming.
I have tried to invert the pictures, this makes no difference at all for tesseract.
I have attached the pictures i'd need, among others, to be processed.
Explaination about the images:
is a image that the tesseract has no trouble recognizing, though it has been made in word for the conveniences of building an app around a working image.
is real life image matching the image_seven. But it cannot recognize this.
is another image i'd like it to recognize, and yes i know it cant be skrewed, and i did unskrew(think skrew is the term here=="straighting") it when testing.
I know of some options that might help you:
Add extra space between image border and text. Tesseract would work awful if text in the image is positioned at the edge.
Duplicate your image. For example, if you're performing OCR on a word 'foobar', clone the image and send 'foobar foobar foobar foobar foobar' to tesseract, results would be better.
Google for font training and image binarization for tesseract.
Keep in mind, that built-in camera in mobile devices mostly produce low quality images (blured, noised, skewed etc.) OCR itself is a resource comsuming process and if you add a worthy image preprocessing to that, low-end and mid mobile devices (which are likely to have android) could face unexpectedly slow performance or even lack of resources. That's OK for free/study projects, but if you're planning a commercial app - consider using a better SDK.
Have a look at this question for details: OCR for android
Tesseract doesn't do segmentation for you. Tesseract will do a thresholding of the image prior to the actual tesseract algo. After thresholding, there may be some edges, artefacts that remain in the image.
Try to manually modify your images to black and white colors and see what tesseract returns as output.
Try to threshold (automatically) your images and see what tesseract returns as output. The output of thresholding may be too bad causing tesseract to give bad output.
Your 4th image will probably fail due to thresholding (you have 3 colors: black background, greyish background and white letters) and the threshold may be between (black background, greyish background).
Generally Tesseract wants nice black and white images. Preprocessing of your images may be needed for better results.
For your first image (with the result "04"), try to see the box result (char + coordinates of box that contains the recognized char). The "0" may be a small artefact - like a 4 by 4 blob of pixels.
You may give javaocr a try ( http://sourceforge.net/projects/javaocr/ , yes, I'm developer )
Therre is no offocial release though, and you will have to look for sources ( good news: there is working android sample including sampler, offline trainer and recognizer application )
If you only one font, you can get pretty good results with it (I reached up to recognition rates 99.96 on digits of the same font)
PS: it is pure java and uses invariant moments to perform matching ( so no problems with scaling and rotation ) . There is also pretty effective binarisation.
See it in action:
https://play.google.com/store/apps/details?id=de.pribluda.android.ocrcall&feature=search_result#?t=W251bGwsMSwxLDEsImRlLnByaWJsdWRhLmFuZHJvaWQub2NyY2FsbCJd