I have successfully integrated tesseract into my android app and it reads whatever the image that I capture but with very less accuracy. But most of the time I do not get the correct text after capturing because some text around the region of interest is also getting captured.
All I want to read is all text from a rectangular area, accurately, without capturing the edges of the rectangle. I have done some research and posted on stackoverflow about this two times, but still did not get a happy result!
Following are the 2 posts that I made:
https://stackoverflow.com/questions/16663504/extract-text-from-a-captured-image?noredirect=1#comment23973954_16663504
Extracting information from captured image in android
I am not sure whether to go ahead with tesseract or use openCV
Including the many links and answers from others, I think it's good to take a step back and note that there are actually two fundamental steps to optical character recognition (OCR):
Text Detection: This is the title and focus of your question, and it is concerned with localizing regions in an image that contain text.
Text Recognition: This is where the actual recognition happens, where the localized image regions from detection get segmented character-by-character and classified. This is also where tools like Tesseract come into play.
Now, there are also two general settings in which OCR is applied:
Controlled: These are images taken from a scanner or similar in-nature where the target is a document and things like perspective, scale, font, orientation, background consistency, etc are pretty docile.
Uncontrolled/Scene: These are the more natural and in-the-wild photos, e.g. those taken from a camera, where you are trying to recognize a street sign, shop name, etc.
Tesseract as-is is most applicable to the "controlled" setting. And in general, but for scene OCR especially, "re-training" Tesseract will not directly improve detection, but may improve recognition.
If you are looking to improve scene text detection, see this work; and if you are looking at improving scene text recognition, see this work. Since you asked about detection, the detection reference uses maximally stable extremal regions (MSER), which has a plethora of implementation resources, e.g. see here.
There's also a text detection project here specifically for Android too:
https://github.com/dreamdragon/text-detection
As many have noted, keep in mind that recognition is still an open research challenge.
The solution to improving the OCR output is to
either use more training data to train it better
filter it's input using some Linear Filter (grayscaling, high-contrasting, blurring)
In the chat we posted a number of links describing filtering techniques used in OCRing, but sample code wasn't posted.
Some of the links posted were
Improving input for OCR
How to train Tesseract
Text enhancement using asymmetric filters <-- this paper is easily found on google, and should be read fully as it quite clearly illustrates and demonstrates necessary steps before OCR-processing the image.
OCR Classification
Related
I have been working on an application that involves font recognition based on a users free hand drawing characters in Android Canvas.
In this application the user is asked to enter some predefined characters in a predefined order (A,a,B,c). Based on this, is there any way to show the very similar font which matches the user's hand writing.
I have researched on this topic found some papers & articles but most of them are recognizing font from a captured image. In that case they are having a lot of problems by segmenting paragraphs, individual letters and so on. But in my scenario I know what letter the user is drawing.
I have some knowledge in OpenCV and Machine Learning. Need help on how to proceed with this problem.
It is not exactly clear to me what you want to accomplish with your application but I assume that you are trying to output a font from a database of fonts that matches a users handwriting the most.
In Machine Learning this would be a classification problem. The number of classes will by equal to the number of different fonts in your database.
You could solve this with the help of a Convolutional neural network which are widely used for image and video recognition related tasks. If you've never implemented a CNN before I would suggest that you look up this resources to learn about Torch which is a easy-to-start-with toolkit to implement CNN's. (Of course there are more Frameworks such as: Tensor Flow, Caffe, Lasagne, ...)
Torch Homepage
Deep learning with Torch: 60 minutes blitz
Torch Cheatsheet
The main obstacle you will face is that Neural Networks need thousands of images (>100.000) to properly train them and to achieve satisfying results. Furthermore you do not only need the images but also a correct label for each image. Will say, you would need a training image such as a handwritten character and the corresponding font it matches the most out of your database as its label.
I would suggest that you read about so called transfer learning which can give you an initial boost as you do not need to set up a CNN model completely by yourself. In addition people have pre-trained such a model for a related task so that you safe extra time as you would not need to train it for many hours on a GPU. (see CUDA)
A great resource to start with is the paper: How transferable are features in deep neural networks?, which could be helpful for the stated reasons.
To get tons of training and testing data you can look up the following open datasets that provide all types of characters that can be helpful for your task:
Artificial Characters Data Set
UJI Pen Characters Data Set
The Chars74K dataset
Hand written - Datasets
A New Benchmark Dataset for Handwritten Character Recognition
For access to a lot of fonts and maybe even the possibility to create further datasets on your own you can have a look at Google Fonts.
You might find this article very interesting : https://erikbern.com/2016/01/21/analyzing-50k-fonts-using-deep-neural-networks/
Seems like a pretty straightforward deep learning supervised learning problem.
Generate a ton of randomly deformed samples for letters of each target font type, and train a convnet on that set?
The ideal would be to have a huge set of labeled, handwriting to font data, but that feels unlikely.
You could also use the generated, progressive to font code to take a bunch of handwritten samples, and transform them to look more like the font of your choice, as a dataset.
This is good place to start : https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
Digit letter recognition with convnets.
This is quite a bit of work though if you haven't worked with that stuff before.
I would suggest using OCR library tesseract. Very well developed and mature. It also has support for training with other languages which you can use to train over a set of font.
Approach
Training:-
Take all 26(per alphabet) images for n fonts. Train tessaract over 26 A's, then 26 B's and soon.
Testing:-
Take a sentence and separate all characters.
For each character, find certainty score(supported in library) from Tesseract. Note, for character 'a, use the trained model on all 'a''s from different fonts.
For all characters, find best font using some metric (average, median, etc). For example: You can sum certainty score each font received for all characters and use the font which got max result.
I would like to make an Android Application that captures an image and searches it for coins and paper notes and then determines the value of the money in the image.
Additionally, the output of the system will be such that it can be understood by a blind person.
What functions and techniques in openCV would suit these tasks?
What limitations and development hurdles can I expect?
Assuming you already know how to program android apps,you need to do the following:-
Download the OpenCV SDK and set it up with the IDE.
Recognising shapes will be a huge part of your project, see the contour detection example that is present with the SDK examples. Your primary goal will be detecting a circle. Later you need to adapt your algorithm according to the currency. This will be of particular interest to you.
Learn the different image processing techniques like thresholding for better accurate results. Understanding what a Mat object is and how it can be manipulated is important.
Finally improve the accuracy of your algorithm, as sometimes lighting conditions make the difference between a good review and a dissatisfied user.
Right now I'm making an OCR app using Android OCR rmtheis(https://github.com/rmtheis/android-ocr) as a scaffold.
However, I'm far from satisfied.
My main issues:
It returns only a fraction of the containing characters correct when scanning ultra-tiny texts with a few different fonts with
irrelavant clutter such as images and barcodes included (from a
product manual in this case).
The autofocus is looping in a way that if you press the camerabutton when the image is as it's sharpest shape, you might be
0.02 seconds too late and scan the blurry image. So I prefer an autocapture when text is included in the selected view.
Is there any high quality OCR solutions for Android that can capture tricky and small text and getting it all correct almost every time?
Just to clarifie: I use tesseract(tess-two) already through the android-ocr project
On another note: It needs to return close to 100% correct results almost everytime. No language support required, Im only going to use it to catch codes such as 842EAB842EAB842EAB84?2EAB842EAB842EAB with irrelevant english text besides it. Therefore, I need no language support at all.
Edit: This seems to be what I'm looking for:
http://www.abbyy.com/mobileocr/features/
Is there any solution with even higher quality output than Abbyy?
I've also been researching for high quality and free OCR solutions for Android, and finally I've chosen tess-two fork in one of my projects because other options had more disadvantages than advantages. As #realkarim says, it's not 100% accurate but the results are optimal.
Link for an OCR example using tess-two
Try it and comment us your experience ;)
Well, a year ago I was planning to create an Android application in which I needed an OCR, first of all and I'm sorry to say that but you won't find a free "high quality OCR solutions for Android" :/ I used tess-two which is the best free OCR available for android but still it wasn't 100% accurate, probably if I had more time I could add some image processing to enhance the output.
link for the OCR:
https://github.com/rmtheis/tess-two
an example of a running application using it:
http://www.youtube.com/watch?v=Ho5DyK1hKdw
my application:
http://www.youtube.com/watch?v=2PRQo7EWYd8
try it, and add some image processing to the image before using the OCR if you can :)
I am developing a currency identification system for blind people. I need to check if the full currency note has been captured so I used square detection for that. It is currently working when the background is pure black or white, but not when the background is more advanced. What techniques can I use to solve this problem?
I am using OpenCV as my image processing framework. Can I use convolution? How?
need enhancement for square detection.
Result image of my code:
I am not sure whether rectangle detection is the best solution for what you want to do.
It will only work efficiently if the picture is taken right up from the money, and as you say will not be robust to cluttered backgrounds.
Is there a precise reason for not going to a direct pattern recognition system ?
I'd start with a picture of my currency and try to perform object recognition with it.
You will find loads of tutorials that can help you on the web, like for bottles or for bowls.
You might have a lot of possibilities, due to the number of currencies but you know it to be a finite number at least.
I want to develop app which will recognize object(like monument or something) present in front of camera using OpenCv and then it will show information about it.
So the question is how to recognize object(like monument or something) shape or compare to images with OpenCV?
And what is the best method for doing this?
It would be good if there was some kind of samples or tutorials for object detection and comparison.
Thank you.
The best method for what you ask is using local features detectors like OpenCV's SIFT, SURF and ORB, for example.
You need at least one picture from the object you want to detect. Afterwards, those algorithms can compare that image with other images to see if they are similar enough.
Here is the Documentation for the algorithms.
ORB and others:
http://docs.opencv.org/modules/features2d/doc/feature_detection_and_description.html
SURF and SIFT ('nonfree'):
http://docs.opencv.org/modules/nonfree/doc/feature_detection.html
The way these algorithms work for that task is by selecting interesting points for each image, and compare them to see if they match. If several matches are found, it is most likely the images have the same object.
Tutorials (from Feature Detection and below):
http://docs.opencv.org/doc/tutorials/features2d/table_of_content_features2d/table_of_content_features2d.html
You can also find C++ samples related to this topic here (samples are also within OpenCV download package):
eg. "matching_to_many_images.cpp"
"video_homography.cpp"
http://code.opencv.org/projects/opencv/repository/revisions/master/show/samples/cpp
And Android Java samples here (unrelated but also helpful):
http://code.opencv.org/projects/opencv/repository/revisions/master/show/samples/android
Or Python samples which are actually the more updated ones for this topic (at the time this post was written):
http://code.opencv.org/projects/opencv/repository/revisions/master/show/samples/python2
As a final note, like #BDFun said in the comment, this is not trivial to do.
More - if you want an overview of OpenCV Feature detection and description, check this post.