I'm writing an Android app to extract a Sudoku puzzle from a picture. For each cell in the 9x9 Sudoku grid, I need to determine whether it contains one of the digits 1 through 9 or is blank. I start off with a Sudoku like this:
I pre-process the Sudoku using OpenCV to extract black-and-white images of the individual digits and then put them through Tesseract. There are a couple of limitations to Tesseract, though:
Tesseract is large, contains lots of functionality I don't need (I.e. Full text recognition), and requires English-language training data in order to function, which I think has to go onto the device's SD card. At least I can tell it to only look for digits using tesseract.setVariable("tessedit_char_whitelist", "123456789");
Tesseract often misinterprets a single digits as a string of digits, often containing newlines. It also sometimes just plain gets it wrong. Here are a few examples from the above Sudoku:
I have three questions:
Is there any way I can overcome the limitations of Tesseract?
If not, what is a useful, accurate method to detect individual digits (not k-nearest neighbours) that would be feasible to implement on Android - this could be a free library or a DIY solution.
How can I improve the pre-processing to target that method? One possibility I've considered is using a thinning algorithm, as suggested by this post, but I'm not going to bother implementing it unless it will make a difference.
I took a class with one of the computer vision superstars who was/is at the top of the digit recognition algorithm rankings. He was really adamant that the best way to do digit recognition is...
1. Get some hand-labeled training data.
2. Run Histogram of Oriented Gradients (HOG) on the training data, and produce one
long, concatenated feature vector per image
3. Feed each image's HOG features and its label into an SVM
4. For test data (digits on a sudoku puzzle), run HOG on the digits, then ask
the SVM classify the HOG features from the sudoku puzzle
OpenCV has a HOGDescriptor object, which computes HOG features. Look at this paper for advice on how to tune your HOG feature parameters. Any SVM library should do the job...the CvSVM stuff that comes with OpenCV should be fine.
For training data, I recommend using the MNIST handwritten digit database, which has thousands of pictures of digits with ground-truth data.
A slightly harder problem is to draw a bounding box around digits that appear in nature. Fortunately, it looks like you've already found a strategy for doing bounding boxes. :)
Easiest thing is to use Normalized Central Moments for digit recognition.
If you have one font (or very similar fonts it works good).
See this solution: https://github.com/grzesiu/Sudoku-GUI
In core there are things responsible for digit recognition, extraction, moments training.
First time application is run operator must provide information what number is seen. Then moments of image (extracted square roi) are assigned to number (operator input). Application base on comparing moments.
Here first youtube movie shows how application works: http://synergia.pwr.wroc.pl/2012/06/22/irb-komunikacja-pc/
Related
as part of my project, I need to plot 2D and 3D functions in android using android studio. I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Thanks in advance.
I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Since you already understand 2D and want to advance to 3D there's a simple and non-optimal method:
Decide on how much z depth you desire:
EG: Currently your limits for x and y in your drawing functions are 1920 and 1080 (or even 4096x4096), if you want to save memory and have things a bit low resolution use a size of 1920x1080x1000 - that's going to use 1000x more memory and has the potential to increase the drawing time of some calls by 1000 times - but that's the cost of 3D.
A more practical limit is matrices of 8192,8192,16384 but be aware that video games at that resolution need 4-6GB graphic cards to work half decently (more being a bit better) - so you'll be chewing up some main memory starting at that size.
It's easy enough to implement a smaller drawing space and simply increase your allocation and limit variables later, not only does that test that future increases will go smoothly but it allows everything to run faster while you're ironing the bugs out.
Add a 3rd dimension to the functions:
EG: Instead of a function that is simply line_draw(x,y) change it to line_draw(x,y,z), use the global variable z_limit (or whatever you decide to name it) to test that z doesn't exceed the limit.
You'll need to decide if objects at the maximum distance are a dot or not visible. While testing having all z's past the limit changed to the limit value (thus making them a visible dot) is useful. For the finished product once it goes past the limit that you are implementing it's best that it isn't visible.
Start by allocating the drawing buffer and implementing a single function first, there's no point (and possibly great frustration) changing everything and hoping it's just going to work - it should but if it doesn't you'll have a lot on your plate if there's a common fault in every function.
Once you have this 3D buffer filled with an image (start with just a few 3D lines, such as a full screen sized "X" and "+") you draw to your 2D screen X,Y by starting at the largest value of Z first (EG: z=1000). Once you finish that layer decrement z and continue, drawing each layer until you reach zero, the objects closest to you.
That's the simplest (and slowest) way to make certain that closest objects obscure the furthest objects.
Now does it look OK? You don't want distant objects the same size (or larger) than the closest objects, you want to make certain that you scale them.
The reason to choose numbers such as 8192 is because after writing your functions in C (or whichever language you choose) you'll want to optimize them with several versions each, written in assembly language, optimized for specific CPUs and GPU architectures. Without specifically optimized versions everything will be extremely slow.
I understand that you don't want to use a library but looking at several should give you an idea of the work involved and how you might implement your own. No need to copy, improve instead.
There are similar questions and answers that might fill in the blanks:
Reddit - "I want to create a 3D engine from scratch. Where do I start?"
Davrous - "Tutorial series: learning how to write a 3D soft engine from scratch in C#, TypeScript or JavaScript"
GameDev.StackExchange - "How to write my own 3-D graphics library for Windows? [closed]"
I want to programmatically read numbers on a page using mobile's camera instead from image, just like barcode scanning.
I know that we can read or scan barcode but is there any way to read numbers using same strategy. Another thing is i also know that we can read text or numbers from image using OCR but i don't want to take the photo/image and than process it but only scan and get ?
You mean to say that you don't want to click a picture and process it, instead you want to scan text by just hovering the camera, am I right?
It could be accomplished using a technology called Optical Character Recognition. (You mentioned something about OSR, I think this is want you meant). What it does is, it finds patterns in images to detect text in printed documents.
As far as I know, existing tools processes still images, so you will have to work around it to make it scan any moving images.
Character recognition demands significant amount of resources, so instead of processing moving pictures I would recommend you to write a program that takes images less frequently from a hovering camera and process it. Once text, or numbers in your case, are detected you could use a less efficient pattern matching algorithm to track the motion of the numbers.
Till date, the most powerful and popular software is Tesseract-OCR. You will find it at GitHub. You can use this to develop your mobile application.
Last week i have chosen my major project. It is a vision based system to monitor cyclists in time trial events passing certain points on the course. It should detect the bright yellow race number on a cyclist's back and extract the number from it, and besides record the time.
I done some research about it and i decided to use Tesseract Android Tools by Robert Theis called Tess Two. To speed up the process of recognizing the text i want to use a fact that the number is mend to be extracted from bright (yellow) rectangle on the cyclist back and to focus the actual OCR only on it. I have not found any piece of code or any ideas how to detect the geometric figures with specific color. Thank you for any help. And sorry if i made any mistakes I am pretty new on this website.
Where are the images coming from? I ask because I was asked to provide some technical help for the design of a similar application (we were working with footballer's shirts) and I can tell you that you'll have a few problems:
Use a high quality video feed rather than rely on a couple of digital camera images.
The number will almost certainly be 'curved' or distorted because of the movement of the rider and being able to use a series of images will sometimes allow you to work out what number it really is based on a series of 'false reads'
Train for the font you're using but also apply as much logic as you can (if the numbers are always two digits and never start with '9', use this information to help you get the right number
If you have the luxury of being able to position the camera (we didn't!), I would have thought your ideal spot would be above the rider and looking slightly forward so you can capture their back with the minimum of distortions.
We found that merging several still-frames from the video into one image gave us the best overall image of the number - however, the technology that was used for this was developed by a third-party and they do not want to release it, I'm afraid :(
Good luck!
We are working on an android application that involves free hand character recognition.
The application requires to student to draw the free hand image of an alphabet on the android screen,and the application process the image drawn and returns the accuracy of the alphabet written.
We are considering two options
a. Using tesseract.
b. Using our own algorithm on which we are still working
Problems
a. Tesseract is not at all helping in recognizing free hand characters.Any pointers on how to train tesseract for the same will be highly appreciated.
b. None of our algorithm are working to our expectation.
Tesseract is actually the wrong approach for recognizing characters written to the screen because it needlessly discards some very valuable info: how the image was originally drawn. How it breaks down into strokes, the order / direction of each stroke, etc - Tesseract has to spend a tremendous amount of time trying to figure out the underlying structure of each character when your software already knows it.
What you want is an actual "handwriting recognition" library, not just an OCR library like Tesseract; you specifically want an "online" handwriting recognition library, "online" meaning that you can capture the actual sequence of points that the user drew to the screen. There are a number of open-source libraries available for this, for example Unipen and LipiTk.
You might want to check out the built-in support for gesture recognition and training tools:
http://developer.android.com/reference/android/gesture/package-summary.html
They offer a pluggable overlay to detect and recognize gestures using a "dictionary" of predefined shapes. Its not automatic like OCR and requires training, but at this point (its been available since Android 1.6) there might be free or commercial handwriting gesture dictionaries available.
I am wondering how to do as I said in the title:
I want to have some objects counted reading an image from the camera of the portable device (such as iPhone or Android phones)
I need only two specific functions.
Recognize and count the amount of objects
Recognize the color of the object (so I can count how many of each color I have).
A very simple example.
I have a stack of pieces of LEGO, all of them the same dimensions. I know they always will be aligned horizontaly, sometimes they are not verically aligned. I need to count how many of each colour I have.
I know that I have pieces with the same dimensions, only the colour change.
I have i think only 10 colour avaible.
I can elaborate the image (such as blur and other stuff) but I don't know how to read how many pieces I have.
Can you tell me some Ideas how to do (and what kind of libraries to use both for iOS and Android -android first-) or maybe some publication (free pdf or books or even publicated books even if they're not free) teaching how to read data from images.
The program should be act as the same:
I start the program, when the program recognize it is looking at (using the integrated cam) some specific objects, ittake a picture and elaborate it, telling how much of each color I have.
Thanks in advance, ANY kind of help will be
I'll admit it is 10 years since I last dabbled with computer vision, but back then I used the OpenCV libraries, and these still seem to be going strong, and support on Android:
http://opencv.willowgarage.com/wiki/Android
and iOS:
http://www.eosgarden.com/en/opensource/opencv-ios/overview/