I'm trying to detect the numbers on the prepaid cards like Vodafone. I'm looking for a fast way and easy way to do that. So I plan to use the Fast CV because it is optimized for ARM with neon support, and also would like to do a simple NCC with the characters to predefined images, and not detect the images at real time, just would like to take to capture an image then do the analysis. I would like to know the process to detect just the characters from the card. Is it just simple threshold, the find contours ?
Related
I want to programmatically read numbers on a page using mobile's camera instead from image, just like barcode scanning.
I know that we can read or scan barcode but is there any way to read numbers using same strategy. Another thing is i also know that we can read text or numbers from image using OCR but i don't want to take the photo/image and than process it but only scan and get ?
You mean to say that you don't want to click a picture and process it, instead you want to scan text by just hovering the camera, am I right?
It could be accomplished using a technology called Optical Character Recognition. (You mentioned something about OSR, I think this is want you meant). What it does is, it finds patterns in images to detect text in printed documents.
As far as I know, existing tools processes still images, so you will have to work around it to make it scan any moving images.
Character recognition demands significant amount of resources, so instead of processing moving pictures I would recommend you to write a program that takes images less frequently from a hovering camera and process it. Once text, or numbers in your case, are detected you could use a less efficient pattern matching algorithm to track the motion of the numbers.
Till date, the most powerful and popular software is Tesseract-OCR. You will find it at GitHub. You can use this to develop your mobile application.
I'm trying to detect certain objects with the camera of an android device. I've tried the OpenCV people detection sample using HOG descriptor but that seems to be pretty slow. Then I tried using Haar Cascades which gave a better average frame rate of 10 fps and better accuracy as well.
After reading a bit more, another viable option seems to be the ORB feature detector. What I could understand is that I need to save the feature vectors of the images against which I want to compare the current frame. So,
What would be the best way to store these vectors on an android device
How big a database of images would I need for decent accuracy (suppose I use it for people detection) & what could be the overhead of comparing against a large database
Also, what limitations does ORB present regarding color differences of objects and distance from camera
I'm writing an Android app to extract a Sudoku puzzle from a picture. For each cell in the 9x9 Sudoku grid, I need to determine whether it contains one of the digits 1 through 9 or is blank. I start off with a Sudoku like this:
I pre-process the Sudoku using OpenCV to extract black-and-white images of the individual digits and then put them through Tesseract. There are a couple of limitations to Tesseract, though:
Tesseract is large, contains lots of functionality I don't need (I.e. Full text recognition), and requires English-language training data in order to function, which I think has to go onto the device's SD card. At least I can tell it to only look for digits using tesseract.setVariable("tessedit_char_whitelist", "123456789");
Tesseract often misinterprets a single digits as a string of digits, often containing newlines. It also sometimes just plain gets it wrong. Here are a few examples from the above Sudoku:
I have three questions:
Is there any way I can overcome the limitations of Tesseract?
If not, what is a useful, accurate method to detect individual digits (not k-nearest neighbours) that would be feasible to implement on Android - this could be a free library or a DIY solution.
How can I improve the pre-processing to target that method? One possibility I've considered is using a thinning algorithm, as suggested by this post, but I'm not going to bother implementing it unless it will make a difference.
I took a class with one of the computer vision superstars who was/is at the top of the digit recognition algorithm rankings. He was really adamant that the best way to do digit recognition is...
1. Get some hand-labeled training data.
2. Run Histogram of Oriented Gradients (HOG) on the training data, and produce one
long, concatenated feature vector per image
3. Feed each image's HOG features and its label into an SVM
4. For test data (digits on a sudoku puzzle), run HOG on the digits, then ask
the SVM classify the HOG features from the sudoku puzzle
OpenCV has a HOGDescriptor object, which computes HOG features. Look at this paper for advice on how to tune your HOG feature parameters. Any SVM library should do the job...the CvSVM stuff that comes with OpenCV should be fine.
For training data, I recommend using the MNIST handwritten digit database, which has thousands of pictures of digits with ground-truth data.
A slightly harder problem is to draw a bounding box around digits that appear in nature. Fortunately, it looks like you've already found a strategy for doing bounding boxes. :)
Easiest thing is to use Normalized Central Moments for digit recognition.
If you have one font (or very similar fonts it works good).
See this solution: https://github.com/grzesiu/Sudoku-GUI
In core there are things responsible for digit recognition, extraction, moments training.
First time application is run operator must provide information what number is seen. Then moments of image (extracted square roi) are assigned to number (operator input). Application base on comparing moments.
Here first youtube movie shows how application works: http://synergia.pwr.wroc.pl/2012/06/22/irb-komunikacja-pc/
Last week i have chosen my major project. It is a vision based system to monitor cyclists in time trial events passing certain points on the course. It should detect the bright yellow race number on a cyclist's back and extract the number from it, and besides record the time.
I done some research about it and i decided to use Tesseract Android Tools by Robert Theis called Tess Two. To speed up the process of recognizing the text i want to use a fact that the number is mend to be extracted from bright (yellow) rectangle on the cyclist back and to focus the actual OCR only on it. I have not found any piece of code or any ideas how to detect the geometric figures with specific color. Thank you for any help. And sorry if i made any mistakes I am pretty new on this website.
Where are the images coming from? I ask because I was asked to provide some technical help for the design of a similar application (we were working with footballer's shirts) and I can tell you that you'll have a few problems:
Use a high quality video feed rather than rely on a couple of digital camera images.
The number will almost certainly be 'curved' or distorted because of the movement of the rider and being able to use a series of images will sometimes allow you to work out what number it really is based on a series of 'false reads'
Train for the font you're using but also apply as much logic as you can (if the numbers are always two digits and never start with '9', use this information to help you get the right number
If you have the luxury of being able to position the camera (we didn't!), I would have thought your ideal spot would be above the rider and looking slightly forward so you can capture their back with the minimum of distortions.
We found that merging several still-frames from the video into one image gave us the best overall image of the number - however, the technology that was used for this was developed by a third-party and they do not want to release it, I'm afraid :(
Good luck!
I'm building an application for Android devices that requires it to recognize, by accelerometer data, the difference between walking noise and double tapping it. I'm trying to solve this problem using Neural Networks.
At the start it went pretty well, teaching it to recognize the taps from noise such as standing up/ sitting down and walking around at a slower pace. But when it came to normal walking it never seemed to learn even though I fed it with a large proportion of noise data.
My question: Are there any serious flaws in my approach? Is the problem based on lack of data?
The network
I've choosen a 25 input 1 output multi-layer perceptron, which I am training with backpropagation. The input is the changes in acceleration every 20ms and output ranges from -1 (for no-tap) to 1 (for tap). I've tried pretty much every constallation of hidden inputs there are, but had most luck with 3 - 10.
I'm using Neuroph's easyNeurons for the training and exporting to Java.
The data
My total training data is about 50 pieces double taps and about 3k noise. But I've also tried to train it with proportional amounts of noise to double taps.
The data looks like this (ranges from +10 to -10):
Sitting double taps:
Fast walking:
So to reiterate my questions: Are there any serious flaws in my approach here? Do I need more data for it to recognize the difference between walking and double tapping? Any other tips?
Update
Ok so after much adjusting we've boiled the essential problem down to being able to recognize double taps while taking a brisk walk. Sitting and regular (in-house) walking we can solve pretty good.
Brisk walk
So this is some test data of me first walking then stopping, standing still, then walking and doing 5 double taps while I'm walking.
If anyone is interested in the raw data, I linked it for the latest (brisk walk) data here
Do you insist on using a neural network? If not, here is an idea:
Take a window of 0.5 seconds and consider the area under the curve (or since your signal is discrete, the sum of the absolute values of each sensor reading-- the red area in the attached image). You will probably find that that sum is high when the user is walking and much much lower when they are sitting and/or tapping. You can set a threshold above which you consider a given window to be taken while the user is walking. Alternatively, since you have labelled data, you can train any binary classifier to differentiate between walking and not walking.
You can probably improve your system by considering other features of the signal, such as how jagged the line is. If the phone is sitting on a table, the line will be almost flat. If the user is typing, the line will be kind of flat, and you will see a spike every now and then. If they are walking, you will see something like a sine wave.
Have you considered that the "fast walking" and "fast walking + double tapping" signals might be too similar to differentiate using only accelerometer data? It may simply not be possible to achieve accuracy above a certain amount.
Otherwise, neural networks are probably a good choice for your data, and it still may be possible to get better performance out of them.
This very-useful paper (http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) recommends that you whiten your dataset so that it has a mean of zero and unit covariance.
Also, since your problem is a classification problem, you should make sure that you are training your network using a cross-entropy criteria (http://arxiv.org/pdf/1103.0398v1.pdf ) rather than RMSE. (I have no idea whether Neuroph supports cross-entropy or not.)
Another relatively simple thing you could try, as other posters suggested, is transforming your data. Using an FFT or DCT to transform your data to the frequency domain is relatively standard for time-series classification.
You could also try training networks on different sized windows and averaging the results.
If you want to try some more difficult NN architectures, you could look at the Time-Delay-Neural-Network (just google this for the paper), which takes multiple windows into account in its structure. It should be relatively straightforward to use one of the Torch libraries (http://www.torch.ch/) to implement this, but it might be hard to export the network to an Android environment.
Finally, another method of getting better classification performance in time-series data is to consider the relationships between adjacent labels. Conditional Neural Fields (http://code.google.com/p/cnf/ - note:I have never used this code) do this by integrating neural networks into conditional random fields, and, depending on the patterns of behavior in your actual data, may do a better job.
What probably would work is to filter the data using a Fourier transform first. Walking has a sinus like amplitude, your double taps would stand-out in the transform-result as a different frequency. I guess a neural network can than determine if the data contains your double tabs because it has the extra frequency (the double tabs frequency). Some questions remain:
How long the sample of data needs to be?
Can your phone do all the work it needs to do, does it have enough processing power?
You might even want to consider using the GPU for this.
Another option is to use the Fourier output and some good old Fuzzy Logic.
This sound like fun...