I want to programmatically read numbers on a page using mobile's camera instead from image, just like barcode scanning.
I know that we can read or scan barcode but is there any way to read numbers using same strategy. Another thing is i also know that we can read text or numbers from image using OCR but i don't want to take the photo/image and than process it but only scan and get ?
You mean to say that you don't want to click a picture and process it, instead you want to scan text by just hovering the camera, am I right?
It could be accomplished using a technology called Optical Character Recognition. (You mentioned something about OSR, I think this is want you meant). What it does is, it finds patterns in images to detect text in printed documents.
As far as I know, existing tools processes still images, so you will have to work around it to make it scan any moving images.
Character recognition demands significant amount of resources, so instead of processing moving pictures I would recommend you to write a program that takes images less frequently from a hovering camera and process it. Once text, or numbers in your case, are detected you could use a less efficient pattern matching algorithm to track the motion of the numbers.
Till date, the most powerful and popular software is Tesseract-OCR. You will find it at GitHub. You can use this to develop your mobile application.
Related
-- Background:
We are working on a device called Run-n-Read which tracks a user's head movements and translates it to the appropriate text movement on the screen. The use is to help a person read while running on a treadmill or riding in a moving vehicle. You can check a small video on http://weartrons.com.
We have created a small device which contains accelerometer, a micro-controller and bluetooth to send the head location in real time to the tablet every ~17ms to match with the 60fps of display. We used Processing IDE to create a basic app with downloaded book pages to test the prototype.
-- PROBLEM:
We would like to run our app in the background and dynamically change the display coordinates of any other app contents on the screen, whether it's an eBook or twitter etc. Basically our algorithms are running on our external device and sending the display coordinates (in pixels to move up-down left-right) at about 60 times per second. We would like the Android display origin to move by that many pixels during every frame rendering.
I am an electronics engineer and it's my first stab at writing any piece of software, so please let me know if I was not clear or the answer is too obvious.
Android as OS makes sure applications are encapsulated and oblivious from each other. All inter-app communication is done through what is called intents which are in the end messages. And you have to know exactly the other apps declared intents and on top of that you have no assurances that all apps implemented the kind of feature you are requesting.
Therefore I don't think what you want to do (the coordinates change) is possible at all without tinkering with the OS source code and compiling your own version of Android.
I'm trying to detect the numbers on the prepaid cards like Vodafone. I'm looking for a fast way and easy way to do that. So I plan to use the Fast CV because it is optimized for ARM with neon support, and also would like to do a simple NCC with the characters to predefined images, and not detect the images at real time, just would like to take to capture an image then do the analysis. I would like to know the process to detect just the characters from the card. Is it just simple threshold, the find contours ?
I'm writing an Android app to extract a Sudoku puzzle from a picture. For each cell in the 9x9 Sudoku grid, I need to determine whether it contains one of the digits 1 through 9 or is blank. I start off with a Sudoku like this:
I pre-process the Sudoku using OpenCV to extract black-and-white images of the individual digits and then put them through Tesseract. There are a couple of limitations to Tesseract, though:
Tesseract is large, contains lots of functionality I don't need (I.e. Full text recognition), and requires English-language training data in order to function, which I think has to go onto the device's SD card. At least I can tell it to only look for digits using tesseract.setVariable("tessedit_char_whitelist", "123456789");
Tesseract often misinterprets a single digits as a string of digits, often containing newlines. It also sometimes just plain gets it wrong. Here are a few examples from the above Sudoku:
I have three questions:
Is there any way I can overcome the limitations of Tesseract?
If not, what is a useful, accurate method to detect individual digits (not k-nearest neighbours) that would be feasible to implement on Android - this could be a free library or a DIY solution.
How can I improve the pre-processing to target that method? One possibility I've considered is using a thinning algorithm, as suggested by this post, but I'm not going to bother implementing it unless it will make a difference.
I took a class with one of the computer vision superstars who was/is at the top of the digit recognition algorithm rankings. He was really adamant that the best way to do digit recognition is...
1. Get some hand-labeled training data.
2. Run Histogram of Oriented Gradients (HOG) on the training data, and produce one
long, concatenated feature vector per image
3. Feed each image's HOG features and its label into an SVM
4. For test data (digits on a sudoku puzzle), run HOG on the digits, then ask
the SVM classify the HOG features from the sudoku puzzle
OpenCV has a HOGDescriptor object, which computes HOG features. Look at this paper for advice on how to tune your HOG feature parameters. Any SVM library should do the job...the CvSVM stuff that comes with OpenCV should be fine.
For training data, I recommend using the MNIST handwritten digit database, which has thousands of pictures of digits with ground-truth data.
A slightly harder problem is to draw a bounding box around digits that appear in nature. Fortunately, it looks like you've already found a strategy for doing bounding boxes. :)
Easiest thing is to use Normalized Central Moments for digit recognition.
If you have one font (or very similar fonts it works good).
See this solution: https://github.com/grzesiu/Sudoku-GUI
In core there are things responsible for digit recognition, extraction, moments training.
First time application is run operator must provide information what number is seen. Then moments of image (extracted square roi) are assigned to number (operator input). Application base on comparing moments.
Here first youtube movie shows how application works: http://synergia.pwr.wroc.pl/2012/06/22/irb-komunikacja-pc/
Last week i have chosen my major project. It is a vision based system to monitor cyclists in time trial events passing certain points on the course. It should detect the bright yellow race number on a cyclist's back and extract the number from it, and besides record the time.
I done some research about it and i decided to use Tesseract Android Tools by Robert Theis called Tess Two. To speed up the process of recognizing the text i want to use a fact that the number is mend to be extracted from bright (yellow) rectangle on the cyclist back and to focus the actual OCR only on it. I have not found any piece of code or any ideas how to detect the geometric figures with specific color. Thank you for any help. And sorry if i made any mistakes I am pretty new on this website.
Where are the images coming from? I ask because I was asked to provide some technical help for the design of a similar application (we were working with footballer's shirts) and I can tell you that you'll have a few problems:
Use a high quality video feed rather than rely on a couple of digital camera images.
The number will almost certainly be 'curved' or distorted because of the movement of the rider and being able to use a series of images will sometimes allow you to work out what number it really is based on a series of 'false reads'
Train for the font you're using but also apply as much logic as you can (if the numbers are always two digits and never start with '9', use this information to help you get the right number
If you have the luxury of being able to position the camera (we didn't!), I would have thought your ideal spot would be above the rider and looking slightly forward so you can capture their back with the minimum of distortions.
We found that merging several still-frames from the video into one image gave us the best overall image of the number - however, the technology that was used for this was developed by a third-party and they do not want to release it, I'm afraid :(
Good luck!
For my next project I started analyzing apps that measure pulse via camera (you press a finger against the camera and you get your pulse info).
I concluded that the apps receives data from the camera with the help of a light. How do the achieve this? Can you direct me to any area I should Investigate?
If anyone is in a mood to help me explaining how does pulse measure apps work? I cannot find ANY doc on the net on this topic.
Thanks in advance
To complement Robert's answer from a non-programming perspective (and since you asked it), pulse measure apps are based on Pulse Oxymetry.
The idea is to measure the absorbance of red light, which will vary when oxygenated blood is passing through your fingertips. When that happens there will a be a peak in absorbance, you only have to measure the number of times a peak is registered and divide by the respective time frame to compute cardiac frequency.
IMHO, to perform it on a mobile device is not fairly reliable, since it requires good lighting conditions and infra-red pulses, and there are several factors that makes this task very difficult:
1) Some phones may not have the flash LED light right near the camera
2) Some phones may not have a flash light at all
3) You don't have access to infra-red data.
4) The phone has to be absolutely still, or the image will be constantly changing, making the brightness measurement unreliable.
AFAIK such apps are using the preview mode of the Camera.
Using the method setPreviewCallBack(..) (and of course startPreview())you can register your own listener that receives continuously calls from the camera containing the current seen picture:
onPreviewFrame(byte[] data, Camera camera)
The image data is contained in the data byte array. The format of the data can be set via setPreviewFormat(). Using this data you can for example process the image and reduce it it's brightness at certain point in the image. Over the time the image brightness should show pulses.
I don't think that the necessary image algorithms are available by default in the Android runtime, therefore you have to develop own algorithms or look for 3rd party libraries that can be used on Android.