We're currently working on an android ocr app using opencv.pre-processing ,segmentation ,Feature extraction steps are done. Classification is the remaining step and we're stuck ..We're using a DB table which is filled with each letter features ..Firstly we had only 1 feature per letter and we used euclidean distance ,but results wasn't accurate and more features needed to be obtained and so we did.The problem now is we have 7 features per letter and absolutely no idea of how to classify i/p based on them..some have recommended using knn ,but we can't figure out how and the opencv documentation in that part ain't clear ..so if anybody can help it wud be great.
Thanks in advance
Briefly and without discussing the details. Vector space comes in handy here. You need to build a feature vector
<feature1, feature2, feature3.. featureN> for each of the instances in your training set.
From each of these images you extract features that you think or you read in the research articles are important for image classification. For example you can do centroid, Gaussian blur, histograms, etc.
Once you have these values linear algebra comes into play with some classification algorithm: knn, svm, naive bayes etc that you run on your training set, that is you build your model.
If the model is ready you run it on your test set.
Use cross validation for more comprehensive results.
For more details check the course notes:
http://www.inf.ed.ac.uk/teaching/courses/iaml/slides/knn-2x2.pdf
or
http://www.inf.ed.ac.uk/teaching/courses/inf2b/lectureSchedule.html
would like to add that OpenCV may not have the sort of classifiers you might prefer.
There are several libraries out there, though you may have to see which works best when on a mobile platform. Could you give some details on the features you are using?
The simplest KNN (k-nearest neighbors) measure would be to find the Euclidean distance in n dimensions (for an n-dimensional feature vector) between the input sample's features and each of the vectors in your DB table. Also explore Mahalanobis distance (used to measure distance between a point and a dataset/class) if you have multiple classes and the input image is to be classified as one such 'type' or 'class' of image.
As #matcheek mentioned, more sophistication can be possible using machine learning techniques such as SVM, Neural Nets, etc. However first you might consider a simpler thing like kNN, considering its a mobile platform which may limit the computational complexity.
Related
I'm developing an Android application to match two images using ORB feature detection .
The processing and matching logic is called in java using JNI functions.
The problem is that the feature detections works well for some images, but fails in some images and some cases.
Here is an example of images that fails in some unknown conditions
After some thoughts and discussions, I figured out that the problem is that the problem is the lack of features that's why the program fails. Someone in the opencv community tried this image and it gave him 60 keypoints which all of them doesn't survive the RobustMatcher tests.
So I need to enhance to features in this image in order to make the matching work.
In addition to equalizeHist, what can I do ?
I hope you can help me with some suggestions and maybe some examples guys.
One way is to enhance the edges of the image. Do a Laplacian filter for example and multiply the result to the original image. This job makes the features (edge) more salient. Of course before everything convert the image to a float type and at the end, normalize your image.
Im new in the Computer Vision field, so I'm learning from scratch how to generate point clouds from multiple image captures. I'm not implementing any of this on code yet, first I want to learn how this whole process should be done and then I'll code it.
So far I've learned about features detection algorithms , mostly SIFT and the remarkably more accurate A-KAZE, which detects much more features on each image and thus generates more dense clouds.
Then there comes the key matching algorithms, mainly Brute Force (BF) and FLANN.
Finally it should be a process in which you:
-first: get all the cameras orientation
-finally : generate the sparse point cloud.
But, until now , I've only found examples in OpenCV in which only two images are matched and their matched features are drawn. Im not able to find any example in which more images are matched and , more importantly, Im not able to find out how to find camera's orientation and to generate point clouds on OpenCV. Please, I need some help on those last stages. If you find any example of multiple image matching, point cloud generation it would be very helpfull . Thanks in advance!
OpenMvg has a nice structure-from-motion pipeline example to reconstruct 3D sparse point clouds from SIFT and AKAZE features. It even works without given any camera intrinsics (focal length, principal point).
I have been working on an application that involves font recognition based on a users free hand drawing characters in Android Canvas.
In this application the user is asked to enter some predefined characters in a predefined order (A,a,B,c). Based on this, is there any way to show the very similar font which matches the user's hand writing.
I have researched on this topic found some papers & articles but most of them are recognizing font from a captured image. In that case they are having a lot of problems by segmenting paragraphs, individual letters and so on. But in my scenario I know what letter the user is drawing.
I have some knowledge in OpenCV and Machine Learning. Need help on how to proceed with this problem.
It is not exactly clear to me what you want to accomplish with your application but I assume that you are trying to output a font from a database of fonts that matches a users handwriting the most.
In Machine Learning this would be a classification problem. The number of classes will by equal to the number of different fonts in your database.
You could solve this with the help of a Convolutional neural network which are widely used for image and video recognition related tasks. If you've never implemented a CNN before I would suggest that you look up this resources to learn about Torch which is a easy-to-start-with toolkit to implement CNN's. (Of course there are more Frameworks such as: Tensor Flow, Caffe, Lasagne, ...)
Torch Homepage
Deep learning with Torch: 60 minutes blitz
Torch Cheatsheet
The main obstacle you will face is that Neural Networks need thousands of images (>100.000) to properly train them and to achieve satisfying results. Furthermore you do not only need the images but also a correct label for each image. Will say, you would need a training image such as a handwritten character and the corresponding font it matches the most out of your database as its label.
I would suggest that you read about so called transfer learning which can give you an initial boost as you do not need to set up a CNN model completely by yourself. In addition people have pre-trained such a model for a related task so that you safe extra time as you would not need to train it for many hours on a GPU. (see CUDA)
A great resource to start with is the paper: How transferable are features in deep neural networks?, which could be helpful for the stated reasons.
To get tons of training and testing data you can look up the following open datasets that provide all types of characters that can be helpful for your task:
Artificial Characters Data Set
UJI Pen Characters Data Set
The Chars74K dataset
Hand written - Datasets
A New Benchmark Dataset for Handwritten Character Recognition
For access to a lot of fonts and maybe even the possibility to create further datasets on your own you can have a look at Google Fonts.
You might find this article very interesting : https://erikbern.com/2016/01/21/analyzing-50k-fonts-using-deep-neural-networks/
Seems like a pretty straightforward deep learning supervised learning problem.
Generate a ton of randomly deformed samples for letters of each target font type, and train a convnet on that set?
The ideal would be to have a huge set of labeled, handwriting to font data, but that feels unlikely.
You could also use the generated, progressive to font code to take a bunch of handwritten samples, and transform them to look more like the font of your choice, as a dataset.
This is good place to start : https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
Digit letter recognition with convnets.
This is quite a bit of work though if you haven't worked with that stuff before.
I would suggest using OCR library tesseract. Very well developed and mature. It also has support for training with other languages which you can use to train over a set of font.
Approach
Training:-
Take all 26(per alphabet) images for n fonts. Train tessaract over 26 A's, then 26 B's and soon.
Testing:-
Take a sentence and separate all characters.
For each character, find certainty score(supported in library) from Tesseract. Note, for character 'a, use the trained model on all 'a''s from different fonts.
For all characters, find best font using some metric (average, median, etc). For example: You can sum certainty score each font received for all characters and use the font which got max result.
I would like to make an Android Application that captures an image and searches it for coins and paper notes and then determines the value of the money in the image.
Additionally, the output of the system will be such that it can be understood by a blind person.
What functions and techniques in openCV would suit these tasks?
What limitations and development hurdles can I expect?
Assuming you already know how to program android apps,you need to do the following:-
Download the OpenCV SDK and set it up with the IDE.
Recognising shapes will be a huge part of your project, see the contour detection example that is present with the SDK examples. Your primary goal will be detecting a circle. Later you need to adapt your algorithm according to the currency. This will be of particular interest to you.
Learn the different image processing techniques like thresholding for better accurate results. Understanding what a Mat object is and how it can be manipulated is important.
Finally improve the accuracy of your algorithm, as sometimes lighting conditions make the difference between a good review and a dissatisfied user.
I am thinking of a project for my university the teachers liked it but I am not sure if its even possible.
I am trying to make an andriod app.
What I want to do is take a picture of a hand drawn logic circuit (having the AND, OR, NOT ... gates) recognize the gates, and make a circuit in the moblie and run it on all possible inputs
Example of logical circuit ( assume its hand drawn )
For this I will have to make a simulator on mobile, that I dont think is the hard part. The problem is how could recognize the gates from a picture.
I found out that theres a edge detection plugin in java but still I dont think its enought to recognize the gates. Please share any algorithm or any technique or tools that I can use to make this thing.
This is actually for my FYP, I cant find any good ideas and have to present this on thursday.
you will need to do some kind of object recognition the easiest way (conceptually) to identify gates is to simply do a correlation between the image and a bank of gates, or an "alphabet" You run the gate template over the entire image and look for the highest correlation, this means it matches the template closely and you likely found your gate of interest. here are a few interesting s0 posts
Simple text reader (OCR) in Matlab
MATLAB Optical character recognition - need help
On it's own this could be a daunting task, but you can simplify the problem by adding constraints.
For instance the user must draw on graph paper and they can only have one gate per grid. This ensures you won't have to check a large variety of sizes for each gate
If you use graph paper with colored lines (like blue) and the user is only allowed to use a non-blue pen/pencil, you MAY be able to easily remove the grid when processing the image by filtering out the blue channel, and still have a clean image to process with.
of course there are more advanced methods than correlation, but as I said before, conceptually, this model is very easy to understand. Hope that helps
edit
I just realized both my examples were in matlab, the important point here is the logic/process used, not the exact code.