I'm working on a application for handwriting recognition,ie, the user draws their character into the screen,and then that particular alphabet is generated automatically.
The approach I'm taking(after a reading from here),is to first train the system. While training, I store the Path(android.graphics.Path) values. Thanks to this, this and this, I know how to serialize and store the Path values to a database during the training phase.
Now while comparing the values for comparison with what the user has drawn as a character, I can increase the accuracy by calculating the centroid and the number of strokes of every character,to differentiate between cases like b and d.
The problem I'm facing is :
The three a's written on the screen are all different, and will all generate a different Path value. How do I compare such characters?
Now,I know there are many such questions on SO regarding handwriting recognition. But since I have already taken up an approach, I don't think those questions would be relevant to me. But if anyone has better and easier solutions to handwriting recognition,can feel free to suggest. :-)
Related
I have a question about harakah on Arabic language, the result that i get from MLKIT doesn't provide arabic with harakah in its text, only arabic without it.
I tried making a feature for my flutter app, and it needs to recognize arabic words from gesture that user paint. So I use google_mlkit_digital_ink_recognition, The example works great, i downloaded arabic model but the problem is it cannot read arabic word with harakah that i input (paint) in the screen. I tried reading documentation and search about this but no dice.
The result that i get is only arabic without harakah, and the feature i plan to make need it. Backend will send a text (1 to 4 arabic alphabet with or without harakah) and user need to paint it on the screen, MLKIT will recognize the strokes user made and then the app will decide if what user paint is correct or not, this is what i plan to make. Is there a way for MLKIT digital ink to input and output arabic with harakah with this plugin? Is there alternative or a better way to handle this? or do i need to make and train a custom modal for this? or am i missing something that needs to be done first?
There are some app that i see in google play that's a bit similiar, i think i can achieve similiar result like the apps i see with clipper and path but they don't have harakah on it, the app need to know if it's correct or not, and the text that i need to match comes from backend which may vary and more complicated. There's other alternatives that i thought, instead of text backend send me an image, use scribble to generate image on what user paint, then match it with image from backend with image_compare, but i'm not sure since i haven't tried it yet. The chance of getting them right when matching might be low and this will burden other team since they need to make the image one by one instead of just text. Right now, the fastest way i can think of is using MLKIT since i need to work on this feature this late month or early next month. I hope you guys can help, Thanks.
I have been working on an application that involves font recognition based on a users free hand drawing characters in Android Canvas.
In this application the user is asked to enter some predefined characters in a predefined order (A,a,B,c). Based on this, is there any way to show the very similar font which matches the user's hand writing.
I have researched on this topic found some papers & articles but most of them are recognizing font from a captured image. In that case they are having a lot of problems by segmenting paragraphs, individual letters and so on. But in my scenario I know what letter the user is drawing.
I have some knowledge in OpenCV and Machine Learning. Need help on how to proceed with this problem.
It is not exactly clear to me what you want to accomplish with your application but I assume that you are trying to output a font from a database of fonts that matches a users handwriting the most.
In Machine Learning this would be a classification problem. The number of classes will by equal to the number of different fonts in your database.
You could solve this with the help of a Convolutional neural network which are widely used for image and video recognition related tasks. If you've never implemented a CNN before I would suggest that you look up this resources to learn about Torch which is a easy-to-start-with toolkit to implement CNN's. (Of course there are more Frameworks such as: Tensor Flow, Caffe, Lasagne, ...)
Torch Homepage
Deep learning with Torch: 60 minutes blitz
Torch Cheatsheet
The main obstacle you will face is that Neural Networks need thousands of images (>100.000) to properly train them and to achieve satisfying results. Furthermore you do not only need the images but also a correct label for each image. Will say, you would need a training image such as a handwritten character and the corresponding font it matches the most out of your database as its label.
I would suggest that you read about so called transfer learning which can give you an initial boost as you do not need to set up a CNN model completely by yourself. In addition people have pre-trained such a model for a related task so that you safe extra time as you would not need to train it for many hours on a GPU. (see CUDA)
A great resource to start with is the paper: How transferable are features in deep neural networks?, which could be helpful for the stated reasons.
To get tons of training and testing data you can look up the following open datasets that provide all types of characters that can be helpful for your task:
Artificial Characters Data Set
UJI Pen Characters Data Set
The Chars74K dataset
Hand written - Datasets
A New Benchmark Dataset for Handwritten Character Recognition
For access to a lot of fonts and maybe even the possibility to create further datasets on your own you can have a look at Google Fonts.
You might find this article very interesting : https://erikbern.com/2016/01/21/analyzing-50k-fonts-using-deep-neural-networks/
Seems like a pretty straightforward deep learning supervised learning problem.
Generate a ton of randomly deformed samples for letters of each target font type, and train a convnet on that set?
The ideal would be to have a huge set of labeled, handwriting to font data, but that feels unlikely.
You could also use the generated, progressive to font code to take a bunch of handwritten samples, and transform them to look more like the font of your choice, as a dataset.
This is good place to start : https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
Digit letter recognition with convnets.
This is quite a bit of work though if you haven't worked with that stuff before.
I would suggest using OCR library tesseract. Very well developed and mature. It also has support for training with other languages which you can use to train over a set of font.
Approach
Training:-
Take all 26(per alphabet) images for n fonts. Train tessaract over 26 A's, then 26 B's and soon.
Testing:-
Take a sentence and separate all characters.
For each character, find certainty score(supported in library) from Tesseract. Note, for character 'a, use the trained model on all 'a''s from different fonts.
For all characters, find best font using some metric (average, median, etc). For example: You can sum certainty score each font received for all characters and use the font which got max result.
I'm developing an android application which recognizing accelerometer gesture. For now I'm just utilizing dynamic time warping to get the smallest distance between input gesture and about 200 unique gesture data in database. My application looping through the data and compare the input gesture with gesture data in the database one by one. It can find the smallest distance and recognizing the gesture for average in 5 second. The problem is can i speed up recognition time maybe for half second or less? Do I have to use classfication method like KNN and combine it with dtw method? an example or references will be apreciated..
What you are currently doing is a 1NN. In other words, you are already running a simplest possible KNN method. with K=1. Changing K won't speed up anything, it can only change the quality of the result. To speed up the process you can think about using two approaches:
Using some indexing methods, which will reduce the computational complexity of your distance based search. This problem is called Nearest Neighbout Search (NNS), and even wikipedia provides quite a lot of information regarding its speed ups;
Using completely different classification method, which build a much simplier model (possibly SVM or even some decision tree - it depends on your actual data).
My intuition is that Locally Sensitive Hashing can be quite easily applicable. For instance you could design them by picking K points randomly and checking if the time series isn't too "far" away.
I would go into more details on that idea, but instead I found this paper : http://dtai.cs.kuleuven.be/events/MLSA13/papers/mlsa13_submission_13.pdf , and it seems to be using much simpler LHS function.
So this is one way out, hope it works out. You can also implement an easy classifier and accept its answer if it is very certain about the gesture (I would recommend SVM here as in the answer above), and if it is close to the boundary decision look for the closest neighbour.
you can do DTW at 10,000 hz, even on a phone, see this vid
http://www.youtube.com/watch?v=d_qLzMMuVQg
eamonn
We're currently working on an android ocr app using opencv.pre-processing ,segmentation ,Feature extraction steps are done. Classification is the remaining step and we're stuck ..We're using a DB table which is filled with each letter features ..Firstly we had only 1 feature per letter and we used euclidean distance ,but results wasn't accurate and more features needed to be obtained and so we did.The problem now is we have 7 features per letter and absolutely no idea of how to classify i/p based on them..some have recommended using knn ,but we can't figure out how and the opencv documentation in that part ain't clear ..so if anybody can help it wud be great.
Thanks in advance
Briefly and without discussing the details. Vector space comes in handy here. You need to build a feature vector
<feature1, feature2, feature3.. featureN> for each of the instances in your training set.
From each of these images you extract features that you think or you read in the research articles are important for image classification. For example you can do centroid, Gaussian blur, histograms, etc.
Once you have these values linear algebra comes into play with some classification algorithm: knn, svm, naive bayes etc that you run on your training set, that is you build your model.
If the model is ready you run it on your test set.
Use cross validation for more comprehensive results.
For more details check the course notes:
http://www.inf.ed.ac.uk/teaching/courses/iaml/slides/knn-2x2.pdf
or
http://www.inf.ed.ac.uk/teaching/courses/inf2b/lectureSchedule.html
would like to add that OpenCV may not have the sort of classifiers you might prefer.
There are several libraries out there, though you may have to see which works best when on a mobile platform. Could you give some details on the features you are using?
The simplest KNN (k-nearest neighbors) measure would be to find the Euclidean distance in n dimensions (for an n-dimensional feature vector) between the input sample's features and each of the vectors in your DB table. Also explore Mahalanobis distance (used to measure distance between a point and a dataset/class) if you have multiple classes and the input image is to be classified as one such 'type' or 'class' of image.
As #matcheek mentioned, more sophistication can be possible using machine learning techniques such as SVM, Neural Nets, etc. However first you might consider a simpler thing like kNN, considering its a mobile platform which may limit the computational complexity.
so I am starting to learn how to develop Android Applications. I have experience with Java and C# from school, and I would say that while I am not a pro, I definitely have a fairly good handle on Object Oriented Programming.
So one thing I don't understand about Android is resources. For example, let's say I have a TextView in my GUI. Why do I have to define a string "Hello," then make the value that string called "Hello" = "hello?" I don't understand why the software development kit doesn't just let users make that string value "hello," and be done with it. What is the purpose of storing numbers and strings and stuff into resources? I know that there must be a solid explanation for this, but I just don't know what.
Also, I am experimenting with an Addition program (where I prompt the user with a randomized math problem, and they can input their answer, and my program will check if it is right or wrong, and restart). So I have a TextView for the problem (i.e., 1 + 1). When I create the TextView, I had to create a problemString in the resources, and then assign the problem TextView to the problemString. However, in my program, when the user has gotten the math problem right or wrong, I write over the problem with a new problem by simply changing the text of the textview. In no way do I interact with the problemString from the resources. And this works. So again, my question is, what is the purpose of having application resources and what role do they play in an application.
Also, how do I access, write over, and do stuff with the application resources.
Sorry that this is a really long question, but I really think Android dev. is really cool, and I am very eager to learn. Any help is APPRECIATED! xD
Thanks!
Imagine your application with a thousand different strings to display to a user. If you need to change 30 of them, do you want to dig through all your code, or one file?
Also localization is another reason for having different sets of string resources, as well as other resources, specific to a locale. Take the above scenario, a thousand different strings, AND three different languages. How would you handle that? Three different version programs? No.