As part of my final project for a 1 year software development course i am required to implement a knn project which predicts the outcome of football matches in an android app.
I built a mysql database on an online hosting site (byethost) using predictors:
difference in average goals per game
diff in average points per game
diff in average goals conceded per game
outcome
My tutor recommended k should equal 100 and that a significant data set be constructed (over 1000 results)
From here on i am lost as to how i should approach the problem at hand. Can anyone give any further guidance as to how to tackle the problem.
All advice is welcomed.
This is too long for a comment.
K-NN could be used as part of the solution. However, it is an undirected data mining algorithm, meaning specifically that it doesn't generate expected outcomes. That means that K-NN is not appropriate as the only technique used for a final project that depends on prediction.
I am concerned about any year-long course that makes such a recommendation. The difference between directed and undirected algorithms is pretty fundamental.
Related
I'm currently doing my final year project. The end product of this project is to develop an Android application that can be used by an individual to track the amount of carbon gas emitted through daily activities and consumption. I'm having trouble in searching and understanding the method use for calculation. Anyone can explain?
You will most likely need to get the user to input their daily food waste, etc and use a formula to calculate it. A quick search on google brought me to https://www3.epa.gov/carbon-footprint-calculator/ which has an excelsheet with the various formulas you can download on that page.
As for how the formula or its constants are derived, I have no clue myself.
I am trying to work on android mobile app where I have a functionality to find matches according to interest and location. Many dating apps are already doing some kinda functionality for example Tinder matches based on locations, gender and age etc.
I do not want to reinvent the wheel if it has been done already. I have searched on google and some suggested to use clustering algorithm for this Algorithm for clustering people with similar interests User similarities algorithm
Lets I have data in this JSON format for users
User1: {location: "Delhi, India", interests: ["Jogging", "Travelling", "Praying"] }
User2: {location: "Noida, India", interests: ["Running", "Eating", "Praying"] }
User3: {location: "Bangalore, India", interests: ["Exercise", "Visiting new places", "Chanting"] }
I am writing a matching algorithm that matches few below criteria -
If user1 is having an interest in "Jogging" and another user2 is having an interest in "Running" so as jogging and running is alternatively a kind of exercise so they should match both the profiles as well as it should be location wise also as nearest should be on top.
The algorithm, when running at scale, should be fairly performant. This means I'd like to avoid comparing each user individually to each other user. For N users this is an O(N^2) operation. Ideally, I'd like to develop some sort of "score" that I can generate for each user in isolation since this involves looping through all users only once. Then I can find other users with similar scores and determine the best match based off that.
Can anyone suggest me with some implementation of how can I achieve this with the help of firebase-cloud-function and firebase-database.
I think hard coding similarity is a wrong approach. FYI none of the major search engines rely on such mappings.
A better approach is to be more data driven. Create an ad hoc methodology to start with and once you have sufficient data build machine learning models to rank matches. This way you do not have to assume anything.
For the location, have some kind of a radius (preferably this can be set by the user) and match people within the radius.
First of all i would say get rid of the redundant features in your dataset, Jogging and running could be 1 feature instead of 2, also after that you can use K-means algorithm to group data in an unsupervised way
to learn more about K-means you can go to this link:
https://www.coursera.org/learn/machine-learning/lecture/93VPG/k-means-algorithm
Also as you're building an online system, it has to improve itself everyday
You can watch this for learning a bit more about online learning
https://www.coursera.org/learn/machine-learning/lecture/ABO2q/online-learning
Also https://www.coursera.org/learn/machine-learning/lecture/DoRHJ/stochastic-gradient-descent this stochastic gradient will be helpful to know.
These are conceptual videos do not implement anything yourself, you can always use a library like tensorflow https://www.tensorflow.org/
I know this looks a bit hard to understand but you'll need this knowledge in order to build your own custom recommendation system.
I'm developing an android application which recognizing accelerometer gesture. For now I'm just utilizing dynamic time warping to get the smallest distance between input gesture and about 200 unique gesture data in database. My application looping through the data and compare the input gesture with gesture data in the database one by one. It can find the smallest distance and recognizing the gesture for average in 5 second. The problem is can i speed up recognition time maybe for half second or less? Do I have to use classfication method like KNN and combine it with dtw method? an example or references will be apreciated..
What you are currently doing is a 1NN. In other words, you are already running a simplest possible KNN method. with K=1. Changing K won't speed up anything, it can only change the quality of the result. To speed up the process you can think about using two approaches:
Using some indexing methods, which will reduce the computational complexity of your distance based search. This problem is called Nearest Neighbout Search (NNS), and even wikipedia provides quite a lot of information regarding its speed ups;
Using completely different classification method, which build a much simplier model (possibly SVM or even some decision tree - it depends on your actual data).
My intuition is that Locally Sensitive Hashing can be quite easily applicable. For instance you could design them by picking K points randomly and checking if the time series isn't too "far" away.
I would go into more details on that idea, but instead I found this paper : http://dtai.cs.kuleuven.be/events/MLSA13/papers/mlsa13_submission_13.pdf , and it seems to be using much simpler LHS function.
So this is one way out, hope it works out. You can also implement an easy classifier and accept its answer if it is very certain about the gesture (I would recommend SVM here as in the answer above), and if it is close to the boundary decision look for the closest neighbour.
you can do DTW at 10,000 hz, even on a phone, see this vid
http://www.youtube.com/watch?v=d_qLzMMuVQg
eamonn
We're currently working on an android ocr app using opencv.pre-processing ,segmentation ,Feature extraction steps are done. Classification is the remaining step and we're stuck ..We're using a DB table which is filled with each letter features ..Firstly we had only 1 feature per letter and we used euclidean distance ,but results wasn't accurate and more features needed to be obtained and so we did.The problem now is we have 7 features per letter and absolutely no idea of how to classify i/p based on them..some have recommended using knn ,but we can't figure out how and the opencv documentation in that part ain't clear ..so if anybody can help it wud be great.
Thanks in advance
Briefly and without discussing the details. Vector space comes in handy here. You need to build a feature vector
<feature1, feature2, feature3.. featureN> for each of the instances in your training set.
From each of these images you extract features that you think or you read in the research articles are important for image classification. For example you can do centroid, Gaussian blur, histograms, etc.
Once you have these values linear algebra comes into play with some classification algorithm: knn, svm, naive bayes etc that you run on your training set, that is you build your model.
If the model is ready you run it on your test set.
Use cross validation for more comprehensive results.
For more details check the course notes:
http://www.inf.ed.ac.uk/teaching/courses/iaml/slides/knn-2x2.pdf
or
http://www.inf.ed.ac.uk/teaching/courses/inf2b/lectureSchedule.html
would like to add that OpenCV may not have the sort of classifiers you might prefer.
There are several libraries out there, though you may have to see which works best when on a mobile platform. Could you give some details on the features you are using?
The simplest KNN (k-nearest neighbors) measure would be to find the Euclidean distance in n dimensions (for an n-dimensional feature vector) between the input sample's features and each of the vectors in your DB table. Also explore Mahalanobis distance (used to measure distance between a point and a dataset/class) if you have multiple classes and the input image is to be classified as one such 'type' or 'class' of image.
As #matcheek mentioned, more sophistication can be possible using machine learning techniques such as SVM, Neural Nets, etc. However first you might consider a simpler thing like kNN, considering its a mobile platform which may limit the computational complexity.
I am developing Android application for the company. They want me to create digital newspaper that would:
Display list of headlines for each category/subcategory
Will have 4-6 categories and 4-8 subcategories for each category
Display article with text and images,
Play podcast,
Save downloaded articles/headlines in db
They already have the web-service almost adopted for this app.
This app will be quite similar to: TechnologyReview or CNET News (but the article will be larger)
I have estimated this project for 160 hours of development. That doesn't include design but includes design implementation.
I would love to hear your opinion on this estimate. Do you think 160h is too short or too long? How much I should charge for this project or 1 hour of development(more or less of course)? I am living in London, UK
I need this estimation by the end of today so I will be really grateful for fast replies.
If you have a similar project, or some similar code files you can bunch together into a folder, you can simply run ProjectCodeMeter over it and get a ballpark cost estimation that will point you in the right direction.
Good luck!
The way to do estimates is to break down the project into each component and estimate each of those. Trying to ballpark one number is inevitably worse than ballparking a breakdown. This also helps you get better on each estimate in the future because you know which estimates made sense and which were wrong.