Finding data that has high similarity with current data?

Finding data that has high similarity with current data? - android

I have android app, and I want to find all data that have high similarities with selected data. example:
I have data that has value like this.
No Name Distance Rating Price
1. Coffee Shop 1.3 KM 4.6 40
And I want to display all data that has similarities with the data above (assuming has weight to count like 'similarity score').
what kind algorithm that most suitable and easy to implement with my case?
From what i have been looking for i got several algorithm that i think it would works
- K-Means Clustering
- K-Nearest Neighbor
- ElasticSearch
- Cosine Similarity
In my current assumption, I still considering using K-Means because it's the only algorithm that I have learnt before

If you use K-Means you will get groups of data clustered together. But here I think k-Nearest Neighbors would suit better for your query since from what I understand you will get queries of data and you are trying to find similar data to it. With k-Nearest Neighbors you can just adjust how many you want to include by saying, say nearest 5 or 50 neighbors. So I would go with kNN in this case.

Use a database like MySQL. SQL has joins and methods to sort similar data.

Related

rss fingerprint database

I am developing an indoor positioning app based on fingerprinting approach. I am stuck at the point where I should store the wi-fi rss values in the database in the training phase. Since rss values vary significantly, will storing the absolute rss values lead to large errors in localization?
I have read many articles and http://www.csd.uoc.gr/~hy439/papers/WILL-pre.pdf says the absolute rss values of each AP varies but their difference relationship maintains. The author introduces a concept called Rss Stacking Difference which is the cumulative difference between one AP and all other APs. Can i store this Rss Stacking Difference in the database rather than absolute values?
Thanks in advance.

Why you do not try to collect several RSS from each reference node for each cell or interesting position (depends on how you segmented the map). That will mitigate the fluctuation of RSS values. then by taking the mean value for each reference node, you will have several mean values for each position or segment. Then you determine the position based on the minimum difference between the data sets in the database and the collected values in online mode.
let the position at point (x=100,y=120) associated with the following finger print
{mac1=xx:xx:xx:xx:xx:xx,rssaverage=-47.54 ; {mac2=xx:xx:xx:xx:xx:xx,rssaverage=-60.1 ; ...}
and the collected values in online mode will be structured in the same way and compared respectively.
I hope that can be helpful
good luck

Tracking GPS points and finding their nearest neighbours?

I have a list of 1 million (slowly) moving points on the globe (stored as latitude and longitude). Every now and then, each point requests a list of the 100 nearest other points (with a configurable max range, if that helps).
Unfortunately, SELECT * SORT BY compute_geodetic_distance() LIMIT 100 is too slow to be done by each point over and over again. So my question: how should I handle this efficiently? Are there better algorithms/datastructures/... known for this? Or is this the only way and should I look into distributing server load?
(Note: this is for an Android app and the points are users, so in case I'm missing an android-specific solution, feel free to say so!)

For your task geo spatial databases have been invented.
There is Oracle Spatial (expensive) and PostGres (free).
These databases store your millions points in a geographical index, a quad tree (Oracle).
Such a query needs nearly no time.
Some people, like me prefer to leave the database away and build up the quadtree themselfs.
The operations search and insert are easy to implement. Update/delete can be more complex.(Cheapest related to implementation effort, is to build up a new quadtree evry minute)
Using a quadtree you can perform hundreds or thousansds of such nearest 100 points within a second.

Architecturally I would arrange for each "point" to phone home to a server with their location when it changes more than a certain amount. On the server you can do the heavy lifting of calculating the distance between the point that moved and each of the other points, and for each of the other points updating their list of the 100 closest points if required. You can then push changes to a point's closest 100 list as they happen (trivial if you are using App Engine, Android push is supported).
This reduces the amount of work involved to an absolute minimum:
Only report a location change when a point moves far enough
Only recalculate distances when a report is received
Don't rebuild the closest 100 list for a point every time, build the list once, then work out if a point that has moved is going to be added or removed from every other point's list.
Only notify a point of changes to its top 100 list to preserve bandwidth.
There are algorithms that you can use to make this super-efficient, and the problem has a fork/join feel to it as well, allowing you to throw horsepower at the problem.

You have to divide the earth into zones and then use an interior point algorithm to figure out what zones the phone is in. Each possible subset of zones will uniquely determine the 100 closest nodes to a fair approximation. You can get an exact set of 100 nodes by checking distance one by one against the candidate nodes, which (once again) are determined by the subset of zones.

Instead of r-tree or a quadtree, I.e spatial index you can also use a quadkey and a monster curve. This curve reduce the dimension and completetly fills the space. You can download my php class hilbert curve from phpclasses.org. You can use a simple varchar column for the quadkey and search the levels from left to right. A good explanation is from Microsoft Bing maps quadkey website.

getting multiple marker coordinates in google map android

Hi to all the members of this great community!
This is my first question so forgive me for possible mistakes. I hope that from this day on i can be helpful for some of you as hopefully you will be for me.Getting to the question:
I am building an android app whose purpose is to search for nearest fuel-points and nearest care-repair-centers. I am very new to android and thx to the numerous posts about android in here I have managed to reach the point where i have build the map and animate it to my current location while updating my location.
Now i have to add the markers of the points of interest. Since they are at least 10 (I will add them only for demonstration purposes) i think it's not wise to add them through 10+ repetitive calls to itemizedOverlay.addOverlayItem(). My idea was to save them in a file in the format ( " latitude " , "longitude" , simple_description_title , other info ) and than in some way import the first 2 fields for the geopoint and the 3rd for the title.
I will use than the 4th later for some type of tooltip text (for example tel_number).
Do you think this is a good approach? And how can I implement the file reading(if) in the code that extends ItemizeOverlay().
I didn't post the code until this point since it's irrelevant.

Welcome to SO, let's jump right into your problem/question.
1.) Since you are only adding 10 points of interest it won't matter if you just call itemizedOverlay.addOverlayItem() for all 10 because the trick is to call itemizedOverlay.populate() only after you have added all the overlayItems using itemizedOverlay.addOverlayItem(), this way you don't compromise on performance.
2.) Now, once again, since you are only doing a demonstration I would advise you to simply hard-code all the 10 overlays with their respective geolocations into the Android code itself. This way you WON'T have to worry about reading data. Also, using a txt file to store data isn't the best option both performance and convenience wise. This is what databases exist for.
3.) If, and when you do this in the future, you do need to use some dynamic data to populate your markers with, then I'd STRONGLY advise you to use either
SQLite: The embedded database that Android offers, it's great for storing small bits of information that's required for your application such as description title, other info, the latitude, longitude, however, if you have some sort of a connection based application where you need to update globally accessible data every once in a while I'd advise you to use the next option,
MySQL: This is an online database that you have to interface with using a server and PHP. The advantage of using an online database is that you can now share information between different users (friends, contacts, followers etc.) by reading and writing to and from the database.

Android line simplification

I'm looking for some best practice advice.
I have create an app (very like mytracks) which collects GPS measurements and displays them on a map. I want to be able to record GPS data for ideally 24 hours at a 10 second interval. This is a lot of data, so I am not keeping it in memory, i'm storing it into an SQLiteDB as it arrives. Inside the draw() functions I am selecting everything and drawing it as a Path object.
My above approach works great until I have > 4 hours worth of data. Then the draw function takes for ever to execute which makes the application seem very slow.
I think what I need to do is draw a simplified trajectory onto the map. My question is what is the best way of doing this.
i) Processor heavy: In draw() select everything from the SQLiteDB, construct the simplified trajectory, draw it on the map.
ii) Memory heavy: Maintain a simplified trajectory in memory, update it as new data arrives, in draw() simply draw it to the map.
iii) Magic: Use some special OverlayLay that I don't know about which handles line simplification for you.
Kind regards,
Cathal

My initial semi-random thoughts:
You don't say that you're actually doing so, but don't store one sample per database table row. 24 hours of samples at 10 second intervals, that's 8640 samples. Each sample is 2 doubles, i.e 16 bytes. A day's worth of data is 135KB, a sum which can easily fit entirely in memory. Your database strategy should probably be to let one table row correspond to one sampling period, whose maximum length is one day. Needless to say, the sample data should be in a BLOB field.
Drawing the path: this depends on the current map zoom and what part of the sample set is visible. The first thing you do is to iterate your sample collection (max. 8640) and determine the subset which is visible at the current zoom. That should be a pretty quick operation. Lets say for sake of example 5000 are visible. You then select some maximum number of samples for the path based on h/w assumptions... picking a number out of thin air let's say no more than 500 samples used for the path (i.e. the device won't struggle to draw a path with 500 points). You therefore build the path using every 10th sample (5000/500 = 10), and make sure to include the first and last sample of the visible set.
Note that you don't do all this work every frame. You only need to recalculate the path when the user finishes panning or zooming the map. The rest of the time you just draw the path you already calculated.

Oddly enough, I was just looking at code I wrote to do something similar, about 5 years ago.
Here are the steps I went through:
Simplify the dataset to only visible detail. I designed with several pluggable simplification strategies, but a common interface for tolerances and feeding in/getting out points to render. More on how to design it below.
Cache a compact, fast-to-access version of the simplified point list. For big data sets, it's helpful to use primitives as much as possible, in preference to Point objects. With double precision locations, you need 128 bytes per point, or ~1.3 MB of memory for 10,000.
Render efficiently, and without creating garbage. Iterating through int/float/double arrays of x and y coordinates is best, with a tight rendering loop that does as much as possible outside the loop. You'll be AMAZED how many points you can render at once if it's just "plot this path."
Notify the Simplifier when new points are added, and run new points through this before adding them to the cached point list. Update it as needed, but try to just process the latest.
Simplification Interface:
There's a bunch of ways to implement this. The best one (for huge point sets) is to feed it an Iterator<Point> and let the simplification algorithm go to work with that. This way, the points don't all have to be in memory, and you can feed it from a DB query. For example, the Iterator can wrap a JDBC ResultSet. Simplifiers should also have a "tolerance" value to determine how close points are before they get ignored.
How to simplify pointsets/polygonal lines:
There are a bunch of algorithms.
The simplest is to remove points that are less than $tolerance from the last included point. This is an O(n) implementation.
The Douglas-Peucker algorithm gives an excellent polygon simplification on a large pointset. The weakness is that you need to operate on points in memory; use it on batches of, say, 10,000 points at a time. Runs in O(n log n) average, O(n^2) worst case
Fancy 2D hashing: you can use a 2D hashing algorithm, with one entry possible per pixel. Points that map to an occupied slot aren't rendered. You'll need to run through points twice for this, to find points that lead back to the same spots, if rendering lines and not scatterplots.
Extra tips:
You can boost performance by creating a wrapper that maps your cached, simplified point list to something your graphics primitives can handle easily. I was using Swing, so I created something that acted like a mutable Path2D, which the rendering context could handle directly.
Use your primitives. Seriously, I can't say this enough. If you use objects to store memory, the 128 bytes/point can double, AND increase memory use AND it can't be compiled to such optimal code.
Using these strategies, it is possible to render millions of points at once (in a reduced form). If I recall correctly, I could run the simplification routine in real-time, operating on 10k+ points at a time, using good code. It might have been 100k+ points. If you can store a hashtable with one slot per pixel, the hashing implementation is ridiculously fast (I know, it was my fastest solution)

Android: Best practice to get data from SQLite and show in a listview order by distance from phone

I have a SQL Lite database on places with a latitude and longitude and I want to get the data from SQL Lite and some way, either in the SQL question or in a custom adapter or what the best practice is sort out all places that are in a 5 miles distance from the phone location and show in a ListView.
I have one idea I figure out but wounder if someone has a better idea.
I can get out all data from SQL Lite and put it in a ListView with a custom adapter that gets the phone location from a LocationManager and then somehow remove the items that has a distanceTo longer then 5 miles and I hope with the custom adapter I can sorting also but don't have all pieces here yet.
And I think my solution going to work but is there any better solution?
I can also say that its around 400 places so maybe not a big deal to do this with custom adapter.
All solutions is good, I am relative new on Android but learning.

I think your approach is good considering that the SQLite database has no spatial functionality. Your custom adapter should then be able to limit the data to those within a specified distance from the phone's current location.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.