I'm looking for some best practice advice.
I have create an app (very like mytracks) which collects GPS measurements and displays them on a map. I want to be able to record GPS data for ideally 24 hours at a 10 second interval. This is a lot of data, so I am not keeping it in memory, i'm storing it into an SQLiteDB as it arrives. Inside the draw() functions I am selecting everything and drawing it as a Path object.
My above approach works great until I have > 4 hours worth of data. Then the draw function takes for ever to execute which makes the application seem very slow.
I think what I need to do is draw a simplified trajectory onto the map. My question is what is the best way of doing this.
i) Processor heavy: In draw() select everything from the SQLiteDB, construct the simplified trajectory, draw it on the map.
ii) Memory heavy: Maintain a simplified trajectory in memory, update it as new data arrives, in draw() simply draw it to the map.
iii) Magic: Use some special OverlayLay that I don't know about which handles line simplification for you.
Kind regards,
Cathal
My initial semi-random thoughts:
You don't say that you're actually doing so, but don't store one sample per database table row. 24 hours of samples at 10 second intervals, that's 8640 samples. Each sample is 2 doubles, i.e 16 bytes. A day's worth of data is 135KB, a sum which can easily fit entirely in memory. Your database strategy should probably be to let one table row correspond to one sampling period, whose maximum length is one day. Needless to say, the sample data should be in a BLOB field.
Drawing the path: this depends on the current map zoom and what part of the sample set is visible. The first thing you do is to iterate your sample collection (max. 8640) and determine the subset which is visible at the current zoom. That should be a pretty quick operation. Lets say for sake of example 5000 are visible. You then select some maximum number of samples for the path based on h/w assumptions... picking a number out of thin air let's say no more than 500 samples used for the path (i.e. the device won't struggle to draw a path with 500 points). You therefore build the path using every 10th sample (5000/500 = 10), and make sure to include the first and last sample of the visible set.
Note that you don't do all this work every frame. You only need to recalculate the path when the user finishes panning or zooming the map. The rest of the time you just draw the path you already calculated.
Oddly enough, I was just looking at code I wrote to do something similar, about 5 years ago.
Here are the steps I went through:
Simplify the dataset to only visible detail. I designed with several pluggable simplification strategies, but a common interface for tolerances and feeding in/getting out points to render. More on how to design it below.
Cache a compact, fast-to-access version of the simplified point list. For big data sets, it's helpful to use primitives as much as possible, in preference to Point objects. With double precision locations, you need 128 bytes per point, or ~1.3 MB of memory for 10,000.
Render efficiently, and without creating garbage. Iterating through int/float/double arrays of x and y coordinates is best, with a tight rendering loop that does as much as possible outside the loop. You'll be AMAZED how many points you can render at once if it's just "plot this path."
Notify the Simplifier when new points are added, and run new points through this before adding them to the cached point list. Update it as needed, but try to just process the latest.
Simplification Interface:
There's a bunch of ways to implement this. The best one (for huge point sets) is to feed it an Iterator<Point> and let the simplification algorithm go to work with that. This way, the points don't all have to be in memory, and you can feed it from a DB query. For example, the Iterator can wrap a JDBC ResultSet. Simplifiers should also have a "tolerance" value to determine how close points are before they get ignored.
How to simplify pointsets/polygonal lines:
There are a bunch of algorithms.
The simplest is to remove points that are less than $tolerance from the last included point. This is an O(n) implementation.
The Douglas-Peucker algorithm gives an excellent polygon simplification on a large pointset. The weakness is that you need to operate on points in memory; use it on batches of, say, 10,000 points at a time. Runs in O(n log n) average, O(n^2) worst case
Fancy 2D hashing: you can use a 2D hashing algorithm, with one entry possible per pixel. Points that map to an occupied slot aren't rendered. You'll need to run through points twice for this, to find points that lead back to the same spots, if rendering lines and not scatterplots.
Extra tips:
You can boost performance by creating a wrapper that maps your cached, simplified point list to something your graphics primitives can handle easily. I was using Swing, so I created something that acted like a mutable Path2D, which the rendering context could handle directly.
Use your primitives. Seriously, I can't say this enough. If you use objects to store memory, the 128 bytes/point can double, AND increase memory use AND it can't be compiled to such optimal code.
Using these strategies, it is possible to render millions of points at once (in a reduced form). If I recall correctly, I could run the simplification routine in real-time, operating on 10k+ points at a time, using good code. It might have been 100k+ points. If you can store a hashtable with one slot per pixel, the hashing implementation is ridiculously fast (I know, it was my fastest solution)
Related
So I've been reading more thoroughly into LiveData and my curiosity was piqued. LiveData can hold a list of anything, but how does that compare to a regular List in terms of memory allocation. I know generally it depends on what is being stored. A List<int> can be larger than a List<float>, if there's only 1 float and a 100 ints.
But, for example, lets say I had a LiveData<List<int>> of 10 phone numbers, and a List<int> of the same phone numbers. Would the LiveData take up more memory?
I know LiveData has a specific purpose in keeping the UI updated(I'm probably not articulating that accurately) and needs an Observer, whereas a plain List is easier to work with but can't meet the need LiveData fulfills.
Or do most devices these days simply have enough memory that it's basically a moot point?
Would the LiveData take up more memory?
For the same underlying list, yes. It's always going to occupy some space beyond the data it contains. Look at the source code to get a sense of what it actually adds on top of the contained data object.
You're probably overthinking it. If you have a problem with running out of memory, it's almost certainly not because of any LiveData. Your list itself is likely the bigger issue, and other parts of the core Android runtime are going to dwarf the size of your LiveData and its contents.
I have a list of 1 million (slowly) moving points on the globe (stored as latitude and longitude). Every now and then, each point requests a list of the 100 nearest other points (with a configurable max range, if that helps).
Unfortunately, SELECT * SORT BY compute_geodetic_distance() LIMIT 100 is too slow to be done by each point over and over again. So my question: how should I handle this efficiently? Are there better algorithms/datastructures/... known for this? Or is this the only way and should I look into distributing server load?
(Note: this is for an Android app and the points are users, so in case I'm missing an android-specific solution, feel free to say so!)
For your task geo spatial databases have been invented.
There is Oracle Spatial (expensive) and PostGres (free).
These databases store your millions points in a geographical index, a quad tree (Oracle).
Such a query needs nearly no time.
Some people, like me prefer to leave the database away and build up the quadtree themselfs.
The operations search and insert are easy to implement. Update/delete can be more complex.(Cheapest related to implementation effort, is to build up a new quadtree evry minute)
Using a quadtree you can perform hundreds or thousansds of such nearest 100 points within a second.
Architecturally I would arrange for each "point" to phone home to a server with their location when it changes more than a certain amount. On the server you can do the heavy lifting of calculating the distance between the point that moved and each of the other points, and for each of the other points updating their list of the 100 closest points if required. You can then push changes to a point's closest 100 list as they happen (trivial if you are using App Engine, Android push is supported).
This reduces the amount of work involved to an absolute minimum:
Only report a location change when a point moves far enough
Only recalculate distances when a report is received
Don't rebuild the closest 100 list for a point every time, build the list once, then work out if a point that has moved is going to be added or removed from every other point's list.
Only notify a point of changes to its top 100 list to preserve bandwidth.
There are algorithms that you can use to make this super-efficient, and the problem has a fork/join feel to it as well, allowing you to throw horsepower at the problem.
You have to divide the earth into zones and then use an interior point algorithm to figure out what zones the phone is in. Each possible subset of zones will uniquely determine the 100 closest nodes to a fair approximation. You can get an exact set of 100 nodes by checking distance one by one against the candidate nodes, which (once again) are determined by the subset of zones.
Instead of r-tree or a quadtree, I.e spatial index you can also use a quadkey and a monster curve. This curve reduce the dimension and completetly fills the space. You can download my php class hilbert curve from phpclasses.org. You can use a simple varchar column for the quadkey and search the levels from left to right. A good explanation is from Microsoft Bing maps quadkey website.
hope you are all well.
I am at a somewhat of a crossroads in my current project, I am needing to extract grayscale pixel values that will be sorted as per the discussion in my previous post (and very kindly and thoroughly answered).
The two main methods that I am aware of are:
Extract the grayscale from the Yuv preview.
Take the photo, and convert the RGB values to grayscale.
One of my main aims is simplicity, the project as a whole needs it, so thus my question - whaich of these two (or another method I am not aware of) would be the most reliable/stable, but would be less taxing on the battery and processing time?
Please note, I am not after any code samples, but are looking for what people may have experienced, may hve read (in articles etc) or have a intuitive hunch about.
Thank you for taking the time to read this.
I'm currently working on a project which also uses pixel values to do some calculations, and I noticed that it's better to use the values directly from the YUV preview if you only need the grayscale, or need to use the entire preview for your calculation.
If you want to use the RGB values, or only calculate something based on a certain part of the preview it's better to convert the area you need by converting to a Bitmap and using that for instance.
However, it all depends on what you're trying to achieve since no two projects are alike. If you have the time, why not (rougly) implement both methods and do a quick test to see what works better in terms of cpu usage and total processing time? That's how I found the best method for my particular problem.
I want to use hashes to uniquely identify photos from an Android phone, to answer queries of does server have xyz? and fetch image which hashes to xyz. I face this:
Hashing the whole image is likely to be slow, hence I want to hash only the first few units (bytes) of the image file, not the whole file.
The first few characters are insufficient due to composition, eg a user takes a photo of a scene, and then takes a second photo of the same scene after adding a paper clip at the bottom of the frame
The first few characters are insufficient to avoid hash collisions, ie it may cause mixups between users.
How many characters must I hash from the image file, so that I keep the chance of a mishap low? Is there a better indexing scheme?
As soon as you leave any bytes out of the hash, you give someone the opportunity to create (either deliberately or accidentally) a file that differs only at those bytes, and hence hashes the same.
How different this image actually looks from the original depends to some extent how many bytes you leave out of the hash, and where. But you first have to decide what hash collisions you can tolerate (deliberate/accidental and major/minor), then you can think about what how fast a hash function you can use, and how much data you need to include in it.
Unless you're willing to tolerate a "largeish block" of data changing, you need to include bytes from every "largeish block" in the hash. From the point of view of I/O performance this means you need to access pretty much the whole file, since reading even one byte will cause the hardware to read the whole block that contains it.
Probably the thing to do is start with "definitely good enough", such as an SHA-256 hash of the whole file. See how much too slow that is, then think about how to improve performance by the required percentage. For example if it's only 50% too slow you could probably solve the problem with a faster (less secure) hash but still including all the data.
You can work out the limit of how fast you can go with a less secure hash by implementing some completely trivial hash (e.g. XOR of all the 4-byte words in the file), and see how fast that runs. If that's still too slow then you need to give up on accuracy and only hash part of the file (assuming you've already done your best to optimize the I/O).
If you're willing to tolerate collisions, then for most (all?) image formats, there's enough information in the header alone to uniquely identify "normal" photos. This won't protect you against deliberate collisions or against the results of image processing, but barring malice the timestamp, image size, camera model etc, together with even a small amount of image data will in practice uniquely identify every instance of "someone taking a photo of something". So on that basis, you could hash just the first 64-128k of the file (or less, I'm being generous to include the max size of an EXIF header plus some) and have a hash that works for most practical purposes but can be beaten if someone wants to.
Btw, unless done deliberately by a seriously competent photographer (or unless the image is post-processed deliberately to achieve this), taking two photos of the same scene with a small difference in the bottom right corner will not result in identical bytes at the beginning of the image data. Not even close, if you're in an environment where you can't control the light. Try it and see. It certainly won't result in an identical file when done with a typical camera that timestamps the image. So the problem is much easier if you're only trying to defend against accidents, than it is if you're trying to defend against deception.
I think the most efficient approach is to pick random bytes (previously selected, and static throughout) and calculating XOR or some other simple hash should be good enough.
I am working on a Text-Based RPG for Android (just built around the default views and buttons) to get a handle on some things before I launch into a more graphically intensive game. There is a Player who moves around Locations, and each Location has a set of possible Actions. The Locations and Actions have Strings for name and description, which are displayed by textViews.
My question is how to store the multiple Locations and Actions for the game? In its current state, I'm just calling new Location() multiple times in onCreate(), but with the 50 or so I'm planning on, the code would be massive, and I'm sure there's a better way to do it. I've thought about subclassing Location for each specific location, but that would be just as bad. I've also looked at JSON, and using an SQLite Database, and I'm sure there are other valid approaches as well.
Does anyone have any links, or suggesting for storing these "plot" related items?
If I understand correctly, at this time, you have the Location object initialization code - and the required data hard coded - in the onCreate method. This is a fairly good solution for a prototype, but if you want more, you have to outsource the data and lazy-initialize the Location objects when required.
If you are not planning on modifying the Locations, than I would suggest JSON, or even easier: an own text based protocol, that is easy to parse, and store it in files of the assets folder. For example like this:
LOCATION 14 Kitchen
LOACTION_DESCRIPTION Entering the kitchen you smell the freshly cut tomatos on the table...
ACCESSABLE_LOCATIONS 13 10 24 54
AVAILABLE_ITEMS 56 23 12 8
...
And you can parse the file line by line with a BufferedReader, building your object.
If locations contain information that may be modified and the modifications should be stored persistently, then you have to use a database. There are lot's of tutorials. This way you can save modifications.