(Moved from https://softwareengineering.stackexchange.com/questions/406813/how-to-implement-fnv-1a-in-sqlite)
I'm trying to modify a SQLite query (in Android) to return its results in pseudorandom order. As in this question, the order needs to be stable over repeated queries (e.g. due to paging, screen rotation, etc.), so I can't just use ORDER BY RANDOM(). Instead I want to use a hash function that depends on a couple of input values that provide stability and sufficient uniqueness. (One of these values is a unique ID column of the table, which is a set of integers fairly close together; the other value is more like an session ID, also an integer, that remains invariant within this query.)
According to this well-researched answer, FNV-1 and FNV-1a are simple hash functions with few collisions and good distribution. But as simple as they are, FNV-1 and FNV-1a both involve XOR operations, as well as looping over the bytes of input.
Looping within each row of a query is pretty awkward. One could fake it by unrolling the loop, especially if only a few bytes are involved. I could make do with two bytes, combining LSBs from the two input values (val1 & 255 and val2 & 255).
XOR isn't supported directly in SQLite. I understand A ^ B can be implemented as (A | B) - (A & B). But the repetition of values, combined with the unrolling of the loop, starts to get unwieldy. Could I just use + (ignoring overflow) instead of XOR? I don't need very high quality randomness. The order just needs to look random to a casual observer over small-integer scales.
So I'm wondering if anyone has already implemented such a thing. Given how widely used this hash function is, it seems like there would likely already be an implementation for this situation.
Here's my attempt at implementing FNV-1a:
SELECT ..... ORDER BY (((fnvbasis + val1 & 255) * fnvprime) + val2 & 255) * fnvprime % range;
I'm ignoring the fact that in FNV, the XOR operation (which I've replaced with +) is only supposed to affect the lowest 8 bits of the hash value. I'm also ignoring any overflow (which I hope just means the upper bits, which I don't care about, are lost).
For fnvbasis I'll use 16777619, and for fnvprime I'll use 2166136261. These are the specified values for 32 bit input, since I don't see a specified value for 16 bit input. For range I'll use a prime number that's greater than the expected number of rows returned by this query.
So is this a reasonable way to approximate FNV-1a in a SQLite query? Is there a better, existing implementation? I.e. will it actually produce an ordering that looks pretty random to a casual user, despite my mutilating the operations of the real FNV-1a?
Inspired by comments from rwong and GrandmasterB on the previous attempt at this question before I moved it, I decided I could precompute the first iteration of FNV-1a's loop, i.e. the hash based on the unique ID column of the table. The precomputed column, fnv1a_step1, is set to
(fnvbasis ^ (ID & 0xFF)) * fnvprime
Because this value is precomputed on each row of the table separately, it can be supplied by the app and doesn't need to be expressed in SQLite; hence the use of ^ (XOR) above. Also, if ID is a string, we can compute an 8-bit hash value from it in Java or Kotlin as well. But we could even use
(fnvbasis + (RANDOM() & 0xFF)) * fnvprime
(back to using + if doing this in SQLite) because the value is only computed once, and therefore is stable even when computed from RANDOM().
The second iteration of the FNV-1a loop can be computed pretty simply in the ORDER BY clause of the query, using the current session ID, so it produces a different-but-stable ordering for each session:
ORDER BY (fnv1a_step1 + sessionId & 0xFF) * fnvprime % range;
I've implemented this in my app, and it seems to work, to my requirements. The order is stable within a session, but is different in each session.
Related
I am quite new to all things Android and Kotlin. I am currently working with an Android app from Punch Through:
(Blog: https://punchthrough.com/android-ble-guide/)
(GitHub: https://github.com/PunchThrough/ble-starter-android)
The app connects with a BLE peripheral and allows the user to enter text to send to the peripheral via UART.
I am struggling interpreting what the following code means / does:
with(hexField.text.toString()) {
if (isNotBlank() && isNotEmpty()) {
val bytes = hexToBytes()
ConnectionManager.writeCharacteristic(device, characteristic, bytes)
}
}
Where hexField.text.toString() is the text entered in the EditText field by the user,
and
where hexToBytes() is defined as:
private fun String.hexToBytes() =
this.chunked(2).map { it.toUpperCase(Locale.US).toInt(16).toByte() }.toByteArray()
I have tried this a few times, always entering “111” and have am using Timber() to output the result of bytes. This result varies every time, for example:
[B#2acf801
[B#476814a
[B#e9a70e5
[B#10172a0
So, I assume that only the first three characters are relevant, and somehow there is no end of line / string information.
So perhaps I am only interested in: [B#.......
B# = 0x 5B 42 40
Hex: 5B4240
Dec: 5980736
Bin: 10110110100001001000000
So then I try (and fail) to interpret / breakdown what this code might be doing.
The first thing I struggle with is understanding the order of operation.
Here's my guess....
Given EditText entry, in this case I entered "111"
First:
this.chunked(2)
would produce something like:
"11 and "01"
Second, for each of the two items ("11 and "01"):
it.toUpperCase(Locale.US).toInt(16).toByte()
would produce byte values:
17 and 1
Third:
.map .toByteArray()
Would produce something like:
[1,7,1]
or
[0x01, 0x07, 0x1]
or
[0x0x31, 0x37, 0x31]
So, as you can see, I am getting lost in this!
Can anyone help me deconstruct this code?
Thanks in advance
Garrett
I have tried this a few times, always entering “111” and have am using Timber() to output the result of bytes. This result varies every time
The output when you try to print a ByteArray (or any array on the JVM) doesn't show the contents of the array, but its type and address in memory. This is why you don't get the same result every time.
In order to print an array's contents, use theArray.contentToString() (instead of plain interpolation or .toString()).
Regarding the interpretation of the code, you almost got it right, but there are a few mistakes here and there.
this.chunked(2) on the string "111" would return a list of 2 strings: ["11", "1"] - there is no padding here, just the plain strings with max size of 2.
Then, map takes each of those elements individually and applies the transformation it.toUpperCase(Locale.US).toInt(16).toByte(). This one makes the string uppercase (doesn't change anything for the 1s), and then converts the string into an integer by interpreting it in base 16, and then truncates this integer to a single byte. This part you got right, it transforms "11" into 17 and "1" into 1, but the map {...} operation transforms the list ["11", "1"] into [17, 1], it doesn't take the digits of 17 individually.
Now toByteArray() just converts the List ([17, 1]) into a byte array of the same values, so it's still [17, 1].
I am using Room Database in my Android App. One of the columns represents volume of drink in milliliters [ml]. I wonder what is the proper way of giving user option to choose/change units to [cl] or other.
Should I make a settings option to choose unit, and then convert all values in my database?
Should I store for example [ml] and then convert values depending on units selected by user?
What is the most efficient way? What will be less resources consuming?
Are You guys having some good open source examples/tutorials/code snippets?
It really depends on the volume and complexity of your conversions.
If you only have a single value (ML) and need to convert it to another value (CL, OZ...) then you can just set a base unit to the DB and convert the value in real time just before displaying them to the user.
If you feel this will get out of hand since you'll have many value types and you'll have a problem keeping track on all of them you can do the conversion in the DB that will cause some overhead on the select queries.
for example, you have this table:
ID unit val
--- --- ---
1 ML 960
2 ML 4112
3 KG 70
4 KG 35
5 C 37
You'll always keep base units in the DB the same meaning you will not mix them with F, OZ and LBS.
You can convert the units in your select query:
SELECT id, unit, val,
case when unit = 'ML' then val * 0.033
when unit = 'KG' then val / 2.205
when unit = 'C' then val * 9/5 + 32
end as result
FROM tbl
In both cases, since this is an Android app the work will be done via SQLite or the app itself.
Personally, I would not mix units on the same table because it would add a layer of complexity to later sort, compare and retrieve them.
Hope that helped.
Background
Suppose I have a RecyclerView, which has items that can only be unique if you look at 2 ids they have, but not just one of them.
The first Id is the primary one. Usually there aren't 2 items that have the same primary ID, but sometimes it might occur, which is why there is a secondary ID.
In my
The problem
The RecyclerView adapter needs to have a "long" type being returned:
https://developer.android.com/reference/android/support/v7/widget/RecyclerView.Adapter.html#getItemId(int)
What I've tried
The easy way to overcome this, is to have a HashMap and a counter.
The HashMap will contain the combined-keys, and the value will be the id that should be returned. The counter is used to generated the next id in case of a new combined key. The combined key can be a "Pair" class in this case.
Suppose each item in the RecyclerView data has 2 long-type keys:
HashMap<Pair<Long,Long>,Long> keyToIdMap=new HashMap();
long idGenerator=0;
this is what to do in getItemId :
Pair<Long,Long> combinedKey=new Pair(item.getPrimaryId(), item.getSecondary());
Long uniqueId=keyToIdMap.get(combinedKey);
if(uniqueId==null)
keyToIdMap.put(combinedKey,uniqueId=idGenerator++);
return uniqueId;
This has the drawback of taking more and more memory. Not much though, and it's very small and proportional to the data you already have, but still...
However, this has the advantage of being able to handle all types of IDs, and you can use even more IDs as you wish (just need something similar to Pair).
Another advantage is that it will use all IDs starting from 0.
The question
Is there perhaps a better way to achieve this?
Maybe a mathematical way? I remember I learned in the past of using prime numbers for similar tasks. Will it work here somehow?
Do the existing primary and secondary ids use the entire 64-bit range of longs? If not then it's possible to compute a unique 64-bit long from their values with e.g bit slicing.
Another approach would be to hash the two together with a hash with very low collisions (a crypto hash like SHA2 for example) and using the first 64 bits of the result. Having a range of 64 bits means you can comfortably have millions of items before the chance of a collision becomes likely - the chance of a collision is 50% when you add sqrt(64)=2**32 items, which is more than 4 billion.
Finally, having an unique independent mapping is very versatile and assuming the map is always accessible it's fine (it gets tricky when you try to synchronize that new id across machine etc). In Java you can attempt to increase performance by avoiding the boxed Longs and a separate Pair instance using a custom map implementation, but that's micro-optimizing.
Example using SHA1:
With Guava - the usage is clean and obvious.
HashFunction hf = Hashing.sha1();
long hashedId = hf.newHasher()
.putLong(primary)
.putLong(secondary)
.hash()
.asLong();
Just the standard JDK, it's pretty horrible and can probably be more efficient, should look something like this (I'm ignoring checked exceptions):
static void updateDigestWithLong(MessageDigest md, long l) {
md.update((byte)l);
md.update((byte)(l >> 8));
md.update((byte)(l >> 16));
md.update((byte)(l >> 24));
}
// this is from the Guava sources, can reimplement if you prefer
static long padToLong(bytes[] bytes) {
long retVal = (bytes[0] & 0xFF);
for (int i = 1; i < Math.min(bytes.length, 8); i++) {
retVal |= (bytes[i] & 0xFFL) << (i * 8);
}
return retVal;
}
static long hashLongsToLong(long primary, long secondary) {
MessageDigest md = MessageDigest.getInstance("SHA-1");
updateDigestWithLong(md, primary);
updateDigestWithLong(md, secondary);
return padToLong(md.digest());
}
I think my original idea is the best one I can think of.
Should cover all possible ids, with least possible collision.
I'm actually using Math.sin() in my android app to calculate a sinus of a given angle (using Math.toRadians(angle_in_degrees)). For exemple when I want to get the Math.cos(90) which is 0, the result is 6.123233... E-17. Thanks you.
For floating point numbers, the system can often only approximate their values. For instance, the system would return something like 0.333333 for the expression (1.0 / 3). The number of 3s after the decimal point will be different depending on whether you're a floats or doubles, but it will still be limited to some finite length.
If you're just displaying the value, then you can limit the number of digits using something like String.format("%0.2f", value) or by rounding it using one of the rounding functions such as Math.round().
The tricky part comes when you need to compare the value to something. You can't just use if (value == some_constant) or even if (value == some_variable). At minimum, you usually have to use something like if (Math.abs(value - some_constant) < 0.001). The actual value of the '0.001' depends on the needs of your particular application and is customarily defined as a named constant.
For more complicated needs, you can implement the algorithm in the Floating-Point Guide.
You're getting back an approximation from Math.cos(Math.toRadians(90)) which is
6.123233... E-17 == 0.00000000000000006123233... which is basically 0
The following link should help clear things up as far as the precision of doubles/floats in programming.
http://www.java67.com/2015/09/float-and-double-value-comparison-in-java-use-relational.html
I am wondering how would I be able to run a SQLite order by in this manner
select * from contacts order by jarowinkler(contacts.name,'john smith');
I know Android has a bottleneck with user defined functions, do I have an alternative?
Step #1: Do the query minus the ORDER BY portion
Step #2: Create a CursorWrapper that wraps your Cursor, calculates the Jaro-Winkler distance for each position, sorts the positions, then uses the sorted positions when overriding all methods that require a position (e.g., moveToPosition(), moveToNext()).
Pre calculate string lengths and add them into separate column. Then sort entired table by that that length. Add indexes (if you can). Then add extra filters for example you don't want to compare "Srivastava Brahmaputra" to "John Smith". The length are out of wack by way too much so exclude these kind of comparison by length as a percentage of the total length. So if your word is 10 characters compare it only to words with 10+-2 or 10+-3 characters.
This way you will significantly reduce the number of times this algorithm needs to run.
Typically in the vocalbulary of 100 000 entries such filters reduce the number of comparisons to about 300. Unless you are doing a full blown record linkage and then I would wonder why use Android for that. You would still need to apply probabilistic methods for that and calculate scores and this is not a job for Android (at least not for now).
Also in MS SQL Server Jaro Winkler string distance wrapped into CLR function perform much better, since SQL Server doesn't supprt arays natively and much of the processing is around arrays. So implementation in T-SQL add too much overhead, but SQL-CLR works extremely fast.