Doing order by using the Jaro-Winkler distance algorithm? - android

I am wondering how would I be able to run a SQLite order by in this manner
select * from contacts order by jarowinkler(contacts.name,'john smith');
I know Android has a bottleneck with user defined functions, do I have an alternative?

Step #1: Do the query minus the ORDER BY portion
Step #2: Create a CursorWrapper that wraps your Cursor, calculates the Jaro-Winkler distance for each position, sorts the positions, then uses the sorted positions when overriding all methods that require a position (e.g., moveToPosition(), moveToNext()).

Pre calculate string lengths and add them into separate column. Then sort entired table by that that length. Add indexes (if you can). Then add extra filters for example you don't want to compare "Srivastava Brahmaputra" to "John Smith". The length are out of wack by way too much so exclude these kind of comparison by length as a percentage of the total length. So if your word is 10 characters compare it only to words with 10+-2 or 10+-3 characters.
This way you will significantly reduce the number of times this algorithm needs to run.
Typically in the vocalbulary of 100 000 entries such filters reduce the number of comparisons to about 300. Unless you are doing a full blown record linkage and then I would wonder why use Android for that. You would still need to apply probabilistic methods for that and calculate scores and this is not a job for Android (at least not for now).
Also in MS SQL Server Jaro Winkler string distance wrapped into CLR function perform much better, since SQL Server doesn't supprt arays natively and much of the processing is around arrays. So implementation in T-SQL add too much overhead, but SQL-CLR works extremely fast.

Related

How to implement FNV-1(a) in SQLite?

(Moved from https://softwareengineering.stackexchange.com/questions/406813/how-to-implement-fnv-1a-in-sqlite)
I'm trying to modify a SQLite query (in Android) to return its results in pseudorandom order. As in this question, the order needs to be stable over repeated queries (e.g. due to paging, screen rotation, etc.), so I can't just use ORDER BY RANDOM(). Instead I want to use a hash function that depends on a couple of input values that provide stability and sufficient uniqueness. (One of these values is a unique ID column of the table, which is a set of integers fairly close together; the other value is more like an session ID, also an integer, that remains invariant within this query.)
According to this well-researched answer, FNV-1 and FNV-1a are simple hash functions with few collisions and good distribution. But as simple as they are, FNV-1 and FNV-1a both involve XOR operations, as well as looping over the bytes of input.
Looping within each row of a query is pretty awkward. One could fake it by unrolling the loop, especially if only a few bytes are involved. I could make do with two bytes, combining LSBs from the two input values (val1 & 255 and val2 & 255).
XOR isn't supported directly in SQLite. I understand A ^ B can be implemented as (A | B) - (A & B). But the repetition of values, combined with the unrolling of the loop, starts to get unwieldy. Could I just use + (ignoring overflow) instead of XOR? I don't need very high quality randomness. The order just needs to look random to a casual observer over small-integer scales.
So I'm wondering if anyone has already implemented such a thing. Given how widely used this hash function is, it seems like there would likely already be an implementation for this situation.
Here's my attempt at implementing FNV-1a:
SELECT ..... ORDER BY (((fnvbasis + val1 & 255) * fnvprime) + val2 & 255) * fnvprime % range;
I'm ignoring the fact that in FNV, the XOR operation (which I've replaced with +) is only supposed to affect the lowest 8 bits of the hash value. I'm also ignoring any overflow (which I hope just means the upper bits, which I don't care about, are lost).
For fnvbasis I'll use 16777619, and for fnvprime I'll use 2166136261. These are the specified values for 32 bit input, since I don't see a specified value for 16 bit input. For range I'll use a prime number that's greater than the expected number of rows returned by this query.
So is this a reasonable way to approximate FNV-1a in a SQLite query? Is there a better, existing implementation? I.e. will it actually produce an ordering that looks pretty random to a casual user, despite my mutilating the operations of the real FNV-1a?
Inspired by comments from rwong and GrandmasterB on the previous attempt at this question before I moved it, I decided I could precompute the first iteration of FNV-1a's loop, i.e. the hash based on the unique ID column of the table. The precomputed column, fnv1a_step1, is set to
(fnvbasis ^ (ID & 0xFF)) * fnvprime
Because this value is precomputed on each row of the table separately, it can be supplied by the app and doesn't need to be expressed in SQLite; hence the use of ^ (XOR) above. Also, if ID is a string, we can compute an 8-bit hash value from it in Java or Kotlin as well. But we could even use
(fnvbasis + (RANDOM() & 0xFF)) * fnvprime
(back to using + if doing this in SQLite) because the value is only computed once, and therefore is stable even when computed from RANDOM().
The second iteration of the FNV-1a loop can be computed pretty simply in the ORDER BY clause of the query, using the current session ID, so it produces a different-but-stable ordering for each session:
ORDER BY (fnv1a_step1 + sessionId & 0xFF) * fnvprime % range;
I've implemented this in my app, and it seems to work, to my requirements. The order is stable within a session, but is different in each session.

Renderscript, image processing, assigning pixel values from a precomputed array

I have an array of precomputed intensity (computed using a fuzzy logic inference system on a desktop machine). Now I want to use this array as a lookup table for a contrast enhancement application on android, using renderscript.
What I want to do, at a highlevel is to process every pixel in an image, and using the lookup table create a new image where the pixel at the corresponding position has the value looked up in the array. Before I start looking at how to implement this, is this even feasible?
Yes, it is feasible and this is something RS can handle with no problems. You'll need to provide your RS "kernel" with the pre-computed array data as either a separate Allocation or just a data array.
This talk will help get you started: https://youtu.be/3ynA92x8WQo

Handling integers bigger than Long with Firebase and android

I am developing an app with Firebase, it is a game in which certain scores are shared between players and can grow without a limit. I tried storing them as a String, but then I could not order them with orderByChild for the leaderboard. How can I handle this problem?
You could store the number as a linked list, with each node in the list representing each digit. Be careful with the order; putting the last number (the ones digit) in the list first, makes math with the linked list easier, while the other direction makes it easier to return the nodes in the list to a number on the screen.
To store integers which are big in size java provides a BigInteger Class to handle those numbers. I will suggeat you to use that first read the concept and then try to findout what you exactly want!
Check this one BigInteger

Hi I want to know what is the appropriate process of converting Amplitude to dB . Through Android device

I want to know what is the appropriate process of converting Amplitude to dB. I am using double as below
db = (20 * Math.log10(mediaRecorder.getMaximimAmplitude));
But there are suggestions to use double as below
db = (20 * Math.log10(x2 / REFERENCE));
I dont know what reference is to use in which scenerio
The decibel is a much misused unit. It is defined as the 10 log (P1/P2) where P1 is the measured power, and P2 is the reference power. That is, it is always relative to some reference power. A common reference power is one milliwatt, and this is the definition of dBm. 0dBm is one milliwatt; +30dbm is one watt. Don't be misled by the oft-quoted "0dBm = one milliwatt in 600 ohms". This is an artifact of when voltage measuring devices were used to display dBm. Because they were voltage measuring rather than power measuring, an impedance at which they read correctly needed to be specified, and it was nearly always 600 ohms.
Over the years dB usage has been stretched to cover situations where having a logarithmic unit is really useful. For instance the voltage gain of an amplifier may be quoted in dB, using the formula 20log(Vout/Vin). In this situation, the input and output impedances (and hence powers) are often vastly different, so the usage is technically wrong. In practice it is a convenient unit to work with, and has been given some legitimacy by labeling it dBv.
The first formula you are using will return dB referenced to 1 volt in whatever impedance your circuit exhibits. This is fine, but it won't be dBm. Often this does not matter, as you just need to graph gain in dB against an arbitrary reference.If you need it to be dBm just find the circuit impedance and use Ohms law to work out what voltage represents one milliwatt in that impedance.
The second formula is a bit strange. What is x2? I would expect the formula to be 20log(Vmeasured/Vreference).

Partial comparison of 2 strings

I'm looking for a way to compare 2 strings partial. I need to clear this with an example.
The base string is "equality".
The string I need to check is spelled wrong: "equallaty". I want to conform this is partially correct so the input, even not right in a grammar way, is the same as the base string.
Now I can of course parse the string to an char array. Now I can check every single character, but if I check the first 4 characters they will be right, the rest will be wrong even if there are only 2 mistakes. So the check I want to use is that a minimum of 70 procent of the characters should match.
Is anyone able to help me get on the right track?
Compare the strings with an edit-distance metric like the Levenshtein distance. Such a metric basically counts the number of changes needed to make the strings equal. If the number of changes is small relative to the total size of the string, then you can consider the strings similar.

Categories

Resources