String hashCode(): always the same result? - android

This question's answer explains the situation with Java well. I would like to know what the situation is with Android. Specifically:
Question 1: For a given string, will the hash code always be the same? (Even more specifically, I need a hashcode of a given string to be the same on a user's phone each time the app is opened).
I googled for the source of android's String and found this, but I'm playing with fire because I don't know the first thing about Android source, if/when it's modified by manufacturers etc.
Question 2: If the answer to 1 is no, then would it be sensible for me to use the hashCode() code in the source quoted above in my own hashCode() function?

The same String should has the same hashCode() (based on hashCode definition)
If you take a look at Android hashCode() of String class. You will see hashCode is calculated based on char array (the same), char count ( the same) and offset field ( this value seems always Zero (0) - is set in String constructor - I don't know why Google adds this offset field. Oracle String.hashCode() is calculated based on char array, char count only.
You can build your own hashCode() function like Oracle String hashCode(): This implementation is based on char array and char count so the same String always has the same hashCode().

As the hash-code algorithm is actually specified in the interface contract, and the Java-doc is also used as part of the Android SDK headers, I suppose you can count on it as being "stable".
But you might be better of to use a cryptographically strong hash function like SHA1 or SHA256 depending on your use-case, as they will also be a lot less likely to produce collisions (The Java hashCode() has only a 32-bit value range!).

Related

How to implement FNV-1(a) in SQLite?

(Moved from https://softwareengineering.stackexchange.com/questions/406813/how-to-implement-fnv-1a-in-sqlite)
I'm trying to modify a SQLite query (in Android) to return its results in pseudorandom order. As in this question, the order needs to be stable over repeated queries (e.g. due to paging, screen rotation, etc.), so I can't just use ORDER BY RANDOM(). Instead I want to use a hash function that depends on a couple of input values that provide stability and sufficient uniqueness. (One of these values is a unique ID column of the table, which is a set of integers fairly close together; the other value is more like an session ID, also an integer, that remains invariant within this query.)
According to this well-researched answer, FNV-1 and FNV-1a are simple hash functions with few collisions and good distribution. But as simple as they are, FNV-1 and FNV-1a both involve XOR operations, as well as looping over the bytes of input.
Looping within each row of a query is pretty awkward. One could fake it by unrolling the loop, especially if only a few bytes are involved. I could make do with two bytes, combining LSBs from the two input values (val1 & 255 and val2 & 255).
XOR isn't supported directly in SQLite. I understand A ^ B can be implemented as (A | B) - (A & B). But the repetition of values, combined with the unrolling of the loop, starts to get unwieldy. Could I just use + (ignoring overflow) instead of XOR? I don't need very high quality randomness. The order just needs to look random to a casual observer over small-integer scales.
So I'm wondering if anyone has already implemented such a thing. Given how widely used this hash function is, it seems like there would likely already be an implementation for this situation.
Here's my attempt at implementing FNV-1a:
SELECT ..... ORDER BY (((fnvbasis + val1 & 255) * fnvprime) + val2 & 255) * fnvprime % range;
I'm ignoring the fact that in FNV, the XOR operation (which I've replaced with +) is only supposed to affect the lowest 8 bits of the hash value. I'm also ignoring any overflow (which I hope just means the upper bits, which I don't care about, are lost).
For fnvbasis I'll use 16777619, and for fnvprime I'll use 2166136261. These are the specified values for 32 bit input, since I don't see a specified value for 16 bit input. For range I'll use a prime number that's greater than the expected number of rows returned by this query.
So is this a reasonable way to approximate FNV-1a in a SQLite query? Is there a better, existing implementation? I.e. will it actually produce an ordering that looks pretty random to a casual user, despite my mutilating the operations of the real FNV-1a?
Inspired by comments from rwong and GrandmasterB on the previous attempt at this question before I moved it, I decided I could precompute the first iteration of FNV-1a's loop, i.e. the hash based on the unique ID column of the table. The precomputed column, fnv1a_step1, is set to
(fnvbasis ^ (ID & 0xFF)) * fnvprime
Because this value is precomputed on each row of the table separately, it can be supplied by the app and doesn't need to be expressed in SQLite; hence the use of ^ (XOR) above. Also, if ID is a string, we can compute an 8-bit hash value from it in Java or Kotlin as well. But we could even use
(fnvbasis + (RANDOM() & 0xFF)) * fnvprime
(back to using + if doing this in SQLite) because the value is only computed once, and therefore is stable even when computed from RANDOM().
The second iteration of the FNV-1a loop can be computed pretty simply in the ORDER BY clause of the query, using the current session ID, so it produces a different-but-stable ordering for each session:
ORDER BY (fnv1a_step1 + sessionId & 0xFF) * fnvprime % range;
I've implemented this in my app, and it seems to work, to my requirements. The order is stable within a session, but is different in each session.

Use enum with number

I know there are many alternatives to reach what I wish, but I wont this solution because it is the most comfortable to me. I wish to use enum that starts with number, like so.
public enum Quality {
1080p,
720p,
BlueRay //this one OK
}
And then use it like so when converting to string:
Quality.1080p.name();
Why it is not possible?
Because the Java language doesn't allow variable names to start with a number- just a letter or underscore. Any character after the first may be a number. The main reason for this is to make parsing easier, and prevent situations where the parser can't tell if a symbol is a number or a variable name.
For example, if numbers were valid at the start of a variable I could do the following:
String 1 = "string";
System.out.println(1);
Does this print 1 or "string"? They avoid the problem by not allowing it. Many (most?) languages have that restriction.

How to revert back from currency format, Android?

I'm developing Android app which uses this method:
public static String currencyFormat(BigDecimal n) {
return NumberFormat.getCurrencyInstance().format(n);
}
which formats number based on Locale currency.
How to revert back from currency format, e.g. $35, to 35? Note that I cannot just remove First character because different locales have different currency name lengths.
You must store the currency unit in a separate field, encapsulated into some higher-order abstraction such as a final value class (with appropriate equals and hashcode defined), eg called CurrencyAmount. [if you do scala, basically you want a case class]. Any other solution will require you to 'reverse engineer' the unit portion from the amount portion and depending on the complexity of your spec of allowable values, it might be reliable only to various degrees. I would just encode the two portions in their own fields and solve this for all cases.
You might try to cut out all non numeric characters from a String with a regex.
Try this.

Correct Value(s) to store in TAG_GPS_PROCESSING_METHOD

Thanks for reading this question. I am sure the experts on this site will be able to provide the help I need.
I am trying to write an app which allows users to edit the exif information of the photos on their Android Phone.
As a part of improved user experience, I want to apply data validation where ever possible.
For the Exif Tag - TAG_GPS_PROCESSING_METHOD I am not able to apply the validation correctly.
Here is the part of code that I have applied :
String strGPSProc = etGPSProc.getText().toString();
if(strGPSProc.equalsIgnoreCase("GPS") || strGPSProc.equalsIgnoreCase("CELLID") || strGPSProc.equalsIgnoreCase("WLAN") || strGPSProc.equalsIgnoreCase("MANUAL") ) {
returnValue = true;
}else {
returnValue=false;
showToast("Incorrect value for GPS Processing Method. Correct value options are GPS, CELLID, WLAN or MANUAL.");
etGPSProc.requestFocus();
}
This code checks if the value entered in the EditText meant for GPSProcessingMethod, has any one of the four prescribed value as described in the documentation of EXIF.
But when I try to save this using setAttribute() and saveAttributes() functions, a non catch-able exception appears in logcat.
Unsupported encoding for GPSProcessingMethod
I understand from Exif Documentation that values for GPSProcessingMethod needs to be stored with some header information.
I need some expert advise on how to implement this correctly, with out using any other 3rd part classes.
Accoridng to the Exif specification:
GPSProcessingMethod
A character string recording the name of the method used for location finding. The first byte indicates the character
code used (Table 6、Table 7), and this is followed by the name of the method. Since the Type is not ASCII, NULL
termination is not necessary
Atually, Table 6 lists the character codes as 8 byte sequences, so the above should probably read "The first bytes indicate...". Anyway, the character code designation for ASCII is defined as 41.H, 53.H, 43.H, 49.H, 49.H, 00.H, 00.H, 00.H., Unicode is (unsurprisingly) 55.H, 4E.H, 49.H, 43.H, 4F.H, 44.H, 45.H, 00.H. I guess these should be all you need.
Hope that helps.
EDIT:
Just discovered that ExifInterface.setAttribute() only supports String values... You could try encoding the value at the beginning of your string, but I doubt that would work. Sounds like the encoding should be handled by the setAttribute() or saveAttributes() method. Could it be a bug in the API? I had a look at the source code, but the actual writing of values is done by native code so I stopped digging further.

BufferedWriter#write(int) javadoc query

The Javadoc for this says:
Only the lower two bytes of the integer oneChar are written.
What effect, if any, does this have on writing non-utf8 encoded chars which have been cast to an int?
Update:
The code in question receives data from a socket and writes it to a file. (A lot of things happen between receiving and writing, so I can't just use the string I get using BufferedReader#readLine()). I was using Writer#write(char[]) but this meant I had to create a new char array each time. To get around creating an array everytime, I had a single char array which is filled with -1 (cast to a char).
I then use TextUtils#getChars to fill it, expanding the array if necessary. For writing, I loop through the array, writing to the Writer until char[i] == (char) -1 == true.
Internally, write(int) will just cast its parameter to char, so write(i) is equivalent to write((char)i).
Now in Java, internally char is just an integer type, with the range 0-65535 (i.e. 16 bit). The cast int -> char is a "narrowing primitive conversion" (Java Language spec, 5.1.3), and int is a signed integer, hence:
A narrowing conversion of a signed
integer to an integral type T simply
discards all but the n lowest order
bits, where n is the number of bits
used to represent type T. In addition
to a possible loss of information
about the magnitude of the numeric
value, this may cause the sign of the
resulting value to differ from the
sign of the input value.
That's why the Javadoc says that only the lower two bytes are written.
Now, what this means in terms of characters depends on how you want to interpret the int values. A char in Java represents a Unicode code point in UTF-16, that is the 16 bit number represented by the char is interpreted as the number of the Unicode code point. So if each of your int values is the number of a 16 bit code point, you're fine (actually, this is only true for characters in the BMP; if you use characters in the supplementary planes, each Unicode code point will be encoded into two chars). If it's anything else (including a code point with more than 16 bit, or a negative number, or something else entirely), you'll get garbage.
What effect, if any, does this have on
writing non-utf8 chars which have been
cast to an int?
There is no such thing as a "non-utf8 char". UTF-8 is an encoding, that is a way to represent a Unicode code point, so the question as posed is meaningless. Maybe you could explain what your code does?

Categories

Resources