non ASCII SSIDs in android - android

Apparently SSIDs can contain UTF-8 chars and also control characters etc. IIUC to contain UTF-8 chars they must specify the SSIDEncoding field. I was under the impression that I can only get ASCII bytes till now.
How should I handle the situation in Android ? Namely, how can I check the SSIDEncoding field from the ScanResult ? Do I need to ? Also what does ScanResult.SSID contain in these cases (including the case an SSID includes non printable characters) ?
Related
Why can't I detect a wifi SSID with unicode characters on Android?

The short answer is that you can't detect it, but you don't need to.
Android only returns a String in ScanResult and WifiConfiguration (There is no documented encoding field). Since Java Strings can contain accept different encodings, but internally store unicode (How to check the charset of string in Java?) then the original encoding is lost in translation. But if all you need is a String and you are using APIs and storage mechanisms that accept Java String then all of those things should already support encoding that String to whatever format they require and you probably have nothing to worry about.
I can't speak to what Android does under the covers with respect to giving you that String, but the link you provided provides some ideas. If Android does or does not support different encoding types for the SSID, there's nothing you can do about it.

Related

Removing Unicode Replacement Characters from String

I'm working on an Android (Java) app that reads values from a BLE device and stores them in a database. This works pretty well, except that for some characteristics that I read and get the String value of, the String includes random replacement characters (�). If I ignore them the String is normal, but these characters cause problem when working with my database.
How can I remove any replacement characters like that from my Strings?
EDIT: I tried using regex to replace any replacement characters, and another replaceAll to replace anything that wasn't a standard character, but none of those seemed to work. But when I output the strings in a TextView, for some reason these characters are gone?

sprintf() handling of %s extended ASCII (ISO 8859-1) on some runtimes?

I'm using ISO 8859-1 (Latin extended ASCII char set) in my C application. When I strcpy/strcat the portions of the string together, it works fine. But when I use sprintf("%s %s"), on some runtimes (particularly certain versions of Android), the string will truncate when an extended ASCII character (specifically é, although I haven't tried others) is hit.
I thought %s was just supposed to copy the bytes until '\0' was hit. I suspect that strcpy/strcat works because it does do just that, without any formatting. What could possibly be going on here?
I should note that I'm not viewing the text using printf(), rather my own text rendering engine which handles ISO-8859-1 just fine.
UPDATE:
To clarify, I have an NDK app, which is keeping the string in C, and passing it to my OpenGL based text rendering engine. If I pass the full string as a char* literal, it displays fine. If I sprintf() the portions together, it gets truncated at the é character.
For example:
char buffer[1024];
strcpy(buffer, "This is ");
strcat(buffer, "the string I want to diésplay.");
That shows up fine. But this:
sprintf(buffer, "%s%s", "This is ", "the string I want to diésplay.");
Prints as:
This is the string I want to di
The behavior of s[n]printf() is specified differently than the behavior of string-manipulation functions such as strcpy() and strcat(). The printf-family functions are all required to produce the same byte sequences when presented identical formats and print items. The only difference is in where those bytes are sent. Thus, if your C library were built such that it performed a transformation on string data (maybe a transcoding) when printing to the standard streams via printf(), then it would perform that same transformation when printing to a string via sprintf().
The "f" in "printf" is for "formatted". The standard neither says nor implies that formatting a string must mean dumping its bytes to the output verbatim, so a transcoding or other transformation such as I hypothesized above is not out of the question. In fact, the docs for some versions of these functions indicate locale-dependence ("Note that the length of the strings produced is locale-dependent and difficult to predict"), so transcoding in particular is a real possibility.
Any specific explanation of the third-party observations you describe would necessarily be speculative, as you have not presented nearly enough code or data to make a confident diagnosis. I am inclined to suspect an issue revolving around running the program in a locale that uses a character encoding differing from the one used internally by the program. If so, then you may be able to reproduce the problem locally by varying the locale in which you run, and you may be able to address it by ensuring one way or another that your program always runs in a suitable locale. Among other things, you might use the getlocale() and setlocale() functions to help here, especially if you want to limit the scope in which you exercise locale control.
Since ultimately you are relying on printf-family functions only for string manipulation, however, I think it would be better to use the workaround presented in the question: as much as possible, use C's dedicated string-manipulation functions, such as strcpy() and strncat(), to perform your string building. Since you are not relying on the stdio functions for your actual output, this should be fine.

Identify type of regex pattern

In my application I am adding edittext based on response provided by server. For each edittext server also provides regex pattern to match. I am able to successfully able to match pattern and do validations. But I want to identify type of regex pattern so that I can open keyboard according to value edittext should accept.
For example,
If edittext should accept email address then keyboard with # sign opens up and if edittext accept numeric values it should open numeric keypad.
Is there any library which can return type from its regex pattern such as "Email", "Number" etc. from regular expressions as there can be several different types of regex pattern?
EDIT: I know how to set input type for edittext but I need to find out type from regex pattern. I am not able to make changes in server I have to handle this on client side.
There most probably isn't one. The reason is - there is no way to tell for sure. Anything you can come up with will be heuristic.
Heuristic one:
If the pattern looks for something containing a dot, followed by # sign, followed by something containing a dot - it's an email validation.
If the pattern contains only \d, or number ranges ([1-5]), or single numbers (7) plus repetition meta characters (?, *, +, {4, 12}), it's a number validation.
If the pattern contains \w and no # sign, it's a regular text.
Continue in the same spirit.
+ high control. You can always add new guesses when you see that your results aren't accurate in some case
- requires more code to implement
- requires very good knowledge of regexes
Heuristic two:
Use a list of strings, which you know the type of and try to match them with the regex. Aka, for emails try example#gmail.com.
+ easy to implement. Small chance problematic logic
- least amount of control. If the server is giving you email patterns for different domains you can't guess that this is an email pattern, unless you know all possible domains
Heuristic three:
Use a library that can generate example strings from regex and match them with your own regexes to determine the type. Here is one for Java and another one for JavaScript.
+ gives a good combination of high control and easy implementation
- you still have to write your own regexes (not as trivial as the 2nd heuristic)
- people sometimes write regexes that allow some false positives. Therefore, generated strings might not be in the perfect format (not as much control as the 1st heuristic)
Are the regexes static?
If yes - you should make a mapping and use that.
If no - use a heuristic like one of the above and improve it over time as you gain more statistics about how the generated regexes usually look.

What is the purpose of "[Developer] Accented English" (zz-ZZ) in Android?

In Android KitKat, if I choose Settings > Language & Input > Language, the first choice I am offered is [Developer] Accented English. This replaces each Roman letter with an accented version. You can find a list of all the character mappings here. (It helps if you can read French).
What is the purpose of this setting? Is it just to show how characters can be mapped to other characters? Or can it be used productively (to create specific phonemes in text-to-speech output for example?
It's a technique called 'Pseudolocalization', and it's used to help test that an app is handling aspects of localization correctly.
The idea is that instead of waiting for an app's string resources to be translated into other languages - which could take some time - a "fake" pseudo-language is used instead. If the app behaves well against this fake translation, then chances are it will perform well with actual translations. There's different variations of pseudolocalization out there, but most tend to do some of the following:
Add parens [ ... ] or other delimiters around the string: this makes it easier to ensure that strings are not getting clipped at either end.
Replace regular characters with accented characters: if you see a string without accented characters, than that's a sign that it might be hardcoded instead of being treated as a localizable resource. (In the past, this was also used to ensure that apps could handle non-ASCII characters correctly and didn't lose data in code page translation, though this is less of an issue now that modern platforms support Unicode.)
Add padding to the string: this is to simulate languages such as German which often have longer translations for the corresponding English string. If the padded string gets truncated instead of wrapping or flowing, then likely the German string will do similar.
Add known-to-be-tricky characters to act as 'canaries': on some platforms, symbols from specific parts of the Unicode range may be added to ensure that they are handled or supported properly. For example, a Chinese character might be added to ensure that Chinese fonts are supported: if this ends up showing as an empty square, than that would indicate a problem. Other common 'canary' characters include code points from outside the BMP, or using Combining Characters.
One advantage of using pseudolocalization over actual translation is that the testing can be performed by someone who does not understand the target language: "[Àççôûñţ Šéţţîñĝš___]" still visually appears similar to the original English text "Account Settings". If you try using it with a Screen-Reader such as TalkBack, or other wise send pseudolocalized text to Text-to-speech, you'll likely get nonsense, since it will try to treat the accented characters as actual accented characters.

Best Character for splitting Android

I started an android project, just like chat program.
Data downloaded from my server just like this
1~my name~my username~message
Nah, my question is, is there any character that compatible with android
to replace the delimiter (~) above. Im afraid, if in other day, user use the
character ~, program will crashed.
I used character ÷, but my android cant read it, it turned to '?'.
Did someone had the same problem ??
First of all it is almost bad idea to create your own format for client-server communication, my best advice is to give a shot to json or xml. There are lots of library available both on client side and server side to form/parse them all you have to do is use you back-end language to return either one of the format.
For python : http://docs.python.org/library/json.html
For php : http://php.net/manual/en/book.json.php
For Android : http://developer.android.com/reference/org/json/JSONObject.html
You can easily find other languages with simple search.
If you're using also Java on the server side, you could define an object like ChatMessage and just send it per Socket and an Object Stream to the Server.
As Burak noted, your way is the wrong way... but there are several other ways, IMHO an object stream might be the easiest solution for you.
If you use a delimiter which is a possible content of the data put into the flow you are delimiting, you will have a problem.
To prevent that, you need to prevent the character from occurring in a way that could be misinterpreted.
At the input side, detect occurrences and replace them either with a special code, or with an escaping prefix character, or quote the contents (though then you have to handle literal occurences of the quote characters)
If you use an escaping character, your splitting code must ignore any delimiter following an escape character or within a quoted sequence.
At the output side you should replace the codes or escape sequences with a literal instance of the encoded character or remove any quoting characters.
As others have mentioned, there are a number of standard schemes and functions for handling them.

Categories

Resources