PatternSyntaxException in non-latin locales

PatternSyntaxException in non-latin locales - android

I've got a regex that was working perfectly fine until I switched my locale to 'fa' (Persian). I suspect this would happen with Hebrew and Arabic too (not yet sure if it's the characters or the RTL direction that makes it break).
The line of code causing the exception is:
public static final Pattern NAME_REGEX = Pattern.compile(String.format("^[\\w ]{%d,%d}$", 2,24));
(the syntax is fine, it works in English & Spanish) but when the app tries to compile the regex in the 'incompatible' locales, I get the following:
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:605)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.util.regex.PatternSyntaxException: Syntax error U_REGEX_BAD_INTERVAL near index 8:
^[\w ]{٢,٢٤}$
^
at java.util.regex.Pattern.compileImpl(Native Method)
at java.util.regex.Pattern.compile(Pattern.java:400)
at java.util.regex.Pattern.<init>(Pattern.java:383)
at java.util.regex.Pattern.compile(Pattern.java:374)
at com.airg.hookt.config.airGConstant.<clinit>(airGConstant.java:131)
Any help would be appreciated.
Thanks

Looks like you're trying to specify the interval using Arabic-Indic digits (U+0660..U+0669); I would have been very surprised if that had worked. I've never heard of a regex flavor that accepts anything but ASCII digits as part of the regex itself.
Are you also expecting \w to match letters/digits from Persian, Hebrew, and Arabic scripts? That won't work either, but this time it's because of a shortcoming in Java's regex flavor. If you want to match characters from any writing system, you need to use Unicode properties like \p{L} and \p{N} (but see here for more details).

ANSWER
So ... the problem was indeed the String.format
Changing
public static final Pattern NAME_REGEX = Pattern.compile(String.format("^[\\w ]{%d,%d}$", 2,24));
to
public static final Pattern NAME_REGEX = Pattern.compile("^[\\w ]{" + 2 + "," + 24 + "}$");
fixed the crash. Thanks to everyone for their contribution.

Related

URL encoding is getting failed for special character. #Android

I'm working on a solution where need to encode string into utf-8 format, this string nothing but device name that I'm reading using BluetoothAdapter.getDefaultAdapter().name.
For one of sampple I got a string like ABC-＆ and encoding this returned ABC-%EF%BC%86 instead of ABC-%26. It was weird until further debugging which helped to identify that there is difference between & and ＆. Second one is some other character which is failing to encoded as expected.
& and ＆ both are different.
For encoding tried both URLEncoder.encode(input, "utf-8") and Uri.encode(input, "utf-8") but nothing worked.
This is just an example, there might be other character which may look like same as actual character but failed to encode. Now question are:
Why this difference, after all it is reading of some data from device using standard SDK API.
How can fix this be fixed. Find and replace with actual character could be a approach but scope is limited, there might be other unknown character.
Any suggestion around !!

One solution would be to define your allowed character scope. Then either replace or remove the characters that fall outside of this scope.
Given the following regex:
[a-zA-Z0-9 -+&#]
You could then either do:
input.replaceAll("[a-zA-Z0-9 -+&#]", "_");
...or if you don't care about possibly empty results:
input.replaceAll("[a-zA-Z0-9 -+&#]", "");
The first approach would give you a length-consistent representation of the original Bluetooth device name.
Either way, this approach has worked wonders for me and my colleagues. Hope this could be of any help 😊.

Create regular expression using android pattern

I have a sample message . I need to create a regular expression to validate using android pattern.
sample message :
ERR|any digit|any digit;
checking validation:
1.Starting fixed characters :ERR
separator character :|
digit after | character
Message termination ;
I have tried like this way:^{ERR}+{|}+\d+{|}+\d+{;}$
Am I right? Please help to solve my problem.

The corrected regex you gave would be ^(ERR)+(\\|)+\\d+(\\|)+\\d+;$. Brackets are used for grouping, not braces. Also, in regex, + is used to represent "one or more of the previous expression". So writing (ERR)+ means "one or more of the string 'ERR'", so strings like "ERRERR|123|456;" would be matched (same thing goes for the pipe characters) - this is not what you are trying to do, I assume.
Having said that, try this: "^ERR\\|\\d+\\|\\d+;$"

Changing text input to also treat "u" as "ü", "ss" as "ß", etc. for word suggestions

I have the following idea:
In German we have four extra letters (ä, ö, ü, ß) and I don't know any other language which has these vocals but I think French people with their accents also know this problem. We have a lot of apps in the Google Play store for cities, bus stations, trains and other stuff like that. Now it is really exhausting that we always have to write these letters if we are on the go. It would be much easier to write Munchen (=München [de] = Munich [en]), Osterreich (Österreich [de] = Austria [en]) or something like Uberwasserstrasse (Überwasserstraße [de] = Over-Water-Street [en]). So my question is now:
A lot of apps show suggestions for our just typed word. I think in the code it is something like this:
String current = editText.getText().toString();
db.lookUp(current); // Of course SQL statement
Can we hook this so that Android thinks that we have typed an ä, ö, ü, ß if we write an a, o, u, ss and the system looks for words with one of these vowels and suggests both? Here I do not want to ask for code - I want to discuss if we are able to write a hack or hook for the Android system. Also, root-rights can be assumed with the solution. I'm looking forward to your ideas.

You could do this the other way around, by "normalizing" typed characters into their related non-diacritical versions. You can use the java.Text.Normalizer class for this. A good snippet can be found in this article:
public static String removeAccents(String text) {
return text == null ? null :
Normalizer.normalize(text, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
When applied to "Münich", this returns "Munich". That way, you can use a simple string comparison using these normalized versions.
This wouldn't work for "ß" though. If that's the only special case, you could handle it separately.

What you are looking for is called accent-insensitive collating sequence. SQLite's COLLATE operator can be used to do such searches, but I learned from another post that there might be bugs you'll need to look out for.

What are my options for displaying characters that Android can't?

I discovered today that Android can't display a small handful of Japanese characters that I'm using in my Japanese-English dictionary app.
The problem comes when I attempt to display the character via TextView.setText(). All of the characters below show up as blank when I attempt to display them in a TextView. It doesn't appear to be an issue with encoding, though - I'm storing the characters in a SQLite database and have verified that Android can understand the characters. Casting the characters to (int) retrieves proper Unicode decimal escapes for all but one of the characters:
String component = cursor.getString(cursor.getColumnIndex("component"));
Log.i("CursorAdapterGridComponents", "Character Code: " + (int) component.charAt(0) + "(" + component + ")");
I had to use Character.codePointAt() to get the decimal escape for the one problematic character:
int codePoint = Character.codePointAt(component, 0);
I don't think I'm doing anything wrong, and as String's are by default UTF-16 encoded, there should be nothing preventing them from displaying the characters.
Below are all of the decimal escapes for the seven problematic characters:
⺅ Character Code: 11909(⺅)
⺌ Character Code: 11916(⺌)
⺾ Character Code: 11966(⺾)
⻏ Character Code: 11983(⻏)
⻖ Character Code: 11990(⻖)
⺹ Character Code: 11961(⺹)
𠆢 Character Code: 131490(𠆢)
Plugging the first six values into http://unicode-table.com/en/ revealed their corresponding Unicode numbers, so I have no doubt that they're valid UTF-8 characters.
The seventh character could only be retrieved from a table of UTF-16 characters: http://www.fileformat.info/info/unicode/char/201a2/browsertest.htm. I could not use its 5-character Unicode number in setText() (as in "\u201a2") because, as I discovered earlier today, Android has no support for Unicode strings past 0xFFFF. As a result, the string was evaluated as "\u201a" + "2". That still doesn't explain why the first six characters won't show up.
What are my options at this point? My first instinct is to just make graphics out of the problematic characters, but Android's highly variable DPI environment makes this a challenging proposition. Is using another font in my app an option? Aside from that, I really have no idea how to proceed.

Is using another font in my app an option?
Sure. Find a font that you are licensed to distribute with your app and has these characters. Package the font in your assets/ directory. Create a Typeface object for that font face. Apply that font to necessary widgets using setTypeface() on TextView.
Here is a sample application demonstrating applying a custom font to a TextView.

PhoneNumberFormattingTextWatcher in Android 4.x

I have the following code that was working fine in Android 2.2 to format phone numbers by 555-555-5555, but in 4.x it is formatting them in 555555-555.
inputPhoneNumber = (EditText) findViewById(R.id.inputPhoneNumber);
inputPhoneNumber.addTextChangedListener(new PhoneNumberFormattingTextWatcher());
Any suggestions on how to fix it?

There is another android class specially for formatting phone number PhoneNumberUtils
and some methods you can use:
formatNumber(String source) Breaks the given number down and formats
it according to the rules for the country the number is from.
formatNumber(Editable text, int defaultFormattingType) Formats a
phone number in-place.
Chect it out.

The comment by #learningslowly helped, but I found it to be still incomplete. The full allowed digits string needed for correct & 'normal' format is:
android:digits="0123456789()-+ "
I was previously missing the minus, plus, and space.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

PatternSyntaxException in non-latin locales - android

Related

URL encoding is getting failed for special character. #Android

Create regular expression using android pattern

Changing text input to also treat "u" as "ü", "ss" as "ß", etc. for word suggestions

What are my options for displaying characters that Android can't?

PhoneNumberFormattingTextWatcher in Android 4.x

Categories

Resources