How to remove hyphen from TextUtils.split(line, "-")? - android

I found a dictionary sample in GitHub that I am currently experimenting with. The sample database used hyphen between the searched word and the word's meaning. So something like this.
abbey - n. a monastery ruled by an abbot
I looked into the dictionary database java file and found the following code:
String[] strings = TextUtils.split(line, "-");
I have my own database that translates Korean words to English. However I didn't use hyphen while creating it. So is there a way to not use hyphen or any other symbols but simply spaces? Also this is part of an android app.
Edit- An example of my own dictionary would be something like
abbey a monastery ruled by an abbot
Edit-
The problem here is that the old code only differentiates and recognizes the words and the meaning only if they are separated by hyphen. How do I make this so it works with spaces alone.

To remove a character in a String use String.replace
String newString = line.replace("-","");
To replace with a space simply use
String newString = line.replace("-"," ");

String mystring = mystring 1.replace("_"," "); if you want space give space.

As I understand it, you want to split your String to get the output like
abbey - n. a monastery ruled by an abbot
[abbey][n. a monastery ruled by an abbot]
You can use String.split(String, int) to force the number of split.
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times
Let's use it like :
String[] array = s.split(" ", 2);
This will split your String on the regex " " but will limit the size of the output to 2 cells. So it will only split once, put the left part on the first cell and the right part on the second cell.
Without this limit argument, the method would keep split the right part again using a bigger array.
Note: this will be a problem if your word is a sentence in the left part.

Related

Custom Regular Expression in Java

I have to implement a function that check if a string is compliant to a regular expression, I have wrote a method that parse a list of filename, for each file name I need to check if respect the regexp.
The filename is composed like as follow (just an example):
verbale.pdf.001.001
image.jpg.002.001
The string is always composed by:
extension (only jpg or pdf) "." a group of three number "." a group of three number
With this regexp I need to check if the string in input end as described above, I have currently implemented this:
Pattern rexExp = Pattern.compile("((\\.jpg)|(\\.pdf))\\.[0-9]{3}\\.[0-9]{3}");
But not work properly, is it a good idea implement a regExp to check if a filename end with a certain path ?
Less greedy than the other answer, think it suits you:
\\w+\\.(jpg|pdf)(\\.\\d{3}){2}
file name, only composed of letters, numbers and _
dot
jpg or pdf formats
another dot
three digits
the dot and the three digits repeated
This should work :
.*\\w{3}\\.\\d{3}\\.\\d{3}
.* = any Characters (like "verbale123")
\\w{3} = any 3 alphabetic\numeric characters
\\. = a dot
\\d{3} = any three numeric characters
To check if a string ends with pdf or jpg and two sequences of . and 3 digits, you may use
(?i)(?:jpg|pdf)(?:\.[0-9]{3}){2}$
See the regex demo
Details
(?i) - case insensitive flag
(?:jpg|pdf) - either jpg or pdf
(?:\.[0-9]{3}){2} - 2 repetitions of a . and 3 digits
$ - end of string.
Use with Matcher#find() (as matches() anchors the match at the start and end of the string, while a partial match is required when using this pattern), example demo:
String s = "verbale.pdf.001.001";
Matcher matcher = Pattern.compile("(?i)(?:jpg|pdf)(?:\\.[0-9]{3}){2}$").matcher(s);
if (matcher.find()){
System.out.println("Valid!");
}

Converting a text file into String array with regular expression

I have a .txt file which contains above 1000 words
sample city names below
Razvilka
Moscow
Firozpur Jhirka
Kathmandu
Kiev
Pokhara
Merida
Delhi
Reshetnikovo
Ciudad Bolivar
Marfino
Zhukovskiy
Reutov
Kurovskoye
etc
I would like to have these words in this format below
"Razvilka","Moscow","etc","etc"
enclosed with double quotation and with a comma in the end.I am using Notepad++.Could you mention how to do it and which software should I use it?
If you're using Notepad++, make a Search and Replace replacing
\b(\w+)\b
with
"$1",
It'll find all words and replace with them self, surrounded by quotes. You'll have to manually remove the last , if that's unwanted.
Regards
I wonder if this question is about programming, but You tagged android, regex and android studio, so I guess it is. If yes, You can simply split a string in that way:
String[] splitted = yourString.split("\\s+");
In that case, You are splitting the strings by whitespaces (this regex is also for more than one whitespace), like Your string seems to be. If You have more than one delimiter, You can do it by using the OR operator |
String[]splitted = yourString.split("-|\\.");
In that example, You are splitting the String by - and . (minus and point). The delimiter is the sign where the String is splitted by.

SQLite unicode slavic accented words Android

I'm trying to filter out accented words if user searches for them in local database. But I have problems, namely with slavic letters ČŠŽ. In my SQLite database I have a field "title" with value: "Želodček"
If I try to select LOWER(title) I always get back the same value "Želodček" whilst other words are correctly lower cased. Only if the word begins with ČŽŠ then it doesn't get lower cased. This only persists with words which have leading accented letters.
Database records
Stomach
Želodček
Uppercase with UPPER()
STOMACH
ŽELODčEK
Lowercase with LOWER()
stomach
Želodček
I've already tried setting localization with setLocale() with no luck. I also tried different collation like NOCASE, UNICODE, LOCALIZED but nothing worked. I'm wondering why when lower cased the first letter is not lower cased and when upper cased other accented words are lowercase.
I've solved the problem with LIKE searches where I replace accented words with their lower cased counterpart. But I have problem with full text(FTS3) searching because I can't use the same trick with MATCH.
-- works but it's a hack
SELECT title FROM articles WHERE REPLACE(LOWER(title),'Ž','ž') LIKE '%želodček%'
-- can't seem to get it work
SELECT title FROM articles WHERE title MATCH 'želodček' COLLATE NOCASE
Is there any solution to this or is there a bigger problem?
Update:
No optimal solution yet.
Un-optimal solution 1:
I decided to deal with the problem directly by changing data in the select query. While this doesn't work for all cases (and I would have to cover all accents) it suits my case for now. So I'm posting it:
-- LIKE query
SELECT title FROM articles WHERE (REPLACE(REPLACE(REPLACE(LOWER(title),'Č','č'),'Š','š'),'Ž','ž') LIKE ? COLLATE NOCASE))
-- MATCH query (FTS)
-- In this case I programmatically replace searched word with 2 word variation (one that starts with lowercase and one that starts with uppercase) ie: title='želodček OR Želodček'
SELECT title FROM articles WHERE title MATCH ? COLLATE UNICODE
Un-optimal solution 2:
As suggested by user CL. to insert in normalized form (didn't work for me because normalized form was basically the original unicode form). I took it futher and insert title stripped of of accents (basically ASCII form). This is maybe better than solution one in ways of general solution. Since I only cover some accents in the first.
But there are downsides:
data doubles (one unicode title and one ASCII title). Which can be a problem if you have a lot of data.
some characters are not supported (like chinese characters will be gone after normalization and stripping)
ambiguity which you get by stripping accents (ie. two words "zelo" and "želo" have different meanings but will both turn up when searching).
Here's the Java code for it:
// Gets you the ASCII version of unicode title which you insert into different column
String titleAsciiName = Normalizer.normalize(title, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
LIKE never uses a custom collation.
FTS can use a custom tokenizer, but you have to check whether unicode61 is available in all Android versions you want to support.
The Android database API does not allow to create custom implementations of LIKE or of a FTS tokenizer.
You might want to store a normalized version of your strings in the database.

What are my options for displaying characters that Android can't?

I discovered today that Android can't display a small handful of Japanese characters that I'm using in my Japanese-English dictionary app.
The problem comes when I attempt to display the character via TextView.setText(). All of the characters below show up as blank when I attempt to display them in a TextView. It doesn't appear to be an issue with encoding, though - I'm storing the characters in a SQLite database and have verified that Android can understand the characters. Casting the characters to (int) retrieves proper Unicode decimal escapes for all but one of the characters:
String component = cursor.getString(cursor.getColumnIndex("component"));
Log.i("CursorAdapterGridComponents", "Character Code: " + (int) component.charAt(0) + "(" + component + ")");
I had to use Character.codePointAt() to get the decimal escape for the one problematic character:
int codePoint = Character.codePointAt(component, 0);
I don't think I'm doing anything wrong, and as String's are by default UTF-16 encoded, there should be nothing preventing them from displaying the characters.
Below are all of the decimal escapes for the seven problematic characters:
⺅ Character Code: 11909(⺅)
⺌ Character Code: 11916(⺌)
⺾ Character Code: 11966(⺾)
⻏ Character Code: 11983(⻏)
⻖ Character Code: 11990(⻖)
⺹ Character Code: 11961(⺹)
𠆢 Character Code: 131490(𠆢)
Plugging the first six values into http://unicode-table.com/en/ revealed their corresponding Unicode numbers, so I have no doubt that they're valid UTF-8 characters.
The seventh character could only be retrieved from a table of UTF-16 characters: http://www.fileformat.info/info/unicode/char/201a2/browsertest.htm. I could not use its 5-character Unicode number in setText() (as in "\u201a2") because, as I discovered earlier today, Android has no support for Unicode strings past 0xFFFF. As a result, the string was evaluated as "\u201a" + "2". That still doesn't explain why the first six characters won't show up.
What are my options at this point? My first instinct is to just make graphics out of the problematic characters, but Android's highly variable DPI environment makes this a challenging proposition. Is using another font in my app an option? Aside from that, I really have no idea how to proceed.
Is using another font in my app an option?
Sure. Find a font that you are licensed to distribute with your app and has these characters. Package the font in your assets/ directory. Create a Typeface object for that font face. Apply that font to necessary widgets using setTypeface() on TextView.
Here is a sample application demonstrating applying a custom font to a TextView.

New Line character \n not displaying properly in textView Android

I know that if you do something like
myTextView.setText("This is on first line \n This is on second line");
Then it will display properly like this:
This is on first line
This is on second line
When I store that string in a database and then set it to the view it displays as such:
This is on first line \n This is on second line
Here is the line of code I use to extract the string from the database:
factView.setText(factsCursor.getString(MyDBAdapter.FACT_COLUMN));
I simply populate the database from a text file where each line is a new entry into the table so a line would look like this "This is on first line \n This is on second line" and it is stored as text.
Is there a reason that it isn't displaying the \n characters properly? It must be something to do with the string being in the database. Any suggestions?
I found this question Austyn Mahoney's answer is correct but here's a little help:
private String unescape(String description) {
return description.replaceAll("\\\\n", "\\\n");
}
description being the string coming out of your SQLite DB
As Falmarri said in his comment, your string is being escaped when it is put into the database. You could try and unescape the string by calling String s = unescape(stringFromDatabase) before you place it in your TextView.
As a side note, make sure you are using DatabaseUtils.sqlEscapeString() on any kind of data that is from the user or an unknown changeable source when inserting data into the database. This will protect you from errors and SQL Injection.
Try \\n instead of \n. If it throws an exception than use newline keyword in place of \n....newline is one character, ascii 10; it's often entered in a string literal...and will serve your purpose....:)
"This is on first line"||x'0A'||"This is on second line"
The || concatenates strings and the x'0A' is an unescaped newline.
If you're inserting records you'll have to replace every newline with "||x'0A'||" (If your string is double quoted). This may seem clumsy compared to the other asnswers. However if your lines are in separate columns this also works in a select:
SELECT firstline||x'0A'||secondline FROM wherever;
I found this while having the same problem you are: http://www.mail-archive.com/sqlite-users#sqlite.org/msg43557.html
A text area can be in multi line or single line mode. When it is in single line mode newline characters '\n' will be treated as spaces. When in doubt, to switch multi line mode on you can use the following code:
setInputType(getInputType() | InputType.TYPE_TEXT_FLAG_MULTI_LINE);
I had the problem that the same code did not work on honeycomb and on froyo, which seem to have different defaults. I am now also excluding the flag when I want to force a field to be single lined.
From the Android doc:
public static final int TYPE_TEXT_FLAG_MULTI_LINE Added in API level 3
Flag for TYPE_CLASS_TEXT: multiple lines of text can be entered into
the field. If this flag is not set, the text field will be
constrained to a single line. Constant Value: 131072 (0x00020000)
http://developer.android.com/reference/android/text/InputType.html#TYPE_TEXT_FLAG_MULTI_LINE
You have to set the flag before you populate the field.

Categories

Resources