I want to check if www page is in text. For example i have page address: www.taktik.com/trow and want check if text for www is in text.
I use Matcher mW = Pattern.compile("[a-zA-Z0-9_.+-]+.[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+\\/[a-zA-Z0-9-.]+").matcher(question); but I don't get any results. How can I check if text xxx.xxxx.xxx/xxx is in my String?
How can I check if text xxx.xxxx.xxx/xxx is in my String?
Fixing your regex, the pattern may look like
[a-zA-Z0-9_.+-]+\\.[a-zA-Z0-9-]+\\.[a-zA-Z0-9.-]+/[a-zA-Z0-9.-]+
Mind I escaped thr first dot and placed the hyphen at the end of the last two character classes (in yours, you have 9-. that creates a range that matches more than you'd want).
I tried to shorten the pattern a bit, but it's difficult since \w also matches Unicode characters in Android. Here is a possible regex:
(?i)[A-Z0-9_+-]+(?:\\.[A-Z0-9-]+){2}/[A-Z0-9-]+
Related
I've been trying to find a good way to be able to keep only emojis and letters in a given text, but every article I found, I didn't have success with .
I've tried to use regex, but seems that I can not make it work.
I've tried to use emoji4j but it seems that this library is working with emojis in this form ":)", which don't help me, because my emojis are groups of unicode characters.
The result I want is the following :
"This is. a text ๐จโ๐ฉโ๐งโ๐ฆ,,1234" => "This is a text ๐จโ๐ฉโ๐งโ๐ฆ"
"๐จโ๐ฉโ๐งโ๐ฆ" => "๐จโ๐ฉโ๐งโ๐ฆ"
"๐จโ๐ฉโ๐งโ๐ฆ๐123abc๐จโ๐ฉโ๐งโ๐ฆ" => "๐จโ๐ฉโ๐งโ๐ฆ๐abc๐จโ๐ฉโ๐งโ๐ฆ"
Here's the emoji regex : ?:[\u2700-\u27bf]|(?:[\ud83c\udde6-\ud83c\uddff]){2}|[\ud800\udc00-\uDBFF\uDFFF]|[\u2600-\u26FF])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|[\ud83c\udffb-\ud83c\udfff])?(?:\u200d(?:[^\ud800-\udfff]|(?:[\ud83c\udde6-\ud83c\uddff]){2}|[\ud800\udc00-\uDBFF\uDFFF]|[\u2600-\u26FF])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|[\ud83c\udffb-\ud83c\udfff])?)*|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|[\ud83c\udd70-\ud83c\udd71]|[\ud83c\udd7e-\ud83c\udd7f]|\ud83c\udd8e|[\ud83c\udd91-\ud83c\udd9a]|[\ud83c\udde6-\ud83c\uddff]|[\ud83c\ude01-\ud83c\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c\ude32-\ud83c\ude3a]|[\ud83c\ude50-\ud83c\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff] .
If I try something like :
val regex = "the_whole_regex_above | [^a-zA-Z]".toRegex()
myText.replace(regex,""), it won't replace anything, basically every character will pass
Basically I want to achieve pretty much the same thing as in this question, but using Kotlin.
You want to remove all punctuation, symbols (other than those used to form emojis) and digits.
To do that, you may use
myText = myText.replace("""[\p{N}\p{P}\p{S}&&[^\p{So}]]+""".toRegex(), "")
See the online Kotlin demo.
Details
[ - start of a character class that matches:
\p{N} - any Unicode digit
\p{P} - any Unicode punctuation proper
\p{S} - any Unicode symbol
&&[^\p{So}] - BUT the Unicode symbols belonging to Symbol, other Unicode category that are mostly used to form emojis
]+ - 1 or more occurrences.
I have to implement a function that check if a string is compliant to a regular expression, I have wrote a method that parse a list of filename, for each file name I need to check if respect the regexp.
The filename is composed like as follow (just an example):
verbale.pdf.001.001
image.jpg.002.001
The string is always composed by:
extension (only jpg or pdf) "." a group of three number "." a group of three number
With this regexp I need to check if the string in input end as described above, I have currently implemented this:
Pattern rexExp = Pattern.compile("((\\.jpg)|(\\.pdf))\\.[0-9]{3}\\.[0-9]{3}");
But not work properly, is it a good idea implement a regExp to check if a filename end with a certain path ?
Less greedy than the other answer, think it suits you:
\\w+\\.(jpg|pdf)(\\.\\d{3}){2}
file name, only composed of letters, numbers and _
dot
jpg or pdf formats
another dot
three digits
the dot and the three digits repeated
This should work :
.*\\w{3}\\.\\d{3}\\.\\d{3}
.* = any Characters (like "verbale123")
\\w{3} = any 3 alphabetic\numeric characters
\\. = a dot
\\d{3} = any three numeric characters
To check if a string ends with pdf or jpg and two sequences of . and 3 digits, you may use
(?i)(?:jpg|pdf)(?:\.[0-9]{3}){2}$
See the regex demo
Details
(?i) - case insensitive flag
(?:jpg|pdf) - either jpg or pdf
(?:\.[0-9]{3}){2} - 2 repetitions of a . and 3 digits
$ - end of string.
Use with Matcher#find() (as matches() anchors the match at the start and end of the string, while a partial match is required when using this pattern), example demo:
String s = "verbale.pdf.001.001";
Matcher matcher = Pattern.compile("(?i)(?:jpg|pdf)(?:\\.[0-9]{3}){2}$").matcher(s);
if (matcher.find()){
System.out.println("Valid!");
}
Hmm, I can't find the man page for 'replace' in Googles App scripts, I only see 'replaceText'. Anyway, from what I gather from the SO posts, the below should work, hopefully someone can spot it easily.
The String in the Cell is "[pro] all, everybody" and I want to remove the bracketed word '[pro]' so the result is 'all, everybody'.
It does work just fine with:
Cell = Cell.toString().replace("\[pro\]","");
but when I try to make it generic, it fails with all these (not sure what the pattern matching rules are, thus the question for the man page):
Cell = Cell.toString().replace("\[pr.\]","");
Cell = Cell.toString().replace("\[pr.*\]","");
Cell = Cell.toString().replace("\[.*\]","");
they should work, no ? What am I missing ?
Also, how would I use 'replaceText', I can't seem to apply it directly to the 'Cell' object.
The String#replace is a JavaScript function where you need to use a regex with a regex literal notation or with new RegExp("pattern", "modifiers") constructor notation:
Cell = Cell.toString().replace(/\[pr[^\]]*]/,"");
When using a regex literal, backslashes are treated as literal backslashes, and /\d/ matches a digit. The constructor notation equivalent is new RegExp("\\d").
The /\[pr[^\]]*]/ regex matches the first instance of:
\[pr - literal substring [pr
[^\]]* - 0+ chars other than ]
] - a literal ] symbol.
And replaces with an empty string.
In my app I have a Textview with some text. I'm trying to get an input from the user, and then highlight words in the Textview according to that input.
For instance if the text is
Hello stackoverflow
and the input for the user is
hello
I want to replace the text with:
<font color='red'>Hello</font>` stackoverflow
This is my code:
String input = //GETTING INPUT FROM THE USER
text= text.replaceAll(input,"<font color='red'>"+input+"</font>");
Textview.setText(Html.fromHtml(text));
And the replacement is working, but the problem is that my current code changes the original word cases, for example :
Text: HeLLo stackoverflow
Input: hello
What i get: <font color='red'>hello</font> stackoverflow
What i want: <font color='red'>HeLLo</font> stackoverflow
You have to think about regular expressions.
replaceAll allows you to use regular expressions, and so, you can replace the text for the exact occurrence that was found.
For instance if Hello was found, it replaces it for <font color='red'>Hello</font>.
If HeLLo is found, it replaces it for <font color='red'>HeLLo</font>
Your code should be somehing as easy as:
String highlighted = text.replaceAll("(?i)("+input+")","<font color='red'>$1</font>");
This means:
(?i) : i want to search for something, case insensitive
"("+input+")" : input is betwen ( and ) because we are creating a group, so this group can be refered later
"<font color='red'>$1</font>" : instead of replacing by input, that would change the case, we replace it by `$1, that is the reference to the first matched group. This means that we want to replace it using the exact word that was found.
But please, try it and keep playing since regular expressions are tricky.
Other reads
It is easier and more clear if you use the Patternclass.
You can read more here:
http://developer.android.com/reference/java/util/regex/Pattern.html
Also, you can take a look at how to do it:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29
public String replaceAll(String regex, String replacement)
.
Replaces each substring of this string that matches the given regular expression with the given replacement.
An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceAll(repl)
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.
Parameters:
regex - the regular expression to which this string is to be matched
replacement - the string to be substituted for each match
Returns:
The resulting String
UPDATE
You can test your regular expressions in this page:
http://www.regexplanet.com/advanced/java/index.html
Using this link I was able to create a filter using the regex (?!dalvikvm\b)\b\w+ to filter out messages with the tag dalvikvm, but I have tried several variations of the regex such as (?!dalvikvm-heap\b)\b\w+, (?!dalvikvm\\-heap\b)\b\w+, (?!dalvikvm[-]heap\b)\b\w+, and many others and I can't seem to get rid of the dalvikvm-heap messages. Ideally I would like to filter them both, but I haven't figured that part out yet either.
Any help would be appreciated.
Use ^(?!dalvikvm) in the tag field instead. That will show only messages whose tag doesn't start with "dalvikvm".
The below is notes about how this works; you can skip them if you're not interested. To start, you have to remember that the question, "Does this string match the regex?" really means, "Is there any position in this string where the regex matches?"
The tricky thing about (?!x) negative assertions is that they match wherever the next part of the string doesn't match x: but that's true of every place in the string "dalvikvm" except the start. The blog post you linked to adds a \b at the end so that the expression matches only at a place that's not just before "dalvikvm" and is a word boundary. But this would still match, because the end of the string is a word boundary, and it doesn't have "dalvikvm" after it. So the blog post adds the \w+ after it, to say that after the word boundary there have to be more word characters.
It works for exactly that case, but it's a bit of an odd way of making a regex, and it's relatively expensive to evaluate. And as you've noticed, you can't adapt it to (?!dalvikvm-heap\b)\b\w+. "-" is a non-word character, so immediately after it, there is a word boundary, followed by word characters, and not followed by "dalvikvm-heap", so the regex matches at that point.
Instead, I use ^, which only matches at the start of the string, along with the negative assertion. Overall, the regex only matches at the start of the string, and only then if the start of the string isn't followed by "dalvikvm". That means it won't match "dalvikvm" or "dalvikvm-heap". It's also cheaper to evaluate, because the regex engine knows it can only possibly match at the start.
Making the regex this way, you can filter out multiple tags by just putting them together. For example, ^(?!dalvikvm)(?!IInputConnectionWrapper) will filter out tags that start with "dalvikvm" or "IInputConnectionWrapper", because the start of the string has to not be followed by the first and not be followed by the second.
BTW, thanks for your link. I didn't realise that you could use the logcat filters that way, so I wouldn't have come up with my answer without it.