I have to implement a function that check if a string is compliant to a regular expression, I have wrote a method that parse a list of filename, for each file name I need to check if respect the regexp.
The filename is composed like as follow (just an example):
verbale.pdf.001.001
image.jpg.002.001
The string is always composed by:
extension (only jpg or pdf) "." a group of three number "." a group of three number
With this regexp I need to check if the string in input end as described above, I have currently implemented this:
Pattern rexExp = Pattern.compile("((\\.jpg)|(\\.pdf))\\.[0-9]{3}\\.[0-9]{3}");
But not work properly, is it a good idea implement a regExp to check if a filename end with a certain path ?
Less greedy than the other answer, think it suits you:
\\w+\\.(jpg|pdf)(\\.\\d{3}){2}
file name, only composed of letters, numbers and _
dot
jpg or pdf formats
another dot
three digits
the dot and the three digits repeated
This should work :
.*\\w{3}\\.\\d{3}\\.\\d{3}
.* = any Characters (like "verbale123")
\\w{3} = any 3 alphabetic\numeric characters
\\. = a dot
\\d{3} = any three numeric characters
To check if a string ends with pdf or jpg and two sequences of . and 3 digits, you may use
(?i)(?:jpg|pdf)(?:\.[0-9]{3}){2}$
See the regex demo
Details
(?i) - case insensitive flag
(?:jpg|pdf) - either jpg or pdf
(?:\.[0-9]{3}){2} - 2 repetitions of a . and 3 digits
$ - end of string.
Use with Matcher#find() (as matches() anchors the match at the start and end of the string, while a partial match is required when using this pattern), example demo:
String s = "verbale.pdf.001.001";
Matcher matcher = Pattern.compile("(?i)(?:jpg|pdf)(?:\\.[0-9]{3}){2}$").matcher(s);
if (matcher.find()){
System.out.println("Valid!");
}
Related
I found a dictionary sample in GitHub that I am currently experimenting with. The sample database used hyphen between the searched word and the word's meaning. So something like this.
abbey - n. a monastery ruled by an abbot
I looked into the dictionary database java file and found the following code:
String[] strings = TextUtils.split(line, "-");
I have my own database that translates Korean words to English. However I didn't use hyphen while creating it. So is there a way to not use hyphen or any other symbols but simply spaces? Also this is part of an android app.
Edit- An example of my own dictionary would be something like
abbey a monastery ruled by an abbot
Edit-
The problem here is that the old code only differentiates and recognizes the words and the meaning only if they are separated by hyphen. How do I make this so it works with spaces alone.
To remove a character in a String use String.replace
String newString = line.replace("-","");
To replace with a space simply use
String newString = line.replace("-"," ");
String mystring = mystring 1.replace("_"," "); if you want space give space.
As I understand it, you want to split your String to get the output like
abbey - n. a monastery ruled by an abbot
[abbey][n. a monastery ruled by an abbot]
You can use String.split(String, int) to force the number of split.
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times
Let's use it like :
String[] array = s.split(" ", 2);
This will split your String on the regex " " but will limit the size of the output to 2 cells. So it will only split once, put the left part on the first cell and the right part on the second cell.
Without this limit argument, the method would keep split the right part again using a bigger array.
Note: this will be a problem if your word is a sentence in the left part.
I have a .txt file which contains above 1000 words
sample city names below
Razvilka
Moscow
Firozpur Jhirka
Kathmandu
Kiev
Pokhara
Merida
Delhi
Reshetnikovo
Ciudad Bolivar
Marfino
Zhukovskiy
Reutov
Kurovskoye
etc
I would like to have these words in this format below
"Razvilka","Moscow","etc","etc"
enclosed with double quotation and with a comma in the end.I am using Notepad++.Could you mention how to do it and which software should I use it?
If you're using Notepad++, make a Search and Replace replacing
\b(\w+)\b
with
"$1",
It'll find all words and replace with them self, surrounded by quotes. You'll have to manually remove the last , if that's unwanted.
Regards
I wonder if this question is about programming, but You tagged android, regex and android studio, so I guess it is. If yes, You can simply split a string in that way:
String[] splitted = yourString.split("\\s+");
In that case, You are splitting the strings by whitespaces (this regex is also for more than one whitespace), like Your string seems to be. If You have more than one delimiter, You can do it by using the OR operator |
String[]splitted = yourString.split("-|\\.");
In that example, You are splitting the String by - and . (minus and point). The delimiter is the sign where the String is splitted by.
I want to check if www page is in text. For example i have page address: www.taktik.com/trow and want check if text for www is in text.
I use Matcher mW = Pattern.compile("[a-zA-Z0-9_.+-]+.[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+\\/[a-zA-Z0-9-.]+").matcher(question); but I don't get any results. How can I check if text xxx.xxxx.xxx/xxx is in my String?
How can I check if text xxx.xxxx.xxx/xxx is in my String?
Fixing your regex, the pattern may look like
[a-zA-Z0-9_.+-]+\\.[a-zA-Z0-9-]+\\.[a-zA-Z0-9.-]+/[a-zA-Z0-9.-]+
Mind I escaped thr first dot and placed the hyphen at the end of the last two character classes (in yours, you have 9-. that creates a range that matches more than you'd want).
I tried to shorten the pattern a bit, but it's difficult since \w also matches Unicode characters in Android. Here is a possible regex:
(?i)[A-Z0-9_+-]+(?:\\.[A-Z0-9-]+){2}/[A-Z0-9-]+
I newt o regular expressions and been using tutorials, but the regular express I have works sometimes, but doesn't all the time. I am getting my numbers out of the contact list from my android phone. I am trying to get rid of all spaces, '(', ')', and '-'
For example:
1. (555) 867-5309 -> 5558675309
2. 1555-555-5555 -> 15555555555
3. 555-555-5555 -> 5555555555
This is the line I am using
String formatphone = contactPhone.replaceAll("\\s()-","");
For some numbers it only returns number and sometimes it doesn't change the format.
Is it correct? Do i need to format something because I am taking it out of the phone's contact list?
Put the desired characters in a character class:
String formatphone = contactPhone.replaceAll("[ ()-]","");
Ensure that you put the hyphen - at either end.
Try using this:
String formatphone = contactPhone.replaceAll("^.*[\\s\\(\\)-].*", "");
As a regular expression you're defining a set using []. In that set you include any character you want to be replaced. As ( and ) are special meaning characters, you have to escape them. As the - is a special character used to design ranges, it has to be the last character of your set, so if nothing is behind it, it's not a range, but just that character (you could escape it too, though).
I want to parse HLS master m3u8 file and get the bandwidth, resolution and file name from it. Currently i am using String parsing to search string for some patterns and do the sub string to get value.
Example File:
#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234
Stream1/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=763319,RESOLUTION=480x270
Stream2/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1050224,RESOLUTION=640x360
Stream3/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1910937,RESOLUTION=640x360
Stream4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3775816,RESOLUTION=1280x720
Stream5/index.m3u8
But i found that we can parse it using regular expressions like mentioned in this question:
Problem matching regex pattern in Android
I don't have any Idea of regular expression so can some one please guide me to parse this using regular expression.
Or can someone help me in writing regexp for parsing out BANDWIDTH and RESOLUTION values from below string
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234
You could try something like this:
final Pattern pattern = Pattern.compile("^#EXT-X-STREAM-INF:.*BANDWIDTH=(\\d+).*RESOLUTION=([\\dx]+).*");
Matcher matcher = pattern.matcher("#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234");
String bandwidth = "";
String resolution = "";
if (matcher.find()) {
bandwidth = matcher.group(1);
resolution = matcher.group(2);
}
Would set bandwidth and resolution to the correct (String) values.
I haven't tried this on an android device or emulator, but judging from the link you sent and the android API it should work the same as the above plain old java.
The regex matches strings starting with #EXT-X-STREAM-INF: and contains BANDWIDTH and RESOLUTION followed by the correct value formats. These are then back-referenced in back-reference group 1 and 2 so we can extract them.
Edit:
If RESOLUTION isn't always present then you can make that portion optional as such:
"^#EXT-X-STREAM-INF:.*BANDWIDTH=(\\d+).*(?:RESOLUTION=([\\dx]+))?.*"
The resolution string would be null in cases where only BANDWIDTH is present.
Edit2:
? makes things optional, and (?:___) means a passive group (as opposed to a back-reference group (___). So it's basically a optional passive group. So yes, anything inside it will be optional.
A . matches a single character, and a * makes means it will be repeated zero or more times. So .* will match zero or more characters. The reason we need this is to consume anything between what we are matching, e.g. anything between #EXT-X-STREAM-INF: and BANDWIDTH. There are many ways of doing this but .* is the most generic/broad one.
\d is basically a set of characters that represent numbers (0-9), but since we define the string as a Java string, we need the double \\, otherwise the Java compiler will fail because it does not recognize the escaped character \d (in Java). Instead it will parse \\ into \ so that we get \d in the final string passed to the Pattern constructor.
[\dx]+ means one or more characters (+) out of the characters 0-9 and x. [\dx\d] would be a single character (no +) out of the same set of characters.
If you are interested in regex you could check out regular-expressions.info or/and regexone.com, there you will find much more in depth answers to all your questions.
you could just split strings, here's what I mean in python.
fu ="#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234"
for chunk in fu.split(':')[1].split(','):
if chunk.startswith('BANDWIDTH'):
bandwidth = int(chunk.split('=')[1])
if chunk.startswith('RESOLUTION'):
resolution = chunk.split('=')[1]
for Jorr-el
>>>> fu = '#EXT-X-STREAM-INF:BANDWIDTH=5857392,RESOLUTION=1980x1080,CODECS="avc1.42c02a,mp4a.40.2"'
>>>> for chunk in fu.split(':')[1].split(','):
.... if chunk.startswith('BANDWIDTH'):
.... bandwidth = int(chunk.split('=')[1])
.... if chunk.startswith('RESOLUTION'):
.... resolution = chunk.split('=')[1]
....
>>>> bandwidth
5857392
>>>> resolution
'1980x1080'
>>>>
I found this one might be help.
http://sourceforge.net/projects/m3u8parser/
(License: LGPLv3)
You can also use: Python m3u8 parser.
Example below:
import m3u8
playlist = """
#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234
Stream1/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=763319,RESOLUTION=480x270
Stream2/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1050224,RESOLUTION=640x360
Stream3/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1910937,RESOLUTION=640x360
Stream4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3775816,RESOLUTION=1280x720
Stream5/index.m3u8
"""
_playlist = m3u8.loads(playlist).playlists
for item in _playlist:
item_uri = item.uri
resolution = item.stream_info.resolution
bandwidth = item.stream_info.bandwidth
print(item_uri ,resolution , bandwidth )
result will be :
Stream1/index.m3u8 (416, 234) 476416
Stream2/index.m3u8 (480, 270) 763319
Stream3/index.m3u8 (640, 360) 1050224
Stream4/index.m3u8 (640, 360) 1910937
Stream5/index.m3u8 (1280, 720) 3775816