Parsing HLS m3u8 file using regular expressions

Parsing HLS m3u8 file using regular expressions - android

I want to parse HLS master m3u8 file and get the bandwidth, resolution and file name from it. Currently i am using String parsing to search string for some patterns and do the sub string to get value.
Example File:
#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234
Stream1/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=763319,RESOLUTION=480x270
Stream2/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1050224,RESOLUTION=640x360
Stream3/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1910937,RESOLUTION=640x360
Stream4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3775816,RESOLUTION=1280x720
Stream5/index.m3u8
But i found that we can parse it using regular expressions like mentioned in this question:
Problem matching regex pattern in Android
I don't have any Idea of regular expression so can some one please guide me to parse this using regular expression.
Or can someone help me in writing regexp for parsing out BANDWIDTH and RESOLUTION values from below string
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234

You could try something like this:
final Pattern pattern = Pattern.compile("^#EXT-X-STREAM-INF:.*BANDWIDTH=(\\d+).*RESOLUTION=([\\dx]+).*");
Matcher matcher = pattern.matcher("#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234");
String bandwidth = "";
String resolution = "";
if (matcher.find()) {
bandwidth = matcher.group(1);
resolution = matcher.group(2);
}
Would set bandwidth and resolution to the correct (String) values.
I haven't tried this on an android device or emulator, but judging from the link you sent and the android API it should work the same as the above plain old java.
The regex matches strings starting with #EXT-X-STREAM-INF: and contains BANDWIDTH and RESOLUTION followed by the correct value formats. These are then back-referenced in back-reference group 1 and 2 so we can extract them.
Edit:
If RESOLUTION isn't always present then you can make that portion optional as such:
"^#EXT-X-STREAM-INF:.*BANDWIDTH=(\\d+).*(?:RESOLUTION=([\\dx]+))?.*"
The resolution string would be null in cases where only BANDWIDTH is present.
Edit2:
? makes things optional, and (?:___) means a passive group (as opposed to a back-reference group (___). So it's basically a optional passive group. So yes, anything inside it will be optional.
A . matches a single character, and a * makes means it will be repeated zero or more times. So .* will match zero or more characters. The reason we need this is to consume anything between what we are matching, e.g. anything between #EXT-X-STREAM-INF: and BANDWIDTH. There are many ways of doing this but .* is the most generic/broad one.
\d is basically a set of characters that represent numbers (0-9), but since we define the string as a Java string, we need the double \\, otherwise the Java compiler will fail because it does not recognize the escaped character \d (in Java). Instead it will parse \\ into \ so that we get \d in the final string passed to the Pattern constructor.
[\dx]+ means one or more characters (+) out of the characters 0-9 and x. [\dx\d] would be a single character (no +) out of the same set of characters.
If you are interested in regex you could check out regular-expressions.info or/and regexone.com, there you will find much more in depth answers to all your questions.

you could just split strings, here's what I mean in python.
fu ="#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234"
for chunk in fu.split(':')[1].split(','):
if chunk.startswith('BANDWIDTH'):
bandwidth = int(chunk.split('=')[1])
if chunk.startswith('RESOLUTION'):
resolution = chunk.split('=')[1]
for Jorr-el
>>>> fu = '#EXT-X-STREAM-INF:BANDWIDTH=5857392,RESOLUTION=1980x1080,CODECS="avc1.42c02a,mp4a.40.2"'
>>>> for chunk in fu.split(':')[1].split(','):
.... if chunk.startswith('BANDWIDTH'):
.... bandwidth = int(chunk.split('=')[1])
.... if chunk.startswith('RESOLUTION'):
.... resolution = chunk.split('=')[1]
....
>>>> bandwidth
5857392
>>>> resolution
'1980x1080'
>>>>

I found this one might be help.
http://sourceforge.net/projects/m3u8parser/
(License: LGPLv3)

You can also use: Python m3u8 parser.
Example below:
import m3u8
playlist = """
#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234
Stream1/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=763319,RESOLUTION=480x270
Stream2/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1050224,RESOLUTION=640x360
Stream3/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1910937,RESOLUTION=640x360
Stream4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3775816,RESOLUTION=1280x720
Stream5/index.m3u8
"""
_playlist = m3u8.loads(playlist).playlists
for item in _playlist:
item_uri = item.uri
resolution = item.stream_info.resolution
bandwidth = item.stream_info.bandwidth
print(item_uri ,resolution , bandwidth )
result will be :
Stream1/index.m3u8 (416, 234) 476416
Stream2/index.m3u8 (480, 270) 763319
Stream3/index.m3u8 (640, 360) 1050224
Stream4/index.m3u8 (640, 360) 1910937
Stream5/index.m3u8 (1280, 720) 3775816

Related

Kotlin regex not working for polish char ("ł") which I get at runtime

I've declared a regex like this:
"(^\\d{1,}\\,\\d{2}|^0) zł$"
Unfortunately it doesn't match below value (but it should)
508,00 zł
NOTE1: I've discovered, that the problem is probably with the ł character
NOTE2: The problem is, that i am getting this String from an API and check it at runtime (it has exact value as I described)
NOTE3: I've also tried to manually match my pattern in the debugger evaluation (when I just typed the "508, 00zł" by hand) and it matched. Unfortunately the string itself that I get doesn't match at runtime. What can be the possible problem?
Code:
val value = getFromApi() // 508,00 zł
val regex = "(^\\d{1,}\\,\\d{2}|^0) zł$".toRegex()
regex.matches(value) // returns false

The letter ł is not a culprit here since there is one Unicode representation for it.
The most common issue is the whitespace: it can be any Unicode whitespace there and from the looks of it, you will never be able to tell.
To match any ASCII whitespace, you may use \s. Here, you had this kind of whitespace, so my top comment below the question worked for you.
To match any Unicode whitespace, you may use \p{Z} to match any one whitespace character, or \p{Z}* to match 0 or more of their occurrences:
val value = "508,00 zł"
val regex = """^(\d+,\d{2}|0)\p{Z}zł$""".toRegex()
// val regex = """^(\d+,\d{2}|0)\p{Z}*zł$""".toRegex()
println(regex.matches(value)) // => True
See Kotlin demo
Also, note the use of the raw string literals (delimited with triple double quotation marks), they enable the use of a single backslash as the regex escape char.
Note {1,} is the same as + quantifier that matches 1 or more repetitions.

Custom Regular Expression in Java

I have to implement a function that check if a string is compliant to a regular expression, I have wrote a method that parse a list of filename, for each file name I need to check if respect the regexp.
The filename is composed like as follow (just an example):
verbale.pdf.001.001
image.jpg.002.001
The string is always composed by:
extension (only jpg or pdf) "." a group of three number "." a group of three number
With this regexp I need to check if the string in input end as described above, I have currently implemented this:
Pattern rexExp = Pattern.compile("((\\.jpg)|(\\.pdf))\\.[0-9]{3}\\.[0-9]{3}");
But not work properly, is it a good idea implement a regExp to check if a filename end with a certain path ?

Less greedy than the other answer, think it suits you:
\\w+\\.(jpg|pdf)(\\.\\d{3}){2}
file name, only composed of letters, numbers and _
dot
jpg or pdf formats
another dot
three digits
the dot and the three digits repeated

This should work :
.*\\w{3}\\.\\d{3}\\.\\d{3}
.* = any Characters (like "verbale123")
\\w{3} = any 3 alphabetic\numeric characters
\\. = a dot
\\d{3} = any three numeric characters

To check if a string ends with pdf or jpg and two sequences of . and 3 digits, you may use
(?i)(?:jpg|pdf)(?:\.[0-9]{3}){2}$
See the regex demo
Details
(?i) - case insensitive flag
(?:jpg|pdf) - either jpg or pdf
(?:\.[0-9]{3}){2} - 2 repetitions of a . and 3 digits
$ - end of string.
Use with Matcher#find() (as matches() anchors the match at the start and end of the string, while a partial match is required when using this pattern), example demo:
String s = "verbale.pdf.001.001";
Matcher matcher = Pattern.compile("(?i)(?:jpg|pdf)(?:\\.[0-9]{3}){2}$").matcher(s);
if (matcher.find()){
System.out.println("Valid!");
}

Create regex pattern for a specified String

I want to check if a String has a specified structure. I think regex would be the best way to test the String, but I have never used regex before and have sadly no clue how it works. I watched some explanations on stackoverflow, but I couldn't find a good explanation how the regex pattern was created.
My String gets returned from a DataMatrix scanner. For example
String contained = "~ak4,0000D"
Now I want to test this String, if it matches the pattern from the regex.
The String starts everytime with the "~".
After this, two lower cased charactes follow in this example "ak".
After this, there follows a six character long value "4,0000". Main problem here, because the comma can sit anywhere in this value, but the comma must be contained in it. For example it can be ",16000" or "150,00" or "2,8000".
At the last position there must be one of this characters A B C D E F G H J K L M in uppercase contained.
I hope some of you guys can help me.

The regex would be ~[a-z]{2}(?=[\d\,]{6})((\d)*\,(\d)*)[A-H|J-M]{1}$ You can create and test expressions here
boolean isMatch(String STRING_YOU_WANT_TO_MATCH)
{
Pattern patt = Pattern.compile(YOUR_REGEX_PATTERN);
Matcher matcher = patt.matcher(STRING_YOU_WANT_TO_MATCH);
return matcher.matches();
}

You need to use a positive lookahead based regex like below.
System.out.println("~ak4,0000D".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
System.out.println("~fk,10000D".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
System.out.println("~jk400,00D".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
System.out.println("~ak4,0000D".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
System.out.println("~fk10000,D".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
System.out.println("~jk400,00I".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
System.out.println("~ak40000,Z".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
System.out.println("~fky,10000D".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
System.out.println("~,jk40000D".matches("~[a-z]{2}(?=\\d*,\\d*.$)[\\d,]{6}[A-HJ-M]"));
Output:
true
true
true
true
true
false
false
false
false

One thing you need to know about regular expressions are that they are a family of things, not one specific thing. There are rather a lot of distinct but similar regular expression languages, and the facilities supporting them vary from programming language to programming language.
Here is a regex pattern that will work in most regex languages to match your strings:
"^~[a-z][a-z]((,[0-9][0-9][0-9][0-9][0-9])|([0-9],[0-9][0-9][0-9][0-9])|([0-9][0-9],[0-9][0-9][0-9])|([0-9][0-9][0-9],[0-9][0-9])|([0-9][0-9][0-9][0-9],[0-9])|([0-9][0-9][0-9][0-9][0-9],))[A-HJ-M]$"
The '^' anchors the pattern to the beginning of the string, and the '$' anchors it to the end, so that the pattern must match the whole string as opposed to a substring. Characters enclosed in square brackets represent "character classes" matching exactly one character from among a set, with the two characters separated by a '-' representing a range of characters. The '|' separates alternatives, and parentheses serve to group subpatterns. For some regex engines, the parentheses and '\' symbols need to be escaped via a preceeding '\' character to have these special meanings instead of representing themselves.
A more featureful regex language can allow that to be greatly simplified; for example:
"^~[a-z]{2}[0-9,]{6}(?<=[a-z][0-9]*,[0-9]*)[A-HJ-M]$"
The quantifiers "{2}" and "{6}" designate that the preceding subpattern must match exactly the specified number of times (instead of once), and the quantifier "*" designates that the preceding subpattern may match any number of times, including zero. Additionally, the "(?<= ...)" is a zero-length look-behind assertion, which tests whether the previous characters of the input match the given sub-pattern (in addition to having already matched the preceding sub-pattern); the characters must also match the subsequent sub-pattern (which does consume them). The '.' metacharacter and '*' quantifier are supported in pretty much all regex languages, but assertions and curly-brace quantifiers are less widely supported. Java's and Perl's regular expression languages will both understand this pattern, however.

~[a-z]{2}[\d|\,]{6}[A-M]
I'm no pro at regex though,but I used this site everytime to build my pattern:
RegExr
Use it like this in your code:
Pattern pattern = Pattern.compile(yourPatternAsAString);
Matcher matcher = pattern.matcher(yourInputToMatch);
if(matcher.matches()) {
// gogogo
}

Android Regular Expression - Replace all spaces '(' ')' '-'

I newt o regular expressions and been using tutorials, but the regular express I have works sometimes, but doesn't all the time. I am getting my numbers out of the contact list from my android phone. I am trying to get rid of all spaces, '(', ')', and '-'
For example:
1. (555) 867-5309 -> 5558675309
2. 1555-555-5555 -> 15555555555
3. 555-555-5555 -> 5555555555
This is the line I am using
String formatphone = contactPhone.replaceAll("\\s()-","");
For some numbers it only returns number and sometimes it doesn't change the format.
Is it correct? Do i need to format something because I am taking it out of the phone's contact list?

Put the desired characters in a character class:
String formatphone = contactPhone.replaceAll("[ ()-]","");
Ensure that you put the hyphen - at either end.

Try using this:
String formatphone = contactPhone.replaceAll("^.*[\\s\\(\\)-].*", "");
As a regular expression you're defining a set using []. In that set you include any character you want to be replaced. As ( and ) are special meaning characters, you have to escape them. As the - is a special character used to design ranges, it has to be the last character of your set, so if nothing is behind it, it's not a range, but just that character (you could escape it too, though).

android java URLDecoder problem

i have a String displayed on a WebView as "Siwy & Para Wino"
i fetch it from url , i got a string "Siwy%2B%2526%2BPara%2BWino". // be corrected
now i'm trying to use URLDecoder to solve this problem :
String decoded_result = URLDecoder.decode(url); // the url is "Siwy+%26+Para+Wino"
then i print it out , i still saw "Siwy+%26+Para+Wino"
Could anyone tell me why?

From the documentation (of URLDecoder):
This class is used to decode a string which is encoded in the application/x-www-form-urlencoded MIME content type.
We can look at the specification to see what a form-urlencoded MIME type is:
The form field names and values are escaped: space characters are replaced by '+', and then reserved characters are escaped as per [URL]; that is, non-alphanumeric characters are replaced by '%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks, as in multi-line text field values, are represented as CR LF pairs, i.e. '%0D%0A'.
Since the specification calls for a percent sign followed by two hexadecimal digits for the ASCII code, the first time you call the decode(String s) method, it converts those into single characters, leaving the two additional characters 26 intact. The value %25 translates to % so the result after the first decoding is %26. Running decode one more time simply translates %26 back into &.
String decoded_result = URLDecoder.decode(URLDecoder.decode(url));
You can also use the Uri class if you have UTF-8-encoded strings:
Decodes '%'-escaped octets in the given string using the UTF-8 scheme.
Then use:
String decoded_result = Uri.decode(Uri.decode(url));

thanks for all answers , i solved it finally......
solution:
after i used URLDecoder.decode twice (oh my god) , i got what i want.
String temp = URLDecoder.decode( url); // url = "Siwy%2B%2526%2BPara%2BWino"
String result = URLDecoder.decode( temp ); // temp = "Siwy+%26+Para+Wino"
// result = "Swy & Para Wino". !!! oh good job.
but i still don't know why.. could someone tell me?

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Parsing HLS m3u8 file using regular expressions - android

I found this one might be help. http://sourceforge.net/projects/m3u8parser/ (License: LGPLv3)

Related

Kotlin regex not working for polish char ("ł") which I get at runtime

Custom Regular Expression in Java

Create regex pattern for a specified String

Android Regular Expression - Replace all spaces '(' ')' '-'

android java URLDecoder problem

Categories

Resources