I am trying to use Regular Expressions to decode some HTML I retrieve from a webpage.
I want to transform some <iframe> tags into links.
The code I'm using should be working fine according to me and some testprograms, however when I run it on my android device it does not find any matches (where as in the test programs it does).
The regular expression I am using is as follows (keep in mind I'm coding in Java, so I need to escape the escape character as well):
String regularExpression = "<iframe.+?src=\\\\?(\\S+).+?(><\\\\?/iframe>|\\\\?/>)";
String replacement = "<a href=$1>Youtube</a>";
input.replaceAll(regularExpression, replacement);
From what I can gather from this it should replace all <iframe> tags that have a src attribute to hyperlinks with that source. However when I feed the following input to it, it does nothing with it:
<iframe src=\"http:\/\/www.youtube.com\/embed\/s6b33PTbGxk\" frameborder=\"0\" width=\"500\" height=\"284\"><\/iframe>
The response is simply the exact same text, only with the escape-characters removed:
<iframe src="http://www.youtube.com/embed/s6b33PTbGxk" frameborder="0" width="500" height="284"></iframe>
Can someone help me and explain what I'm doing wrong? I only started learning Regular Expressions yesterday, but I just can't for the life of me figure out why this doesn't work.
The method String.replaceAll doesn't modify the string. It can't because strings are immutable. Instead it returns a new string with the result. You need to assign this result to something:
String result = input.replaceAll(regularExpression, replacement);
Also, don't use regular expressions to parse HTML.
String resultString = subjectString.replaceAll("(?=<(iframe)\\s+src\\s*=\\s*(['\"])(.*?)\\2[^>]*>).*?</\\1>", "<a href=$3>Youtube</a>");
This should work. In addition to #Mark Byers note your regex does not seem to match to your input, even with removed (double) backslashes.
Related
I'm having an issue when using the EvaluteJavascript on both Android and iOS (I'm using Xamarin).
The issue is when I want to pass a string argument to a javascript function, if that string contains special character, the js compiler will not understand.
For example:
EvaluateJavascript("updateHtml('Some Html \n Some Html')")
But if i use, this will work:
EvaluateJavascript("updateHtml('Some Html Some Html')")
So the question is how am i able to pass entire string as argument to the javascript function in EvaluateJavascript.
Thanks in advance :)
I was able to solve my issue.
I encoded my string as URL encoding like this
EvaluateJavascript('$"updateHtml({Uri.EscapeUriString("Some Html \n Some Html"'})))
Then in my javascript i just need to decode it by using:
document.getElementById("body").innerHTML = decodeURI(html);
How about using "\\n" instead of "\n" ?
I'm getting a JSON response string similar to this:
<strong>B.<\/strong> Because there is no indication of Miss Manette’s feelings
The string text that I'm receiving is full of tags like <strong>, <em> and ’
“
” etc. How can I parse it to a plain String with same features?
The only way I could think of is replacing such characters and using Html.fromHtml() method. Is there a built-in parser available? How could I parse such HTML text?
Use Html.fromHtml only. It'll parse most of the tags supplied and give you the formatted output. The point to note here is that not all of the HTML tags are supported by this method. Checkout this link for more information about what tags are supported. Also check this, though it's a bit old.
If you know what text you'll be parsing, and you have tags that aren't parsed by fromHtml, your best bet would be to replace them with empty string and then use this method.
I know that I can use the default Html.fromHtml(string) but it highlights links by default. Is there a way to prevent that behavior?
P.S.
I'm trying to feed it straight to TextView using .setText() without saving it to String. I want to keep all the formatting except links.
If you only need text you can fetch it with:
String string = Html.fromHtml(string).toString();
Edit:
Since you want to remove only the links, you can use String.replaceAll before parsing the html:
// Remove <a href*>
html = html.replaceAll("<a href.*?>", "");
// Remove </a>
html = html.replaceAll("</a>", "");
textView.setText(Html.fromHtml(html));
In my app I have a Textview with some text. I'm trying to get an input from the user, and then highlight words in the Textview according to that input.
For instance if the text is
Hello stackoverflow
and the input for the user is
hello
I want to replace the text with:
<font color='red'>Hello</font>` stackoverflow
This is my code:
String input = //GETTING INPUT FROM THE USER
text= text.replaceAll(input,"<font color='red'>"+input+"</font>");
Textview.setText(Html.fromHtml(text));
And the replacement is working, but the problem is that my current code changes the original word cases, for example :
Text: HeLLo stackoverflow
Input: hello
What i get: <font color='red'>hello</font> stackoverflow
What i want: <font color='red'>HeLLo</font> stackoverflow
You have to think about regular expressions.
replaceAll allows you to use regular expressions, and so, you can replace the text for the exact occurrence that was found.
For instance if Hello was found, it replaces it for <font color='red'>Hello</font>.
If HeLLo is found, it replaces it for <font color='red'>HeLLo</font>
Your code should be somehing as easy as:
String highlighted = text.replaceAll("(?i)("+input+")","<font color='red'>$1</font>");
This means:
(?i) : i want to search for something, case insensitive
"("+input+")" : input is betwen ( and ) because we are creating a group, so this group can be refered later
"<font color='red'>$1</font>" : instead of replacing by input, that would change the case, we replace it by `$1, that is the reference to the first matched group. This means that we want to replace it using the exact word that was found.
But please, try it and keep playing since regular expressions are tricky.
Other reads
It is easier and more clear if you use the Patternclass.
You can read more here:
http://developer.android.com/reference/java/util/regex/Pattern.html
Also, you can take a look at how to do it:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29
public String replaceAll(String regex, String replacement)
.
Replaces each substring of this string that matches the given regular expression with the given replacement.
An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceAll(repl)
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.
Parameters:
regex - the regular expression to which this string is to be matched
replacement - the string to be substituted for each match
Returns:
The resulting String
UPDATE
You can test your regular expressions in this page:
http://www.regexplanet.com/advanced/java/index.html
I want to grab img tag from text returned from json data like that
#رصد| #انقلاب_3يوليو| اليوم ... مبني المركبات العسكري في صلاح سالم<br /> <br /> تصوير المواطن الصحفي : عبدالرحمن النحاس<br/><br/><img class="img" src="https://fbcdn-photos-c-a.akamaihd.net/hphotos-ak-frc3/1239478_598075296936250_1910331324_s.jpg" alt="" />
i want to grab this
<img class="img" src="https://fbcdn-photos-c-a.akamaihd.net/hphotos-ak-frc3/1239478_598075296936250_1910331324_s.jpg" alt="" />
what the reqular expression i must use in android to match it
I used this code but it is not working
String content = e.getString("content");
String img = "";
Pattern p = Pattern
.compile("<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>");
Matcher m = p.matcher(content);
if (m.matches()) {
Log.d("true", m.group(0).toString());
img = m.group(0).toString();
}
Log.d("image", "image : " + content);
Using regular expressions to parse HTML is a very bad idea.
Better to use a true HTML parser and walk the DOM tree to get what you want.
You also need to be careful about proper encoding, since you want Arabic text.
Well... you know you can get the JSON object and parse that without regex? that is probably the best approach. Then you can just strip out the content without worrying about parsing anything from a string because it automatically puts it into variables for you.
How to parse JSON
It can become very messy to mess around with regex for the reasons #duffymo posted above me.
edit:
I see what you are trying to do.... parse the image out of the content section correct? There needs to be two things involved here yes.. regular expressions and also json parsing. You need to grab all the content fields from the json parser then use regex on those to extract the images. That's what you are trying to do correct?