how can i use regex to match this string in android? - android

I want to grab img tag from text returned from json data like that
‫#رصد| #انقلاب_3يوليو| اليوم ... مبني المركبات العسكري في صلاح سالم<br /> <br /> تصوير المواطن الصحفي : عبدالرحمن النحاس‬<br/><br/><img class="img" src="https://fbcdn-photos-c-a.akamaihd.net/hphotos-ak-frc3/1239478_598075296936250_1910331324_s.jpg" alt="" />
i want to grab this
<img class="img" src="https://fbcdn-photos-c-a.akamaihd.net/hphotos-ak-frc3/1239478_598075296936250_1910331324_s.jpg" alt="" />
what the reqular expression i must use in android to match it
I used this code but it is not working
String content = e.getString("content");
String img = "";
Pattern p = Pattern
.compile("<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>");
Matcher m = p.matcher(content);
if (m.matches()) {
Log.d("true", m.group(0).toString());
img = m.group(0).toString();
}
Log.d("image", "image : " + content);

Using regular expressions to parse HTML is a very bad idea.
Better to use a true HTML parser and walk the DOM tree to get what you want.
You also need to be careful about proper encoding, since you want Arabic text.

Well... you know you can get the JSON object and parse that without regex? that is probably the best approach. Then you can just strip out the content without worrying about parsing anything from a string because it automatically puts it into variables for you.
How to parse JSON
It can become very messy to mess around with regex for the reasons #duffymo posted above me.
edit:
I see what you are trying to do.... parse the image out of the content section correct? There needs to be two things involved here yes.. regular expressions and also json parsing. You need to grab all the content fields from the json parser then use regex on those to extract the images. That's what you are trying to do correct?

Related

How to parse HTML-formatted String to plain string?

I'm getting a JSON response string similar to this:
<strong>B.<\/strong> Because there is no indication of Miss Manette’s feelings
The string text that I'm receiving is full of tags like <strong>, <em> and ’
“
” etc. How can I parse it to a plain String with same features?
The only way I could think of is replacing such characters and using Html.fromHtml() method. Is there a built-in parser available? How could I parse such HTML text?
Use Html.fromHtml only. It'll parse most of the tags supplied and give you the formatted output. The point to note here is that not all of the HTML tags are supported by this method. Checkout this link for more information about what tags are supported. Also check this, though it's a bit old.
If you know what text you'll be parsing, and you have tags that aren't parsed by fromHtml, your best bet would be to replace them with empty string and then use this method.

How do I pick selective html content in my webview in android?

I am currently trying to import selective headline from html content in my webview. I am looking at wide variety of options like json parsing or any hack will do. I was wondering if anyone has had experience with this or a brief idea on how to go about this?
Here's my example:
This is my html file content:
<div><h1><span class = "headline"> Some depressing title </span> <span class = "source" > ABCD </span> </h1> <br/> <span class = "body"> crappy body content which I do not need </span></div>
I just want to retrieve "headline" and "source" from this html in my webview, nothing else(not the body ). How do I go about defining a parameter to retrieve these? Any clues on how to do it?
Thanks!
Step 1: get the HTML source from your WebView - see this question. You basically create a JS interface that extracts your HTML source to a Java String.
Step 2: Use an HTML Parser (for example JSOUP) to parse the JAVA String into a format that you can handle easily.
Step 3: Use the parser to extract your relevant information. Here, you could use getElementsByTag('span') to get all your spans, then filter by class; or you could directly use getElementsByClass('healine') and getElementsByClass('source').
In general, you can retreive the HTML source and parse the DOM in all cases.
Edit: if you don't want to use a parser, you can extract your information by using searches on the HTML source string (finding the correct classes, then finding the indexes of '<' and '>' caracters to parse the information. This way is harder, less efficient, and less flexible, but it can be done.

Is there a way to convert Html to String without highlighting hyperlinks?

I know that I can use the default Html.fromHtml(string) but it highlights links by default. Is there a way to prevent that behavior?
P.S.
I'm trying to feed it straight to TextView using .setText() without saving it to String. I want to keep all the formatting except links.
If you only need text you can fetch it with:
String string = Html.fromHtml(string).toString();
Edit:
Since you want to remove only the links, you can use String.replaceAll before parsing the html:
// Remove <a href*>
html = html.replaceAll("<a href.*?>", "");
// Remove </a>
html = html.replaceAll("</a>", "");
textView.setText(Html.fromHtml(html));

Json parsing converts html tags to escape sequence

I am fetching few html content from my server for which I am using JSON parsing. But this converts my html content to unicode values.
For Eg: <p>Spend minimum $10 (in a single same-day receipt) at any outlet<\/p> is getting converted to,
;p>Spend minimum $10 (in a single same-day receipt) at any outlet </p>
Now if I try to set this to my WebView it displays with HTML tags itself. If I try to encode the data using TextUtils.encode it displays the text with unicode values.
Can anyone help me with this.
How should I fetch a HTML content and display it in WebView?
I am not getting your question exactly but, If you want to load HTML in web view in you can use
webView.loadDataWithBaseURL(null, html, "text/html", "UTF-8", null);
and if you want to convert &lt and &gt like notation you can use Jsoup Library
Guys thanks for your help. But I have solved this issue myself. I have elaborated my way of solving the issue.
What I did is,
1)convert the unicode value to Spanned like this,
Spanned ss=Html.fromHtml(;p>Spend minimum $10 (in a single same-day receipt) at any outlet </p>");
2)Now convert this Spanned to String like this,
String tempString=ss.toString();
3)And now set this to WebView which solved the problem,
webView.loadData(tempString, "text/html","UTF-8");
Actually this isn't JSON encoder converts data to HTML entities but some other layer, before it passed to JSON encoder.
JSON have nothing to do with HTML tags, usually only quotes encoded by parser (Unicode is supported by most parsers).
You probably need to change the way data is returned by server, to omit encoding of HTML tags braces to HTML entities or decoding entities backin your app.
Update:
To decode HTML entities used in HTML tags (and others too) you may use StringEscapeUtils.unescapeHTML()
To show the HTML page inside the Webview why you require the JSON. create web view inside the XML and write below code Inside the Activity you can see the HTML page.
webView = (WebView)findViewById(R.id.webView);
FrameLayout mContentView = (FrameLayout) getWindow().
getDecorView().findViewById(android.R.id.content);
final View zoom = this.webView.getZoomControls();
mContentView.addView(zoom, ZOOM_PARAMS);
zoom.setVisibility(View.GONE);
webView.loadUrl("http://www.google.co.in/");

Android RegEx doesn't find matches

I am trying to use Regular Expressions to decode some HTML I retrieve from a webpage.
I want to transform some <iframe> tags into links.
The code I'm using should be working fine according to me and some testprograms, however when I run it on my android device it does not find any matches (where as in the test programs it does).
The regular expression I am using is as follows (keep in mind I'm coding in Java, so I need to escape the escape character as well):
String regularExpression = "<iframe.+?src=\\\\?(\\S+).+?(><\\\\?/iframe>|\\\\?/>)";
String replacement = "<a href=$1>Youtube</a>";
input.replaceAll(regularExpression, replacement);
From what I can gather from this it should replace all <iframe> tags that have a src attribute to hyperlinks with that source. However when I feed the following input to it, it does nothing with it:
<iframe src=\"http:\/\/www.youtube.com\/embed\/s6b33PTbGxk\" frameborder=\"0\" width=\"500\" height=\"284\"><\/iframe>
The response is simply the exact same text, only with the escape-characters removed:
<iframe src="http://www.youtube.com/embed/s6b33PTbGxk" frameborder="0" width="500" height="284"></iframe>
Can someone help me and explain what I'm doing wrong? I only started learning Regular Expressions yesterday, but I just can't for the life of me figure out why this doesn't work.
The method String.replaceAll doesn't modify the string. It can't because strings are immutable. Instead it returns a new string with the result. You need to assign this result to something:
String result = input.replaceAll(regularExpression, replacement);
Also, don't use regular expressions to parse HTML.
String resultString = subjectString.replaceAll("(?=<(iframe)\\s+src\\s*=\\s*(['\"])(.*?)\\2[^>]*>).*?</\\1>", "<a href=$3>Youtube</a>");
This should work. In addition to #Mark Byers note your regex does not seem to match to your input, even with removed (double) backslashes.

Categories

Resources