Jsoup - extract html from element - android

I want to extract HTML code from a div element using jsoup HTML parser library.
HTML code:
<div class="entry-content">
<div class="entry-body">
<p><strong>Text 1</strong></p>
<p><strong> <a class="asset-img-link" href="http://example.com" style="display: inline;"><img alt="IMG_7519" class="asset asset-image at-xid-6a00d8341c648253ef01b7c8114e72970b img-responsive" src="http://example.com" style="width: 500px;" title="IMG_7519" /></a><br /></strong></p>
<p><em>Text 2</em> </p>
</div>
</div>
Extract part:
String content = ... the content of the HTML from above
Document doc = Jsoup.parse(content);
Element el = doc.select("div.entry-body").first();
I want to the result el.html() to be the whole HTML from div tab entry-body:
<p><strong>Text 1</strong></p>
<p><strong> <a class="asset-img-link" href="http://example.com" style="display: inline;"><img alt="IMG_7519" class="asset asset-image at-xid-6a00d8341c648253ef01b7c8114e72970b img-responsive" src="http://example.com" style="width: 500px;" title="IMG_7519" /></a><br /></strong></p>
<p><em>Text 2</em> </p>
but I get only the first <p> tag:
<p><strong>Text 1</strong></p>

Try this:
Elements el = doc.select("div.entry-body");
instead of this:
Element el = doc.select("div.entry-body").first();
and then:
for(Element e : el){
e.html();
}
EDIT
Maybe you'll get you result if you do that this way:
I have try to do it and it give a correct result.
Elements el = doc.select("a.asset-img-link");

As mentioned in the comments to the OP, I don't get it. Here is my reproduction of the problem, and it does exactly what you want:
String html = ""
+"<div class=\"entry-content\">"
+" <div class=\"entry-body\">"
+" <p><strong>Text 1</strong></p>"
+" <p><strong> <a class=\"asset-img-link\" href=\"http://example.com\" style=\"display: inline;\"><img alt=\"IMG_7519\" class=\"asset asset-image at-xid-6a00d8341c648253ef01b7c8114e72970b img-responsive\" src=\"http://example.com\" style=\"width: 500px;\" title=\"IMG_7519\" /></a><br /></strong></p>"
+" <p><em>Text 2</em> </p>"
+" </div>"
+"</div>"
;
Document doc = Jsoup.parse(html);
Element el = doc.select("div.entry-body").first();
System.out.println(el.html());
This results in the following output:
<p><strong>Text 1</strong></p>
<p><strong> <a class="asset-img-link" href="http://example.com" style="display: inline;"><img alt="IMG_7519" class="asset asset-image at-xid-6a00d8341c648253ef01b7c8114e72970b img-responsive" src="http://example.com" style="width: 500px;" title="IMG_7519"></a><br></strong></p>
<p><em>Text 2</em> </p>

In your case, you'd use
doc.select("div[name=entry-body]") to select that specific <div>
acording to this cookbook

Related

How to parse ul , li tags in android studio using jsoup without div and display it in recycler view?

I am trying to retrieve the data from web page and the sample html code is shown below I want to retrieve the data and show it in list view.
<html>
<head>
<title>Index of /abc/xyz/Female/pqr</title>
</head>
<body>
<h1>Index of /abc/xyz/Female/pqr</h1>
<ul>
<li> Parent Directory</li>
<li> 2016060500004.png</li>
<li> 2016060500011.png</li>
<li> 2016060500012.png</li>
</ul>
</body>
</html>
To select ul use this :
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements ul = doc.select("ul"); // select ul
refer to docs here: Jsoup docs
Example :
String html = "<html>"+
"<head>"+
"<title>Index of /abc/xyz/Female/pqr</title>"+
"</head>"+
"<body>"+
"<h1>Index of /abc/xyz/Female/pqr</h1>"+
"<ul>" +
"<li> Parent Directory</li>"+
"<li> 2016060500004.png</li>"+
"<li> 2016060500011.png</li>"+
"<li> 2016060500012.png</li>"+
"</ul>" +
"</body>"+
"</html>";
Document doc = Jsoup.parse(html);
Elements links = doc.select("ul"); // select ul
for(Element b : links){
System.out.println(b.text());
}

Android - Extract html with tag and make count on result

I got a JSON object that return in a long string with html content,
is there anyway i can get the certain text from this html string and assign put it into textview? what i want to get would probably the content under <h1> & <p> and throw all others away.
\r\n<div class=\"page-title-wrap sx-hide\">\r\n
<div class=\"page-title clearfix\">\r\n
<div class=\"col-lg-12\">\r\n <h1>Latest Deals</h1>\r\n </div>\r\n </div>\r\n
</div>\r\n\r\n<div class=\"breadcrumb-wrapper\">\r\n
<ul class=\"breadcrumb\">\r\n
<li>Home</li>\r\n
<li>Deals</li>\r\n
<li class=\"active\">Great promotion! Is now RM 95 only!
</li>\r\n
</ul>\r\n
</div>\r\n\r\n
<div class=\"article outer clearfix\">\r\n
<div class=\"col-sm-12\">\r\n
<img alt=\"" title=\"Great promotion! Is now RM 95 only!\" src=\"">\r\n
<h1>Great promotion! Is now RM 95 only!</h1>\r\n
<p class=\"date\">March 28th, 2017</p>\r\n
<p><strong class=\"text-red\"></strong></p>\r\n
<p>This is the paragraht that shows the description of the promotion deals. You can write anything here.\r\n </p>\r\n
<p>The buses offered by Alisan Golden Coach are in single deck or double deck. All of the buses are equip with air-conditioning and comfortable seats to ensure passengers are comfortable while travelling on the long journeys.</p>\r\n\r\n
<p>Book your bus ticket before too late and enjoy the great saving.</p>\r\n\r\n\r\n\r\n\r\n\r\n\r\n
<div class=\"m-top30 m-bottom20\">\r\n
Home\r\n\r\n \r\n\r\n\r\n</div>\r\n\r\n\r\n
<div id=\"fb-root\"></div>\r\n<script>\r\n (function(d, s, id) {\r\n var js, fjs = d.getElementsByTagName(s)[0];\r\n if (d.getElementById(id)) return;\r\n js = d.createElement(s); js.id = id;\r\n js.async = true;\r\n js.src = '';\r\n fjs.parentNode.insertBefore(js, fjs);\r\n }(document, 'script', 'facebook-jssdk'));</script>\r\n\r\n
<div class=\"fb-share-button\" data-href=\"http://google.com/\" data-layout=\"button_count\" data-size=\"large\" data-mobile-iframe=\"true\">\r\n
<a target=\"_blank\" href=\"" class=\"fb-xfbml-parse-ignore\">Share</a>\r\n</div>\r\n </div>\r\n</div>
basically, the <h1> content will assign to "Title" textview and i was think about to take the content from first <p> and last </p>( the last /p before div class) to put it into a webview as i want to have the paragraph arrangement.
==================== EDIT ====================
I had successfully get the content using Jsoup, realize that using the .text() will get all element with the tag and put into a single line, however, i want the result that will maintain the <p> format while load into HTML view. Any idea i can do with it?
Document doc = Jsoup.parse(content);
Elements eTitle = doc.getElementsByTag("h1");
Elements eBody = doc.getElementsByTag("p");
binding.fragmentWebview.loadData("<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">" + eBody.text(), "text/html; charset=utf-8","UTF-8");

Jsoup get text from div class

This my html
<div class="open-statuses">
<div class="open-status" id="lifts-status-scripted">
<h3>Lifts</h3>
<div class="status-graph">
<canvas width="177" height="177"></canvas>
<div class="open-number">04</div>
<div class="total-number">4</div>
</div>
Details
</div>
<div class="open-status" id="trails-status-scripted">
<h3>Trails</h3>
<div class="status-graph">
<canvas width="177" height="177"></canvas>
<div class="open-number">12</div>
<div class="total-number">169</div>
</div>
Details
</div>
<div class="open-status open" id="road-status-scripted">
<h3>Road</h3>
<div class="status-graph">
<canvas width="177" height="177"></canvas>
<div class="status-message">Open</div>
</div>
Road Conditions
</div>
</div>
I need the text from the (div class="open-status" id="trails-status-scripted"), I cant do it. I use this code for the first class, with no problems, but I can't do it for the second div class.
Elements div1=document.select("#mountain-report-page");
Elements div2=div1.select(".open-statuses-holder");
Elements div3=div2.select(".open-statuses");
Jliftbig = div3.select("div.open-number").first().ownText();
Any clue?
Simplify in this way:
Elements div = doc.select("div[id=mountain-report-page] div[class=open-statuses-holder] div[class=open-statuses] div[class=open-status]");
for (Element e : div){
if (e.id().equals("trails-status-scripted")){
Element ele = e.select("div[class=status-graph] div[class=open-number]").first();
String str = ele.text();
}
}
Done. I resolved with this code
Element div = document.select("div[id=mountain-report-page] div[class=open-statuses-holder] div[class=open-statuses] div[class=open-status] ").get(2);
String Jtrails = div.select("div.open-number").first().ownText();
Since in HTML all IDs must be unique, so you can simply use this selector.
Element div = document.select("#trails-status-scripted .open-number");
Notes:
#foo equals to *[id=foo]
.foo equals to *[class=foo]

if statement is not working the way i want it to using jsoup

Am trying to use an if statement how ever its not doing what i want it to.
Am trying to get all the images extracted from html source using jsoup some items in html dont have images so there is no ( img ) tag's in them so here is the if statement i use
Elements imagess = doc.select("img[src$=.jpg]");
//Elements imagess = doc.select("img");
for (Element table : doc.select("div[class=listing-content]")) {
// Identify all the table row's(tr)
for (Element row : table.select("div:gt(0)")) {
HashMap<String, String> map = new HashMap<String, String>();
String[] imgg = new String[imagess.size()];
ArrayList products = new ArrayList();
for (int i = 0; i < imagess.size(); i++)
if (imagess.toString().contains("https://ssli")) {
imgg[i] = imagess.get(i).attr("src");
} else {
imgg[i] = "https://afs.googleusercontent.com/gumtree-com/noimage_thumbnail_120x92_v2.png";
}
so while looping if (https://ssli) is found during the loop process then extract the current found
imgg[i] = imagess.get(i).attr("src");else let it add blank image url imgg[i] = "https://afs.googleusercontent.com/gumtree-com/noimage_thumbnail_120x92_v2.png";
here is the part of html code extracted from page is has more image and no image tags
<div class="listing-content">
<h2 class="listing-title" itemprop="name">
Faulty Xbox 36
</h2>
<p class="listing-description
hide-fully-to-m"
itemprop="description">
Turns on but tray broken so can't load games .
Sold as seen
</p>
<ul class="listing-attributes inline-list hide-fully-to-m">
</ul>
<div class="listing-location" itemscope itemtype="http://schema.org/Place">
<span class="truncate-line" itemprop="name">
Sunbury-on-Thames, Surrey
</span>
</div>
<strong class="listing-price txt-emphasis"
itemprop="price">£20</strong>
<strong class="listing-posted-date txt-normal truncate-line" itemprop="adAge">
<span class="hide-visually">Ad posted </span>
11 mins ago
</strong>
</div>
</a>
<span class="save-ad listing-save-ad"
data-savead="channel:savead-1131358978">
<span class="hide-visually">Save this ad</span>
<span class="icn-star iconu-m txt-quaternary" aria-hidden="true"></span>
</span>
</article>
</li>
<li>
<article class="listing-maxi" itemscope itemtype="http://schema.org/Product" data-q=ad-1131358703>
<a class="listing-link" href="/p/video-games/xbox-360-cod-/1131358703" itemprop="url">
<div class="listing-side">
<div class="listing-thumbnail ">
<img src="" data-lazy="https://ssli.ebayimg.com/00/s/ODAwWDYwMA==/z/uFgAAOSwMmBV4eSL/$_26.JPG"
alt="" itemprop="image"
class="hide-fully-no-js"/>
<noscript>
<img src="https://ssli.ebayimg.com/00/s/ODAwWDYwMA==/z/uFgAAOSwMmBV4eSL/$_26.JPG" alt="" itemprop="image"/>
</noscript>
</div>
<div class="listing-meta">
<ul class="inline-list txt-center">
<li>1<span class="hide-visually"> images</span>
<span class="icn-camera txt-quaternary" aria-hidden="true"></span>
</li>
</ul>
</div>
</div>
<div class="listing-content">
<h2 class="listing-title" itemprop="name">
Xbox 360 cod
</h2>
<p class="listing-description truncate-paragraph
hide-fully-to-m"
itemprop="description">
Call of duty advanced warfare £12
Call of duty modern warfare 3 £5
Black ops 2 SOLD
Both for £15
No offers
</p>
<ul class="listing-attributes inline-list hide-fully-to-m">
</ul>
<div class="listing-location" itemscope itemtype="http://schema.org/Place">
<span class="truncate-line" itemprop="name">
Norwich, Norfolk
</span>
</div>
<strong class="listing-price txt-emphasis"
itemprop="price">£1</strong>
<strong class="listing-posted-date txt-normal truncate-line" itemprop="adAge">
<span class="hide-visually">Ad posted </span>
13 mins ago
</strong>
</div>
</a>
<span class="save-ad listing-save-ad"
data-savead="channel:savead-1131358703">
<span class="hide-visually">Save this ad</span>
<span class="icn-star iconu-m txt-quaternary" aria-hidden="true"></span>
</span>
</article>
</li>
<li>
<article class="listing-maxi" itemscope itemtype="http://schema.org/Product" data-q=ad-1131358320>
<a class="listing-link" href="/p/xbox-one/xbox-one-w-kinect-5-games-forza-horizon-2-incl.-blu-ray-2-controllers-2-charger-cables-1-mic/1131358320" itemprop="url">
<div class="listing-side">
<div class="listing-thumbnail ">
<img src="" data-lazy="https://ssli.ebayimg.com/00/s/OTYwWDk2MA==/z/n4AAAOSwLVZV4eQ1/$_26.JPG"
alt="" itemprop="image"
class="hide-fully-no-js"/>
<noscript>
<img src="https://ssli.ebayimg.com/00/s/OTYwWDk2MA==/z/n4AAAOSwLVZV4eQ1/$_26.JPG" alt="" itemprop="image"/>
</noscript>
</div>
<div class="listing-meta">
<ul class="inline-list txt-center">
<li>8<span class="hide-visually"> images</span>
<span class="icn-camera txt-quaternary" aria-hidden="true"></span>
</li>
</ul>
</div>
</div>
as you can se html with images has tag listing-thumbnail ones without it dont have that.
also it seems like
if (imagess.get(i).toString().contains("https://ssli")){
imgg[i] = imagess.get(i).attr("src");
} else {
imgg[i] = "https://afs.googleusercontent.com/gumtree-com/noimage_thumbnail_120x92_v2.png";
}
the code in else dont fire not sure why , it prints out like this when found and not found
for (int j = 0; j < hrefElements.size(); j++) {
System.out.println("title: " + titlee[j]);
System.out.println("description: " + description[j]);
System.out.println("distance: " + distance[j]);
System.out.println("posted: " + posted[j]);
System.out.println("price: " + pricee[j]);
System.out.println("meta: " + listingmeta[j]);
System.out.println("link: " + linkss[j]);
System.out.println("img-link: " + imgg[j]);
}
return products;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
this returns when found like this
System.out.println("img-link: " + "https://ssli.ebayimg.com/00/s/OTYwWDk2MA==/z/n4AAAOSwLVZV4eQ1/$_26.JPG");
else when not found
System.out.println("img-link: " + ""); blank rather then being blank i want it to add my custom link in else
imagess appears to be a collection of some kind. Instead of stringifying the collection:
if (imagess.toString().contains("https://ssli"))
you probably wanted to examine an element of the collection:
if (imagess.get(i).toString().contains("https://ssli"))

phonegap Android: How to clear datas displayed in label according to selection in dropdown list?

Hi am new phonegap mobile app developing. am developing in android, using jquery mobile and jquery,
Am displaying in datas in mobile by parsing xml datas, where i parsed xml datas successfully, now my problem is dropdown list: where my drop down list contains List of weekdays Sunday to Saturday If i select monday it will parse some datas from xml ans display in mobile Example: if i click sunday it will display Chicken Biriyani .then In drop down if i change a day to monday it has to parse xml and has to display Mutton Biriyani. But It displays Both Datas of Sunday and Monday. if i change days in dropdown datas getting retained i dont know how to solved this issue please help me Here is the code which i have tried.
<!DOCTYPE HTML>
<html>
<head>
<link rel="stylesheet" href="css/jquery.mobile-1.3.2.min.css"/>
<script src="js/jquery-1.9.1.min.js"></script>
<script src="js/jquery.mobile-1.3.2.min.js"></script>
</head>
<body>
<script type="text/javascript">
function report(period)
{
$(function() {
var eSelect = document.getElementById('yourSelectID');
var strUser = eSelect.options[eSelect.selectedIndex].value;
var order="http://journalonline.in/CRM/orderbooking/day?day=" + strUser;
$.ajax({
type: "POST",
url: order,
data: "{}",
cache: false,
dataType: "xml",
success: onSuccess
});
$("#resultLog").ajaxError(function(event, request, settings, exception) {
$("#resultLog").html("Error Calling: " + settings.url + "<br />HTTP Code: " + request.status);
});
function onSuccess(data)
{
$("#resultLog").append("<hr>");
$(data).find("Daydetails").each(function () {
var a=$(this).find("Productname").text();
var b=$(this).find("Quatity").text();
var c=$(this).find("Daystatus").text();
if(a!="")
{
$("#resultLog").append("<br><b> Productname</b>: " + $(this).find("Productname").text());
}
$("#resultLog").append("<br><b> </b> " + $(this).find("").text());
if(b!="")
{
$("#resultLog").append("<br> <b>Quantity</b>: " + $(this).find("Quatity").text());
}
$("#resultLog").append("<br><b> </b> " + $(this).find("").text());
if(c!=="")
{
$("#resultLog").append("<br><b> Status</b>: " + $(this).find("Daystatus").text());
}
$("#resultLog").append("<br><hr>");
});
}
});
}
</script>
<div data-role="header" data-position="fixed">
Back
<h1>Bird on Tree</h1>
Logout
</div>
<br>
<center><font color="#857240" size="5"><h3>Daywise Reports</h3></font></center>
<br>
<table>
<tr>
<td style="color:#857240">Day :</td>
<td>
<select id="yourSelectID" onchange="report(this.value)">
<option value="Select">Select</option>
<option value="Monday">Monday</option>
<option value="Tuesday">Tuesday</option>
<option value="Wednesday">Wednesday</option>
<option value="Thursday">Thursday</option>
<option value="Friday">Friday</option>
<option value="Saturday">Saturday</option>
<option value="Sunday">Sunday</option>
</select>
</td>
</tr>
</table>
<p id="resultLog"></p>
<br>
<br>
<br>
<br>
<div data-role="footer" data-position="fixed">
<center> <a style="text-decoration:none; color:#FFF;" href="http://www.jtechindia.com">J-Tech © 2014</a> </center>
</div>
</div>
</body>
</html>
Please help me how to solve this Thanks in advance.
You are using the jQuery append() function to display your results. This always adds to any existing DOM elements. If you want to get rid of the previous content before adding the new content, call the empty() function:
function onSuccess(data)
{
$("#resultLog").empty();
$("#resultLog").append("<hr>");
$(data).find("Daydetails").each(function () {
...
API Doc for empty: http://api.jquery.com/empty/

Categories

Resources