<div class="row">
<div id="content">
<div class="textData">
</div>
<div class="textData">
</div>
</div>
</div>
I want text from second div with class=textData. I did parsed div id=content.
Here is my doInbackground
try {
Document document = Jsoup.connect(url).get();
Elements myin = null;
myin = document.select("div.horoscopeText:eq(1)");
desc = myin.text().toString();
} catch (IOException e) {
e.printStackTrace();
}
Try this
div#textData:eq(1)
eq(n) accepts zero-based index of matched elements. Btw, you shouldn't have multiple elements with same id, use class for that. Check out selector syntax documentation for more examples.
EDIT
For class instead of id, use div.textData:eq(1)
Related
I am trying to retrieve the data from web page and the sample html code is shown below I want to retrieve the data and show it in list view.
<html>
<head>
<title>Index of /abc/xyz/Female/pqr</title>
</head>
<body>
<h1>Index of /abc/xyz/Female/pqr</h1>
<ul>
<li> Parent Directory</li>
<li> 2016060500004.png</li>
<li> 2016060500011.png</li>
<li> 2016060500012.png</li>
</ul>
</body>
</html>
To select ul use this :
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements ul = doc.select("ul"); // select ul
refer to docs here: Jsoup docs
Example :
String html = "<html>"+
"<head>"+
"<title>Index of /abc/xyz/Female/pqr</title>"+
"</head>"+
"<body>"+
"<h1>Index of /abc/xyz/Female/pqr</h1>"+
"<ul>" +
"<li> Parent Directory</li>"+
"<li> 2016060500004.png</li>"+
"<li> 2016060500011.png</li>"+
"<li> 2016060500012.png</li>"+
"</ul>" +
"</body>"+
"</html>";
Document doc = Jsoup.parse(html);
Elements links = doc.select("ul"); // select ul
for(Element b : links){
System.out.println(b.text());
}
I want to extract HTML code from a div element using jsoup HTML parser library.
HTML code:
<div class="entry-content">
<div class="entry-body">
<p><strong>Text 1</strong></p>
<p><strong> <a class="asset-img-link" href="http://example.com" style="display: inline;"><img alt="IMG_7519" class="asset asset-image at-xid-6a00d8341c648253ef01b7c8114e72970b img-responsive" src="http://example.com" style="width: 500px;" title="IMG_7519" /></a><br /></strong></p>
<p><em>Text 2</em> </p>
</div>
</div>
Extract part:
String content = ... the content of the HTML from above
Document doc = Jsoup.parse(content);
Element el = doc.select("div.entry-body").first();
I want to the result el.html() to be the whole HTML from div tab entry-body:
<p><strong>Text 1</strong></p>
<p><strong> <a class="asset-img-link" href="http://example.com" style="display: inline;"><img alt="IMG_7519" class="asset asset-image at-xid-6a00d8341c648253ef01b7c8114e72970b img-responsive" src="http://example.com" style="width: 500px;" title="IMG_7519" /></a><br /></strong></p>
<p><em>Text 2</em> </p>
but I get only the first <p> tag:
<p><strong>Text 1</strong></p>
Try this:
Elements el = doc.select("div.entry-body");
instead of this:
Element el = doc.select("div.entry-body").first();
and then:
for(Element e : el){
e.html();
}
EDIT
Maybe you'll get you result if you do that this way:
I have try to do it and it give a correct result.
Elements el = doc.select("a.asset-img-link");
As mentioned in the comments to the OP, I don't get it. Here is my reproduction of the problem, and it does exactly what you want:
String html = ""
+"<div class=\"entry-content\">"
+" <div class=\"entry-body\">"
+" <p><strong>Text 1</strong></p>"
+" <p><strong> <a class=\"asset-img-link\" href=\"http://example.com\" style=\"display: inline;\"><img alt=\"IMG_7519\" class=\"asset asset-image at-xid-6a00d8341c648253ef01b7c8114e72970b img-responsive\" src=\"http://example.com\" style=\"width: 500px;\" title=\"IMG_7519\" /></a><br /></strong></p>"
+" <p><em>Text 2</em> </p>"
+" </div>"
+"</div>"
;
Document doc = Jsoup.parse(html);
Element el = doc.select("div.entry-body").first();
System.out.println(el.html());
This results in the following output:
<p><strong>Text 1</strong></p>
<p><strong> <a class="asset-img-link" href="http://example.com" style="display: inline;"><img alt="IMG_7519" class="asset asset-image at-xid-6a00d8341c648253ef01b7c8114e72970b img-responsive" src="http://example.com" style="width: 500px;" title="IMG_7519"></a><br></strong></p>
<p><em>Text 2</em> </p>
In your case, you'd use
doc.select("div[name=entry-body]") to select that specific <div>
acording to this cookbook
This my html
<div class="open-statuses">
<div class="open-status" id="lifts-status-scripted">
<h3>Lifts</h3>
<div class="status-graph">
<canvas width="177" height="177"></canvas>
<div class="open-number">04</div>
<div class="total-number">4</div>
</div>
Details
</div>
<div class="open-status" id="trails-status-scripted">
<h3>Trails</h3>
<div class="status-graph">
<canvas width="177" height="177"></canvas>
<div class="open-number">12</div>
<div class="total-number">169</div>
</div>
Details
</div>
<div class="open-status open" id="road-status-scripted">
<h3>Road</h3>
<div class="status-graph">
<canvas width="177" height="177"></canvas>
<div class="status-message">Open</div>
</div>
Road Conditions
</div>
</div>
I need the text from the (div class="open-status" id="trails-status-scripted"), I cant do it. I use this code for the first class, with no problems, but I can't do it for the second div class.
Elements div1=document.select("#mountain-report-page");
Elements div2=div1.select(".open-statuses-holder");
Elements div3=div2.select(".open-statuses");
Jliftbig = div3.select("div.open-number").first().ownText();
Any clue?
Simplify in this way:
Elements div = doc.select("div[id=mountain-report-page] div[class=open-statuses-holder] div[class=open-statuses] div[class=open-status]");
for (Element e : div){
if (e.id().equals("trails-status-scripted")){
Element ele = e.select("div[class=status-graph] div[class=open-number]").first();
String str = ele.text();
}
}
Done. I resolved with this code
Element div = document.select("div[id=mountain-report-page] div[class=open-statuses-holder] div[class=open-statuses] div[class=open-status] ").get(2);
String Jtrails = div.select("div.open-number").first().ownText();
Since in HTML all IDs must be unique, so you can simply use this selector.
Element div = document.select("#trails-status-scripted .open-number");
Notes:
#foo equals to *[id=foo]
.foo equals to *[class=foo]
Am trying to use an if statement how ever its not doing what i want it to.
Am trying to get all the images extracted from html source using jsoup some items in html dont have images so there is no ( img ) tag's in them so here is the if statement i use
Elements imagess = doc.select("img[src$=.jpg]");
//Elements imagess = doc.select("img");
for (Element table : doc.select("div[class=listing-content]")) {
// Identify all the table row's(tr)
for (Element row : table.select("div:gt(0)")) {
HashMap<String, String> map = new HashMap<String, String>();
String[] imgg = new String[imagess.size()];
ArrayList products = new ArrayList();
for (int i = 0; i < imagess.size(); i++)
if (imagess.toString().contains("https://ssli")) {
imgg[i] = imagess.get(i).attr("src");
} else {
imgg[i] = "https://afs.googleusercontent.com/gumtree-com/noimage_thumbnail_120x92_v2.png";
}
so while looping if (https://ssli) is found during the loop process then extract the current found
imgg[i] = imagess.get(i).attr("src");else let it add blank image url imgg[i] = "https://afs.googleusercontent.com/gumtree-com/noimage_thumbnail_120x92_v2.png";
here is the part of html code extracted from page is has more image and no image tags
<div class="listing-content">
<h2 class="listing-title" itemprop="name">
Faulty Xbox 36
</h2>
<p class="listing-description
hide-fully-to-m"
itemprop="description">
Turns on but tray broken so can't load games .
Sold as seen
</p>
<ul class="listing-attributes inline-list hide-fully-to-m">
</ul>
<div class="listing-location" itemscope itemtype="http://schema.org/Place">
<span class="truncate-line" itemprop="name">
Sunbury-on-Thames, Surrey
</span>
</div>
<strong class="listing-price txt-emphasis"
itemprop="price">£20</strong>
<strong class="listing-posted-date txt-normal truncate-line" itemprop="adAge">
<span class="hide-visually">Ad posted </span>
11 mins ago
</strong>
</div>
</a>
<span class="save-ad listing-save-ad"
data-savead="channel:savead-1131358978">
<span class="hide-visually">Save this ad</span>
<span class="icn-star iconu-m txt-quaternary" aria-hidden="true"></span>
</span>
</article>
</li>
<li>
<article class="listing-maxi" itemscope itemtype="http://schema.org/Product" data-q=ad-1131358703>
<a class="listing-link" href="/p/video-games/xbox-360-cod-/1131358703" itemprop="url">
<div class="listing-side">
<div class="listing-thumbnail ">
<img src="" data-lazy="https://ssli.ebayimg.com/00/s/ODAwWDYwMA==/z/uFgAAOSwMmBV4eSL/$_26.JPG"
alt="" itemprop="image"
class="hide-fully-no-js"/>
<noscript>
<img src="https://ssli.ebayimg.com/00/s/ODAwWDYwMA==/z/uFgAAOSwMmBV4eSL/$_26.JPG" alt="" itemprop="image"/>
</noscript>
</div>
<div class="listing-meta">
<ul class="inline-list txt-center">
<li>1<span class="hide-visually"> images</span>
<span class="icn-camera txt-quaternary" aria-hidden="true"></span>
</li>
</ul>
</div>
</div>
<div class="listing-content">
<h2 class="listing-title" itemprop="name">
Xbox 360 cod
</h2>
<p class="listing-description truncate-paragraph
hide-fully-to-m"
itemprop="description">
Call of duty advanced warfare £12
Call of duty modern warfare 3 £5
Black ops 2 SOLD
Both for £15
No offers
</p>
<ul class="listing-attributes inline-list hide-fully-to-m">
</ul>
<div class="listing-location" itemscope itemtype="http://schema.org/Place">
<span class="truncate-line" itemprop="name">
Norwich, Norfolk
</span>
</div>
<strong class="listing-price txt-emphasis"
itemprop="price">£1</strong>
<strong class="listing-posted-date txt-normal truncate-line" itemprop="adAge">
<span class="hide-visually">Ad posted </span>
13 mins ago
</strong>
</div>
</a>
<span class="save-ad listing-save-ad"
data-savead="channel:savead-1131358703">
<span class="hide-visually">Save this ad</span>
<span class="icn-star iconu-m txt-quaternary" aria-hidden="true"></span>
</span>
</article>
</li>
<li>
<article class="listing-maxi" itemscope itemtype="http://schema.org/Product" data-q=ad-1131358320>
<a class="listing-link" href="/p/xbox-one/xbox-one-w-kinect-5-games-forza-horizon-2-incl.-blu-ray-2-controllers-2-charger-cables-1-mic/1131358320" itemprop="url">
<div class="listing-side">
<div class="listing-thumbnail ">
<img src="" data-lazy="https://ssli.ebayimg.com/00/s/OTYwWDk2MA==/z/n4AAAOSwLVZV4eQ1/$_26.JPG"
alt="" itemprop="image"
class="hide-fully-no-js"/>
<noscript>
<img src="https://ssli.ebayimg.com/00/s/OTYwWDk2MA==/z/n4AAAOSwLVZV4eQ1/$_26.JPG" alt="" itemprop="image"/>
</noscript>
</div>
<div class="listing-meta">
<ul class="inline-list txt-center">
<li>8<span class="hide-visually"> images</span>
<span class="icn-camera txt-quaternary" aria-hidden="true"></span>
</li>
</ul>
</div>
</div>
as you can se html with images has tag listing-thumbnail ones without it dont have that.
also it seems like
if (imagess.get(i).toString().contains("https://ssli")){
imgg[i] = imagess.get(i).attr("src");
} else {
imgg[i] = "https://afs.googleusercontent.com/gumtree-com/noimage_thumbnail_120x92_v2.png";
}
the code in else dont fire not sure why , it prints out like this when found and not found
for (int j = 0; j < hrefElements.size(); j++) {
System.out.println("title: " + titlee[j]);
System.out.println("description: " + description[j]);
System.out.println("distance: " + distance[j]);
System.out.println("posted: " + posted[j]);
System.out.println("price: " + pricee[j]);
System.out.println("meta: " + listingmeta[j]);
System.out.println("link: " + linkss[j]);
System.out.println("img-link: " + imgg[j]);
}
return products;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
this returns when found like this
System.out.println("img-link: " + "https://ssli.ebayimg.com/00/s/OTYwWDk2MA==/z/n4AAAOSwLVZV4eQ1/$_26.JPG");
else when not found
System.out.println("img-link: " + ""); blank rather then being blank i want it to add my custom link in else
imagess appears to be a collection of some kind. Instead of stringifying the collection:
if (imagess.toString().contains("https://ssli"))
you probably wanted to examine an element of the collection:
if (imagess.get(i).toString().contains("https://ssli"))
How can I extract the full names from this sample HTML code?
I only want to get the following.
Full name1
Full name2
Full name3
<div class="readerP">
<p><a href="link1_english.html" title="Complete" >Full name1</a><br>[ other info ]</br> </p>
</di
<div class="readerP">
<p><a href="link2_english.html" title="Complete" >Full name2</a><br>[ other info ]</br> </p>
</div>
<div class="readerP">
<p><a href="link1_english.html" title="Complete" >Full name3</a><br>[ other info ]</br> </p>
</div>
I am using this code, but it looks to all the 'a' tags in the page, so I would get extra info like.
Home Page
About
Contact
Full name1
Full name2
Full name3
and so on ...
try {
doc = Jsoup.connect("http://www.somesite.com").get();
Elements links = doc.getElementsByTag("a");
for (Element el : links) {
linkText = el.ownText();
arr_linkText.add(linkText);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
How can I look at the 'div' tag and if class="readerP" look at the 'a' tags inside the 'div'?
How can I look at the 'div' tag and if class="readerP" look at the 'a'
tags inside the 'div'?
Using the appropiate selector, in stead of just searching by tags.
Elements links = doc.select("div .readerP a");
Read more about selectors in the Jsoup documentation.