I try this link
jSoup to check if a span class exists
but i want to get image url from the following script through jsoup.
<ul class="a-unordered-list a-nostyle a-button-list a-vertical a-spacing-top-micro">
<li class="a-spacing-small item"><span class="a-list-item">
<span class="a-declarative" data-action="thumb-action" data-thumb-action="{"variant":"MAIN","index":"0"}">
<span class="a-button a-button-selected a-button-thumbnail a-button-toggle a-button-focus"><span class="a-button-inner"><input class="a-button-input" type="submit"><span class="a-button-text" aria-hidden="true">
<img alt="" src="https://images-na.ssl-images-amazon.com/images/I/31XcCgGBePL._SS40_.jpg">
</span></span></span>
</span>
</span></li>
Try This :
Elements anchors = doc.select("li span img");
Related
Using Jsoup
I want to print the 기숙사 in the txt class
<ul class="">
<li class="top">
<span class="num">01</span>
<span class="txt">
<button onclick="frm_search.searchKeyword.value=$(this).text();frm_search.submit();">*기숙사*</button></span>
</li>
<li class="top">
<span class="num">02</span><span class="txt">
<button onclick="frm_search.searchKeyword.value=$(this).text();frm_search.submit();">졸업증명서</button></span>
</li>
</ul>
doc.select(""); What should be in Double quotes?
I guess that it should be: span:containsOwn(01) + span button
Element button = document.selectFirst("span:containsOwn(01) + span button");
String text = button.text();
See this Try.jsoup example to test and play around with different options.
This selector works by finding the span with the text "01", and then finding its sibling span (using the +) and then the child button element.
<div class="mdl-layout--large-screen-only mdl-layout__tab-bar mdl-js-
ripple-effect mdl-color--primary-dark ">
Aboutus
Technology
Outsourcing
Training
Techblog
Careers
Contactus
<button class="mdl-button mdl-js-button mdl-button--fab mdl-js-
ripple-effect mdl-button--colored mdl-shadow--4dp mdl-color--
accent" id="add">
<i class="material-icons" role="presentation">add</i>
<span class="visuallyhidden">Add</span>
</button>
</div>
</header>
<div class="mdl-layout__drawer">
<span class="mdl-layout-title">MaterialDesignLite</span>
<nav class="mdl-navigation">
Overview
Features
Details
Technology
FAQ
</nav>
</div>
Question: mdl-layout__tab-bar tabs are synced with section id(example:#Aboutus ,#Technology),
Where as mdl-drawer needs separate .html file for each content,it is not working with section id (example:#Aboutus ,#Technology),is any possible ways to achieve it,rather creating separate html file.
Thanks
<div class="mdl-layout__tab-panelmdl-layout__drawer" id="Techblog">
<section>
<div>
<div>
<div class="techblogimage">
<img class="article-image" src="images/techblog/3.png" border="0" alt="">
</div>
</div>
</div>
</section>
</div>
Finally i got a solution..
adding mdl-layout__drawer in sectionblock
This will help few of them who got stuck in this..
I have a html text in
<html>
<head></head>
<body>
<div id="carousel-generic" class="banner-erbj carousel slide" data-ride="carousel"> <ul class="carousel-indicators"> <li class="active" data-target="#carousel-generic" data-slide-to="0">0</li> <li data-target="#carousel-generic" data-slide-to="1">0</li> </ul> <div class="carousel-inner"> <div class="item active"><img src="imagesrcpath" alt="" /></div> <div class="item"><img src="imagesrcpath" alt="" /></div> </div> </div>
</html>
</body>
I want to take out the link in it and display in a webview.
I have tried the jsoup method and some solutions provided in the questions on stackoverflow also but not able to find anysolution .. please help
After lot more of digging i found the solution to it :
Spanned parsed = Html.fromHtml(text);
String finalstr = ("<html><body>").concat(parsed.toString()).concat("</body></html>");
mWebView.setInitialScale(30);
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.getSettings().setLoadWithOverviewMode(true);
/*mWebView.getSettings().setLayoutAlgorithm(WebSettings.LayoutAlgorithm.SINGLE_COLUMN);*/
mWebVie.getSettings().setUseWideViewPort(true);
mWebView.loadData(finalstr, "text/html", "UTF-8");
This whole set of instructions helped me do the same.
Use regex.
<img.+?/>
will simply find image tags.
Learn XPATH
http://www.w3schools.com/xml/xpath_intro.asp
and use your xpath knowledge in
http://htmlcleaner.sourceforge.net/, to get whatever you want from html
You can take advantage of the power of REGEX here.
Pattern pattern = Pattern.compile("/src=\"(.*)\"/");
Matcher matcher = pattern.matcher(YOURHTML);
matcher.group(1); // this way you can read all of the srcs
Disclaimer: the regex above is not tested.
I have the following element in a webpage:
<div id="pnNij" class="post" data-tag1="" data-tag2="">
<a class="image-list-link" href="http://imgur.com/gallery/pnNij" data-page="0">
<img alt="" src="./Imgur_ The most awesome images on the Internet_files/H7fZCNgb.jpg">
<div class="point-info gradient-transparent-black transition">
<div class="relative">
<div class="pa-bottom">
<div class="arrows">
<div title="like" class="pointer arrow-up icon-upvote-outline" data="pnNij" type="image" data-up="4212"></div>
<div title="dislike" class="pointer arrow-down icon-downvote-outline" data="pnNij" type="image" data-downs="502"></div>
<div class="clear"></div>
</div>
<div class="point-info-points" title="points">
<span class="points-pnNij">3,710</span>
<span class="points-text-pnNij">points</span>
</div>
</div>
</div>
</div>
</a>
<div class="hover">
<p>Seems like 2017 has it all...</p>
<div class="post-info">
album · 69,542 views
</div>
</div>
</div>
notice how the href is equal to http://imgur.com/gallery/pnNij.
However, when I use JSoup to extract elements from the page like this:
docImgur = Jsoup.connect("http://imgur.com/").get();
Elements links = docImgur.getElementsByClass("post");
The element is almost extracted properly, except the href attribute is equal to /gallery/pnNij/
Why does the href attribute not contain the full URL?
When you check the page source, you'll find
<a class="image-list-link" href="/gallery/WRzti" data-page="0">
...
</a>
So the href attribute is not absolute, which results in your expected results: /gallery/WRzti
Solution
Use the abs: attribute prefix.
Example
Document docImgur = Jsoup.connect("http://imgur.com/").get();
Elements links = docImgur.select("a[href].image-list-link");
for (Element element : links) {
System.out.println(element.attr("abs:href"));
}
Output
http://imgur.com/gallery/WRzti
http://imgur.com/gallery/tCnDJ
http://imgur.com/gallery/JIHYh
...
<div class="item all clearfix">
<div class="pic all">
<a href="/terrace.jpg"/></a>
</div>
<div class="text all">
<a href="/link/test" class="title">
Some random title
</a>
<strong>STRONG TEXT</strong>
Some random subtitle…
</div>
</div>
I'm trying to get the text from the bold tag only, but whenever I parse it, it shows the complete. I also tried to remove the a class, but there was text left out below - "Some random subtitle".
I tried by:
Element strong_title = el.select("strong").first();
And then by retrieving strong_title.text().
Nevermind, it was a project-related code mistake.
The following code:
document.select("strong"); works just fine.