I have checked through all forum but i dont understand where i am wrong. basically i try to scrape the word "Sun, 04 Feb 2018" out. anyhelp will be appreciated on what concept i got wrong in this case but i keep getting no null return.
aspx. code
<div class="divLatestDraws slider" data-min-width="282">
<div class="slide-wrapper four-d">
<ul class="slide-container ulDraws" style="width: 1808px; margin-left: 0px;">
<li style="width: 301.33px;"><div class="tables-wrap">
<table class="table table-striped orange-header">
<thead>
<tr>
<th class="drawDate">Sun, 04 Feb 2018</th>
my jsoup code
// Connect to the web site
Document document = Jsoup.connect(url).get();
// Using Elements to get the Meta data
Elements xxx = document.select("div[class=divLatestDraws slider]");
Elements zzz = xxx.select("th[class=drawDate]");
desc=zzz.body().text();
} catch (IOException e) {
e.printStackTrace();
}
return null;
Related
I am trying to extract the value of "Prev. Close" from finance.yahoo.com/q?s=[Symbol]
Here is what the HTML looks like,
<div class="yui-u first yfi-start-content">
<div class="yfi_quote_summary">
<div id="yfi_quote_summary_data" class="rtq_table">
<table id="table1">
<tbody>
<tr>
<th scope="row" width="48%">Prev Close:</th>
<td class="yfnc_tabledata1">208.25</td>
</tr>
<tr>
<th scope="row" width="48%">Open:</th>
<td class="yfnc_tabledata1">211.00</td>
</tr>
<tr>
<th scope="row" width="48%">Bid:</th>
<td class="yfnc_tabledata1">N/A</td>
</tr>
</tbody>
</table>
</div>
</div>
Here's how I tried to extract the required data.
Document doc = Jsoup.connect("http://finance.yahoo.com/q?s=goog").get();
Elements e = doc.select("td.yfnc_tabledata1");
String close = e.get(0).text();
However, this gives an IndexOutOfBoundsException saying that the size of the ArrayList is 0 and hence e can't return an element.
What am I doing wrong?
Before accessing the Elements ensure it's not empty. This way, you can avoid IndexOutOfBoundsException. Also, as #Hasanaga mentionned it, you should set userAgent and referrer headers.
Document doc = Jsoup
.userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6") //
.referrer("http://finance.yahoo.com") //
.connect("http://finance.yahoo.com/q?s=goog") //
.get();
Elements e = doc.select("td.yfnc_tabledata1");
if (e.isEmpty()) {
throw new RuntimeException("Unable to locate table cell.");
}
String close = e.get(0).text();
I'm trying to learn jsoup for android and I'm having a hard time with learning the selectors. I've already set up the application with simple buttons and textviews that can retrieve basic info i.e. title etc. Now I'm trying to get the text that I've highlighted below. I've tried multiple times and cannot get the correct syntax down.
<li class="info info info">
<script>clicked = false</script>
<div class="simple">
<p class="name">TEXT I NEED TO PARSE </p>
<ul class="Type">
<li>Normal</li> </ul>
<p class="address">120 Hollywood Blvd.</p>
</div>
<div class="sortables">
<p class="inches"></p>
</div>
<div class="action_links">
</div>
Document doc = null;
try {
doc = Jsoup.connect("http://example.com/index.html").get();
} catch (IOException e) {
// TODO Throws exception
}
Element simple = doc.getElementsByClass("simple").first();
Element p = simple.getElementsByClass("name").first();
Element a = p.select("a").first();
String text = a.text();
System.out.println(text);
<div class='ym-gbox adds-header'>
<a href='javascript:(void);' >
<a href="http://epaper.thedailystar.net/" target="_blank">
<img src="http://epaper.thedailystar.net/images/edailystar.png" alt="edailystar" style="float: left; width: 100px; margin-top: 15px;">
</a>
<a href="http://www.banglalink.com.bd/celebrating10years" target="_blank" style="display:block;float: right;">
<img width="490" height="60" src="http://bd.thedailystar.net/upload/ads/2015/02/12/BD-News_490x60.gif" alt="banglalink" >
</a>
</a>
</div>
This is the html portion. From here I want to extract the image source of image tag with source address src="http://epaper.thedailystar.net/images/edailystar.png" using jsoup in android. But I failed. If anyone give the answer I will be thankful to him.
Here is my code
Document document = Jsoup.connect(url).get();
Elements img = document.select("div[class=ym-gbox adds-header]").first().select("a[href=http://epaper.thedailystar.net/] > img[src]");
String imgSrc = img.attr("src");
Since you didn't mention url, i assume url is http://epaper.thedailystar.net/index.php
Document doc = Jsoup.connect("http://epaper.thedailystar.net/index.php").timeout(10*1000).get();
Elements div = doc.select("div.logo");
Elements get = div.select("img");
System.out.println(get.attr("abs:src"));
Output :
http://epaper.thedailystar.net/images/edailystar.png
You have to iterate through elements to choose the element that suits your needs. Like so:
Elements elements = document.getElementsByTag("img");
for (Element element : elements) {
if (element.attr("src").endsWith("png")) {
System.out.println(element.attr("src"));
}
}
I have this HTML:
<ul class="programList">
<li class="showing">
<div class="filterTime"></div>
<div class="filterGenre"></div>
<div class="outerSmallPoster">
<span class="posterBanner"></span>
<a href="/filmdatabase/06-juni/22-jump-street/">
<img src="/fileshare/filarkivroot/AuroraKino/Filmer/06%20-%20juni/22%20Jump%20Street/kynoefo11.jpg?width=160" class="smallPoster" /></a>
</div>
<div class="movieDescr">
<a class="movieTitle" href="/filmdatabase/06-juni/22-jump-street/">22 Jump Street</a>
<p class="movieDescription">Channing Tatum og Jonah Hill er tilbake i rollene som radarparet Jenko og Schmidt i oppfølgeren til...</p>
</div>
<div class="outerMovieDescription">
<div class="outerProgramTicketSale">
<button type="button" data-frames="underholdning" data-hour="21" href="http://91.207.226.164/ticketweb.php?sign=2&UserCenterID=100007&PaymentType=000&ShowID=484797&PaymentTypeSelection=&ErrorCode=0" target="_blank" onclick="openTicket(100007,484797);return false;" data-usercenterid="100007" data-showid="484797" class="programTime">
21:15
</button>
<span class="theater">Sal 6</span>
</div>
</div>
</li>
</ul>
This code is at a URL which I am trying to parse from an android app. To do so, I have written the following code:
protected Hashtable<String, Elements> doInBackground(Void... params) {
// Get all the movies from aurorakino and return a list of them
// to the postexecute method.
MainActivity.out("Bakgrunn");
Hashtable<String, Elements> map = new Hashtable<String, Elements>();
Document doc;
try {
doc = (Document) Jsoup.connect("http://fokus.aurorakino.no/billetter-og-program/").get();
Elements title = doc.select("a.movieTitle");
Elements desc = doc.select("p.movieDescription");
Elements image = doc.select("img.smallPoster");
MainActivity.out(title.size());
MainActivity.out("Vi er inne i try");
map.put("title", title);
map.put("desc", desc);
map.put("image", image);
return map;
}
catch (IOException e) {
// TODO Auto-generated catch block
MainActivity.out("Noe gikk galt");
}
return null;
}
This code is in an AsyncTask, and the task is running. I print some debug info from each stage in the asynctask so that I can see if its actually running. And it is.
My problem is that even though the html page has a few links like this:
movie title>
the code dont manage to find any of them. I print the element size and it says zero.
When i remove the class specification it finds 73 titles.
What am I doing wrong?
Many thanks
I want to parse <dt>Seeders:</dt> & <dt>Leechers:</dt> from a html using Jsoup.
See the full code below.
<div id="details">
<dl class="col1">
<dt>Type:</dt>
<dd>Audio > Music</dd>
<dt>Files:</dt>
<dd><a href="/torrent/8682317/" title="Files" onclick="
if (filelist < 1) {
new Ajax.Updater('filelistContainer', '/ajax_details_filelist.php', {method: 'get', parameters: 'id=8682317'});
filelist=1;
}; toggleFilelist(); return false;">28</a></dd>
<dt>Size:</dt>
<dd>222.65 MiB (233468815 Bytes)</dd>
<br />
<dt>Tag(s):</dt>
<dd>markus schulz dakota things trance armada 2011 inspiron </dd>
<br />
<dt>Uploaded:</dt>
<dd>2013-07-13 15:30:25 GMT</dd>
<dt>By:</dt>
<dd>
-inspiron- <img src="/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border='0' /></dd>
<br />
<dt>Seeders:</dt>
<dd>16</dd>
<dt>Leechers:</dt>
<dd>1</dd>
<dt>Comments</dt>
<dd><span id="NumComments">0</span>
</dd>
<br />
<dt>Info Hash:</dt><dd> </dd>
01DD6B7325C3DB5F0DF5BBE510FD3FD9738D1C88 </dl>
<div class="torpicture">
<img src="//image.bayimg.com/345b5b11734bb9973863359cc52929f3ddc45205.jpg" title="picture" alt="picture" />
</div>
<dl class="col2">
</dl>
<div id="CommentDiv" style="display:none;">
<form method="post" id="commentsform" name="commentsform" onsubmit="new Ajax.Updater('NumComments', '/ajax_post_comment.php', {evalScripts:true, asynchronous:true, parameters:Form.serialize(this)}); return false;" action="/ajax_post_comment.php">
<p class="info">
<textarea name="add_comment" id="add_comment" rows="8" cols="50"></textarea><br/>
<input type="hidden" name="id" value="8682317"/>
<input type="submit" value="Submit" /><input type="button" value="Hide" onclick="document.getElementById('CommentDiv').style.display = 'none'" />
</p>
</form>
</div>
<br/>
<br/>
<div id="social">
</div>
<iframe src="http://cdn1.adexprt.com/dl/dl.php?b=bar&r=75&n=Markus_Schulz_-_Global_DJ_Broadcast_%282013-07-11%29_%28Inspiron%29&m=magnet%3A%3Fxt%3Durn%3Abtih%3A01dd6b7325c3db5f0df5bbe510fd3fd9738d1c88%26dn%3DMarkus%2BSchulz%2B-%2BGlobal%2BDJ%2BBroadcast%2B%25282013-07-11%2529%2B%2528Inspiron%2529%26tr%3Dudp%253A%252F%252Ftracker.openbittorrent.com%253A80%26tr%3Dudp%253A%252F%252Ftracker.publicbt.com%253A80%26tr%3Dudp%253A%252F%252Ftracker.istole.it%253A6969%26tr%3Dudp%253A%252F%252Ftracker.ccc.de%253A80%26tr%3Dudp%253A%252F%252Fopen.demonii.com%253A1337" width="622" height="51" frameborder="0" scrolling="no"></iframe>
<br /><br /> <div class="download">
<a style='background-image: url("/static/img/icons/icon-magnet.gif");' href="magnet:?xt=urn:btih:01dd6b7325c3db5f0df5bbe510fd3fd9738d1c88&dn=Markus+Schulz+-+Global+DJ+Broadcast+%282013-07-11%29+%28Inspiron%29&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337" title="Get this torrent"> Get this torrent</a>
<a style='background-image: url("/static/img/icon-https.gif");' href="http://adexprt.me/get/Markus_Schulz_-_Global_DJ_Broadcast_%282013-07-11%29_%28Inspiron%29?tag=bal" title="Anonymous Download"> Anonymous Download</a>
</div>
<div>(Problems with magnets links are fixed by upgrading your torrent client!)</div>
<div class="nfo">
<pre>=======================================================
Site: http://www.inspirontrance.com/
=======================================================
=======================================================
F B Page: Inspiron Trance
=======================================================
=======================================================
TWITTER : inspiron22
=======================================================
Markus Schulz
01. Mobil - One Morning (Aleksey Sladkov Remix)
02. Store N Forward - Nuts
03. Alter Future vs. Holbrook & SkyKeeper - Megapolis
04. Danilo Ercole - Cruzer
05. Aaron Camz - Emission
06. Markus Schulz Featuring Sarah Howells - Tempted
07. M.I.K.E. Presents Caromax - Inner Thoughts
08. Ruffault - Progressive Dream
09. Styller - What We Left Behind
10. Meridian - Exit
11. Lange - A Different Shade of Crazy
12. Tucandeo Featuring Natalie Gioia - Disappear (Xtigma Remix)
13. Sebastian Weikum - Sky is the Limit
14. Markus Schulz - Don't Leave Until the Sunrise
Guy J
01. Roger Martinez & Secret Cinema - Menthol Raga (Guy J Remix)
02. Ambassador - The Fade (Guy J Remix)
03. Guy J - Seven
04. Echomen – Perpetual (Guy J Remix)
Back with Markus Schulz
15. Mauro Picotto & Riccardo Ferri - New Time, New Place (New World Punx Remix)
16. Grube & Hovsepian - Trickster
17. Nifra - Waves
18. Markus Schulz featuring Dauby - Perfect (Digital X Remix) [Global Selection]
19. Basil O'Glue - Gilgamesh
20. Skytech - The Other Side
21. ID
Enjoy
(Inspiron) </pre>
</div>
I've used this code which parses the whole details instead of parsing the 'seeders' & 'leechers'
try {
document = Jsoup.connect(BLOG_URL).get();
title = document.title();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// selector query
Elements nodeBlogStats = document.select("div#details");
// check results
if (nodeBlogStats.size() > 0) {
// get value
result = nodeBlogStats.get(0).text();
}
According to http://jsoup.org/apidocs/org/jsoup/select/Selector.html, you are looking for
E ~ F an F element preceded by sibling E
and
:contains(text) elements that contains the specified text.
I would try
Element seeders = document.select("dt:contains(Seeders) ~ dd").get(0);
Element leechers = document.select("dt:contains(Leechers) ~ dd").get(0);