How can I extract something specific using Jsoup? - android

How can I extract the full names from this sample HTML code?
I only want to get the following.
Full name1
Full name2
Full name3
<div class="readerP">
<p><a href="link1_english.html" title="Complete" >Full name1</a><br>[ other info ]</br> </p>
</di
<div class="readerP">
<p><a href="link2_english.html" title="Complete" >Full name2</a><br>[ other info ]</br> </p>
</div>
<div class="readerP">
<p><a href="link1_english.html" title="Complete" >Full name3</a><br>[ other info ]</br> </p>
</div>
I am using this code, but it looks to all the 'a' tags in the page, so I would get extra info like.
Home Page
About
Contact
Full name1
Full name2
Full name3
and so on ...
try {
doc = Jsoup.connect("http://www.somesite.com").get();
Elements links = doc.getElementsByTag("a");
for (Element el : links) {
linkText = el.ownText();
arr_linkText.add(linkText);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
How can I look at the 'div' tag and if class="readerP" look at the 'a' tags inside the 'div'?

How can I look at the 'div' tag and if class="readerP" look at the 'a'
tags inside the 'div'?
Using the appropiate selector, in stead of just searching by tags.
Elements links = doc.select("div .readerP a");
Read more about selectors in the Jsoup documentation.

Related

Jsoup : How to escape from this simple img selection nightmare?

I am using jsoup lib.
So I want to take img from this website
here
so I used select element and my code was like that
Elements news2 = document.select("div.contentcolumn");
// Elements title2 = news2.select("div.catitems");
Log.d("MainActivity", "This is news = " + title);
for (Element el : news2) {
news_object = new item();
news_object.setTitle(el.select("h1").text());
news_object.setauther(el.select("a").attr("abs:href"));
news_object.setimg(el.select("ImageArea").attr("abs:src"));
Log.d("newsdetail", "humam" + news_object.getimg());
and here is the source code of web site
<div id="ImageArea">
<a href="/filestorage/contentfiles/2016/04_16/090416102307_140_1.jpg"
target="_blank">
<img src="/filestorage/contentfiles/2016/04_16/090416102307_140_1.jpg"
alt="المالكي: الإصلاح محاولة لإفشال المشروع الإسلامي وضرب المتدينين"
style="max-width:620px;">
</a>
</div>
I want to select img and put it in view img and select the text.
The problem lies in this code line: el.select("ImageArea").attr("abs:src").
Let's see what happen:
el.select("ImageArea") // Here we select an element with tag name ImageArea
.attr("abs:src")
ImageArea is the id of a div containing an anchor containing a targeted image.
Try this instead:
news_object.setimg(el.select("#ImageArea img").attr("abs:src"));
If the ImageArea element may not have an img, use the code below:
Element img = el.select("#ImageArea img").first();
if ( img != null ) {
news_object.setimg(img.attr("abs:src"));
} else {
// ...
}

Jsoup selector syntax for div having same class

<div class="row">
<div id="content">
<div class="textData">
</div>
<div class="textData">
</div>
</div>
</div>
I want text from second div with class=textData. I did parsed div id=content.
Here is my doInbackground
try {
Document document = Jsoup.connect(url).get();
Elements myin = null;
myin = document.select("div.horoscopeText:eq(1)");
desc = myin.text().toString();
} catch (IOException e) {
e.printStackTrace();
}
Try this
div#textData:eq(1)
eq(n) accepts zero-based index of matched elements. Btw, you shouldn't have multiple elements with same id, use class for that. Check out selector syntax documentation for more examples.
EDIT
For class instead of id, use div.textData:eq(1)

Remove extra "a href" tag from html string

I have a html string like this:
<a class="favourite" href="LixWQfueLU"><font color="#009a49">Rohit Lalwani</font></a>
I want to make the html string:
<a class="favourite" href="LixWQfueLU"><font color="#009a49">Rohit Lalwani</font></a>
How can I solve the above issue? Should I use JSOUP or Regex? What will be the solution?
This code using JSoup will do the trick in a more generic way:
String html ="<a class=\"favourite\" href=\"LixWQfueLU\"><font color=\"#009a49\">Rohit Lalwani</font></a>";
Document doc = Jsoup.parse(html);
Element afav = doc.select(".favourite").first();
Element select = doc.select("font").first();
afav.remove();
afav.appendChild(select);
System.out.println(afav);
Output:
<a class="favourite" href="LixWQfueLU"><font color="#009a49">Rohit Lalwani</font></a>
Try to get your required string using substring :
String beforeString = "<p dir=\"ltr\"> <a class=\"favourite\" href=\"LixWQfueLU\"><font color=\"#009a49\">Rohit Lalwani</font></a></p>";
String afterString = beforeString.substring(0,beforeString.indexOf("<a href")+1)+beforeString.substring(beforeString.indexOf("<font"),beforeString.indexOf("</a>"))+beforeString.substring(beforeString.indexOf("</a>")+4,beforeString.length());
Value of afterString :
<p dir="ltr"> <a class="favourite" href="LixWQfueLU"><<font color="#009a49">Rohit Lalwani</font></a></p>

How to get all rows of a column by providing its name using JSOUP in android?

I am using Jsoup library to parse html in my android app.This is the column in the html page which i want to parse:
<TD width="9%" ROWSPAN = 2>Days</TD>
Now i want to get all the rows of this column.I am using following code do achieve my goal but the success is far away:
StringBuilder s = new StringBuilder(100);
Document doc = Jsoup.parse(htmlPage);
Elements links = doc.select("table tr.Day");
for (Element link : links)
{
String linkHref = link.attr("href");
System.out.println(linkHref);
s.append(linkHref);
String linkText = link.text();
System.out.println(linkText);
s.append(linkText);
}
I searched a lot but of no avail.Please help me.Thanks in advance.
Assuming your html looks similar to this:
<table>
<tbody>
<tr>
<td width="9%" rowspan="2">Days</td>
<td>a</td>
<td>b</td>
<td>c</td>
<td>d</td>
</tr>
</tbody>
</table>
if I understand aright you want everything below the td tag with days.
Document doc = Jsoup.parse(htmlPage);
for( Element element : doc.select("td:contains(days) ~ *") ) // Select everything followed after the 'td' tag with 'days' text
{
System.out.println(element); // do something with the elememnt
}
Using the html posted before you get this output:
<td>a</td>
<td>b</td>
<td>c</td>
<td>d</td>

Android Jsoup in service - get text of span

Im pretty new to jsoup. For days im trying now to read out a simple number from a span without any success.
I hope to find help here. My html:
<div class="navi">
<div class="tab mail">
<a href="/comm.php/indexNew/" accesskey="8" title="Messages">
<span class="tabCount">1 </span>
<img src="/b2/message.png" alt="Messages" class="moIcon i24" />
</a>
</div>
The class tabCount excists 3 times though in the whole document and I am interested in the first span with this class.
Now I am trying in onCreate() of a service to create a thread with:
Thread downloadThread = new Thread() {
public void run() {
Document doc;
try {
doc = Jsoup.connect("https://www.bla.com").get();
String count = doc.select("div.navi").select("div.tab.mail").select("a[href]").first().select("tabCount").text();
Log.d("SOMETHING", "test"+(count));
} catch (IOException e) {
e.printStackTrace();
}
}
};
downloadThread.start();
This forces my app to crash. The same if i change text() to ownText(). if i remove text() then the app can start but it gives me null.
what am i doing wrong? By the way, besides the service a webview is loading the same url. might that be a problem?
You only need to select the element you're interested in, you don't need to get every outer element before. In your example you could try
String count = doc.select("span.tabCount").text();
Where you define the type of the element "span" and class name ".tabcount"
For an example that might help you, look at this link
Edit:
Try this code instead, this will get the value of the first span.
Elements elements = doc.select("span.tabCount");
String count = elements.first().text();
And if you want to print all elements you could do like this.
Elements elements = doc.select("span.tabCount");
for (Element e : elements) {
Log.d("Something", e.text();
}
Haven't you meant .select(".tabCount")?
BTW, on Android AsyncTasks are more convenient than Threads. Also, empty catch blocks are a bad practice.
Your select statement is wrong. You can insert the whole selection string in one line. Furthermore you have to prefix "tabCount" with a dot as it is a class.
String count = doc.select("div.navi div.tab.mail a").first().select(".tabCount").text();

Categories

Resources