strange jsoup behavior when getting first element - android

i'm new in using jsoup, so i don't know why follows appears:
...
Document doc = Jsoup.connect("http://4pda.ru").get();
Elements articleElems = doc.select("article.post");
for(Element article:articleElems)
{
Element desc = article.select("div.description").first();
Elements posts = desc.select("h1.list-post-title");
Log.d(TAG,"size is "+posts.size()); // it's ok, size is 1
...
}
so, as size is 1 i wanna to get first Element, i change the code as follows:
for(Element article:articleElems)
{
Element desc = article.select("div.description").first();
Element post = desc.select("h1.list-post-title").first();
Log.d(TAG,"post is "+post.toString()); // there NullPointerException throws
...
}
i cannot understand this...

You are selecting article that don't have h1.list-post-title
You can use has(). Here is the official doc about has()
:has(seletor): find elements that contain elements matching the selector
Here is the solution with has
Document doc = Jsoup.connect("http://4pda.ru").get();
Elements articleElems = doc.select("article.post:has(h1.list-post-title)");
for (Element article : articleElems) {
Element desc = article.select("div.description").first();
Element post = desc.select("h1.list-post-title").first();
System.out.println(post);
}

Related

How can I extract table with JSOUP

I'm writing an Android app and trying to figure out how should I construct my call to get table data from this webpage: http://uk.soccerway.com/teams/scotland/saint-mirren-fc/1916/squad/
I've read the cookbook from JSOUP website but because I haven't used this library before I am bit stuck. I came up with something like this:
doc = Jsoup.connect("http://uk.soccerway.com/teams/scotland/saint-mirrenfc/1916/squad/").get();
Element squad = doc.select("div.squad-container").first(); Element
Elements table = squad.select("table squad sortable");
As you can see I'm nowhere near getting players statistics yet. I think the next step should be to point new Element object to "tbody" tag inside the "table squad sortable"?
I know I will have to use for loop once I manage to read the table and then read each row inside the loop.
Unfortunately table structure is a bit complex for someone with no experience so I would really appreciate some advice!
Basically each row has the following selector -
#page_team_1_block_team_squad_3-table > tbody:nth-child(2) > tr:nth-child(X) where X is the row's number (starting at 1).
One way is to iterate over the rows and extract the info:
String url = "http://uk.soccerway.com/teams/scotland/saint-mirren-fc/1916/squad/";
String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0";
Document doc = null;
try {
doc = Jsoup.connect(url)
.userAgent(userAgent)
.get();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
int i = 1;
Elements row;
do {
row = doc.select("#page_team_1_block_team_squad_3-table > tbody:nth-hild(2) > tr:nth-child(" + i + ")");
for (Element el : row) {
System.out.print(el.select(".shirtnumber").text() + " ");
System.out.println(el.select(".name").text());
i++;
}
} while (row != null);
This will print the number and name of each player. Since I don'r want to count the number of rows (and keep the program felxible for changes), I orefer to use do...while loop - I will iterate as ling as the row exists (or not empty).
The output I get:
1 J. Langfield
21 B. O'Brien
28 R. Willison
2 S. Demetriou
3 G. Irvine
4 A. Webster
...
Use your browser's developer tools to get the names of the other columns and use it to get all the info you need.

Get Text of Certain Tag after Tag with Certain Text HTML in Android with JSoup

I want to Get Text of Certain Tag after Tag with Certain Text from HTML with JSoup like this :
<td>AAA</td>
<td>1111</td>
<td>BBB</td>
<td>2222</td>
I want to print 1111 if I select AAA, or print 2222 if I select BBB
I Have try this, but nothing printed in text field :
#Override
protected Void doInBackground(Void... params) {
try {
// Connect to the web site
Document document = Jsoup.connect(url).get();
// Using Elements to get the Meta data
Elements description = document
.select("td [value=AAA] td");
// Locate the content attribute
desc= description.text();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(Void result) {
// Set description into TextView
TextView txtdesc = (TextView) findViewById(R.id.desctxt);
txtdesc.setText(desc);
mProgressDialog.dismiss();
}
Anybody can help?
Your selector would select the all tds within a td that has an attribute named "value" with the value "AAA".
Instead, you must match the content of td tags. There is a pseudo selector matchesOwn available for that purpose. (own means, only match the direct text of the element, not that of any children).
Then, advance to the next sibling element (or, if there can be another element following before the td of interest, query again using "td" as the selector).
Try
Element description = document.select("td:matchesOwn("AAA")).first().nextElementSibling();
Beware, the parameter of the matchesOwn pseudo selector is interpreted as a regex.

Android Remove First and Last <div> tag from html text using Jsoup

I want to remove first and last div tag from the html text. i use jsoup library to parse the html text.i tried some thing which are shown in code.The html text which have more than one div tag or not be , but i want to remove just first and last div tag if available. please help me. thanks in advance.
public String divremove(String html) {
Document doc = Jsoup.parse(html);
for (Element e : doc.select("div")){
if (e != null) {
Log.e("LOG","link >> " + e.text());
}
}
/* Element link = doc.removeClass("div");
if (link != null)
{
}
Integer in = doc.select("div").first().elementSiblingIndex();*/
Element link = doc.select("div").first();
Log.e("LOG","link >> " + link);
Element link2 = doc.select("div").last();
Log.e("LOG","link2 >> " + link2.text());
return html;//formatted
}
Here's an example:
final String html = "<div>A</div><div>B</div><div>C</div><div>D</div>";
Document doc = Jsoup.parse(html);
// (1) - Remove from html
doc.select("div").first().remove();
doc.select("div").last().remove();
System.out.println(doc.body());
// (2) - Remove from list
Elements divs = doc.select("div");
divs.remove(0);
divs.remove(divs.size()-1);
System.out.println(divs);
(1) removes the first and last tag from the html, so doc wont contain them anymore. If you just want to remove them from your selected div's, use (2) instead. This will keep it in your html (= doc), but it's removed from divs.

Replace elements using Jsoup, Android

Does anyone know how to replace elements using Jsoup. Im trying to replace table elements and their content with buttons but having no success. Code attempt is below. This is for an android project
Elements elements = doc.select("table");
if (elements != null) {
for (Element element : elements) {
Element button = Jsoup.parse("<button type='button'>Click Me!</button>");
element.replaceWith(button);
}
}
I went about this in a bit of a hacky way it seems to work. The replaceWith(button) attribute did'nt do anything. I do actually want to replace the whole table with a button but i also want to add that button along with the results to a string.
for (int i = 0; i < elements.size(); i++) {
Element sibling = siblings.get(i);
if ("table".equals(sibling.tagName())) {
siblings.remove(i);
Element button = Jsoup.parse("<button type='button'>Click Me!</button>");
sibling = button;
sb.append(sibling.toString());
}
else {
sb.append(sibling.toString());
}
}

Scraping site with jsoup issue

When I scrape a site using jsoup I am getting extra values that I do not want to recieve.
I only want to recieve his name not his team and position. Currently it is also scraping the position and team. I only want to recieve the name.
Page Source:
<td class="playertableData">5</td><td class="playertablePlayerName" id="playername_515" style="">Derrick Rose, Chi PG<a href="" class="flexpop" content="tabs#ppc"
My Code:
while (tdIter.hasNext()) {
int tdCount = 1;
Element tdEl = tdIter.next();
name = tdEl.getElementsByClass("playertablePlayerName")
.text();
Elements tdsEls = tdEl.select("td.playertableData");
Iterator<Element> columnIt = tdsEls.iterator();
namelist.add(name);
OUTPUT:
name: Derrick Rose, Chi PG
You are doing it wrong. By the line,
name = tdEl.getElementsByClass("playertablePlayerName").text();
you will get the complete text of the with class="playertablePlayerName" which includes an anchor tag and a plane text outside any tag. Means, you will get
Derric Rose, Chi PG
Which is your output. To solve this issue, you must include the condition for th anchor tag too. Try using the belove line as a replacement.
doc = Jsoup_Connect.doHttpGet();
Elements tdsEls = doc.getElementsByClass("playertablePlayerName");
name = tdsEls.get(0).child(0).text();
You can traverse through the child of the td you have already got. When you get correct tag, use the chained text() method.
Feel free to ask if you have any doubt.
You can probably hack up this code to get what you want:
Document doc = Jsoup.connect("http://games.espn.go.com/fba/playerrater?&slotCategoryId=0").get();
for (Element e : doc.select(".playertablePlayerName")) {
//this assumes the name is in the first anchor tag
//which it seems to be according to the url in your pastbin
System.out.println(e.select("a").first().text());
}
To translate to your code, I think this will work...
name = tdEl.select("a").first().text();
Let me know if this works for you.
Another solutions:
1.- First Name
String url = "http://games.espn.go.com/fba/playerrater?&slotCategoryId=0";
//First Name
try {
Document doc = Jsoup.connect(url).get();
Element e = doc.select("td.playertablePlayerName > a").first();
String name = e.text();
System.out.println(name);
}
catch (IOException e) {
}
2.- All the names
//All Names
try {
Document doc = Jsoup.connect(url).get();
Elements names = doc.select("td.playertablePlayerName > a");
for( Element e : names ) {
String name = e.text();
System.out.println(name);
}
}
catch (IOException e) {
}

Categories

Resources