Can't scrap elements by class name using JSOUP

Can't scrap elements by class name using JSOUP - android

This code returns nothing when I'm trying to scrap data from airbnb,
try {
doc = Jsoup.connect("https://www.airbnb.com").
header("Accept", "text/html")
.header("Accept-Encoding", "gzip,deflate")
.header("Accept-Language", "it-IT,en;q=0.8,en-US;q=0.6,de;q=0.4,it;q=0.2,es;q=0.2")
.header("Connection", "keep-alive")
.userAgent("Mozilla")
.get();
} catch (IOException e) {
e.printStackTrace();
}
Elements els = doc.getElementsByClass("cy5jw6o dir dir-ltr");
System.out.println(els);
I tried the mentioned code and also this
Elements els = doc.getElementsByClass("div.cy5jw6o.dir.dir-ltr");
How to get all elements with this class name and even access links under it or other divs under?

Related

How to get Google search headings with Jsoup

I am trying to get the headings of Google search with Jsoup.
Here is my code:
String request = "https://www.google.com/search?q=" + query + "&num=5";
try {
Document doc = Jsoup
.connect(request)
.userAgent(
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
.timeout(5000).get();
Elements headings = doc.select("h3");
//headings array is empty
} catch (IOException e) {
e.printStackTrace();
}
I get no results from doc.select("h3"). What am I doing wrong?

Check your Document's content, perhaps the request didn't go through properly or the result is different from your browser.

Parsing an XML string into a kXML Element

I'm writing an Android app that connects to a SOAP webservice using kSOAP2, and I have a kXML element where I would like to inject a child based on an XML string I got from elsewhere (a REST API). I have the following code:
Element samlHeader = new Element().createElement("http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd", "Security");
samlHeader.setPrefix("wsse", "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd");
samlHeader.setPrefix("wsu", "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd");
String samlTokenString = ...; //I got this from elsewhere
Element samlTokenElement = ...; //I don't know how to build this
samlHeader.addChild(Node.ELEMENT, samlTokenElement);
So I'm trying to figure out how to build my Element based on the XML string I'm getting from elsewhere.

This is the solution that we ended up implementing:
try {
KXmlParser parser = new KXmlParser();
parser.setInput(new StringReader(samlTokenString));
parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, true);
Document samlTokenDocument = new Document();
samlTokenDocument.parse(parser);
samlHeader.addChild(Node.ELEMENT, samlTokenDocument.getRootElement());
} catch (XmlPullParserException e) {
Log.e(TAG,"Could not parse SAML assertion", e);
} catch (IOException e) {
Log.e(TAG,"Could not parse SAML assertion", e);
}
We're still validating if it produces the right result but it seems to work.

Jsoup doesnt find the specified element

I'm trying to write a little android program (don't mind the messiness) that shows the last "Did you know?" from Wikipedia. But for some reason Jsoup doesn't find it.
What is the problem?
Part of the code:
Document document = null;
try {
document = Jsoup.connect("https://en.wikipedia.org/wiki/Portal:Mathematics/Did_you_know/1").get();
} catch (IOException e) {
e.printStackTrace();
}
//Document document = Jsoup.parse("test.html");
if (document != null) {
Element element = document.select("div#mw-content-text").first();
if (element == null) {
message = "empty";
} else {
message = element.html();
}
}
Part of the wikipedia source code:
<div id="mw-content-text" lang="en" dir="ltr" class="mw-content-ltr"><p>...that outstanding mathematician Grigori Perelman was offered a Fields Medal in 2006, in part for his proof of the Poincaré conjecture, which he declined?</p>
https://en.wikipedia.org/wiki/Portal:Mathematics/Did_you_know/1

Your code works fine on a desktop. Check your android settings according to internet access rights. Also it's a good idea to check where's the real problem.
Some hints:
replace e.printStackTrace(); with a logger
write the value of message variable to a logger too
are you using an AsyncTask?
Are there any errors, exception or something similar?

How to organize extracted values when working with jsoup?

How do you guys store the values extracted using jsoup in a way where it can be easily readable? So if you have an HTML code like below.
<td width="200">country1 </td>
<td width="200">country2 </td>
<td width="200">country3 </td>
I want to save the countries and the href link for each one, and later be able to read them easily. The way I do it, I have two ListViews one for the countries and one for the href link. If the user selects for example country2 I find the index of it, then use it to get the href link from the other ListView. I feel this method is not good, how do you guys do it?
This is my jsoup code by the way in case it needs more improvement too.
try {
doc = Jsoup.connect("http://somesite.com").get();
// Here to get the names inside tag a
Elements links = doc.select("a");
for (Element el : links) {
links = el.ownText();
//Save all the links into String Array.
array_link.add(links);
}
//Here to get the names inside tag td
Elements linktwo = doc.select("td");
for (Element eltwo : linktwo) {
linkText = eltwo.ownText();
//Save the countries to String Array
array_countries.add(linkText);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Thank you!

Is this what you want?
try {
Document doc = Jsoup.connect("http://somesite.com").get();
// Here to get the names inside tag a
Elements links = doc.select("a");
Elements linktwo = doc.select("td");
String eltwo = null;
int i = 0;
for (Element el : links) {
eltwo = linktwo.get(i).text();
//Save all the links into String Array.
array_link.add(el.text());
array_countries.add(eltwo);
i++;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Android: Need help, trying to parse a HTML page using JSoup parser

Here is the code so far I am trying but it is showing me error:
URL url = null;
try {
url = new URL("http://wap.nastabuss.se/its4wap/QueryForm.aspx?hpl=Teleborg+C+(V%C3%A4xj%C3%B6)");
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("1");
Document doc = null;
try {
System.out.println("2");
doc = Jsoup.parse(url, 3000);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("3");
Element table = doc.select("table[title=Avgångar:]").first();
System.out.println("4");
Iterator<Element> it = table.select("td").iterator();
//we know the third td element is where we wanna start so we call .next twice
it.next();
it.next();
while(it.hasNext()){
// do what ever you want with the td element here
System.out.println(it.next());
//iterate three times to get to the next td you want. checking after the first
// one to make sure
// we're not at the end of the table.
it.next();
if(!it.hasNext()){
break;
}
it.next();
it.next();
}
It prints System.out.println("3");
then it stops in this line
Element table = doc.select("table[title=Avgångar:]").first();
How can i solve this problem,
Thanks

It looks like the website you're trying to parse the HTML from has an error and doesn't have any tables on it. This is what's causing the null pointer exception. doc.select("table[title=Avgångar:]") isn't returning an element and then you're trying to call a method on it. To prevent this error from happening again, you could do something like this:
Elements foundTables = doc.select("table[title=Avgångar:]");
Element table = null;
if(!foundTables.isEmpty()){
table = tables.first();
}
Now, if any table was found, the table variable won't be null. You'll just have to alter the code to adapt in case no tables are found.

You're not checking the result of doc.select() before calling .first(). If there are no elements in the document that match the specified query, doc.select() could return null. Then you are calling .first() on a null pointer which, of course, will throw an exception. There is no table tag with the title you have specified in the document that you are using in your example. So, the result is not surprising.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Can't scrap elements by class name using JSOUP - android

Related

How to get Google search headings with Jsoup

Parsing an XML string into a kXML Element

Jsoup doesnt find the specified element

How to organize extracted values when working with jsoup?

Android: Need help, trying to parse a HTML page using JSoup parser

Categories

Resources