Extracting data using JSoup

Extracting data using JSoup - android

I am trying to extract product name information from Google Shopping (http://www.google.co.uk/m/products?q=5010459007289, phone website).
The product name always appear in between the span with class "owb63p",for example
"<span class="owb63p">Highland Spring Sports Bottle 750 Ml</span>"
I am new with JSoup, I can connect with the URL and get the whole document, but I just need help setting it up so that I only get the piece of information I need.

In JSoup it will be like:
Document doc = Jsoup.connect("www.google.co.uk/m/products?q=5010459007289").get();
Element title = doc.select("span.owb63p").first();
System.out.println(title.text());

I don't like JSoup that much, but with apache jericho it would like :
Source source=new Source(new URL(sourceUrlString));
String content=source.getFirstElementByClass( "owb63p" ).getContent().toString();

It looks like JSoup examples has what you are looking for.

You could try
doc.select("span").get(0).data();
or you can simply iterate for multiple span tags...

Related

JSoup Screen Scraping With Many Divs

I have a page I want to scrape with android, and the contents are want are located like this:
body
div#wrapper
div#mainContentArea
div#scheduleModule
div#scheduleDayView
div#scheduleDayViewScroll
div#scheduleItemContainer
div#eventContainer
div#SSPP_o090570*A*
div.eventInfo
p.eventText
span.eventInfoDefault
How can I access the span using jsoup?

If you don't want to be taken out in the streets and whipped for your transgressions, you will split up that block of text there.
Anyway, you want to find the span whose class is eventInfoDefault? Well:
Document site = Jsoup.connect("http://www.example.com");
Element span = site.select("span.eventInfoDefault").first();
//Proceed to do whatever you want with that below.
Source: http://jsoup.org/cookbook/extracting-data/selector-syntax

iPhone/iPad/Android: Is there some api for testing web application accessibility?

Instead of going through VoiceOver or similar software, I want a function which can take an element-id as parameter and return the alt text or label so that I can validate whether the text is correct.
Any other suggestions welcome.

You could use HttpClient to fetch the HTML code from web, and use the jsoup library to parse the code, then find out the attributes of selected element. Download jsoup jar and put it into the lib directory of your project.
Document doc = Jsoup.parse("..."); // ... is the string of HTML code
Elements inputElement = doc.select("#...").first(); // ... is the id of your element
String alt = inputElement.attr("alt") // select the "alt" attribute.

How to parse the XML which contains HTML contents

i am new for android. could you help me to parse this XML which contains the HTML contents like,
<title>Jeff Mayweather: Floyd Sr showed a Sign of finally letting go of his Son, Passing Torch to Roger</title>
<summary type="html">
<p>By Shawn Craddick</p><p></p>
<p>Boxingsocialist had a chance to catch up with Floyd Mayweather's other uncle Jeff Mayweather. While Jeff stays busy at the gym he gave us some updates on his fighters as well as his thoughts on Brandon Rios, Gamboa, Floyd Mayweather Sr and Floyd Jr. meeting back together. Also he talked to us about a surprise boxing veteran he might be working with. Check out the interview below.</p>
<p><br></br> <span style="color: #ff6600;">BoxingSocialist</span>- What did you…</p> </summary>
I can parse the title field , For parsing the summary field I give the command in RSS handler-- localname.equals("summary") . i cannot parse the content in the summary field. anyone help me on this??

You can use the jsoup to parse the html content in java.
tutorial Link example
Cheers

try this one
android.text.Html.fromHtml(text).toString();

Once I had such feed with html data inside tags. My solution was to ask data provider to wrap html with CDATA. So, if you have access to how xml is made, consider this option.

RSS feed parse from html file

i have url of RSS feed Click here in the context of url you can see title and description over there. now i need to parse it in to android i have try to search on it but all help is regarding xml format. but here i want something like "HTML parse" and based on that perticular news description i can parse.. so is there any idea regarding those parse if yes then please help me on this...
one more thing in my searching i found that this link may be usefull for me and it guide me or attract to use "Apache Feedparser" so is this right way ??

My advice is to use JSoup to parse the HTML.
It is very simple to use and well documented, you should not have too much trouble parsing your page.
EDIT, to point you in the right direction for parsing your page:
You should take a look at this documentation page.
You should able to parse the title and article content with something like this:
Document doc = Jsoup.connect("http://.....").get();
String title = doc.select("h3").first().text(); //text in <h3> tag
String articleContent = doc.select("div.articleLeft p").toString(); //text in <p> elements nested in the <div class="articleLeft">

Android JSoup Example

I was just wondering has anyone got a sample eclipse project with a working implementation of JSoup? Im trying to use it to pull information from websites and have gone all over google trying to get it to work but cant. If anyone could help I'd really appreciate it.

JSoup is really easy to use, look at these exemples from the JSoup cookbook:here
First, You have to connect to the webpage you want to parse using:
Document doc = Jsoup.connect("http://example.com/").get();
Then, you can select page elements using the JSoup selector syntax.
For instance, say you want to select all the content of the div tags with the id attribute set to test, you just have to use:
Elements divs = doc.select("div#test");
to retrieve the divs, then you can iterate on them using:
for (Element div : divs)
System.out.println(div.text());
}

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Extracting data using JSoup - android

In JSoup it will be like: Document doc = Jsoup.connect("www.google.co.uk/m/products?q=5010459007289").get(); Element title = doc.select("span.owb63p").first(); System.out.println(title.text());

I don't like JSoup that much, but with apache jericho it would like : Source source=new Source(new URL(sourceUrlString)); String content=source.getFirstElementByClass( "owb63p" ).getContent().toString();

It looks like JSoup examples has what you are looking for.

You could try doc.select("span").get(0).data(); or you can simply iterate for multiple span tags...

Related

JSoup Screen Scraping With Many Divs

iPhone/iPad/Android: Is there some api for testing web application accessibility?

How to parse the XML which contains HTML contents

RSS feed parse from html file

Android JSoup Example

Categories

Resources