RSS feed parse from html file - android

i have url of RSS feed Click here in the context of url you can see title and description over there. now i need to parse it in to android i have try to search on it but all help is regarding xml format. but here i want something like "HTML parse" and based on that perticular news description i can parse.. so is there any idea regarding those parse if yes then please help me on this...
one more thing in my searching i found that this link may be usefull for me and it guide me or attract to use "Apache Feedparser" so is this right way ??

My advice is to use JSoup to parse the HTML.
It is very simple to use and well documented, you should not have too much trouble parsing your page.
EDIT, to point you in the right direction for parsing your page:
You should take a look at this documentation page.
You should able to parse the title and article content with something like this:
Document doc = Jsoup.connect("http://.....").get();
String title = doc.select("h3").first().text(); //text in <h3> tag
String articleContent = doc.select("div.articleLeft p").toString(); //text in <p> elements nested in the <div class="articleLeft">

Related

How do I pick selective html content in my webview in android?

I am currently trying to import selective headline from html content in my webview. I am looking at wide variety of options like json parsing or any hack will do. I was wondering if anyone has had experience with this or a brief idea on how to go about this?
Here's my example:
This is my html file content:
<div><h1><span class = "headline"> Some depressing title </span> <span class = "source" > ABCD </span> </h1> <br/> <span class = "body"> crappy body content which I do not need </span></div>
I just want to retrieve "headline" and "source" from this html in my webview, nothing else(not the body ). How do I go about defining a parameter to retrieve these? Any clues on how to do it?
Thanks!
Step 1: get the HTML source from your WebView - see this question. You basically create a JS interface that extracts your HTML source to a Java String.
Step 2: Use an HTML Parser (for example JSOUP) to parse the JAVA String into a format that you can handle easily.
Step 3: Use the parser to extract your relevant information. Here, you could use getElementsByTag('span') to get all your spans, then filter by class; or you could directly use getElementsByClass('healine') and getElementsByClass('source').
In general, you can retreive the HTML source and parse the DOM in all cases.
Edit: if you don't want to use a parser, you can extract your information by using searches on the HTML source string (finding the correct classes, then finding the indexes of '<' and '>' caracters to parse the information. This way is harder, less efficient, and less flexible, but it can be done.

Parsing content which contains html tags using XMLPullParser

I am building an app in android using XmlPullParser.
How can I get the content from an html formatted like this?
<div class="content">
"Some text is here."
<br>
"some more text "<a class="link" href="adress">continues here</a>
<br>
</div>
I want to parse all the content like this:
"Some text is here.
some more text continues here"
"continues here" part should also be hyperlinked.
ADDITION after some comments: HTML is first put into Yahoo YQL and YQL generates an XML. I use the generated XML file in the code. Above mentioned part that i want to parse is from the generated XML.
Both HTML and XML, although they share common syntax in some cases, are different. I think using a XmlPullParser for that purpose is not a good idea. I recommend using one of the several Java HTML parsers for that.
XmlPullParser is meant to deal with XML. It's really rare to encounter XHMTL pages that are well structured on the web. An XML Parser would expect very well formatted data and is not supposed to be fault tolerant. On the other hand, HTML is usually loosely organized.
So, no, it's not a good idea. You should prefer other libraries like tagsoup or geronimo.
PS : and the best when you ask a stack over flow question is to try something by yourself and, if blocked, then ask. Not the other way around.

Getting content from Rss link tag for Android app

as I'm new in developing Android apps, I'm here for some help about creating one for specific website. The task is to make app for news: there is supposed to be menu with categories like Business, Fun, Sport etc. with list of titles and pubDates for example (listView) and when you click on the item the whole news opens (title, date, images, content or similar).
Website has RSS with structure like this:
<?xml version="1.0" encoding="utf-8"?><rss xmlns:a10="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<item>
<guid isPermaLink="true">http://www.balkanmagazin.net/novosti-i-politika/cid128-57413/putin-i-kiril-cvrsto-zajedno</guid>
<link>http://www.balkanmagazin.net/novosti-i-politika/cid128-57413/putin-i-kiril-cvrsto-zajedno</link>
<category>Novosti i politika</category>
<title>Putin i Kiril čvrsto zajedno</title>
<description><img alt="" style="" src="/Storage/Global/DynamicImage/cid-57413- 455-253-Kiril-Putin-ilu61cc0f32-1ea0-4049-9455-259d70fda69d.jpg" />
<p>Važnije od ljudskih prava su vera, moral, svetinje i otadžbina, smatra Ruska pravoslavna crkva, ušančena u rovu odbrane Rusije od Zapada zajedno s Kremljom<br/>(foto: patrijarh Kiril i predsednik Putin)</p>
</description>
<pubDate>Tue, 05 Feb 2013 01:12:12 +0100</pubDate>
</item>
...
</channel>
I'm asking are these tags enough to provide content and images from articles? Is there any way to get them from description or link tag or RSS feed has to have content tag for that?
Any help would be appreciated.
You want to familiarise yourself with a Sax Parser to read in the data.
A good tutorial I have personally used for Sax Parsing is here.
You will easily be able to extract the different fields from the Rss feed and add them to a custom object with Sax Parsing.
You can get hold of the link tag and you can set off a Http request to download the actual content of the URL, and then display it/parse it separately in a similar way. (HTML parsing can be tricker than RSS/XML, but still possible).
For extracting images etc from description, you are looking at a trickier solution, but you could for example search through the String you have extracted for the description and look for the "src=" tag, taking the text after that using String.subString() until you get to the end of the tag.
Hope this helps set you on your way.

How to parse the XML which contains HTML contents

i am new for android. could you help me to parse this XML which contains the HTML contents like,
<title>Jeff Mayweather: Floyd Sr showed a Sign of finally letting go of his Son, Passing Torch to Roger</title>
<summary type="html">
<p>By Shawn Craddick</p><p></p>
<p>Boxingsocialist had a chance to catch up with Floyd Mayweather's other uncle Jeff Mayweather. While Jeff stays busy at the gym he gave us some updates on his fighters as well as his thoughts on Brandon Rios, Gamboa, Floyd Mayweather Sr and Floyd Jr. meeting back together. Also he talked to us about a surprise boxing veteran he might be working with. Check out the interview below.</p>
<p><br></br> <span style="color: #ff6600;">BoxingSocialist</span>- What did you…</p> </summary>
I can parse the title field , For parsing the summary field I give the command in RSS handler-- localname.equals("summary") . i cannot parse the content in the summary field. anyone help me on this??
You can use the jsoup to parse the html content in java.
tutorial Link example
Cheers
try this one
android.text.Html.fromHtml(text).toString();
Once I had such feed with html data inside tags. My solution was to ask data provider to wrap html with CDATA. So, if you have access to how xml is made, consider this option.

Extracting data using JSoup

I am trying to extract product name information from Google Shopping (http://www.google.co.uk/m/products?q=5010459007289, phone website).
The product name always appear in between the span with class "owb63p",for example
"<span class="owb63p">Highland Spring Sports Bottle 750 Ml</span>"
I am new with JSoup, I can connect with the URL and get the whole document, but I just need help setting it up so that I only get the piece of information I need.
In JSoup it will be like:
Document doc = Jsoup.connect("www.google.co.uk/m/products?q=5010459007289").get();
Element title = doc.select("span.owb63p").first();
System.out.println(title.text());
I don't like JSoup that much, but with apache jericho it would like :
Source source=new Source(new URL(sourceUrlString));
String content=source.getFirstElementByClass( "owb63p" ).getContent().toString();
It looks like JSoup examples has what you are looking for.
You could try
doc.select("span").get(0).data();
or you can simply iterate for multiple span tags...

Categories

Resources