Android rss feed parsing

Android rss feed parsing - android

I am new to android,In my application i have to parse the data and i need to display in screen.But in one particular tag data i can't able to parse why because some special character also coming inside that tag.Here below i display my code.
My parser function:
protected ArrayList<String> doInBackground(Context... params)
{
// context = params[0];
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
test = new ArrayList<String>();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new java.net.URL("input URL_confidential").openConnection().getInputStream());
//Document document = builder.parse(new URL("http://www.gamestar.de/rss/gamestar.rss").openConnection().getInputStream());
Element root = document.getDocumentElement();
NodeList docItems = root.getElementsByTagName("item");
Node nodeItem;
for(int i = 0;i<docItems.getLength();i++)
{
nodeItem = docItems.item(i);
if(nodeItem.getNodeType() == Node.ELEMENT_NODE)
{
NodeList element = nodeItem.getChildNodes();
Element entry = (Element) docItems.item(i);
name=(element.item(0).getFirstChild().getNodeValue());
// System.out.println("description = "+element.item(2).getFirstChild().getNodeValue().replaceAll("<div><p>"," "));
System.out.println("Description"+Jsoup.clean(org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(element.item(2).getFirstChild().getNodeValue()), new Whitelist()));
items.add(name);
}
}
}
catch (ParserConfigurationException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (MalformedURLException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (SAXException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
return items;
}
Input:
<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>my application</title>
<link>http:// some link</link>
<atom:link href="http:// XXXXXXXX" rel="self"></atom:link>
<language>en-us</language>
<lastBuildDate>Thu, 20 Dec 2012</lastBuildDate>
<item>
<title>lllegal settlements</title>
<link>http://XXXXXXXXXXXXXXXX</link>
<description> <div><p>
India was joined by all members of the 15-nation UN Security Council except the US to condemn Israelâ€™s announcement of new construction activity in Palestinian territories and demand immediate dismantling of the â€œillegalâ€ settlements.
</p>
<p>
UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank, saying the move by Israel â€œgravely threatens efforts to establish a viable Palestinian state.â€
</p>
<p>
</description>
</item>
</channel>
Output:
lllegal settlements ----> title tag text
India was joined by all members of the 15-nation UN Security Council except the US to condemn Israel announcement of new construction activity in Palestinian territories and demand immediate dismantling of the illegal settlements. -----> description tag text
UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank, saying the move by Israel gravely threatens efforts to establish a viable Palestinian state. ----> description tag text.

Your text node contains both escaped HTML entities (> is >, greater then) and garbage characters (â€œgrosslyâ€). You should first adjust the encoding according to your input source, then you can unescape the HTML with Apache Commons Lang StringUtils.escapeHtml4(String).
This method (hopefully) returns an XML which you can query (for example with XPath) to extract the wanted text node, or you can give the whole string to JSOUP or to the Android Html class
// JSOUP, "html" is the unescaped string. Returns a string
Jsoup.parse(html).text();
// Android
android.text.Html.fromHtml(instruction).toString()
Test program (JSOUP and Commons-Lang required)
package stackoverflow;
import org.apache.commons.lang3.StringEscapeUtils;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;
public class EmbeddedHTML {
public static void main(String[] args) {
String src = "<description> <div><p> An independent" +
" inquiry into the September 11 attack on the US Consulate" +
" in Benghazi that killed the US ambassador to Libya and" +
" three other Americans has found that systematic failures" +
" at the State Department led to â€œgrosslyâ€ inadequate" +
" security at the mission. </p></description>";
String unescaped = StringEscapeUtils.unescapeHtml4(src);
System.out.println(Jsoup.clean(unescaped, new Whitelist()));
}
}

Is there anything wrong with simply replacing the offending characters?
string = string.replaceAll("<", "");
string = string.replaceAll("div>", "");
string = string.replaceAll("p>", "");

Run the node value with Html.fromHTML() two or three times and it wil be fine.
EXPLANATION: The built-in Html.fromHTML() method will convert wild and broken HTML into usable content. Pseudo code here:
sHTML = node.getNodeValue()
sHTML = Html.fromHTML(sHTML)
sHTML = Html.fromHTML(sHTML)
sHTML = Html.fromHTML(sHTML)
By the the third or fourth time unreadable content will become readable again. You can display it in a textview or loaddata with a webview.

Related

How to get Google search headings with Jsoup

I am trying to get the headings of Google search with Jsoup.
Here is my code:
String request = "https://www.google.com/search?q=" + query + "&num=5";
try {
Document doc = Jsoup
.connect(request)
.userAgent(
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
.timeout(5000).get();
Elements headings = doc.select("h3");
//headings array is empty
} catch (IOException e) {
e.printStackTrace();
}
I get no results from doc.select("h3"). What am I doing wrong?

Check your Document's content, perhaps the request didn't go through properly or the result is different from your browser.

Android Jsoup find specific element

I wish to parse IMBD page to get the movie rating. (Please do not offer me APIs). This is my code (for now):
private static class getData extends AsyncTask<String, String, Void>
{
String url = "https://www.imdb.com/title/tt0437086/";
#Override
protected Void doInBackground(String... strings) {
try {
Document document = Jsoup.connect(url).get();
Elements img = document.select("span");
}
catch (IOException e) {
e.printStackTrace();
}
return null;
}
I get all the span, but do I need to cycle all of them to find the rating?
What I need is the rating from this line (specifically the rating itself):
<span itemprop="ratingValue">7.5</span>
How can I get the rating without cycling trough all elements?

You can select a specific span
Elements img = document.select("span[itemprop=ratingValue]");
Log.e("TEST", "Result: " + img.text());
I tested here and it is properly printing 7.5
You can find more info about the selector syntax here

Retrieving information from google page

I'm thinking about making my first android app, It'd be about movies, I found an excellent data source, it is "http://www.google.com/movies?" but I wanted to know how could I extract this information and put it in my app,
I've searched but I don't know which is the optimal way to do this? does google have an API for this? is that what I want? is it better with the source code?what could I read or see to learn to do this?
thanks a lot guys, Is my first time as well programming retrieving information from the cloud,
cheers

Yup. Here is one way to do it.
First, you need to find the source of the SQL. The Yahoo Developer Console is a great place to look for this sort of stuff. It has EVERYTHING. The way these resources work is that you have a long link, like this....
developer.yahoo.com/blah/this . . . &q=KEYWORD_HERE+blah/ . . .
To access the information you are looking for, you stick whatever the correct keyword is where "KEYWORD_HERE" is, and the link will give you info in SQL format. I'll be doing the example as a stocks app.
First you create an Activity and define both sides of your link as strings. It'll look a bit like this:
public class InfoActivity extends Activity {
String firstHalf = "http://query.yahooapis.com/v1/public/blahblahblah&q=";
String secondHalf = "+blah/blah&blah . . . ";
Then, in your onCreate, you'll need to start an aSync task to do the actual pulling and parsing:
protected void onCreate(Bundle bundle) {
super.onCreate(bundle);
setContentView(R.id.layout_name);
final String yqlURL = firstHalf + KEYWORD_HERE + secondHalf;
new MyAsyncTask().execute(yqlURL);
}
Then to define our MrAsyncTask:
private class MyAsyncTask extends AsyncTask<String, String, String>{
protected String doInBackground(String... args) {
try {
URL url = new URL(args[0]);
URLConnection connection;
connection = url.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection)connection;
int responseCode = httpConnection.getResponseCode();
// Tests if responseCode == 200 Good Connection
if (responseCode == HttpURLConnection.HTTP_OK) {
InputStream in = httpConnection.getInputStream();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse(in);
Element docEle = dom.getDocumentElement();
NodeList nl = docEle.getElementsByTagName("nodeName1");
if (nl != null && nl.getLength() > 0) {
for (int i = 0 ; i < nl.getLength(); i++) {
//Parse the node here with getTextValue(n1, "Name of element")
//ex: String movieName = getTextValue(n1, "MovieName");
}
}
}
} catch (MalformedURLException e) {
Log.d(TAG, "MalformedURLException", e);
} catch (IOException e) {
Log.d(TAG, "IOException", e);
} catch (ParserConfigurationException e) {
Log.d(TAG, "Parser Configuration Exception", e);
} catch (SAXException e) {
Log.d(TAG, "SAX Exception", e);
}
finally {
}
return null;
}
I hope that gives you some idea of how to do this sort of thing. I'll go see if I can quickly spot a good resource on the yahoo apis to get the movie times at a certain location.
Good luck :) Let me know if you need anything clarified.
EDIT:
Looks like this is EXACTLY what you need (resource wise):
https://developer.yahoo.com/yql/console/?q=show%20tables&env=store://datatables.org/alltableswithkeys#h=select+*+from+google.igoogle.movies+where+movies%3D'68105'%3B
Check that out. Using that, your two halves of the link would be:
String firstHalf = "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20google.igoogle.movies%20where%20movies%3D'"
String secondHalf = "'%3B&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys"
And then to get your final link, you would just do
String yqlURL = firstHalf + "ZIP CODE OF YOUR LOCATION" + secondHalf;
And you would have all of the movies playing near you returned!

Make your life a lot easier and choose the api that is right for you. Choose one of these:
http://www.programmableweb.com/news/52-movies-apis-rovi-rotten-tomatoes-and-internet-video-archive/2013/01/22
Make your decision not only based on the content, but also ease of use and documentation. Documentation is a biggy.
Good luck!

well i would rather advice you to use an TheMovieDB.com API it is simple and provides every info of movies.

Get random words from android dictionary

I am kind of learning android...and I would like to know if there is a way to access 3 letter words or 4 letter words or some specif type of words at random from the android User Dictionary class??Considering the fact that android has an auto correct feature I'm guessing it also has a dictionary in it...thus how do I use that...where can I find a proper tutorial?
i have no idea about the code...searched around a lot...please help me with the code and also the explanation possibly :)

I don't know how to access the android dictionary but you can have a "custom" dictionary as a txt file in the app's assets folder. This link has several word lists from around 20,000 words to 200,000 words. You could find more lists with google.
Afterwards, you can read the txt file and add it to an Array List if it matches the word length. A random word can then be selected from the dictionary list. The following code will create the dictionary and select a random word from it.
private ArrayList<String> dictionary;
private int wordLength; //Set elsewhere
private void createDictionary(){
dictionary = new ArrayList<String>();
BufferedReader dict = null; //Holds the dictionary file
AssetManager am = this.getAssets();
try {
//dictionary.txt should be in the assets folder.
dict = new BufferedReader(new InputStreamReader(am.open("dictionary.txt")));
String word;
while((word = dict.readLine()) != null){
if(word.length() == wordLength){
dictionary.add(word);
}
}
} catch (FileNotFoundException e){
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
dict.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
//Precondition: the dictionary has been created.
private String getRandomWord(){
return dictionaryList.get((int)(Math.random() * dictionaryList.size()));
}

How to organize extracted values when working with jsoup?

How do you guys store the values extracted using jsoup in a way where it can be easily readable? So if you have an HTML code like below.
<td width="200">country1 </td>
<td width="200">country2 </td>
<td width="200">country3 </td>
I want to save the countries and the href link for each one, and later be able to read them easily. The way I do it, I have two ListViews one for the countries and one for the href link. If the user selects for example country2 I find the index of it, then use it to get the href link from the other ListView. I feel this method is not good, how do you guys do it?
This is my jsoup code by the way in case it needs more improvement too.
try {
doc = Jsoup.connect("http://somesite.com").get();
// Here to get the names inside tag a
Elements links = doc.select("a");
for (Element el : links) {
links = el.ownText();
//Save all the links into String Array.
array_link.add(links);
}
//Here to get the names inside tag td
Elements linktwo = doc.select("td");
for (Element eltwo : linktwo) {
linkText = eltwo.ownText();
//Save the countries to String Array
array_countries.add(linkText);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Thank you!

Is this what you want?
try {
Document doc = Jsoup.connect("http://somesite.com").get();
// Here to get the names inside tag a
Elements links = doc.select("a");
Elements linktwo = doc.select("td");
String eltwo = null;
int i = 0;
for (Element el : links) {
eltwo = linktwo.get(i).text();
//Save all the links into String Array.
array_link.add(el.text());
array_countries.add(eltwo);
i++;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Android rss feed parsing - android

Is there anything wrong with simply replacing the offending characters? string = string.replaceAll("<", ""); string = string.replaceAll("div>", ""); string = string.replaceAll("p>", "");

Related

How to get Google search headings with Jsoup

Android Jsoup find specific element

Retrieving information from google page

Get random words from android dictionary

How to organize extracted values when working with jsoup?

Categories

Resources