I wish to parse IMBD page to get the movie rating. (Please do not offer me APIs). This is my code (for now):
private static class getData extends AsyncTask<String, String, Void>
{
String url = "https://www.imdb.com/title/tt0437086/";
#Override
protected Void doInBackground(String... strings) {
try {
Document document = Jsoup.connect(url).get();
Elements img = document.select("span");
}
catch (IOException e) {
e.printStackTrace();
}
return null;
}
I get all the span, but do I need to cycle all of them to find the rating?
What I need is the rating from this line (specifically the rating itself):
<span itemprop="ratingValue">7.5</span>
How can I get the rating without cycling trough all elements?
You can select a specific span
Elements img = document.select("span[itemprop=ratingValue]");
Log.e("TEST", "Result: " + img.text());
I tested here and it is properly printing 7.5
You can find more info about the selector syntax here
Related
I know, there are a lot of questions on this, but no answer helped me.
Trying to parse football news from one famous ukrainian portal and put to my listview.
I parsed "news-feed" class:
class ParseTitle extends AsyncTask<Void, Void, HashMap<String, String>>{
#Override
protected HashMap<String, String> doInBackground(Void... params) {
HashMap<String, String> hashMap = new HashMap<>();
try {
Document document = Jsoup.connect("http://football.ua/england.html").get();
Elements elements = document.select(".news-feed");
for (Element element : elements){
Element element1 = element.select("a[href]").first();
hashMap.put(element.text(), element1.attr("abs:ahref"));
}
} catch (IOException e) {
e.printStackTrace();
}
return hashMap;
}
}
Use
Elements elements = document.select("article.news-feed");
Instead of
Elements elements = document.select(".news-feed");
EDIT: comparing my code to yours, I see good differences, firstly and I think more important, you accumulate the read values in a HashMap, I in a StringBuffer. Then I connect and go this way:
try {
doc = Jsoup.connect("http://football.ua/england.html").userAgent("yourPersonalizedUA").timeout(0).ignoreHttpErrors(true).get();
topicList = doc.select("article.news-feed");
for (Element topic : topicList) {
myString += topic.html();
}} catch (IOException e) { System.out.println("io - "+e); }
buffer.append(myString);
Then, if everything worked
return buffer.toString();
Presuming you've already stated at the beggining:
private Document doc;
private String myString;
private StringBuffer buffer;
private Elements topicList;
Not shure if this helps, maybe can lead into a new perspective. Have you succeeded parsing another page with your code?
I'm thinking about making my first android app, It'd be about movies, I found an excellent data source, it is "http://www.google.com/movies?" but I wanted to know how could I extract this information and put it in my app,
I've searched but I don't know which is the optimal way to do this? does google have an API for this? is that what I want? is it better with the source code?what could I read or see to learn to do this?
thanks a lot guys, Is my first time as well programming retrieving information from the cloud,
cheers
Yup. Here is one way to do it.
First, you need to find the source of the SQL. The Yahoo Developer Console is a great place to look for this sort of stuff. It has EVERYTHING. The way these resources work is that you have a long link, like this....
developer.yahoo.com/blah/this . . . &q=KEYWORD_HERE+blah/ . . .
To access the information you are looking for, you stick whatever the correct keyword is where "KEYWORD_HERE" is, and the link will give you info in SQL format. I'll be doing the example as a stocks app.
First you create an Activity and define both sides of your link as strings. It'll look a bit like this:
public class InfoActivity extends Activity {
String firstHalf = "http://query.yahooapis.com/v1/public/blahblahblah&q=";
String secondHalf = "+blah/blah&blah . . . ";
Then, in your onCreate, you'll need to start an aSync task to do the actual pulling and parsing:
protected void onCreate(Bundle bundle) {
super.onCreate(bundle);
setContentView(R.id.layout_name);
final String yqlURL = firstHalf + KEYWORD_HERE + secondHalf;
new MyAsyncTask().execute(yqlURL);
}
Then to define our MrAsyncTask:
private class MyAsyncTask extends AsyncTask<String, String, String>{
protected String doInBackground(String... args) {
try {
URL url = new URL(args[0]);
URLConnection connection;
connection = url.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection)connection;
int responseCode = httpConnection.getResponseCode();
// Tests if responseCode == 200 Good Connection
if (responseCode == HttpURLConnection.HTTP_OK) {
InputStream in = httpConnection.getInputStream();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse(in);
Element docEle = dom.getDocumentElement();
NodeList nl = docEle.getElementsByTagName("nodeName1");
if (nl != null && nl.getLength() > 0) {
for (int i = 0 ; i < nl.getLength(); i++) {
//Parse the node here with getTextValue(n1, "Name of element")
//ex: String movieName = getTextValue(n1, "MovieName");
}
}
}
} catch (MalformedURLException e) {
Log.d(TAG, "MalformedURLException", e);
} catch (IOException e) {
Log.d(TAG, "IOException", e);
} catch (ParserConfigurationException e) {
Log.d(TAG, "Parser Configuration Exception", e);
} catch (SAXException e) {
Log.d(TAG, "SAX Exception", e);
}
finally {
}
return null;
}
I hope that gives you some idea of how to do this sort of thing. I'll go see if I can quickly spot a good resource on the yahoo apis to get the movie times at a certain location.
Good luck :) Let me know if you need anything clarified.
EDIT:
Looks like this is EXACTLY what you need (resource wise):
https://developer.yahoo.com/yql/console/?q=show%20tables&env=store://datatables.org/alltableswithkeys#h=select+*+from+google.igoogle.movies+where+movies%3D'68105'%3B
Check that out. Using that, your two halves of the link would be:
String firstHalf = "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20google.igoogle.movies%20where%20movies%3D'"
String secondHalf = "'%3B&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys"
And then to get your final link, you would just do
String yqlURL = firstHalf + "ZIP CODE OF YOUR LOCATION" + secondHalf;
And you would have all of the movies playing near you returned!
Make your life a lot easier and choose the api that is right for you. Choose one of these:
http://www.programmableweb.com/news/52-movies-apis-rovi-rotten-tomatoes-and-internet-video-archive/2013/01/22
Make your decision not only based on the content, but also ease of use and documentation. Documentation is a biggy.
Good luck!
well i would rather advice you to use an TheMovieDB.com API it is simple and provides every info of movies.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have this code here which should get the information I need to generate strings for each line of text on the TXT document but I am unsure where I have to go from here and whether my app is even getting the information. The strings it needs to generate will be going into a TextView just if that information is needed but I am very unsure where I go from here as I have no knowledge of Jsoup and have been told that this is the easiest way of getting the information from the lines of text into strings but I just need help on how to get it there. The code that is being used is below:
public void updatebutton(View v){
new SyncTask().execute();
Toast.makeText(MainActivity.this, "Updating", Toast.LENGTH_LONG).show();
}
private class SyncTask extends AsyncTask<String, Void, String>{
#Override
protected String doInBackground(String... params) {
Document doc = null;
String returnValue ="";
String baseWebPage = "http://nowactivity.webs.com/teststring.txt";
for(int i = 0; i< params.length; i++){
try {
doc = Jsoup.connect(
baseWebPage)
.get();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Log.i("DOC", "The line " + doc.toString());
}
return returnValue;
}
}
I'm assuming that when you run this code you get every line of teststring.txt printed in your logcat?
If that's the case, then you could change the return type of SyncTask from String to something like ArrayList or even a String array, then add each doc.toString(); to that list using your for loop, then return it.
From there you can access that array of strings and populate your TextViews accordingly.
You could also use Jsoup to cache the text file locally on the user's SD card, then you could populate your TextViews from there without having to reconnect every time.
Edit: Instead of using Jsoup, try using this:
private class SyncTask extends AsyncTask<String, Void, String> {
#Override
protected String doInBackground(String... params) {
String returnValue = "";
String u = "http://nowactivity.webs.com/teststring.txt";
URL url;
InputStream stream;
InputStreamReader streamReader;
BufferedReader reader;
String str;
try {
System.out.println("Reading URL: " + u);
url = new URL(u);
stream = url.openStream();
streamReader = new InputStreamReader(stream);
reader = new BufferedReader(streamReader);
do {
str = reader.readLine();
if (str != null)
System.out.println(str);
} while (str != null);
} catch (MalformedURLException e) {
System.out.println("Invalid URL");
} catch (IOException e) {
System.out.println("Can not connect");
}
return returnValue;
}
}
That would also work inside an AsyncTask, with that you could create the same ArrayList and return it, instead of trying to use Jsoup.
I am new to android,In my application i have to parse the data and i need to display in screen.But in one particular tag data i can't able to parse why because some special character also coming inside that tag.Here below i display my code.
My parser function:
protected ArrayList<String> doInBackground(Context... params)
{
// context = params[0];
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
test = new ArrayList<String>();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new java.net.URL("input URL_confidential").openConnection().getInputStream());
//Document document = builder.parse(new URL("http://www.gamestar.de/rss/gamestar.rss").openConnection().getInputStream());
Element root = document.getDocumentElement();
NodeList docItems = root.getElementsByTagName("item");
Node nodeItem;
for(int i = 0;i<docItems.getLength();i++)
{
nodeItem = docItems.item(i);
if(nodeItem.getNodeType() == Node.ELEMENT_NODE)
{
NodeList element = nodeItem.getChildNodes();
Element entry = (Element) docItems.item(i);
name=(element.item(0).getFirstChild().getNodeValue());
// System.out.println("description = "+element.item(2).getFirstChild().getNodeValue().replaceAll("<div><p>"," "));
System.out.println("Description"+Jsoup.clean(org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(element.item(2).getFirstChild().getNodeValue()), new Whitelist()));
items.add(name);
}
}
}
catch (ParserConfigurationException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (MalformedURLException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (SAXException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
return items;
}
Input:
<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>my application</title>
<link>http:// some link</link>
<atom:link href="http:// XXXXXXXX" rel="self"></atom:link>
<language>en-us</language>
<lastBuildDate>Thu, 20 Dec 2012</lastBuildDate>
<item>
<title>lllegal settlements</title>
<link>http://XXXXXXXXXXXXXXXX</link>
<description> <div><p>
India was joined by all members of the 15-nation UN Security Council except the US to condemn Israel’s announcement of new construction activity in Palestinian territories and demand immediate dismantling of the “illegal†settlements.
</p>
<p>
UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank, saying the move by Israel “gravely threatens efforts to establish a viable Palestinian state.â€
</p>
<p>
</description>
</item>
</channel>
Output:
lllegal settlements ----> title tag text
India was joined by all members of the 15-nation UN Security Council except the US to condemn Israel announcement of new construction activity in Palestinian territories and demand immediate dismantling of the illegal settlements. -----> description tag text
UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank, saying the move by Israel gravely threatens efforts to establish a viable Palestinian state. ----> description tag text.
Your text node contains both escaped HTML entities (> is >, greater then) and garbage characters (“grosslyâ€). You should first adjust the encoding according to your input source, then you can unescape the HTML with Apache Commons Lang StringUtils.escapeHtml4(String).
This method (hopefully) returns an XML which you can query (for example with XPath) to extract the wanted text node, or you can give the whole string to JSOUP or to the Android Html class
// JSOUP, "html" is the unescaped string. Returns a string
Jsoup.parse(html).text();
// Android
android.text.Html.fromHtml(instruction).toString()
Test program (JSOUP and Commons-Lang required)
package stackoverflow;
import org.apache.commons.lang3.StringEscapeUtils;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;
public class EmbeddedHTML {
public static void main(String[] args) {
String src = "<description> <div><p> An independent" +
" inquiry into the September 11 attack on the US Consulate" +
" in Benghazi that killed the US ambassador to Libya and" +
" three other Americans has found that systematic failures" +
" at the State Department led to “grossly†inadequate" +
" security at the mission. </p></description>";
String unescaped = StringEscapeUtils.unescapeHtml4(src);
System.out.println(Jsoup.clean(unescaped, new Whitelist()));
}
}
Is there anything wrong with simply replacing the offending characters?
string = string.replaceAll("<", "");
string = string.replaceAll("div>", "");
string = string.replaceAll("p>", "");
Run the node value with Html.fromHTML() two or three times and it wil be fine.
EXPLANATION: The built-in Html.fromHTML() method will convert wild and broken HTML into usable content. Pseudo code here:
sHTML = node.getNodeValue()
sHTML = Html.fromHTML(sHTML)
sHTML = Html.fromHTML(sHTML)
sHTML = Html.fromHTML(sHTML)
By the the third or fourth time unreadable content will become readable again. You can display it in a textview or loaddata with a webview.
I want to extract information from the web and show that value in my Android app. When I try to write the following code, nothing gets initialized to my textView. I can't see the data I wanted. Can you please tell me whats wrong?
EDIT: Android is now not even going past the line:
Document doc = Jsoup.connect("http://movies.ign.com/articles/100/1002569p1.html").get();
When I run the emulator, it just exits the App. Why is this happening??
Here is my code:
public class Search extends Activity {
private static final String TAG = "TVGuide";
String outputtext;
Parser parser;
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.search);
TextView outputTextView = (TextView) findViewById(R.id.outputTextView);
String id = "main-article-content";
try {
Document doc = Jsoup.connect("http://movies.ign.com/articles/100/1002569p1.html").get();
Elements elementsHtml = doc.getElementsByAttributeValue("id", "main-article-content");
for (Element element : elementsHtml) {
Log.i("PARSED ELEMENTS:", URLDecoder.decode(element.text(), HTTP.UTF_8));
outputTextView.setText(element.text());
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
I think you imported the class org.w3c.dom.Document instead of the required one, org.jsoup.nodes.Document by mistake.
EDIT: Android is now not even going past the line:
Jsoup cant connect to the site? Try to add timeout on the connect:
Document doc = Jsoup.connect("http://movies.ign.com/articles/100/1002569p1.html").timeout(10000).get();