When I try to parse the document from a html file it works fine.
But, when I try to parse a document from a url it gives the following error:
java.lang.IndexOutOfBoundsException: Invalid index 3, size is 2
I am sure the content from the file is the same from the url and I also tried using threads
Here, below is the website:
http://pucminas.br/relatorio_atividades_2014/arquivos/ensino_graduacao.htm
Here, below, is the code
class MyTask extends AsyncTask<Void, Void, String> {
#Override
protected String doInBackground(Void... params) {
String title ="";
try {
URL url = new URL(getString(R.string.url));
Document doc = Jsoup.parse(url, 3000);
Element table = doc.select("table").get(3);
} catch (IOException e) {
e.printStackTrace();
}
return title;
}
}
You should know that Jsoup has a size limit and a timeout limit also, therefor not every table is parsed.
Fortunately, there's a way to change this when connecting to the site and making your document object.
Solution
Document doc = Jsoup.connect(url).maxBodySize(0)
.timeout(0)
.followRedirects(true)
.get();
JSoup APIDocs
Connection#maxBodySize(int bytes)
Update the maximum body size, in bytes.
Connection#timeout(int millis)
Update the request timeout.
Related
I am trying to debug an issue I am having. I am using the following code to try to get the link to an image off of a page.
private class DownloadWebpageTask extends AsyncTask<String, Void, String> {
#Override
protected String doInBackground(String... args) {
String urls = args[0];
Document doc = null;
try {
doc = Jsoup.connect(urls).ignoreContentType(true).get();
image = doc.select("img[src~=(?i)\\.(png|jpe?g|gif)]").last();
theurlstring = "test " + image.attr("src"); // I put test here to make sure it is being executed
} catch (IOException e) {
e.printStackTrace();
}
return urls;
}
}
I am usually getting an error from any way I am trying to get the link from the Element "image." It says
Attempt to invoke virtual method 'java.lang.String org.jsoup.nodes.Element.attr(java.lang.String)' on a null object reference
So with that error, I am now thinking that image is not getting selected properly. Does anyone see anything that looks wrong? Or how could I pinpoint the problem better?
Your query is not working, see http://try.jsoup.org/~I4Y0POaloHUtrNTMJO7IAiAUIRY
You could use:
image = doc.select("img[src$=.png],img[src$=.gif],img[src$=.jpg],img[src$=.jpeg]").last();
Not as compact, but at least selecting the images (see http://try.jsoup.org/~kjnlfvCzrxiqaGQqwcszLZswSNg).
If the error persists, use try.jsoup.org with your source url to verify, that the expected output is rendered in the received html, to rule out issues with javascript generated content.
I am trying to convert iOS application into android. But I just start learning Java a few days ago. I'm trying to get a value from a tag inside html.
Here is my swift code:
if let url = NSURL(string: "http://www.example.com/") {
let htmlData: NSData = NSData(contentsOfURL: url)!
let htmlParser = TFHpple(HTMLData: htmlData)
//the value which i want to parse
let nPrice = htmlParser.searchWithXPathQuery("//div[#class='round-border']/div[1]/div[2]") as NSArray
let rPrice = NSMutableString()
//Appending
for element in nPrice {
rPrice.appendString("\n\(element.raw)")
}
let raw = String(NSString(string: rPrice))
//the value without trimming
let stringPrice = raw.stringByReplacingOccurrencesOfString("<[^>]+>", withString: "", options: .RegularExpressionSearch, range: nil)
//result
let trimPrice = stringPrice.stringByReplacingOccurrencesOfString("^\\n*", withString: "", options: .RegularExpressionSearch)
}
Here is my Java code using Jsoup
public class Quote extends Activity {
TextView price;
String tmp;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_quote);
price = (TextView) findViewById(R.id.textView3);
try {
doc = Jsoup.connect("http://example.com/").get();
Element content = doc.getElementsByTag("//div[#class='round-border']/div[1]/div[2]");
} catch (IOException e) {
//e.printStackTrace();
}
}
}
My problems are as following:
I got NetworkOnMainThreatException whenever i tried any codes.
I'm not sure that using getElementByTag with this structure is correct.
Please help,
Thanks.
I got NetworkOnMainThreatException whenever i tried any codes.
You should use Volley instead of Jsoup. It will be a faster and more efficient alternative. See this answer for some sample code.
I'm not sure that using getElementByTag with this structure is correct.
Element content = doc.getElementsByTag("//div[#class='round-border']/div[1]/div[2]");
Jsoup doesn't understand xPath. It works with CSS selectors instead.
The above line of code can be corrected like this:
Elements divs = doc.select("div.round-border > div:nth-child(1) > div:nth-child(2)");
for(Element div : divs) {
// Process each div here...
}
I'm making an Music player for Android, I want to provide feature for users to get album art of a song from last.fm.
I've got my API key too. Just need help for retrieving the image from Last.fm.
Any help in getting the image url would also be appreciated.
Thanks in advance.
P.S : For more info about my music player, check the link below
https://plus.google.com/u/0/communities/115046175816530349000
I found an solution check below
Add the below AsyncTask loader
public class RetrieveFeedTask extends AsyncTask<String, Void, String> {
protected String doInBackground(String... urls) {
String albumArtUrl = null;
try {
XMLParser parser = new XMLParser();
String xml = parser.getXmlFromUrl(urls[0]); // getting XML from URL
Document doc = parser.getDomElement(xml);
NodeList nl = doc.getElementsByTagName("image");
for (int i = 0; i < nl.getLength(); i++) {
Element e = (Element) nl.item(i);
Log.d(LOG_TAG,"Size = " + e.getAttribute("size") + " = " + parser.getElementValue(e));
if(e.getAttribute("size").contentEquals("medium")){
albumArtUrl = parser.getElementValue(e);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return albumArtUrl;
}
}
Call it as followed :
StringBuilder stringBuilder = new StringBuilder("http://ws.audioscrobbler.com/2.0/");
stringBuilder.append("?method=album.getinfo");
stringBuilder.append("&api_key=");
stringBuilder.append("YOUR_LAST_FM_API_KEY");
stringBuilder.append("&artist=" + URLEncoder.encode("ARTIST_NAME_HERE", "UTF-8"));
stringBuilder.append("&album=" + URLEncoder.encode("ALBUM_NAME_HERE", "UTF-8"));
url = new RetrieveFeedTask().execute(stringBuilder.toString()).get();
You need 2 classes :
1. XmlParser
2. DocElement
Both of which will be available in link below.
Xml parsing tutorial
Please see Last.fm Web Services docs for album.getInfo: http://www.last.fm/api/show/album.getInfo
Here is a sample response, from which you can easily see how to get cover art image url:
<album>
<name>Believe</name>
<artist>Cher</artist>
<id>2026126</id>
<mbid>61bf0388-b8a9-48f4-81d1-7eb02706dfb0</mbid>
<url>http://www.last.fm/music/Cher/Believe</url>
<releasedate>6 Apr 1999, 00:00</releasedate>
<image size="small">...</image>
<image size="medium">...</image>
<image size="large">...</image>
<listeners>47602</listeners>
<playcount>212991</playcount>
<toptags>
<tag>
<name>pop</name>
<url>http://www.last.fm/tag/pop</url>
</tag>
...
</toptags>
<tracks>
<track rank="1">
<name>Believe</name>
<duration>239</duration>
<mbid/>
<url>http://www.last.fm/music/Cher/_/Believe</url>
<streamable fulltrack="0">1</streamable>
<artist>
<name>Cher</name>
<mbid>bfcc6d75-a6a5-4bc6-8282-47aec8531818</mbid>
<url>http://www.last.fm/music/Cher</url>
</artist>
</track>
...
</tracks>
</album>
I am trying to parse the following link:
http://rate-exchange.appspot.com/currency?from=USD&to=EUR&q=1
As you can see it is a very simple page and I am just trying to extract the text on the page with JSoup. My current implementation returns the wrong HTML and I am not sure why. Here is my code:
public class RetreiveCurrencies extends AsyncTask<String, Void, String>{
#Override
protected String doInBackground(String... arg0) {
Document html = null;
try {
Log.i("wbbug",arg0[0]);
html = Jsoup.parse((arg0[0]));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Log.i("wbbug",html.toString());
return null;
}
}
Which is called with:
AsyncTask<String, Void, String> rc = new RetreiveCurrencies().execute("http://rate-exchange.appspot.com/currency?from=USD&to=EUR&q=1");
However, instead of returning the correct HTML with the text you see when clicking the link, my Log.i returns:
<html>
<head></head>
<body>
http://rate-exchange.appspot.com/currency?from=USD&to=EUR&q=1
</body>
</html>
What am I doing wrong and how can I extract the text you see when clicking the link?
Jsoup.parse() takes a String argument, so currently your code is parsing the URL as if it was a String of html code.
To parse a Document from a remote URL you should use Jsoup.connect(), for example:
Document doc = Jsoup.connect("URL").get();
For your specific example (which appears to be returning JSON, not HTML):
Document doc = Jsoup.connect("http://rate-exchange.appspot.com/currency?from=USD&to=EUR&q=1").ignoreContentType(true).get();
System.out.println(doc.text());
Will output:
{"to": "EUR", "rate": 0.73757499999999998, "from": "USD", "v": 0.73757499999999998}
The reason I had to add ignoreContentType(true) is because otherwise it throws an UnsupportedMimeTypeException.
I had a PHP API which showed a JSON Array, which I then read into an Android Application.
I since moved servers and the android application broke.
I assumed it was the Authentication and thought I would re-build the Android application (Was my first application and thought a re-write could make things better)
For some reason I am now getting this exception error
I read somewhere that I need to parse JSON_FORCE_OBJECT in the PHP json_encode
json_encode($arrMyData, JSON_FORCE_OBJECT);
But I am running PHP 5.2 (Options parameter came out in PHP 5.3)
My code for you to rip into
private void displayAllStories(){
String line;
int intNumStories = 0;
JSONObject arrAllStories;
LinearLayout storiesLayout = (LinearLayout) findViewById(R.id.lyoutStoriesMain);
storiesLayout.removeAllViewsInLayout();
try {
while((line = this.jsonResult.readLine()) != null){
JSONObject arrStories;
arrStories = new JSONObject(line.trim());
intNumStories = Integer.parseInt(arrStories.optString("NumStories"));
arrAllStories = arrStories.getJSONObject("StoryData");
this.strDebug += "We have "+intNumStories+"\n";
}
} catch (IOException e) {
this.strDebug += "Error (3) "+e.getLocalizedMessage()+"\n";
} catch (JSONException e) {
this.strDebug += "Error (4) "+e.getLocalizedMessage()+"\n";
}
}
And the encoded data from the website
{
"NumStories":1,
"StoryData":{
"Story0":{
"ID":"1020",
"OWERNAME":"Alicia",
"STORYMAIN":"Good evening my son was born with bilateral club feet. When he was a week old we started serial casting once a week for 3 months and then he was placed in braces for the next 6 months for a 23 hour period and then for the next 3 months just durning the night. This last visit the doctor said that he needs to have his tendons lengthened and he will go back into cast. After reading all of these articles I am a little scared on what will be best for him. It sounds like the risk of having the surgery are just as heavily weighed as just keeping him in AFO\\'s till he can make his own decision. I would like all advice whether it be positive or negative. Thank you in advance for your help.",
"STORYBRIEF":"Need reassurance that tendon lengthening is the best decision.",
"ADDEDDATE":"2011-12-12 00:51:16",
"CURRENTSTATUS":"n"
}
}
}
Sorry I should add, the code before this which procudes jsonResult is as follows
try{
URL url = null;
URLConnection urlConn = null;
InputStreamReader jsonIsr = null;
BufferedReader jsonBr = null;
//this.strDebug += "URL is "+this.strURL+"\n";
url = new URL(this.strURL);
urlConn = url.openConnection();
jsonIsr = new InputStreamReader(urlConn.getInputStream());
jsonBr = new BufferedReader(jsonIsr, 8192);
this.jsonResult = jsonBr;
return true;
}catch(MalformedURLException e){
this.strDebug += "JSON Error (1) "+e.getLocalizedMessage()+"\n";
}catch(IOException e){
this.strDebug += "JSON Error (2) "+e.getLocalizedMessage()+"\n";
}
}else{
strDebug = "NO URL Passed to JSON\n";
}
// EDIT 2
For those who asking
The error is as the title says
Error (4) A JSONObject text must being with '{' at character 1 of {"NumStories":1, "StoryData":........
Your code assumes that whole JSON data comes on one line: it iterates with readLine() but creates a new JSON object every time.
You are reading the data line by line and trying to convert each line into a JSON object. That won't work because a single line just contains a fragment of a complete JSON object.
I don't know what type jsonResult has. But you'll probably want to read the whole thing at once.
Your old web application probably produced JSON data without line break so a single line would contain a full JSON object.
i think you read the json file line by line and pass to the json object you should like this way the whole string you have to pass to the json object for parsing than only you getting the json
JSONObject arrStories = new JSONObject(jsonResult);
now get the object like this way
intNumStories = Integer.parseInt(arrStories.getString("NumStories"));
This code is going to break, if object takes more than one line (apparemtly it does). Your choices are:
Collect all the strings into string builder, the parse from this string ( http://developer.android.com/reference/org/json/JSONTokener.html )
Take GSON or my databinding layer ( https://github.com/ko5tik/jsonserializer ) and just parse stream into object.