I am trying to be able to view the source code of a webpage after being given a URL in order to parse the text for a certain string which represents and image url.
I found this post which is pretty much what I am after trying to do but can't get it working:
Post
This is my code below.
public String fetchImage() throws ClientProtocolException, IOException {
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet("www.google.co.uk/images?q=songbird+oasis");
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
return html;
}
but for some reason it just does not work. It forces me to use a try catch statement in calling the method. Once this works I think it will simple from here using regex to find the string "href="/imgres?imgurl=........jpg" to find the url of a jpg image to then be shown in an image view.
Please tell me if i'm going at this all wrong.
First, Google has a search API, which will be a better solution than the scraping you are going through, since the API will be reliable, and your solution will not be.
Second, use the BasicResponseHandler pattern for string responses, as it is much simpler.
Third, saying something "just does not work" is a pretty useless description for a support site like this one. If it crashes, as kgiannakakis pointed out, you will have an exception. Use adb logcat, DDMS, or the DDMS perspective in Eclipse to examine the stack trace and find out what the exception is. That will give you some clues for how to solve whatever problem you have.
Related
I'm pretty new/bad with regex-patterns, but this is what I want:
I've got a webpage with html, and somewhere on that page I have: <input name="__RequestVerificationToken" type="hidden" value="the_value_I_want" />
So, my question is: How can I get the value (the_value_I_want) of the hidden text field in Android?
I did make the HttpGet already (see code below), I just need to know the correct Pattern for this.
Code:
// Method to get the hidden-input value of the Token
private String getToken(){
String url = "http://myhost/Account/Login";
String hidden_token = "";
String response = "";
HttpGet get = new HttpGet(url);
try{
// Send the GET-request
HttpResponse execute = MainActivity.HttpClient.execute(get);
// Get the response of the GET-request
InputStream content = execute.getEntity().getContent();
BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
String s = "";
while((s = buffer.readLine()) != null)
response += s;
}
catch(Exception ex){
ex.printStackTrace();
}
// Get the value of the hidden input-field with the name __RequestVerificationToken
Pattern pattern = Pattern.compile("<input name=\"" + TOKEN + "\" type=\"hidden\" value=\".\" />", Pattern.DOTALL);
Matcher matcher = pattern.matcher(response);
while(matcher.find())
hidden_token = matcher.group();
return hidden_token;
}
So, what should I replace the following line with?
Pattern pattern = Pattern.compile("<input name=\"" + TOKEN + "\" type=\"hidden\" value=\".\" />", Pattern.DOTALL);
Or should I also change something else?
Thanks in advance for the responses.
PS: For those wondering: I need this token to be able to Log-in using a Google-account with a POST-request, combined with the token I got from a Cookie.
Edit 1:
After reading the answer of this stackoverflow question I think it's better to not use a regex-pattern for the HTML page. Does anyone know a better solution (I would appreciate it if this better solution would be with a code sample).
Edit 2:
I tried using Illegal Argument's answer and added the Jsoup library. I did indeed manage to get the token by making the following changes to my code above:
Replace everything in the try { ... } with:
// Get the value of the hidden input-field with the name __RequestVerificationToken
Document doc = Jsoup.connect(url).get();
org.jsoup.nodes.Element el = doc.select("input[name*=" + TOKEN).first();
hidden_token = el.attr("value");
This does indeed get me the token of the hidden field, but now I have an entire new problem.. The token changed, because Jsoup opens a new session. So basically I can't use the Jsoup and are "forced" to use the already open DefaultHttpClient that I also use for the POST.
I will make a new question for this though, since my original answer was just bad questioning by myself (not providing all the details) and so I accept Illegal Argument's answer as the correct one (though it didn't solved my current problem, it might help others).
Try using Jsoup library. Its is a regex parser built for this purpose.
I already asked a similar question before, but wasn't providing enough details. So here is the "same" question, but this time more in-depth.
I've got a webpage with html, and somewhere on that page I have: <input name="__RequestVerificationToken" type="hidden" value="the_value_I_want" />
So, my question is: How can I get the value (the_value_I_want) of the hidden text field in Android, using HttpGet of an already open DefaultHttpClient connection?
My current code:
// Method to get the hidden-input value of the Token
private String getToken(){
String url = "http://myhost/Account/Login";
String hidden_token = "";
String response = "";
HttpGet get = new HttpGet(url);
try{
// Send the GET-request
HttpResponse execute = MainActivity.HttpClient.execute(get);
// Get the response of the GET-request
InputStream content = execute.getEntity().getContent();
BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
String s = "";
while((s = buffer.readLine()) != null)
response += s;
content.close();
buffer.close();
// Get the value of the hidden input-field with the name __RequestVerificationToken
// TODO
}
catch(Exception ex){
ex.printStackTrace();
}
return hidden_token;
}
So, what should I add on the TODO-line?
Because the Token and Cookie only remain as long as the Session stays open, I can't use the Jsoup library for finding the hidden field (which I did by using the code below). Instead I need to use the already open DefaultHttpClient.
Jsoup code:
Document doc = Jsoup.connect(url).get(); // <- this opens a new session
org.jsoup.nodes.Element el = doc.select("input[name*=" + TOKEN).first();
hidden_token = el.attr("value");
Thanks in advance for the responses.
PS: For those wondering: I need this token to be able to Log-in using a Google-account with a POST-request, combined with the token I got from a Cookie.
Ok, lucky for me this was very simple. I just replaced Document doc = Jsoup.connect(url).get(); with Document doc = Jsoup.parse(html);
So in the code of the main post I replaced //TODO with:
Document doc = Jsoup.parse(response);
org.jsoup.nodes.Element el = doc.select("input[name*=" + TOKEN).first();
hidden_token = el.attr("value");
Edit 1:
I thought this did the trick, but it doesn't.. It still thinks there are two different Sessions opened.. :S
Does Jsoup.parse(...) open a new Jsoup.get-session behind the scenes?
Edit 2:
It's even worse.. Every time another page is opened on the website, another session is created and therefore another token is needed.. So I need to discuss some things with the creator of the website/web api hybrid and figure some things out.. Perhaps create a different log-in just for the Web API..
All in all I'm kinda frustrated right now, even though all the problems I've encountered are "solved"..
This question already has answers here:
How can I fix 'android.os.NetworkOnMainThreadException'?
(66 answers)
Closed 8 years ago.
I am trying to get the HTML source from a URL in my Android app using the code below. However, it crashes at the line HttpResponse response = client.execute(request);. How can I fix this?
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
UPDATE! I am too new of a user to post my own official answer apparently, so here it is:
Thanks to both of you! Hichris' comments & links pointed me in the right direction. Here are the 3 hurdles that were causing the problem:
The AsyncTask should be placed in the same class file, but as a
sub-class (I might not have my terminology right there)
I was filling a text box with the resulting html code. However, I was doing that within the doInBackground() section of the AsyncTask when I should have been doing it in onPostExecute().
The urls I was passing to the AsyncTask were not properly converted to URI. This caused the program to crash for some urls but not others.
The best way to use for network call is Volley library check this link "https://developers.google.com/events/io/sessions/325304728".It is very simple and very easy to use. Check the below link for what are the problem in creating HTTPClient and other third party library for network call "https://www.youtube.com/watch?v=MIc4kl3yXw0&list=LLckOGLeNzdRsRG5CvBux_dg&index=8"
I'm writing an app that downloads youtube videos and I got a lttle problem.
As you might know, each video link in youtuube contains direct links to the video.
When the page uses the flash flayer (and not html5 or so), it is stored in the flash object (in it's flashvar attr).
My app parses that flash object and extract from it those direct links (one link for each available video quality).
I get the flash object's html code by downloading the video's html code (e.g http://www.youtube.com/watch?v=VIDEOID) and parsing it.
I use asynctask to dowload the html code (the non mobile version), and here is my downloading code :
HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.USER_AGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0.1)");
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
return html;
Now I got a little problem:
The code above doesn't download the whloe html code. The downloaded html string gets cut somewhere in the middle and I don't get the flash object part. This function works fine with other sites!
Am I doing something wrong?
Thanks :)
Check out BasicResponseHandler.
String html = client.execute(request, new BasicResponseHandler());
Okay, so I was trying to send Http Post Requests to this one site, and I sniffed the sent request with wireshark thus getting the text data from the post request of this site. I used this in a stock Java application, and it worked perfectly fine. I could use the post method regularly with no problem whatsoever, and it would return the appropriate website. Then I tried doing this with Android. Instead of returning the actual html data after executing the post request, it returns the regular page html data untouched. It DOES send a post request (sniff with wireshark again), it just doesn't seem to get the appropriate response. I took the exact same method used from another one of my projects, which worked perfectly fine in that project, and pasted it into my new project. I added the INTERNET user permission in Android, so there's nothing wrong with that. The only visible difference is that I used NameValuePairs in the other one (the one that worked) and in this one I'm directly putting the string into a StringEntity without encoding (using UTF-8 encoding screws up the String though). I used this exact same line of text in regular Java like I said, and it worked fine with no encoding. So what could be the problem? This is the code:
public static String sendNamePostRequest(String urlString) {
HttpClient client = new DefaultHttpClient();
HttpPost post = new HttpPost(urlString);
StringBuffer sb = new StringBuffer();
try {
post.setEntity(new StringEntity(
"__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwULLTE3NDM5MzMwMzRkZA%3D%3D&__EVENTVALIDATION=%2FwEWBAL%2B%2B4CfBgK52%2BLYCQK1gpH7BAL0w%2FPHAQ%3D%3D&_nameTextBox=John&_zoekButton=Zoek&numberOfLettersField=3"));
HttpResponse response = client.execute(post);
HttpEntity entity = response.getEntity();
BufferedReader br = new BufferedReader(new InputStreamReader(
entity.getContent()));
String in = "";
while ((in = br.readLine()) != null) {
sb.append(in + "\n");
}
br.close();
} catch (Exception e) {
e.printStackTrace();
}
return sb.toString();
}
Can you see what's wrong here?