I'm writing an app that downloads youtube videos and I got a lttle problem.
As you might know, each video link in youtuube contains direct links to the video.
When the page uses the flash flayer (and not html5 or so), it is stored in the flash object (in it's flashvar attr).
My app parses that flash object and extract from it those direct links (one link for each available video quality).
I get the flash object's html code by downloading the video's html code (e.g http://www.youtube.com/watch?v=VIDEOID) and parsing it.
I use asynctask to dowload the html code (the non mobile version), and here is my downloading code :
HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.USER_AGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0.1)");
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
return html;
Now I got a little problem:
The code above doesn't download the whloe html code. The downloaded html string gets cut somewhere in the middle and I don't get the flash object part. This function works fine with other sites!
Am I doing something wrong?
Thanks :)
Check out BasicResponseHandler.
String html = client.execute(request, new BasicResponseHandler());
Related
I'm trying to parse some HTML in my Android app and I need to get the text:
Pan Artesano Elaborado por Panadería La Constancia. ¡Esta Buenísimo!
in
Is there any easy way to get only the text and remove all html tags?
The behavior that I need is exactly the one shown in this PHP code http://php.net/manual/es/function.strip-tags.php
Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");
String pConcatenated="";
for (Element x: p) {
pConcatenated+= x.text();
}
System.out.println(pConcatenated);//sometext another p tag
Well when you want just to show it, then webview would help you, just set that string to webview and you got it.
When you would to use it elsewhere then i am to stupid for that :D.
String data = "your html here";
WebView webview= (WebView)this.findViewById(R.id.webview);
webview.getSettings().setJavaScriptEnabled(true);
webview.loadDataWithBaseURL("", data, "text/html", "UTF-8", "");
also you can pass just web URL webview.loadDataWithBaseURL("url","","text/html", "UTF-8", "");
Firstly get HTML code with
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
then I recommend to create custom tag in HTML such as <toAndroid></toAndroid> and then you can get text with
String result = html.substring(html.indexOf("<toAndroid>", html.indexOf("</toAndroid>")));
your html for example
<toAndroid>Hello world!</toAndroid>
will result
Hello world!
Note that you can place <p> into <toAndroid> tags and then remove it in Java from result.
I am reading html source code of a public website using the following code:
Code:
#Override
protected Void doInBackground(Void... params)
{
try
{
URL url = new URL(""+URL);
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
PageCode = "";
OriginalPageCode = "";
while ((inputLine = in.readLine()) != null)
{
PageCode += inputLine;
}
OriginalPageCode = PageCode;
try
{
extract_website_and_save(); // extracting data from PageCode
}
catch (Exception e1)
{
}
in.close();
}
Background:
The above code sometimes can fetch the most updated website properly. But occasionally it linked to an outdated version of the website and hence unable to obtain the most updated information for the website.
I am curious why the above will occur, does it related to extracting from cache instead of the real updated website??
I therefore used Chrome to browse the same link, and discovered that Chrome also fetched the outdated website.
I have tried restarting the device, but the problem continues.
After 30 minutes to an hour, I requested the app to fetch again and it then can extract the most updated information. I at the same time browse the website using Chrome, Chrome can now obtain the most updated website.
Question:
The above BufferedReader should have no relationship with Chrome? But they follow the same logic and hence extracting from cache instead of from the most updated website?
I strongly suspect the end point is being cached by URL
Try something like this
urlSrt = urlSrt + "?x=" + new Random().nextInt(100000);
// If your URL already is passing parameters i.e. example.com?x=1&p=pass - then modify
// the urlSrt line to to use an "&" and not "?"
// i.e. urlSrt = urlSrt + "&x=" + new Random().nextInt(100000);
URL url = new URL(urlSrt);
URLConnection con = url.openConnection();
con.setUseCaches(false); //This will stop caching!
So if you modify your code to something like this.
URLConnection con = url.openConnection();
con.setUseCaches(false);
BufferedReader in = new BufferedReader(new InputStreamReader(
con.getInputStream()));
I want to download videos from you tube in android by programmatically..Still now i can able to stream these you tube videos.I searched in Internet..But there is no perfect solution for me..Please suggest that possible solutions for that issue..Thanks
Check this out. I use this function to extract the direct download links from a youtube video (it returns an array with links). All you need to do is to get the html code of the video (not the mobile version!) using this:
// url = youtube link (e.g. http://www.youtube.com/watch?v=fJ9rUzIMcZQ)
public String DownloadText(String url) throws IOException{
String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0.1)";
HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.USER_AGENT, userAgent);
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
return html;
}
I need to download an HTML page programmatically and then get its HTML. I am mainly concerned with the downloading of the page. If I download the page, where will I put it?
Will I have to keep in an String variable? If yes then how?
This site provides a good explanation on how to download a file, and also how to set the location to where it should be stored. You do not have to, and should not, keep it in a string variable. If you are to manipulate the data I would suggest you use an XML parser.
You can call this method in doInBackground of AsyncTask
String html = "";
String url = "ENTER URL TO DOWNLOAD";
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
I am trying to be able to view the source code of a webpage after being given a URL in order to parse the text for a certain string which represents and image url.
I found this post which is pretty much what I am after trying to do but can't get it working:
Post
This is my code below.
public String fetchImage() throws ClientProtocolException, IOException {
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet("www.google.co.uk/images?q=songbird+oasis");
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
return html;
}
but for some reason it just does not work. It forces me to use a try catch statement in calling the method. Once this works I think it will simple from here using regex to find the string "href="/imgres?imgurl=........jpg" to find the url of a jpg image to then be shown in an image view.
Please tell me if i'm going at this all wrong.
First, Google has a search API, which will be a better solution than the scraping you are going through, since the API will be reliable, and your solution will not be.
Second, use the BasicResponseHandler pattern for string responses, as it is much simpler.
Third, saying something "just does not work" is a pretty useless description for a support site like this one. If it crashes, as kgiannakakis pointed out, you will have an exception. Use adb logcat, DDMS, or the DDMS perspective in Eclipse to examine the stack trace and find out what the exception is. That will give you some clues for how to solve whatever problem you have.