I'm making an Android app for my board community. The board provider gives me RSS feeds from general categories but don't generate feeds from topics. So I retreive topics URLs from these feeds and want to parse HTML with Jsoup and give it to a WebView.
It works nice except with the select() function which returns nothing.
The "HTML RETREIVED" log gives me : <html><head><title>The topic title</title></head><body></body></html>
h1 tags are in the code on test purpose : it displays well on WebView and the title of the parsed webpage too.
I also putted the log line right after the select() line. It returns nothing too.
I've tried in a pure Java project to parse with Jsoup only and it goes well.
So I assumed something's wrong with Android.
PS : Internet permission is active in the manifest.
Did I miss something ?
Here is the code :
String html;
Bundle param = this.getIntent().getExtras();
String url = param.getString("url");
try {
Document doc = Jsoup.connect(url).get();
doc.select(".topic .clear").remove();
String title = doc.title().toString();
html = doc.select(".username strong, .entry-content").toString();
html = "<html><head><title>"+title+"</title></head><body><h1>"+title+"</h1>"+html+"</body></html>";
WebView webview = new WebView(this);
getWindow().requestFeature(Window.FEATURE_PROGRESS);
setContentView(webview);
webview.getSettings().setJavaScriptEnabled(true);
final Activity activity = this;
webview.setWebChromeClient(new WebChromeClient() {
public void onProgressChanged(WebView view, int progress) {
activity.setProgress(progress * 1000);
Log.d("LOADING",""+ progress);
}
});
webview.loadData(html, "text/html", "UTF-8");
//webview.loadUrl(url);
Log.i("HTML RETREIVED", ""+html);
} catch (IOException e) {
Log.e("ERROR", "Error while generate topic");
}
Ok I've found out something interesting.
The class I wanted to select was not here because I'm getting the mobile version of the webpage. It appears Android App use a mobile user-agent, which is quite normal but not said anywhere.
Anyway I know what thinking about now.
Related
I wanna save all web page including .css .js on android by programmatically.
So far I tried html get method and jsoup , webview content but all of them I could not save all page with css and js. These methods just save html parts of WEB Page. When I save the all page ,I want to open it offline.
Thanks in advance
You have to take the html, parse it and get the urls of the resources and then make requests for those urls too.
public class Stack {
private static final String USER_AGENT = "";
private static final String INITIAL_URL = "";
public static void main(String args[]) throws Exception {
Document doc = Jsoup
.connect(INITIAL_URL)
.userAgent(USER_AGENT)
.get();
Elements scripts = doc.getElementsByTag("script");
Elements css = doc.getElementsByTag("link");
for(Element s : scripts) {
String url = s.absUrl("src");
if(!url.isEmpty()) {
System.out.println(url);
Document docScript = Jsoup
.connect(url)
.userAgent(USER_AGENT)
.ignoreContentType(true)
.get();
System.out.println(docScript);
System.out.println("--------------------------------------------");
}
}
for(Element c : css) {
String url = c.absUrl("href");
String rel = c.attr("rel") == null ? "" : c.attr("rel");
if(!url.isEmpty() && rel.equals("stylesheet")) {
System.out.println(url);
Document docScript = Jsoup
.connect(url)
.userAgent(USER_AGENT)
.ignoreContentType(true)
.get();
System.out.println(docScript);
System.out.println("--------------------------------------------");
}
}
}
}
I have similar problem...
Using this code we can get images,.css,.js. However some html contents are still missing.
For instance when we save a web page via chrome,there are 2 options.
Complete html
html only
Out of .css,.js,.php..."Complete html" consists of more elements than "only html". The requirement is to download the html as complete like chrome does in the first option.
I have a hidden div which by JavaScript gets filled with json text. I need to find this div and read the json text from it. How can this be done?
<html>
<div id="hiddenJSON">
{
"id":"1234",
"Name":"Jonas",
"Address":"Test Road 5",
"Phone":"1234-1234-1234"
}
</div>
</html>
try below code :-
Pattern p = Pattern.compile(Pattern.quote("<div id=\"hiddenJSON\">") + "(.*?)" + Pattern.quote("</div>"));
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group(1));
}
But better solution is you have to receive data without html tag so talk with back end person.
It would be best to use a library for this such as JSoup. Check out this question about parsing html code
Here is how i solved this:
result is the response from #JavascriptInterface
WebView Fragment
WebView wv = ...
wv.addJavascriptInterface( this, "android" );
wv.loadUrl( "javascript:android.showHTML(document.getElementById('hiddenJSON').innerHTML);" );
Interface in my WebView Fragment
#JavascriptInterface
public void showHTML( String result ) {
// handle JSON (result)
}
Problem:
I had to get the result from my WebView in order to get the JavaScript to run (filling this hidden div with JSON).
I want to make an app which loads the content from the webpage into webview. I want to show only a particular thing in the entire webview, not the whole content of the webpage.
Here is an example. If I use: http://us.m.yahoo.com/w/search%3B_ylt=A2KL8xs0vUBQMg0AwAkp89w4?submit=oneSearch&.intl=us&.lang=en&.tsrc=yahoo&.sep=fp&p=digital+cameras&x=0&y=0 as the URL for the webview, it loads all the contents of the page on the webview. But I want to remove the banner of the page and show it on the webview of my application.
I have tried using adblocker using CSS tags, but that is not helping me. Please give me some idea for overcoming this problem.
Thanks.
Thank You for the answer Zyber. I have solved it using injection of JavaScript in the code for WebView in android.
final WebView webview = (WebView)findViewById(R.id.browser);
webview.getSettings().setJavaScriptEnabled(true);
webview.setWebViewClient(new WebViewClient() {
#Override
public void onPageFinished(WebView view, String url)
{
webview.loadUrl("javascript:(function() { " +
"document.getElementsByTagName('header')[0].style.display="none"; " +
"})()");
}
});
webview.loadUrl("http://code.google.com/android");
This solved my purpose and it is easy to use to.
I got the solution to add this:
view.getSettings().setJavaScriptEnabled(true);
view.setWebViewClient(new WebViewClient() {
#Override
public void onPageFinished(WebView view, String url)
{
view.loadUrl("javascript:(function() { " +
"var head = document.getElementsByClassName('header')[0].style.display='none'; " +
"var head = document.getElementsByClassName('blog-sidebar')[0].style.display='none'; " +
"var head = document.getElementsByClassName('footer-container')[0].style.display='none'; " +
"})()");
}
});
view.loadUrl("your url");
Adding (var head =) looks like to hide my class in webview.
I hope this will be helpful for someone.
check Jsoup it provides an library which gives an easy way of extracting Html elements from a webpage
DefaultHttpClient client = new DefaultHttpClient();
HttpGet get = new HttpGet(url.toURI());
HttpResponse resp = client.execute(get);
String content = EntityUtils.toString(resp.getEntity());
Document doc = Jsoup.parse(content);
Elements ele = doc.select("div.classname");
This example executes an Http GET and then extracts an Div element with the class "classname" which you can then load in your webview
I don't know if you already tested the Google IO application, but there is a cool feature displaying all the tweets including Google IO hashtags.
I really would like to offer the same feature to my users.
I can do something similar using the API, but I would have to create a custom listview, parsing XML/JSON feeds and that's quite overcomplicated! and of course this list will not update automatically and be a livefeed.
In the application, I have just seen that when I turn off wifi, This is indeed a webview with this url:
http://www.google.com/search?%20tbs=mbl%3A1&hl=en&source=hp&biw=1170&bih=668&q=%23io2011&btnG=Search
Here is a screenshot of the app and the same url in the browser
High resolution picture: http://cl.ly/3q1r0c2J3H163E3G2p2X
But using this url in a webview display only a google search, and does not offer same feature.
I know this app will certainly be opensources, but i am so negative about "within next days" that google promise.
We are still waiting for the Twitter app source code!
If you wait 'til after the conference is over, you'll find the source code for the app here. You'll also find last year's application's source code there.
Update:
Just viewed the source code, and you're almost right. It's a webview with this URL: http://www.google.com/search?tbs=mbl%3A1&hl=en&source=hp&biw=1170&bih=668&q=%23io2011&btnG=Search so it just seems you put %20 in there by accident maybe.
Code:
public static final String EXTRA_QUERY = "com.google.android.iosched.extra.QUERY";
public static final String CONFERENCE_HASHTAG = "#io2011";
private String mSearchString;
//onCreate()
final Intent intent = BaseActivity.fragmentArgumentsToIntent(getArguments());
mSearchString = intent.getStringExtra(EXTRA_QUERY);
if (TextUtils.isEmpty(mSearchString)) {
mSearchString = CONFERENCE_HASHTAG;
}
if (!mSearchString.startsWith("#")) {
mSearchString = "#" + mSearchString;
}
//onCreateView
mWebView = (WebView) root.findViewById(R.id.webview);
mWebView.post(new Runnable() {
public void run() {
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.getSettings().setJavaScriptCanOpenWindowsAutomatically(false);
try {
mWebView.loadUrl(
"http://www.google.com/search?tbs="
+ "mbl%3A1&hl=en&source=hp&biw=1170&bih=668&q="
+ URLEncoder.encode(mSearchString, "UTF-8")
+ "&btnG=Search");
} catch (UnsupportedEncodingException e) {
Log.e(TAG, "Could not construct the realtime search URL", e);
}
}
});
Probably implemented with Loaders API with throttling.
Impatiently waiting for source code as well.
I am trying to parse HTML in android from a webpage, and since the webpage it not well formed, I get SAXException.
Is there a way to parse HTML in Android?
I just encountered this problem. I tried a few things, but settled on using JSoup. The jar is about 132k, which is a bit big, but if you download the source and take out some of the methods you will not be using, then it is not as big.
=> Good thing about it is that it will handle badly formed HTML
Here's a good example from their site.
File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
//http://jsoup.org/cookbook/input/load-document-from-url
//Document doc = Jsoup.connect("http://example.com/").get();
Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}
Have you tried using Html.fromHtml(source)?
I think that class is pretty liberal with respect to source quality (it uses TagSoup internally, which was designed with real-life, bad HTML in mind). It doesn't support all HTML tags though, but it does come with a handler you can implement to react on tags it doesn't understand.
String tmpHtml = "<html>a whole bunch of html stuff</html>";
String htmlTextStr = Html.fromHtml(tmpHtml).toString();
We all know that programming have endless possibilities.There are numbers of solutions available for a single problem so i think all of the above solutions are perfect and may be helpful for someone but for me this one save my day..
So Code goes like this
private void getWebsite() {
new Thread(new Runnable() {
#Override
public void run() {
final StringBuilder builder = new StringBuilder();
try {
Document doc = Jsoup.connect("http://www.ssaurel.com/blog").get();
String title = doc.title();
Elements links = doc.select("a[href]");
builder.append(title).append("\n");
for (Element link : links) {
builder.append("\n").append("Link : ").append(link.attr("href"))
.append("\n").append("Text : ").append(link.text());
}
} catch (IOException e) {
builder.append("Error : ").append(e.getMessage()).append("\n");
}
runOnUiThread(new Runnable() {
#Override
public void run() {
result.setText(builder.toString());
}
});
}
}).start();
}
You just have to call the above function in onCreate Method of your MainActivity
I hope this one is also helpful for you guys.
Also read the original blog at Medium
Maybe you can use WebView, but as you can see in the doc WebView doesn't support javascript and other stuff like widgets by default.
http://developer.android.com/reference/android/webkit/WebView.html
I think that you can enable javascript if you need it.