I Have one SDK (android). Is there any way to get entire webpage (HTML javascript whole page) by making API call?
There are many ways to get URL of webpage. I dont want URL I want full HTML of webpage directly so that I can use that in mobile app.
Is it possible? If it is then how? and If it is not then why not?
Any type of help appreciated.
If you have the URL of the page, the API is called HTTP GET and you only request the page with that URL - that will give you all the HTML that is on the page, including all the embedding tags that refer to external javascript, css, images etc.
Related
While I know how to extract contents of a website by URLConnection and BufferedReader and get its source code, sometimes a website is itself getting data from elsewhere and showing onto the page.
e.g. I am now working on this page
http://bet.hkjc.com/marksix/userinfo.aspx?file=lucky_ocbs.asp&lang=en
and the 10 branches name and other details in the table in the page is not in the source code of the page.
Question:
Instead of extracting data from source code, is there any way to extract wordings simply from the final text showing in a page? If yes, how could it be done?
Thanks a lot.
Yes, there is a way to extract the information from the website even if it performs some client side operations such as loading the data from an external website before displaying it. Although it'll be a very tricky solution and if you would have an opportunity to make an agreement with the website's owner and ask him to provide API to your application, I'd choose that option.
Ok, according to your question you can try to use Android's WebView to render the website first. Then just get the html content using one of the method described here. The most tricky part here is to make it in user friendly way. You have to cover a WebView with a progress bar while your app is waiting for onPageFinished callback from WebView. I'm not sure that WebView is acting properly in that case. But it's worth to try.
Short Answer: You can't.
Reason: What renders the HTML is the client side. e.g: Browsers, Chrome, Firefox, IExplore, etc... Since you don't have a interpreter for the Markup Language you are unable to get only tag content ,even the browsers download all content, this is the HTTP behavior.
Workaround: Since you mentioned that some branches are not on page, i assume it is running on client side via some Javascript, what you can do is check what client is executing and perform via code). Since your client is the app.
Also see: Jsoup
You can not extract only your wanted information without download source html. after you downloaded source, you can use jsoup to iterate to only your wanted information.
add this to your app level build.gradle file
compile 'org.jsoup:jsoup:1.9.2'
then you can download and parse source code.
String url = "http://bet.hkjc.com/marksix/userinfo.aspx?file=lucky_ocbs.asp&lang=en";
InputStream input = new URL(url).openStream();
Document doc = Jsoup.parse(input, "ISO-8859-9", url);
Elements sectionElements = doc.select("div#general-info-panel");
Elements imageElements = sectionElements.select("img[src]");
you need to convert above code block to your html page source code. you can find examples to how to use jsoup.
http://phantomjs.org/ can be used to extract a website's content after JavaScript execution. Not sure if they have an android build.
I've created a simple android browser. I have used EditText for URL and Webview for loading webpages. Although the browser works fine when I put the complete URL path, I need a functionality by which I can get predictions/auto-completion of partially put URLs. Please let me know if following options are valid -
Using AutoCompleteTextView instead of EditText with a database of all the available websites on the internet(More than 1 billion websites!! Where can I get such dynamically updating database??)
Saving URLs which are frequently being used by the user and use them as a prediction in an AutoCompleteTextView.(How can I add these URLs in a dynamic AutoCompleteTextView/Database??)
Currently I am using google search as a workaround. Please provide your views on how this can be achieved.
I have an app that has a web-view which has a basic web-form that has a few fields and a submit button. I would like to figure out in my app if the form has any input in any of the fields. I cannot change the form from the server side, and I can't be certain much about the fields (ids / names in the html).
In iOS we accomplish this with an interesting process of pulling all the html out when loading the form, and comparing it to the html at any given point, if they don't match, the user must have entered something into a field. I believe we were able to get the html by injecting and running some javascript into the web-view. I'm not sure exactly how to approach the problem on android, or if android has any better tools to get whether a form has been edited.
Anybody have any ideas / pseudo-code how I can tell if a form has had input in any of the fields in a webview in android?
Unfortunately, there are no special form-related tools in Android WebView either. You can use the same approach as you have described for iOS.
A couple of links to get you started:
Read HTML content of webview widgets
Android Web-View : Inject local Javascript file to Remote Webpage
I have implemented jsoup in android. Jsoup.connect() fetches the html content of a site "http://karnatakatourism.org/" correctly but it doesn't fetch anything for the url "http://karnatakatourism.org/Bidar/en/". I want to fetch the data from the links which are present in html page of www.karnatakatourism.org. Can anyone help me??
It seems that most of the content is loaded by some AJAX magic. You can try to analyze the network traffic to get to the URLs that you are really interested. These might be "getable" via JSoup.connect() then.
Another approach could be the use of other tools like selenium, but I don't know how far you can get with this on the android platform. Probably selendroid could provide your answer.
I pulled a website to a WebView via HTTP GET. The problem is that the website isn't formatted for mobile. I found that if I edit the HTML, I can comment out the scripting that makes the left pane on the site.
Method:
Download page to string, search string for and replace first substring <link with <!--, write to file, and load into the WebView.
That works great until it comes to a link. Clicking on it causes the WebView to attempt to load file:///index.php/Whatever_the_page_was.
What I want to do is capture that link request and change the file:/// part to www.wurmpedia.com, and then run it through my parser to remove the script like the first, and repeat the process on any other link click that follows.
I could not find any other way to pull this off and this is what I made up. Any help would be appreciated, either through URL modification or with a more efficient method.
How about intercepting the link request using
WebView.shouldInterceptRequest