My goal is to interact with a website (not mine), getting and posting data from it to my Android app coded using Kotlin. The interaction part is to be done in the background, as the result is to be shown in a RecyclerView in my app.
The website in question uses Knockout.js - the responsiveness and dynamically changing data makes it impossible to use libraries such as Jsoup for my goal at hand.
I am an aspiring App developer (n00b), and the question I have for the more senior devs here:
Is my project impossible? I have read it is "complex" to interact with a website that is dynamic, and I have also heard it is impossible. Is it? If not, could you guide me to the libraries I should be using? It is ok if these are in Java, I could probably look at adapting these to Kotlin.
If the site you need to extract data from produces a predictable result when you make a request to a URL then it would be easy to extract the data you need from it using a library like Jsoup which you've mentioned. Looking at the Jsoup docs that would be something like:
Document doc = Jsoup.connect("https://en.wikipedia.org/").get();
log(doc.title());
Elements newsHeadlines = doc.select("#mp-itn b a");
for (Element headline : newsHeadlines) {
log("%s\n\t%s",
headline.attr("title"), headline.absUrl("href"));
}
Where doc.select references an id in a given div (or other element) whose contents you're looking to extract.
Whether the site uses knockout or other JS library to help it render content shouldn't matter at all since all you're doing is parsing the string contents of the request--basically what you see when you view source in your browser. Knockout or any other script will have already run, doing its work in the rendering of the final HTML which you're going to parse with Jsoup.
But doing all of this is rather irregular as #Gushan indicates since normally unless you're doing some sort of scraping type of activity which would be weird for an android app, a site that wants to give you data and which you want to get data from will provide an API (usually some sort of REST API) that will simplify (document) how to go about getting that data. But I guess things aren't always like that. :)
Related
While I know how to extract contents of a website by URLConnection and BufferedReader and get its source code, sometimes a website is itself getting data from elsewhere and showing onto the page.
e.g. I am now working on this page
http://bet.hkjc.com/marksix/userinfo.aspx?file=lucky_ocbs.asp&lang=en
and the 10 branches name and other details in the table in the page is not in the source code of the page.
Question:
Instead of extracting data from source code, is there any way to extract wordings simply from the final text showing in a page? If yes, how could it be done?
Thanks a lot.
Yes, there is a way to extract the information from the website even if it performs some client side operations such as loading the data from an external website before displaying it. Although it'll be a very tricky solution and if you would have an opportunity to make an agreement with the website's owner and ask him to provide API to your application, I'd choose that option.
Ok, according to your question you can try to use Android's WebView to render the website first. Then just get the html content using one of the method described here. The most tricky part here is to make it in user friendly way. You have to cover a WebView with a progress bar while your app is waiting for onPageFinished callback from WebView. I'm not sure that WebView is acting properly in that case. But it's worth to try.
Short Answer: You can't.
Reason: What renders the HTML is the client side. e.g: Browsers, Chrome, Firefox, IExplore, etc... Since you don't have a interpreter for the Markup Language you are unable to get only tag content ,even the browsers download all content, this is the HTTP behavior.
Workaround: Since you mentioned that some branches are not on page, i assume it is running on client side via some Javascript, what you can do is check what client is executing and perform via code). Since your client is the app.
Also see: Jsoup
You can not extract only your wanted information without download source html. after you downloaded source, you can use jsoup to iterate to only your wanted information.
add this to your app level build.gradle file
compile 'org.jsoup:jsoup:1.9.2'
then you can download and parse source code.
String url = "http://bet.hkjc.com/marksix/userinfo.aspx?file=lucky_ocbs.asp&lang=en";
InputStream input = new URL(url).openStream();
Document doc = Jsoup.parse(input, "ISO-8859-9", url);
Elements sectionElements = doc.select("div#general-info-panel");
Elements imageElements = sectionElements.select("img[src]");
you need to convert above code block to your html page source code. you can find examples to how to use jsoup.
http://phantomjs.org/ can be used to extract a website's content after JavaScript execution. Not sure if they have an android build.
This question is not for android programmers only , but also for whom interested in web pages design .
I would like to make an android app that renders some parts of specific web pages only (not all part of them) .
I am heard about jsoun library as a tool that does this task
My main problem is:-
How I can choose the correct link from web page's source that render some part of a web page ?.
For example let us take the famous website FORBES
How can I render the list of richest men by their name and Rank,Name net Worth,Change,Age,Source,Country of Citizenship as they appearthere excluding other parts of web page.
Here is a good example of an application that accomplishes like this task
You may have a good suggestion.
You need to screen-scrape the HTML. I'm not sure about any Android libraries for doing this, but I would build a RESTful service to return the data I needed. The service would than do the heavy lifting of scraping the webpage and converting the data to JSON to be sent back to device.
On the server side I would use a library like Beautiful Soup to do the scraping. It is easy enough to use once have it installed. You create a beautifulSoup object from the HTML and make calls like myObject.getTitle() to return the title of the HTML. You can use the tags in the HTML to drill down to the elements you want and build up a JSON object from there. Here is an image of the elements you are interested in for that list. Note the #ids on the right for that list item.
http://i.imgur.com/TMjhYvY.jpg
I have a database of content of which the majority are HTML pages which are then used for display purposes in an app.
We are looking to build out a search feature but I have some concerns over false positives appearing due to the results including HTML code.
E.g searching for "title" will return any content pages which have a title html tag
We are currently using NSPredicates to perform the query on a Core Data database.
Are there any easy/efficient ways to prevent these results being returned?
I have the same problem on Windows and Android as well!
One idea for iOS is to actually store a separate a text version apart from the HTML version. You could then use very simple (even if not very efficient) predicates lie
[NSPredicate predicateWithFormat:#"text CONTAINS[cd] %#", searchText];
A more performant way would be to strip out the words and store them in lowercase in an indexed attribute of another entity.
In both cases, the parsing should be done beforehand via one of the available libraries (see e.g. link in the comment).
i want to write a program that gets the match dates from this link http://www.goal.com/en/teams/germany/148/fc-bayern-munich-news
and use it in my program i just want the dates and the matches how can i do this? in andorid
I'd write an Activity to display the data, which calls an AsyncTask to connect to the site and download the HTML. I'd then use some kind of parser to grab the data I want and save it to a database.
Have you written Java before? If not I'd start out by learning the language. Download Eclipse and write a simple program that can connect to the site and grab the HTML. Then add the parser.
Once you are that far, do the Hello World tutorial, then work your way through the other tutorials. Also learn about the Android Application Lifecycle. At that point you can start thinking about moving your code over to the Android framework.
EDIT
Here are some links to information about potential parsers & parsing approaches.
Tag Soup
What HTML parsing libraries do you recommend in Java
Two HTML parsing links
You could also consider using (hushed voice) regex/pattern matching.
I am looking into developing an App that will convert a website into more readable data for an android app. I am at university and have an online notice board which can be viewed on the web but if possible I would like to transfer this into an app on android to make it more easy to read on mobile devices.
What I thinking is that the app would go to the website where the notice board is held and read in the html code to display each notice in a list adapter view. Each notice is within its own div so I assume I could use that to split each notice up into its own button on the list adapter view. Is this possible and if so how I can go about doing this. I have tried google for an answer but I have not yet found a solution to this problem.
Thanks for your help
It seems overly complicated to me. I wouldn't handle all that using Android. I'd crawl the data on a machine (server) and then I'd convert all needed data to JSON and have the Android (client) fetch the data using a simple JSON parser.
In my opinion that would be the easiest solution if you don't have access to the server the website is hosted on to get it generate a JSON feed for you directly.
EDIT: In answer to your comment Boardy.
Here is the official website of the JSON project in order to get an understanding of what it is. Then if you have access to the webserver providing that page (I assume it is a PHP based site) and want to modify or add the functionality of providing a JSON feed then you should also take a look at the PHP JSON documentation.
To parse JSON on Android check out this SO question and also don't forget to take a look at the official Android documentation on their JSON implementation.