I know how to get data from a url but my point is when a user paste a url in my EditText I want to expand the url to get the description and main image from the url. When we paste a url in a facebook/google + EditText it will read and expand the url so I want just like that. In web development we can get those data from html < meta > tag but in android how can I just get the < meta > data rather than whole url html.
Check out this question on So: How to extract meta tags from website on android?
You can pull the meta tags from websites using that library.
Edit:
Due to the lack of documentation on that other library, I did some research and found something better. There is something called jSoup which is a .jar file and can be imported into your libs folder in your Android Studio project.
jSoup let's you pull tags from websites using a method known as scraping. Disclaimer: make sure you have permission to scrape a website depending on what you're doing with the results. Here's a great tutorial on how to use jSoup to pull tags into Android. It has a really well laid out example.
Hopefully this works a little better.
Related
While I know how to extract contents of a website by URLConnection and BufferedReader and get its source code, sometimes a website is itself getting data from elsewhere and showing onto the page.
e.g. I am now working on this page
http://bet.hkjc.com/marksix/userinfo.aspx?file=lucky_ocbs.asp&lang=en
and the 10 branches name and other details in the table in the page is not in the source code of the page.
Question:
Instead of extracting data from source code, is there any way to extract wordings simply from the final text showing in a page? If yes, how could it be done?
Thanks a lot.
Yes, there is a way to extract the information from the website even if it performs some client side operations such as loading the data from an external website before displaying it. Although it'll be a very tricky solution and if you would have an opportunity to make an agreement with the website's owner and ask him to provide API to your application, I'd choose that option.
Ok, according to your question you can try to use Android's WebView to render the website first. Then just get the html content using one of the method described here. The most tricky part here is to make it in user friendly way. You have to cover a WebView with a progress bar while your app is waiting for onPageFinished callback from WebView. I'm not sure that WebView is acting properly in that case. But it's worth to try.
Short Answer: You can't.
Reason: What renders the HTML is the client side. e.g: Browsers, Chrome, Firefox, IExplore, etc... Since you don't have a interpreter for the Markup Language you are unable to get only tag content ,even the browsers download all content, this is the HTTP behavior.
Workaround: Since you mentioned that some branches are not on page, i assume it is running on client side via some Javascript, what you can do is check what client is executing and perform via code). Since your client is the app.
Also see: Jsoup
You can not extract only your wanted information without download source html. after you downloaded source, you can use jsoup to iterate to only your wanted information.
add this to your app level build.gradle file
compile 'org.jsoup:jsoup:1.9.2'
then you can download and parse source code.
String url = "http://bet.hkjc.com/marksix/userinfo.aspx?file=lucky_ocbs.asp&lang=en";
InputStream input = new URL(url).openStream();
Document doc = Jsoup.parse(input, "ISO-8859-9", url);
Elements sectionElements = doc.select("div#general-info-panel");
Elements imageElements = sectionElements.select("img[src]");
you need to convert above code block to your html page source code. you can find examples to how to use jsoup.
http://phantomjs.org/ can be used to extract a website's content after JavaScript execution. Not sure if they have an android build.
I want to add a tab to my Android App that pulls information from the web. The first tab should be a list of the most popular TV Shows on IMDB (http://www.imdb.com/search/title?at=0&count=100&sort=moviemeter&title_type=tv_series,mini_series) for example.
What would be my first steps? How can I parse this data and then reuse (the title for example) in my app? I am not really familiar with API and parsing data, so I need some guidance towards the right direction.
You can try Jsoup for parsing html data.
Include jsoup in your app by configuring the build path.
Jsoup is so easy to use and parse data
The jsoup website itself is very helpful for its usage.
For easy parsing of the website first understand the source of site and use the Online Jsoup Parser
I have implemented jsoup in android. Jsoup.connect() fetches the html content of a site "http://karnatakatourism.org/" correctly but it doesn't fetch anything for the url "http://karnatakatourism.org/Bidar/en/". I want to fetch the data from the links which are present in html page of www.karnatakatourism.org. Can anyone help me??
It seems that most of the content is loaded by some AJAX magic. You can try to analyze the network traffic to get to the URLs that you are really interested. These might be "getable" via JSoup.connect() then.
Another approach could be the use of other tools like selenium, but I don't know how far you can get with this on the android platform. Probably selendroid could provide your answer.
I pulled a website to a WebView via HTTP GET. The problem is that the website isn't formatted for mobile. I found that if I edit the HTML, I can comment out the scripting that makes the left pane on the site.
Method:
Download page to string, search string for and replace first substring <link with <!--, write to file, and load into the WebView.
That works great until it comes to a link. Clicking on it causes the WebView to attempt to load file:///index.php/Whatever_the_page_was.
What I want to do is capture that link request and change the file:/// part to www.wurmpedia.com, and then run it through my parser to remove the script like the first, and repeat the process on any other link click that follows.
I could not find any other way to pull this off and this is what I made up. Any help would be appreciated, either through URL modification or with a more efficient method.
How about intercepting the link request using
WebView.shouldInterceptRequest
I have read the example for Rss Parsing from the ibm site.(http://www.ibm.com/developerworks/opensource/library/x-android/).
In this example,the rss are shown in a listview and then,if you press one announcement you can see it in the web browser of the device.How could i see them in the app,with no use of the device browser?
Thanks a lot
Create a layout with a WebView then load the URL from each "announcement" using WebView.loadUrl.
I'm a little confused but you seem to have answered your own question.
You say you don't want to use the web browser on the device but the example in your question doesn't use the browser. It does exactly what you're asking for.
The idea is that you download the html from the website and then use the parser to break it up into separate "announcements" and store them in list view items in your program.
I have done a bit of this type of thing myself in android. I used jsoup java library, which makes breaking the html into the bits you want to display really easy.
If you want some more help I can give you an example of an app I made that pulls movie times from google.com/movies as an example. here are links to the classes where I did the html download and parse:
ScreenScraper.java
HtmlParser.java