how to extract text from a html website in android - android

I want to develop an android application which takes a particular input from the app and gives it to a website, then the website fetches the result. I want to display this result from the website and display it in the android app. I tried using xml parsing but the website is not a xml based website. The website is http://www.fastvturesults.com/ and the input is a roll number(usn number). Can anyone guide me on how to fecth the result from this site and parse it to the app.

Use JSOUP which extracts datas from website and display the result.
Document document = Jsoup.connect(url).get();
// Get the html document title
String title = document.title(); //get title
Elements description = document
.select("meta[name=description]");
// Locate the content attribute
String desc = description.attr("content");
and so on.

Related

All webview content printed in one page

Case: User should be able to view and print pdf
My solution: I am opening PDF inside Webview with the help of docs.google.com/gview. Below is my code
Set up Webview
string url = "http://www.africau.edu/images/default/sample.pdf";
string gview = $"https://docs.google.com/gview?embedded=true&url={url}";
mWebView.LoadUrl(gview);
Print PDF
var printMgr = (PrintManager)GetSystemService(PrintService);
printMgr.Print("print", mWebView.CreatePrintDocumentAdapter("print"), null);
Below is the screenshot. As you can see PDF loads just fine
Problem
When I want to print PDF, all the PDF pages are printed in one paper which you can see below
I would appreciate any suggestion, including different library for displaying/printing pdf or suggestion in Java or Kotlin, I can convert them to C#.
I would not print the web page but print the PDF directly as when printing the web page it just sees it as a longer web page and knows nothing about the content.
Use a custom print adapter instead, but instead of drawing a PDF to print you can just use the existing PDF you already have.
See for details https://developer.android.com/training/printing/custom-docs.html

Why img tag has no src value after parsing with jsoup?

I Want to get src value from html img tag .
by chrome and inside of inspect element i can see value of src ,but when i parse it with jsoup library, src has no value , here's my code :
document = Jsoup.connect("http://estelam.rahvar120.ir/index.jsp?
pageid=2371666&p=1").userAgent(USERAGENT).method(Connection.Method.GET)
.execute().parse();
Element element = document.select("img[id=capimg]").first(); //img
tag element
String absoluteUrl = element.absUrl("src"); // absoluteUrl = ""
String srcValue = element.attr("src"); // srcValue = ""
the website isn't reachable from other countries, but where I want to parse from html is :
<img id="capimg" alt="Enter Captcha :"
src="" width="200" height="60">
The Problem is that jsoup get html content right before javascript set src value, What Should I Do ?
Welcome to SO!
The problem you are facing is not resolvable with Jsoup because Jsoup is a HTML parser not a browser. And since it's not a browser, any content rendered by javascript will not be rendered with Jsoup.
What you need is another tool that simulates web browser such as Selenium
There are multiple way to do this.
Use Selenium to handle page retrieval and scraping.
Use Selenium to get the dynamic pages and use JSoup to parse and scrape the content.
I personally recommend 2nd approach because I am more comfortable using Jsoup to scrape.

how do I add captcha from a website to an app

I am trying t build an app that lets you search your train reservation details, the website has a Captcha to it, I need help with adding that captcha to the app.
Basically I'm trying to turn this page into an app: http://www.indianrail.gov.in/pnr_Enq.html
Try this:
As I can see what you want is this part
<img src="captcha_code_file.php?rand=<?php echo rand(); ?>" id="captchaimg">
now you have the Id of the element, use Jsoup:
Document doc = Jsoup.connect("http://www.indianrail.gov.in/pnr_Enq.html").get();
Elements imgCaptcha= doc.select("#captchaimg");
String imgSrc=imgCaptcha.attr("src");
Log.d(Tag,"image source = "+imgSrc);
now create a stream and download it from imgSrc

How to retrieve url of an image on website

I am developing an application which shows a image from a site on button click.
I learned that concept and did it. But now the problem is on each day image changes on that site and i want to display that new image.
So how can i get the url for that new image?
That site has only one image so i hope if this helps for the solution because i am not getting answer for this over internet.
You can visit site to be clear. If any doubt so please ask in comments and please don't select it as off topic as its related to programming.
Parse it from that html page source. You can use Jsoup
String webUrl = "http://www.yoursite.com/";
Document doc = Jsoup.connect(webUrl).get();
Elements element = doc.getElementsByClass("header");
String elementText = element.text();
You should parse a XML RSS feed. For parsing XML RSS feed you need to use the 'xmlParser' of JSoup.
You want to get the image URLs. The image URLs are in the 'enclosure' tag with attribute 'url'. They is no 'img' tag in the RSS feed. So, you need to read the 'enclosure' tag and not the 'img' tag.
I am attaching the code below to pull the image URLs. This code has been tested by me. Let me know in case of any issues.
String url = "http://www.uefa.com/rssfeed/news/rss.xml";
Document doc = Jsoup.connect(url).parser(Parser.xmlParser()).ignoreContentType(true).get();
for (Element x : doc.getElementsByTag("enclosure")) {
System.out.println(x.attr("url"));
}
You Should parse RSS feed from this url http://apod.nasa.gov/apod.rss
for parsing RSS you can use sax parser
You can see this link to use the sax parser http://samir-mangroliya.blogspot.com/p/android-sax-parser.html
hope can help you

How to display PDF from URL For different language in android

I am making an app in which I am displaying a PDF file from url on WebView by appending Google Doc url i.e;
String pdf_url = "my pdf url";
webView.loadUrl("https://docs.google.com/gview?embedded=true&url="+pdf_url);
its displaying perfect but my app is in Swedish Language and its a requirement of the app that every word must be display in Swedish Language. The problem is when pdf file is shown its showing copy right and some words by Google in English language. Is there a way to convert these words in Swedish? may be the Google Doc url (https://docs.google.com/gview?embedded=true&url=) has the option to set language value. I can't figure this one out and stuck here. Any type of help would be appreciated.
The current output is shown below in the picture for better understanding
can you try to add "se" or "SE" just before the end of the url : exemple:
www.myUrl.html ===> www.myUrl.se.html
www.myUrl.pdf ===> www.myUrl.se.pdf

Categories

Resources