Getting info from a website using Jsoup - android

So I'm making a small app for myself so I can see my earnings for my apps using Jsoup. The code I have works perfectly, it's extracting the text that I'm having trouble with. I looked at the source code for the website and the text that I want to extract is in a div class named "subheading".
<div class="subheading">
Total revenue: $1.17
Reports
</div>
This is what the div class looks like. Now I want to extract the piece that says "Total revenue: $1.17". So in my code I put
Elements elements = document.select("div.subheading");
When I run the app, it doesn't crash, it just shows up blank. I know my code works though because I put "body" into the document.select(); and the entire body showed up. Does anyone know why nothing is showing up when I use "div.subheading"? Thanks for your help!

Try this
Document doc = Jsoup.parse(html);
Elements elements = doc.select("div.subheading");
String data = elements.text();
Log.i(".........",""+data);

Related

Android WebView .loadData Trims the text

I insert the code of the page in WebView, but its code is cut off.
Both elements are at the bottom
String html = readFile("index.html"); webWiev.loadData(html, "text/plane; charset=utf-8", "utf-8"); T.setText(html); //EditText T = findViewById(R.id.editTextTextMultiLine);
Both elements get information from the same html variable
The page was broken so I rendered it as code, that's how I discovered the problem. ("text/plane; )
The code contains all the necessary script style pages, a total of 3496 strips, and 96,501 characters.
What could be the problem? I did not find it on the Internet. Maybe some webView limit.
The Android application must download the web application from the server or, if there is no connection, from a file. Accordingly, the page is displayed incorrectly when loaded from a file, I specify the entire web application with styles and scripts in one file. Everything I did was aimed at identifying the problem. And when I found it, I started looking for it on the Internet, but I couldn't find it.

All webview content printed in one page

Case: User should be able to view and print pdf
My solution: I am opening PDF inside Webview with the help of docs.google.com/gview. Below is my code
Set up Webview
string url = "http://www.africau.edu/images/default/sample.pdf";
string gview = $"https://docs.google.com/gview?embedded=true&url={url}";
mWebView.LoadUrl(gview);
Print PDF
var printMgr = (PrintManager)GetSystemService(PrintService);
printMgr.Print("print", mWebView.CreatePrintDocumentAdapter("print"), null);
Below is the screenshot. As you can see PDF loads just fine
Problem
When I want to print PDF, all the PDF pages are printed in one paper which you can see below
I would appreciate any suggestion, including different library for displaying/printing pdf or suggestion in Java or Kotlin, I can convert them to C#.
I would not print the web page but print the PDF directly as when printing the web page it just sees it as a longer web page and knows nothing about the content.
Use a custom print adapter instead, but instead of drawing a PDF to print you can just use the existing PDF you already have.
See for details https://developer.android.com/training/printing/custom-docs.html

Http.get loads and parse website partially before getting HTML in Flutter

I am trying to perform web parsing in flutter. I want to grab all episode links and numbers from a certain website https://www2.9anime.to/watch/black-clover-dub.2y44/0wql03
This is my code to parse the html:
var url = 'https://www2.9anime.to/watch/black-clover-dub.2y44/0wql03';
http.Response response = await http.get((url));
dom.Document document = parse(response.body);
List<dom.Element> rapidvideoepisodelinks = document.getElementsByTagName('#servers-container');
List<Map<String, dynamic>> rapidvideoepisodelinkMap = [];
for (var link in rapidvideoepisodelinks) {
rapidvideoepisodelinkMap.add(
{
/////////////////////some logic////////////////////
});
}
var rapidvideoepisodejson = json.encode(rapidvideoepisodelinkMap);
rapidvideoepisodelist = (json.decode(rapidvideoepisodejson) as List)
.map((data) => new Rapidvideoepisodelist.fromJson(data))
.toList();
setState(() {
isLoading = false;
});
But the thing is, the episodes content area takes a few seconds to load. And the http.get is loading the website too early before this part is even loaded. Because of this, I am unable to parse it completely. This area containing the episode is not even loaded, so its HTML isn't parsed. Everything else seems to be working fine except for the areas like this that take additional time to load.
Is there a way to solve this issue?
Like parsing the website after it is completely loaded or something like that.
Any help really appreciated.
Your thinking is not really correct. The reason why you can not parse it is NOT because of partial load. http.get is getting the HTML file. That's all. You are just getting the HTML file and you got it. What you see in your browser is not that HTML file. Your browser first gets HTML file and then find what else it should load from the HTML file and then load JPG files, CSS files, JS scripts etc...
The contents you are trying to parse is manipulated by executing JS script inside the Browser. You can not achieve this with http.get. I am not sure how to achieve what you want in flutter. You may need some kind of pseudo browser in dart if any to load the URL and then parse the resulted html. You will never be able to do it with http.get because you do get the HTML file, but you are actually not looking for that HTML file. I am not sure if you can understand what I mean or not.

How to clear html page before showing into a webview in Android?

I have the URL of a webpage to be displayed into a webview in my Android app. Before showing this page i want to clear the html code of this page from some tag (such as the header, footer, ecc..) in order to show only few information. How can i do it? I tried to solve the issue working with JSoup but i can't understand how to create and pass the "new page" to the webview. Anybody can help me?
EDIT
I cleaned the html code useless through jsoup libraries. Then, always by mean of these, i get head and body content and finally i showing the "cleared" web page through these lines:
headURL = doc.select("head").outerHtml();
bodyURL = doc.select("body").outerHtml();
webview.loadData( "<html>"+headURL+bodyURL+"</html>" , "text/html", "charset=UTF-8");
webview.setWebViewClient(new DisPlayWebPageActivityClient());
The view shows the new page but do not load css files specified in the head(that has not been touched). Who can say me why?
You can fetch the WebPage you want to display as a string, parse and remove whatever you don't want and then load this string as data in your webview.
Something like:
String webContent = fetchPage(url);
String cleanedWebContent = cleanUp(webContent);
webView.loadData(cleanedWebContent, "text/html", "UTF-8");
Of course, you will need to implement fetchPage and cleanUp as they are not Android methods

Selecting paragraph tagged with itemprop=recipeInstructions using jsoup on android

I've tried my query in http://try.jsoup.org/ and it works fine. However, when I try it on android (4.2.2) it is returning a zero sized array.
The query I want is [itemprop=recipeInstructions].
The website I'm testing on is http://www.foodnetwork.co.uk/recipes/real-meatballs-and-spaghetti-674.html
My android code looks like
Document doc = Jsoup.connect("http://www.foodnetwork.co.uk/recipes/real-meatballs-and-spaghetti-674.html").get();
Elements recipe = doc.select("[itemprop=recipeInstructions]");
// recipe is a zero sized array :(
I'm linking against jsoup-1.7.3.jar
My android code works fine on the website http://www.foodnetwork.com/recipes/ina-garten/broccoli-and-bow-ties-recipe.html so I suspect it's a bug in the html or how jsoup parses the html of the first site.
Try to add the "User Agent".
Document doc = Jsoup.connect(url).userAgent("Mozilla/4.0").get();
Because, the server may return to different page according to different browser identification.
Try something like that:
Elements recipe = doc.select("p[itemprop = recipeInstructions]");

Categories

Resources