Convert html parser with multiple divs from swift to android using Jsoup

Convert html parser with multiple divs from swift to android using Jsoup - android

I am trying to convert iOS application into android. But I just start learning Java a few days ago. I'm trying to get a value from a tag inside html.
Here is my swift code:
if let url = NSURL(string: "http://www.example.com/") {
let htmlData: NSData = NSData(contentsOfURL: url)!
let htmlParser = TFHpple(HTMLData: htmlData)
//the value which i want to parse
let nPrice = htmlParser.searchWithXPathQuery("//div[#class='round-border']/div[1]/div[2]") as NSArray
let rPrice = NSMutableString()
//Appending
for element in nPrice {
rPrice.appendString("\n\(element.raw)")
}
let raw = String(NSString(string: rPrice))
//the value without trimming
let stringPrice = raw.stringByReplacingOccurrencesOfString("<[^>]+>", withString: "", options: .RegularExpressionSearch, range: nil)
//result
let trimPrice = stringPrice.stringByReplacingOccurrencesOfString("^\\n*", withString: "", options: .RegularExpressionSearch)
}
Here is my Java code using Jsoup
public class Quote extends Activity {
TextView price;
String tmp;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_quote);
price = (TextView) findViewById(R.id.textView3);
try {
doc = Jsoup.connect("http://example.com/").get();
Element content = doc.getElementsByTag("//div[#class='round-border']/div[1]/div[2]");
} catch (IOException e) {
//e.printStackTrace();
}
}
}
My problems are as following:
I got NetworkOnMainThreatException whenever i tried any codes.
I'm not sure that using getElementByTag with this structure is correct.
Please help,
Thanks.

I got NetworkOnMainThreatException whenever i tried any codes.
You should use Volley instead of Jsoup. It will be a faster and more efficient alternative. See this answer for some sample code.
I'm not sure that using getElementByTag with this structure is correct.
Element content = doc.getElementsByTag("//div[#class='round-border']/div[1]/div[2]");
Jsoup doesn't understand xPath. It works with CSS selectors instead.
The above line of code can be corrected like this:
Elements divs = doc.select("div.round-border > div:nth-child(1) > div:nth-child(2)");
for(Element div : divs) {
// Process each div here...
}

Related

Is there a transformItems equivalent in the Android Java Libraries for Algolia?

I have a use case where i would like to render the image associated with the hits returned from an Algolia search using the Algolia Java library for Android. I am currently developing on Pie . Here is what i am doing :
I use com.algolia.instantsearch.core.helpers.Searcher
I bind the results to a fragment which has a layout with the algolia attributes for images
<ImageView
algolia:attribute='#{"image_url"}'
>
The trouble is that the response JSON only stores the name of the JPG image which needs to be displayed. I need to dynamically add the base site URL and some more path specifiers . I tried doing something like this
algolia:attribute='https://somedomain.com/somepath1/ProductImages/#{"BaseProductId"}/thumbnails/#{"image_url"}
But that was not accepted.
I am looking for a way to transform the results so i can build the complete URL and place it in the image_url and then use it in the layout as stated in the first code fragment.
Is there any way to do it ?

I solved it by adding a listener and updating the hits object as seen below.
searcher.registerResultListener(new AlgoliaResultsListener() {
#Override
public void onResults(#NonNull SearchResults results, boolean isLoadingMore) {
for (int i=0;i<results.hits.length();i++){
try {
JSONObject obj = results.hits.getJSONObject(i);
String image_url_file = obj.getString("image_url");
String base_product_id = obj.getString("BaseProductId");
String full_image_path = "https://somedomain.com/somPath/ProductImages/"+base_product_id+"/Original/"+image_url_file;
results.hits.getJSONObject(i).put("image_url",full_image_path);
}catch(Exception exx){
}
}
}
}
);

How to save all web page including .css .js?

I wanna save all web page including .css .js on android by programmatically.
So far I tried html get method and jsoup , webview content but all of them I could not save all page with css and js. These methods just save html parts of WEB Page. When I save the all page ,I want to open it offline.
Thanks in advance

You have to take the html, parse it and get the urls of the resources and then make requests for those urls too.
public class Stack {
private static final String USER_AGENT = "";
private static final String INITIAL_URL = "";
public static void main(String args[]) throws Exception {
Document doc = Jsoup
.connect(INITIAL_URL)
.userAgent(USER_AGENT)
.get();
Elements scripts = doc.getElementsByTag("script");
Elements css = doc.getElementsByTag("link");
for(Element s : scripts) {
String url = s.absUrl("src");
if(!url.isEmpty()) {
System.out.println(url);
Document docScript = Jsoup
.connect(url)
.userAgent(USER_AGENT)
.ignoreContentType(true)
.get();
System.out.println(docScript);
System.out.println("--------------------------------------------");
}
}
for(Element c : css) {
String url = c.absUrl("href");
String rel = c.attr("rel") == null ? "" : c.attr("rel");
if(!url.isEmpty() && rel.equals("stylesheet")) {
System.out.println(url);
Document docScript = Jsoup
.connect(url)
.userAgent(USER_AGENT)
.ignoreContentType(true)
.get();
System.out.println(docScript);
System.out.println("--------------------------------------------");
}
}
}
}

I have similar problem...
Using this code we can get images,.css,.js. However some html contents are still missing.
For instance when we save a web page via chrome,there are 2 options.
Complete html
html only
Out of .css,.js,.php..."Complete html" consists of more elements than "only html". The requirement is to download the html as complete like chrome does in the first option.

android set TextView to html file

I need to get information from this website: http://rowans.diekantankys.nl/bonnen/index.php?id=4 (It's in dutch)
From line 36 and on is a table in which you can see the debt op people on this website:
<td>Marc</td>
<td>16.75</td>
</tr> <tr>
<td>Marlieke</td>
<td>7.27</td>
</tr> <tr>
<td>Anne Ruth</td>
<td>4.70</td>
but all the functions and methods that I found that should download an HTML file from a website/web-server to a string/array somehow fail, can anyone give me a method on which I can give my full error report?
My apology's if this is considered: "Not a real question", I don't know how to formulate this
Thanks in advance

I would recommend to use (http://jsoup.org "JSoup") to download the parse the HTML from URL
You can get the HTML as document and read the text on the elements
Document doc = Jsoup.connect("http://rowans.diekantankys.nl/bonnen/index.php?id=4").get();
In you case on this website , you need to get the text in the table body i.e., tbody
String table = doc.body().getElementsByTag("tbody").text()
So, you need to download the content first in an background thread and then update the TextView on UIThread.
new AsyncTask<Void, Integer, Long>(){
#Override
protected Long doInBackground(Void... params) {
try {
final Document doc = Jsoup.connect("http://rowans.diekantankys.nl/bonnen/index.php?id=4").get();
runOnUiThread(new Runnable() {
#Override
public void run() {
String tableContent = doc.body().getElementsByTag("tbody").text();
// you can split the text and read it, as required.
textView.setText(tableContent);
}
});
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}.execute();
Hope it helps you. Let me know if any issue.

You have to download all the HTML content, put it into a string and parse manually the code. If you are interested in the values into that table, I would suggest you to search into the string that contains the downloaded HTML for a uniqui piece of string that identifies the beginning of the table (eg. search for '' tag and you put the HTML section of code of the table into a different variable. Then you proceed to parse manually that string searching for and and extracting the values with a loop. Another way would be use regular expressions but it becames more difficult if you're not faimiliar with them. It's not a relevant resource but in this android app, I did exactly what just explained https://github.com/rexromae/mytotem_android/blob/master/com.torvergata.mytotem/src/com/torvergata/mytotem/student/StudentLogin.java

Getting Text out of URL into String Android

I want to get Text out of an URL into my Android String.
Website:
<html>
<body>
Text I don't want to get.
<div id="editorText" class="answer" itemprop="text">Text I want to get</div>
</body>
Text I don't want to get.
<html>
Android:
I want that the result is like that:
String text = "Text I want to get";

use jsoup library to parse html string for more details check this link

You can try to get all the content and parse (maybe 'substring') the content to get what you want based on pattern's. In this case it's somethins like:
String urlContent = ...//getContent from URL
String beginPattern = "itemprop='text'>";
String endPattern = "</div>";
int begin = urlContent.indexOf(beginPattern)+beginPattern.length();
int end = urlContent.indexOf(endPattern);
String contentNeeded = urlContent.substring(begin, end);

You can use jsoup library as mentioned by Maulik and write this code
try {
Document doc= Jsoup.connect("url of the page").get();
} catch (IOException e) {
Log.e("MyTag", e.getMessage());
}
Elements elements = doc.getElementsByTag("body");
for(Element ele: elements){
String text = ele.ownText();
// Now here you need to add some logic
}

Parse HTML in Android

I am trying to parse HTML in android from a webpage, and since the webpage it not well formed, I get SAXException.
Is there a way to parse HTML in Android?

I just encountered this problem. I tried a few things, but settled on using JSoup. The jar is about 132k, which is a bit big, but if you download the source and take out some of the methods you will not be using, then it is not as big.
=> Good thing about it is that it will handle badly formed HTML
Here's a good example from their site.
File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
//http://jsoup.org/cookbook/input/load-document-from-url
//Document doc = Jsoup.connect("http://example.com/").get();
Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}

Have you tried using Html.fromHtml(source)?
I think that class is pretty liberal with respect to source quality (it uses TagSoup internally, which was designed with real-life, bad HTML in mind). It doesn't support all HTML tags though, but it does come with a handler you can implement to react on tags it doesn't understand.

String tmpHtml = "<html>a whole bunch of html stuff</html>";
String htmlTextStr = Html.fromHtml(tmpHtml).toString();

We all know that programming have endless possibilities.There are numbers of solutions available for a single problem so i think all of the above solutions are perfect and may be helpful for someone but for me this one save my day..
So Code goes like this
private void getWebsite() {
new Thread(new Runnable() {
#Override
public void run() {
final StringBuilder builder = new StringBuilder();
try {
Document doc = Jsoup.connect("http://www.ssaurel.com/blog").get();
String title = doc.title();
Elements links = doc.select("a[href]");
builder.append(title).append("\n");
for (Element link : links) {
builder.append("\n").append("Link : ").append(link.attr("href"))
.append("\n").append("Text : ").append(link.text());
}
} catch (IOException e) {
builder.append("Error : ").append(e.getMessage()).append("\n");
}
runOnUiThread(new Runnable() {
#Override
public void run() {
result.setText(builder.toString());
}
});
}
}).start();
}
You just have to call the above function in onCreate Method of your MainActivity
I hope this one is also helpful for you guys.
Also read the original blog at Medium

Maybe you can use WebView, but as you can see in the doc WebView doesn't support javascript and other stuff like widgets by default.
http://developer.android.com/reference/android/webkit/WebView.html
I think that you can enable javascript if you need it.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Convert html parser with multiple divs from swift to android using Jsoup - android

Related

Is there a transformItems equivalent in the Android Java Libraries for Algolia?

How to save all web page including .css .js?

android set TextView to html file

Getting Text out of URL into String Android

Parse HTML in Android

Categories

Resources