Jsoup parsing with Android - android

I'm trying to parse html with Jsoup lib. However, I did not get what I want.
I want to bring to the screen of a mobile device the entire text of the tag <pre>
Please tell me, how do I get the text from web? How do I need to fix?
Web site: http://devanswers.ru/text.php
package com.example.devanswers;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import android.app.Activity;
import android.os.AsyncTask;
import android.os.Bundle;
import android.view.Menu;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.ImageView;
import android.widget.TextView;
public class MainActivity extends Activity {
TextView DevMainText;
ImageView DevMainImage;
MyTask DevMain;
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
DevMainText = (TextView) findViewById(R.id.DevMainText);
DevMainImage = (ImageView) findViewById(R.id.DevMainImage);
OnClickListener onClick = new OnClickListener() {
public void onClick(View v) {
DevMain = new MyTask();
DevMain.execute();
}
};
DevMainImage.setOnClickListener(onClick);
}
#Override
public boolean onCreateOptionsMenu(Menu menu) {
// Inflate the menu; this adds items to the action bar if it is present.
getMenuInflater().inflate(R.menu.main, menu);
return true;
}
class MyTask extends AsyncTask<Void, Void, Void> {
#Override
protected void onPreExecute() {
super.onPreExecute();
DevMainImage.setEnabled(false);
}
#Override
protected Void doInBackground(Void... params) {
Document doc;
try {
doc = Jsoup.connect("http://devanswers.ru/text.php").get();
Elements links = doc.getElementsByTag("pre");
for (Element link : links) {
DevMainText.setText((link.text()));
}
} catch (IOException e) {
// TODO Auto-generated catch block
DevMainText.setText("Error");
}
return null;
}
#Override
protected void onPostExecute(Void result) {
super.onPostExecute(result);
DevMainImage.setEnabled(true);
}
}
}

I have never use the Jsoup before but what i can see from your code that you over write what in the DevMainText each time you get data from the page
so you should setText in your textview like this:
String maintext = "";
for (Element link : links) {
maintext += link.text() +"\n";
}
DevMainText.setText(maintext);

Infact the response is not wrapped in the <pre> tag. Its the browser who wraps the raw response in the <pre> tag when you view the source.
Instead of doc.getElementsByTag("pre") try doc.getElementsByTag("body")

You could also try using the Android WebView component, if you want to display the entire page.
http://developer.android.com/reference/android/webkit/WebView.html
You can easily parse the page using the URI class Uri uri = Uri.parse("http://devanswers.ru/text.php");and then display this in a WebView.

Related

Scraping website bullet points

Extremely new to coding(yesterday)
I have been given the task of creating a Android app for a friend to pull the data from his website and insert it into a quick app rather than him and his colleagues having to open a browser every time they would like to view it.
For reasons, I cannot share the website address.
The data on the website is in a bullet point list.
I have been reading some tutorials and watching some videos, I can pull the website up using WebView and I have now started reading up on JSoup. I have copied a few lines of code from some tutorials and I can now pull the website up in raw/html/just plain text but I could like to remove the unnecessary data and just have the text next to the bullet points.
Here is my code from MainActivity.java
As previously mentioned, I am extremely new to this with no background experience.
What I would like to know is if I can scrape the text between two tags on the page source for example between "div class="scheduler" SCRAPE THE TEXT HERE "ul id="myUL"
That would have all of the information I need in the above tags.
Thank you.
package spcentral.jsouptut;
import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.widget.Button;
import android.widget.TextView;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class MainActivity extends AppCompatActivity {
private Button getBtn;
private TextView result;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
result = (TextView) findViewById(R.id.result);
getBtn = (Button) findViewById(R.id.getBtn);
getBtn.setOnClickListener(new View.OnClickListener() {
#Override
public void onClick(View view) {
getWebsite();
}
});
}
private void getWebsite() {
new Thread(new Runnable() {
#Override
public void run() {
final StringBuilder builder = new StringBuilder();
try {
Document doc = Jsoup.connect("linkremoved").get();
String text = doc.text();
Elements links = doc.select("a[href]");
builder.append(text).append("n");
for (Element link : links) {
builder.append("n").append("Link : ").append(link.attr("href"))
.append("n").append("Text : ").append(link.text());
}
} catch (IOException e) {
builder.append("Error : ").append(e.getMessage()).append("n");
}
runOnUiThread(new Runnable() {
#Override
public void run() {
result.setText(builder.toString());
}
});
}
}).start();
}

How to parse data from a website using Jsoup in android

I am developing an application in which I want some data from a website to display in activity. For this I'm using Jsoup to parse data. But I am getting error at:
org.jsoup.nodes.Document document = Jsoup.connect(url).get();
Here it is my complete code below, I am not getting any idea about what is wrong I'm doing...
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import android.os.AsyncTask;
import android.os.Bundle;
import android.provider.DocumentsContract.Document;
import android.app.Activity;
import android.view.Menu;
import android.webkit.WebSettings;
import android.webkit.WebView;
import android.webkit.WebViewClient;
import android.widget.TextView;
public class Events extends Activity
{
//WebView web1;
TextView t1;
String url="https://sites.google.com/site/holyfamilychurchpestomsagar/notices-for-the-week";
#Override
protected void onCreate(Bundle savedInstanceState)
{
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_events);
t1=(TextView)findViewById(R.id.textView1);
Title t2=new Title();
t2.execute();
}
#Override
public boolean onCreateOptionsMenu(Menu menu) {
// Inflate the menu; this adds items to the action bar if it is present.
getMenuInflater().inflate(R.menu.events, menu);
return true;
}
private class Title extends AsyncTask<Void, Void, Void> {
Elements title;
String desc;
#Override
protected void onPreExecute()
{
super.onPreExecute();
}
#Override
protected Void doInBackground(Void... params) {
try {
// Connect to the web site
org.jsoup.nodes.Document document = Jsoup.connect(url).get();
// Get the html document title
title = document.select("meta[name=title]");
desc = title.attr("content");
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(Void result)
{
// Set title into TextView
t1.setText(desc);
}
}
}
Is the device online? Ok, im sure you thought of that ;-) Are you
sure its exactly this line? Are you running in your catch clause, or
does the app crash?
You should always add a null pointer check when working with jsoup,
because when the selecting process is unsuccessful, the method will
return null!
Can you post your stacktrace?
Ok when the app crashes then Im pretty shure its not the connection itself but one of the lines below. Add a check for null pointers, or add a catch clause.

Android: Jsoup: GET put all items on ListView

If doc isn't null i want to put all data from doc to ListView, how do it?
If to write Element = doc.select("someSelector"); then i can't to put it in ListView;
Sorry for my english(i'am Russian)
Code:
package com.example.phpfunctions;
import java.io.IOException;
import java.util.Locale;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import android.app.Activity;
import android.os.AsyncTask;
import android.os.Bundle;
import android.view.Menu;
import android.widget.AutoCompleteTextView;
import android.widget.ListView;
import android.widget.Toast;
public class MainActivity extends Activity {
private final String lang = Locale.getDefault().getLanguage();
private final String functions_list = "someURL";
private final ListView lv = (ListView) findViewById(R.id.listView1);
Document doc = null;
AutoCompleteTextView input;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
new getData().execute(functions_list);
if(doc != null)
{
//--Write code here--//
}
else
Toast.makeText(this, "error", Toast.LENGTH_LONG).show();
}
#Override
public boolean onCreateOptionsMenu(Menu menu) {
// Inflate the menu; this adds items to the action bar if it is present.
getMenuInflater().inflate(R.menu.main, menu);
return true;
}
class getData extends AsyncTask<String, Void, Document> {
protected Document doInBackground(String... urls) {
try {
Document data = Jsoup.connect(urls[0]).get();
return data;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return null;
}
}
protected void onPreExecute() {
}
protected void onPostExecute(Document result) {
doc = result;
}
}
}
You can select all of the Elements from a page if you use the * as a wildcard character when calling doc.select(). To add all of the elements to a ListView you need to save each element into some type of array, e.g. an ArrayList and also use an ArrayAdapter.
For example:
ArrayList<String> htmlElements = new ArrayList<String>();
if(doc != null)
{
//--Write code here--//
Elements elements = doc.select("*"); // select all elements from that page
for (Element e : elements) {
htmlElements.add(e.html()); // or e.text(), depends on what you require
}
ArrayAdapter<String> adapter = new ArrayAdapter<String>(MainActivity.this, android.R.layout.simple_list_item_1, htmlElements);
lv.setAdapter(adapter);
}
If you only want to list the elements from the body of the document, call doc.body().select("*") instead. The documentation is worth a read for some other tricks.

Connecting to the network

I'm starting with android, my question is regarding to this official tutorial:
http://developer.android.com/training/basics/network-ops/connecting.html
In the "Perform network operations on a Separate Thread", I have the exact same code in eclipse and I get the following error in eclipse:
The type MainActivity.DownloadWebpageText must implement the inherited abstract method AsyncTask.doInBackground(Object...)
I understand that to override doInBackground() it must get an object as parameter and I expecting and String...
How do i solve that?
I'm pretty confused, because this code is in the main android training section.
Thank you very much and merry christmas!
EDIT: Here's my code. Same code that the guide i linked:
package com.example.com.example.networkoperations;
import java.io.IOException;
import android.net.ConnectivityManager;
import android.net.NetworkInfo;
import android.os.AsyncTask;
import android.os.Bundle;
import android.app.Activity;
import android.content.Context;
import android.util.Log;
import android.view.Menu;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.Button;
import android.widget.TextView;
public class MainActivity extends Activity implements OnClickListener {
final String LOG_TAG = "Connectivity tests (chux)";
Button btn;
TextView tv;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
init();
}
private void init() {
btn = (Button) findViewById(R.id.button);
btn.setOnClickListener(this);
tv = (TextView) findViewById(R.id.textView1);
}
#Override
public boolean onCreateOptionsMenu(Menu menu) {
// Inflate the menu; this adds items to the action bar if it is present.
getMenuInflater().inflate(R.menu.activity_main, menu);
return true;
}
#Override
public void onClick(View arg0) {
tvText("Clicado!");
ConnectivityManager connMgr = (ConnectivityManager)getSystemService(Context.CONNECTIVITY_SERVICE);
NetworkInfo networkInfo = connMgr.getActiveNetworkInfo();
if (networkInfo != null && networkInfo.isConnected()){
new DownloadWebpageText().execute("http://mydomain.com");
}
else
tvText("No hay conexión a internet");
}
private void tvText(String text){
String oldText = tv.getText().toString() + "\n";
tv.setText(oldText + text);
}
private class DownloadWebpageText extends AsyncTask{
#Override
protected String doInBackground(String... urls) {
// params comes from the execute() call: params[0] is the url.
try {
return downloadUrl(urls[0]);
} catch (IOException e) {
return "Unable to retrieve web page. URL may be invalid.";
}
}
// onPostExecute displays the results of the AsyncTask.
#Override
protected void onPostExecute(String result) {
tv.setText(result);
}
}
}
Change the you class deceleration of downloading from
private class DownloadWebpageText extends AsyncTask{
}
to be like
private class DownloadWebpageText extends AsyncTask<String,Void,String>{
}

HTMLcleaner in AsyncTask

I am trying to get HTML cleaner to parse the info from a website and then use Xpath to find the data I'm looking for. I have the htmlcleaner stuff in a separate AsyncTask class and the app seems to work on my phone. However, when I push the button nothing happens. Here is my main activity class and my AsyncTask Class.
package ru.habrahabr.stackparser;
import java.net.URL;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.htmlcleaner.TagNode;
import android.app.Activity;
import android.app.ProgressDialog;
import android.os.AsyncTask;
import android.os.Bundle;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.*;
public class stackParser extends Activity {
/** Called when the activity is first created. */
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
Button button = (Button) findViewById(R.id.parse);
button.setOnClickListener(myListener);
}
private OnClickListener myListener = new OnClickListener() {
public void onClick(View v) {
new parseSite().execute("http://xjaphx.wordpress.com/");
}
};
private class parseSite extends AsyncTask<String, Void, String> {
protected String doInBackground(String... arg) {
String output = new String();
try {
htmlHelper hh = new htmlHelper();
} finally {
}
return output;
}
protected void onPostExecute(String output) {
TextView view = (TextView) findViewById(R.id.tv1);
view.setText((CharSequence) output);
}
}
}
And here's my referenced class. I'd really appreciate it if someone could look at this and tell me what's up. I tried to follow a working example and put my own Url and Xpath in but it's not working.
package ru.habrahabr.stackparser;
import java.io.IOException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.XPatherException;
public class htmlHelper {
TagNode rootNode;
String stats;
static final String XPATH_STATS = "//div[#id='blog-stats']/ul/li";
public String htmlHelper(URL htmlPage) throws IOException, XPatherException {
HtmlCleaner cleaner = new HtmlCleaner();
rootNode = cleaner.clean(htmlPage);
// query XPath
Object[] statsNode = rootNode.evaluateXPath(XPATH_STATS);
// process data if found any node
if (statsNode.length > 0) {
TagNode resultNode = (TagNode) statsNode[0];
stats = resultNode.getText().toString();
}
return stats;
}
}
Change AsyncTask doInBackground method as :
#Override
protected String doInBackground(String... arg) {
String output "";
try {
htmlHelper hh = new htmlHelper();
output=hh.htmlHelper(arg[0]); //<< call htmlHelper method here
} finally {
}
return output;
}
because you are currently only creating an instance of htmlHelper class not calling htmlHelper method from htmlHelper class to get data from web url
Look , I will not care of the data is downloaded or not , but I will assume that the data is already downloaded and besides in the variable output , you just need to place this code inside onPostExecute
runOnUiThread(new Runnable() {
public void run() {
// update data into TextView.
TextView view = (TextView) findViewById(R.id.tv1);
view.setText((CharSequence) output);
}
});
because , sometimes the data is already downloaded but needed to be assigned into Textview or whatever view by using UIthread. Hope this works for you

Categories

Resources