I am developing an Android App which has to extract data from a website and the extracted data will be displayed in a text view in the application
After having tried all the possible ways that i found in the googling and Stackoverflow i am still unable to process the data and now can any one share if they have done ..
Details
Website: https://www.amrita.edu/campus/bengaluru
In this website i was looking to extract the data of Latest News block and Upcoming Events
Here's the code : I have used JSOUP to extract
package out.in;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.w3c.dom.Document;
import android.app.Activity;
import android.os.Bundle;
import android.sax.Element;
import android.widget.TextView;
import android.widget.Toast;
public class HtmlExtracterActivity extends Activity {
/** Called when the activity is first created. */
// url
static final String URL = "https://www.amrita.edu/campus/bengaluru";
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
try {
((TextView)findViewById(R.id.tv)).setText(getdata());
}
catch (Exception ex) {
((TextView)findViewById(R.id.tv)).setText("Error");
}
}
protected String getdata() throws Exception {
String result = "";
// get html document structure
Document document = (Document) Jsoup.connect(URL).get();
// selector query
*********Need help
// check results
*********Need help
return result;
}
}
I have given the Internet Permission in the Manifest file
and
Xml file is as follows
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:orientation="vertical"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
>
<TextView android:text=" "
android:id="#+id/tv" android:layout_width="wrap_content"
android:layout_height="wrap_content"></TextView>
</LinearLayout>
I would sincrely Appreciate the needed Help in advance
You've not mentioned the exact problem you are facing. Did you try to see what is being returned at this:
Document document = (Document) Jsoup.connect(URL).get();
I am assuming that this might be because of missing User-Agent in the above mentioned code. Please try this and let us know if you still face the error:
Response response= Jsoup.connect(location)
.ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.referrer("http://www.google.com")
.timeout(12000)
.followRedirects(true)
.execute();
Document doc = response.parse(); User Agent
Use the latest User agent. Here's the complete list
http://www.useragentstring.com/pages/Firefox/.
Timeout
Also don't forget to add timout, since sometimes it takes more than
normal timeout to download the page.
Referer
Set the referer as google.
Follow redirects
follow redirects to get to the page.
execute() instead of get()
Use execute() to get the Response object. Which can help you to check
for content type and status codes incase of error.
Source: https://stackoverflow.com/a/20284953/1262177
Related
I'm trying to parse an Html page and i need to get the full div:
void printing() async {
http.Response response = await http.get('https://stackoverflow.com/');// example
Document document = parser.parse(response.body);
var elent = document.getElementById('content') ;
print(elent);
}
the result is:
I/flutter ( 2336): <html div>
how can i print all html elements inside div ?
thank you.
The https://pub.dartlang.org/packages/html package allows you to query elements similar to what you can do in the browser (for example querySelectorAll()).
import 'package:html/parser.dart' show parse;
import 'package:html/dom.dart';
main() {
var document = parse(
'<body>Hello world! <a href="www.html5rocks.com">HTML5 rocks!');
print(document.outerHtml);
}
We are writing at the same time me and Gunter ^^
As Gunter pointed out you can use the Dart package html.
https://github.com/dart-lang/html
https://pub.dartlang.org/packages/html#-installing-tab-
In your pubspec.yaml you sould import it:
html: ^0.13.3+3
Imports should look like that if you have errors in duplication of Text in dom.dart and widgets.dart.
import 'package:html/parser.dart' show parse;
import 'package:html/dom.dart' hide Text;
and then you can givin it a try like this:
void _printing() async {
http.Response response =
await http.get('https://stackoverflow.com/'); // example
Document document = parse(response.body);
var element = document.getElementById('content');
debugPrint(element.querySelectorAll('div').toString());
}
with querySelectorAll you get all selectors of the page:
And then you can loop through all of them:
element.querySelectorAll('div').forEach((value) {
debugPrint(value.outerHtml);
});
Is there any lightweight library for Android that acts like JAXB on the desktop?
Give an XML schema, create code to parse, validate, manipulate and then again write it.
The files already exist and since it's a finance application nothing that isn't modified must be touched. Including whitespace, ordering and character encoding.
(I'm doing this for years with JAXB and it works fine but I can't port that code to Android due to the lack of JAXB and it's footprint.)
You can try Simple Framework or Castor.
The following worked for me for validation:
Create a validation utility.
Get both the xml and xsd into file on the android OS and use the validation utility against it.
Use Xerces-For-Android to do the validation.
Android does support some packages which we can use, I created my xml validation utility based on: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
My initial sandbox testing was pretty smooth with java, then I tried to port it over to Dalvik and found that my code did not work. Some things just aren't supported the same with Dalvik, so I made some modifications.
I found a reference to xerces for android, so I modified my sandbox test of (the following doesn't work with android, the example after this does):
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.w3c.dom.Document;
/**
* A Utility to help with xml communication validation.
*/
public class XmlUtil {
/**
* Validation method.
* Base code/example from: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// parse an XML document into a DOM tree
DocumentBuilder parser = null;
Document document;
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
// validate the DOM tree
parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = parser.parse(new File(xmlFilePath));
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
validator.validate(new DOMSource(document));
} catch (Exception e) {
// Catches: SAXException, ParserConfigurationException, and IOException.
return false;
}
return true;
}
}
The above code had to be modified some to work with xerces for android (http://gc.codehum.com/p/xerces-for-android/). You need SVN to get the project, the following are some crib notes:
download xerces-for-android
download silk svn (for windows users) from http://www.sliksvn.com/en/download
install silk svn (I did complete install)
Once the install is complete, you should have svn in your system path.
Test by typing "svn" from the command line.
I went to my desktop then downloaded the xerces project by:
svn checkout http://xerces-for-android.googlecode.com/svn/trunk/ xerces-for-android-read-only
You should then have a new folder on your desktop called xerces-for-android-read-only
With the above jar (Eventually I'll make it into a jar, just copied it directly into my source for quick testing. If you wish to do the same, you can making the jar quickly with Ant (http://ant.apache.org/manual/using.html)), I was able to get the following to work for my xml validation:
import java.io.File;
import java.io.IOException;
import mf.javax.xml.transform.Source;
import mf.javax.xml.transform.stream.StreamSource;
import mf.javax.xml.validation.Schema;
import mf.javax.xml.validation.SchemaFactory;
import mf.javax.xml.validation.Validator;
import mf.org.apache.xerces.jaxp.validation.XMLSchemaFactory;
import org.xml.sax.SAXException;
/**
* A Utility to help with xml communication validation.
*/public class XmlUtil {
/**
* Validation method.
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse or exception/error during parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
SchemaFactory factory = new XMLSchemaFactory();
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Source xmlSource = new StreamSource(new File(xmlFilePath));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlSource);
} catch (SAXException e) {
return false;
} catch (IOException e) {
return false;
} catch (Exception e) {
// Catches everything beyond: SAXException, and IOException.
e.printStackTrace();
return false;
} catch (Error e) {
// Needed this for debugging when I was having issues with my 1st set of code.
e.printStackTrace();
return false;
}
return true;
}
}
Some Side Notes:
For creating the files, I made a simple file utility to write string to files:
public static void createFileFromString(String fileText, String fileName) {
try {
File file = new File(fileName);
BufferedWriter output = new BufferedWriter(new FileWriter(file));
output.write(fileText);
output.close();
} catch ( IOException e ) {
e.printStackTrace();
}
}
I also needed to write to an area that I had access to, so I made use of:
String path = this.getActivity().getPackageManager().getPackageInfo(getPackageName(), 0).applicationInfo.dataDir;
A little hackish, it works. I'm sure there is a more succinct way of doing this, however I figured I'd share my success, as there weren't any good examples that I found.
I have signed XML document (by pure Java with RSA and X509 tags) on the web and I have implemented XML pull parser - before I parse some information from my XML document to specific URL, I need to verify this document if it is the right one. Do you know how to check XML signature?
Thanks
edit:
my XML file looks like follows:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<application id="1">
<appversion>1.0</appversion>
<obligatory>yes</obligatory>
<update>http://www.xyz.....</update>
<check>http://www.xyz.....</check>
<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
<ds:SignedInfo>
<ds:CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments" />
<ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1" />
<ds:Reference URI="#1">
<ds:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256" />
<ds:DigestValue>fuv...</ds:DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue>PC0+4uO4...</ds:SignatureValue>
<ds:KeyInfo>
<ds:KeyValue>
<ds:RSAKeyValue>
<ds:Modulus>gEs/Fn2Gd5evwhlUgoS3...</ds:Modulus>
<ds:Exponent>AQAB</ds:Exponent>
</ds:RSAKeyValue>
</ds:KeyValue>
<ds:X509Data>
<ds:X509IssuerSerial>
<ds:X509IssuerName>CN=abc abc,OU=abcs,O=abc,L=abc,ST=abc,C=abc</ds:X509IssuerName>
<ds:X509SerialNumber>123456...</ds:X509SerialNumber>
</ds:X509IssuerSerial>
<ds:X509SubjectName>CN=abc abc,OU=abcs,O=abc,L=abc,ST=abc,C=abc</ds:X509SubjectName>
<ds:X509Certificate>MIIDhzCCAm+gAwIBAgI...</ds:X509Certificate>
</ds:X509Data>
</ds:KeyInfo>
</ds:Signature>
In J2EE Java you would use javax.xml.crypto as detailed here
http://java.sun.com/developer/technicalArticles/xml/dig_signature_api/
However these are not part of the standard Android package.
It may be a manageable amount of work to make your own package of the bits of the source you need.
http://google.com/codesearch/p?hl=en#-WpwJU0UKqQ/src/share/classes/javax/xml/crypto/dom/DOMCryptoContext.java&d=5
You can use Apache Santuario but you need to strip it down. Look at https://web.archive.org/web/20140902223147/http://www.xinotes.net/notes/note/1302/ for more details.
You can add the following stripped version of Apache Santuario with the XML security features: http://mvnrepository.com/artifact/org.apache.santuario/xmlsec/2.0.2
Then you just need to create some verifier class, for example:
import java.io.*;
import javax.xml.parsers.*;
import java.security.PublicKey;
import java.security.cert.X509Certificate;
import org.w3c.dom.*;
import org.apache.xml.security.keys.KeyInfo;
import org.apache.xml.security.signature.XMLSignature;
import org.apache.xml.security.utils.Constants;
import org.apache.xml.security.utils.XMLUtils;
enter code here
public class Whatever {
boolean verifySignature() {
boolean valid = false;
try {
// parse the XML
InputStream in = obtainInputStreamToXMLSomehow();
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setNamespaceAware(true);
Document doc = f.newDocumentBuilder().parse(in);
in.close();
// verify signature
NodeList nodes = doc.getElementsByTagNameNS(Constants.SignatureSpecNS, "Signature");
if (nodes.getLength() == 0) {
throw new Exception("Signature NOT found!");
}
Element sigElement = (Element) nodes.item(0);
XMLSignature signature = new XMLSignature(sigElement, "");
KeyInfo ki = signature.getKeyInfo();
if (ki == null) {
throw new Exception("Did not find KeyInfo");
}
X509Certificate cert = signature.getKeyInfo().getX509Certificate();
if (cert == null) {
PublicKey pk = signature.getKeyInfo().getPublicKey();
if (pk == null) {
throw new Exception("Did not find Certificate or Public Key");
}
valid = signature.checkSignatureValue(pk);
}
else {
valid = signature.checkSignatureValue(cert);
}
}
catch (Exception e) {
e.printStackTrace();
}
return valid;
}
// This is important!
static {
org.apache.xml.security.Init.init();
}
}
Edit
Seems like including XML parsing libraries in Android is trickier than expected. First you need to generate the jar file and then modify some of the library namespaces (using JarJar tool). After that, add it to the project's library directory (/lib or /libs). Source
I switched from signed XML to Signed JSON (RFC 7515)
Maybe you can use Apache Santuario. http://santuario.apache.org/
I want to make a dynamic webpage that can display my sqlite database's element in tabular format. How can I create a dynamic webpage in android?
I make a static HTML page and put in assest folder. It works but now I want a dynamic webpage. Please help:
package com.Htmlview;
import java.io.IOException;
import java.io.InputStream;
import android.app.Activity;
import android.os.Bundle;
import android.webkit.WebView;
public class Htmlview extends Activity {
/** Called when the activity is first created. */
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
//setContentView(R.layout.main);
WebView webview = new WebView(this);
setContentView(webview);
try {
InputStream fin = getAssets().open("index3.html");
byte[] buffer = new byte[fin.available()];
fin.read(buffer);
fin.close();
webview.loadData(new String(buffer), "text/html", "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
}
}
This page works ...
Help to make it dynamic through code.
gaurav gupta
Use java.text.MessageFormat:
Put "{0}" markers in your html file.
Read your html file into a string
Create an arguments array from the database record
Call MessageFormat.format(htmlString, dbArgs)
Load the resulting string into the webview.
You can't modify the asset folder in runtime as it compiled at build time. So, you have 2 variants:
You content from database as it is. Just read the bytes and pass them to the webview without storing to anywhere.
Store the content to the file in internal memory and do like you did it with assets. But there is no sence as you already have your data in db.
With the Android SDK, the following code in a plain empty Activity fails:
#Override
protected void onStart() {
super.onStart();
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
}
The 2.2 emulator logcat shows this exception:
06-28 05:38:06.107: WARN/dalvikvm(495): threadid=1: thread exiting with uncaught exception (group=0x4001d800)
06-28 05:38:06.128: ERROR/AndroidRuntime(495): FATAL EXCEPTION: main
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example/com.example.HelloWorldActivity}: java.lang.IllegalArgumentException: http://www.w3.org/2001/XMLSchema
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2663)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2679)
at android.app.ActivityThread.access$2300(ActivityThread.java:125)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2033)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:123)
at android.app.ActivityThread.main(ActivityThread.java:4627)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:521)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.lang.IllegalArgumentException: http://www.w3.org/2001/XMLSchema
at javax.xml.validation.SchemaFactory.newInstance(SchemaFactory.java:194)
at com.example.HelloWorldActivity.onStart(HelloWorldActivity.java:26)
at android.app.Instrumentation.callActivityOnStart(Instrumentation.java:1129)
at android.app.Activity.performStart(Activity.java:3781)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2636)
... 11 more
The Javadoc of SchemaFactory mentions "Platform default SchemaFactory is located in a implementation specific way. There must be a platform default SchemaFactory for W3C XML Schema."
I had the same problem and read many posts before I got an answer that worked for me. The reference to the constant won't work on Dalvik. I found I had to modify my code to work with the Xerces-for-Android project then I was able to get the xml validation I was after. Which is most likely what you are doing by the variable reference. The following are details of the setup and some example code showing how to get the validation to work on android.
The following worked for me:
Create a validation utility.
Get both the xml and xsd into file on the android OS and use the validation utility against it.
Use Xerces-For-Android to do the validation.
Android does support some packages which we can use, I created my xml validation utility based on: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
My initial sandbox testing was pretty smooth with java, then I tried to port it over to Dalvik and found that my code did not work. Some things just aren't supported the same with Dalvik, so I made some modifications.
I found a reference to xerces for android, so I modified my sandbox test of (the following doesn't work with android, the example after this does):
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.w3c.dom.Document;
/**
* A Utility to help with xml communication validation.
*/
public class XmlUtil {
/**
* Validation method.
* Base code/example from: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// parse an XML document into a DOM tree
DocumentBuilder parser = null;
Document document;
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
// validate the DOM tree
parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = parser.parse(new File(xmlFilePath));
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
validator.validate(new DOMSource(document));
} catch (Exception e) {
// Catches: SAXException, ParserConfigurationException, and IOException.
return false;
}
return true;
}
}
The above code had to be modified some to work with xerces for android (http://gc.codehum.com/p/xerces-for-android/). You need SVN to get the project, the following are some crib notes:
download xerces-for-android
download silk svn (for windows users) from http://www.sliksvn.com/en/download
install silk svn (I did complete install)
Once the install is complete, you should have svn in your system path.
Test by typing "svn" from the command line.
I went to my desktop then downloaded the xerces project by:
svn checkout http://xerces-for-android.googlecode.com/svn/trunk/ xerces-for-android-read-only
You should then have a new folder on your desktop called xerces-for-android-read-only
With the above jar (Eventually I'll make it into a jar, just copied it directly into my source for quick testing. If you wish to do the same, you can making the jar quickly with Ant (http://ant.apache.org/manual/using.html)), I was able to get the following to work for my xml validation:
import java.io.File;
import java.io.IOException;
import mf.javax.xml.transform.Source;
import mf.javax.xml.transform.stream.StreamSource;
import mf.javax.xml.validation.Schema;
import mf.javax.xml.validation.SchemaFactory;
import mf.javax.xml.validation.Validator;
import mf.org.apache.xerces.jaxp.validation.XMLSchemaFactory;
import org.xml.sax.SAXException;
/**
* A Utility to help with xml communication validation.
*/public class XmlUtil {
/**
* Validation method.
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse or exception/error during parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
SchemaFactory factory = new XMLSchemaFactory();
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Source xmlSource = new StreamSource(new File(xmlFilePath));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlSource);
} catch (SAXException e) {
return false;
} catch (IOException e) {
return false;
} catch (Exception e) {
// Catches everything beyond: SAXException, and IOException.
e.printStackTrace();
return false;
} catch (Error e) {
// Needed this for debugging when I was having issues with my 1st set of code.
e.printStackTrace();
return false;
}
return true;
}
}
Some Side Notes:
For creating the files, I made a simple file utility to write string to files:
public static void createFileFromString(String fileText, String fileName) {
try {
File file = new File(fileName);
BufferedWriter output = new BufferedWriter(new FileWriter(file));
output.write(fileText);
output.close();
} catch ( IOException e ) {
e.printStackTrace();
}
}
I also needed to write to an area that I had access to, so I made use of:
String path = this.getActivity().getPackageManager().getPackageInfo(getPackageName(), 0).applicationInfo.dataDir;
A little hackish, it works. I'm sure there is a more succinct way of doing this, however I figured I'd share my success, as there weren't any good examples that I found.
You might have some luck re-packaging xerces with jarjar and then passing
"org.apache.xerces.jaxp.validation.XMLSchemaFactory"
to
SchemaFactory.newInstance(String schemaLanguage, String factoryClassName, ClassLoader classLoader)
if you're using API >=9 or directly instantiating
org.apache.xerces.jaxp.validation.XMLSchemaFactory
if you're using API 8. It might not work at all using an older API than that.