XMLParser encoding problems - android

public XMLParser(InputStream is) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
node = doc.getDocumentElement();
} catch (Exception e) {
DebugLog.log(e);
}
}
The inputStream contains content like: "Hey there this is a ü character."
The character 'ü' is a 'ü';
When reading the node's content System.out.println(node.getTextContent()) I receive "hey there this is a character." ü is cut of.

Well, is this a valid document? Does it have encoding specified?-> http://www.w3schools.com/XML/xml_encoding.asp
Those might help:
Howto let the SAX parser determine the encoding from the xml declaration?
http://www.coderanch.com/t/127052/XML/XML-parsers-encoding-byte-order

The Problem was the XML Entities and HTML Entities.
I request a webpage which returns data with HTML Entities.
I had to convert the HTML Entities to XML Entities and it worked!
Check this answer for some code

Related

how to parse the string of xml data in android

I am new to android programming.My requirement is to invoke the web services.I successfully got the response from web services.how to parse the response in android.Give me solution.
This is the code for getting response:
HttpResponse response = httpclient.execute(httppost);
String str=response.getStatusLine().toString();
System.out.println("========URL STATUS========"+str);
HttpEntity r_entity = response.getEntity();
if( r_entity != null ) {
result = new byte[(int) r_entity.getContentLength()];
if(r_entity.isStreaming()) {
is = new DataInputStream(r_entity.getContent());
is.readFully(result);
}
}
httpclient.getConnectionManager().shutdown();
String responsedata= (new String(result).toString());
The below sample is for dom parser.
DocumentBuilderFactory dbf =DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
StringReader sr=new StringReader(result.toString());
is.setCharacterStream(sr);
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("your root tag");
//get your other tag elements.
http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/. Example of dom parser.
http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/. Example of sax parser.
Dom is w3c based parser. Dom is slower than sax cause it uses tree node and has to be in mmeory. So parsing large data using DOM parser is not recommended.
SAX on the other hand is faster than dom. Recommended for large xml data.
The above links gives you examples of both. Use any of the above parser to parse and get values from the xml tags.
You can try to use SimpleXmlParser. It's a native android class. It's simpler than DOM xml parser.
what you need is simply a android xml parse library. There are plenty of xml parse for android.
official tutorial
there is also a article "comparing methods for xml parsing in android"

Android dom parser - illegal characters exception

I need to parse a xml document in my Android application and I'm using Dom parser. Encoding in my xml file is set to UTF-8. The code I'm using for parsing is as follows:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream inStream = getAssets().open("words.xml");
InputSource inSource = new InputSource(inStream);
inSource.setEncoding("UTF-8");
Document doc = db.parse(inSource);
But the problem is that I get an illegal character exception. The node which is problematic has the following structure:
<obriši>
<item>obriši</item>
<item>ukloni</item>
</obriši>
What could be the problem?
Try with
inSource.setEncoding("windows-1251");

Dom xml Parsing in android

i am new in android development, i don't know how to parse data from xml, so please help.
this is my Xml which i have to parse.
<MediaFeedRoot>
<MediaTitle>hiiii</MediaTitle>
<MediaDescription>hellooooo.</MediaDescription>
<FeedPath>how r u</FeedPath>
</MediaFeedRoot>
Thanx in advance.
I dont understand that why people ask the question here without searching properly on net.Please do remember that search on net before asking anything here....
Below is the link where you can find a very good tutorial about xml parsing...
http://www.androidpeople.com/android-xml-parsing-tutorial-%E2%80%93-using-domparser
My suggestion is starting from the basic step:
think about your xml file connection: url? local?
instance a DocumentBuilderFactory and a builder
DocumentBuilder dBuilder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
OR
URLConnection conn = new URL(url).openConnection();
InputStream
inputXml = conn.getInputStream();
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document xmlDoc = docBuilder.parse(inputXml);
Parsing XML file:
Document xmlDom = dBuilder.parse(xmlFile);
After that, it turns a xml file into DOM or Tree structure, and you have to travel a node by node.
In your case, you need to get content. Here is an example:
String getContent(Document doc, String tagName){
String result = "";
NodeList nList = doc.getElementsByTagName(tagName);
if(nList.getLength()>0){
Element eElement = (Element)nList.item(0);
String ranking = eElement.getTextContent();
if(!"".equals(ranking)){
result = String.valueOf(ranking);
}
}
return result;
}
return of the getContent(xmlDom,"MediaTitle") is "hiiii".
Good luck!

Problem with parsing of UTF-8 encoded xml files in Android 3.1 sdk

Xml parsing api is throwing sax parse exception, If i try to parse a xml file which has attributes at root node.
One thing i have noticed is that, this happens if there is a UTF-8 BOM character at the start of the string, if i remove the BOM character things work fine. This code is working fine on 3.0 sdk and below, i saw this problem only in 3.1
am using following parser:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = docFactory.newDocumentBuilder();
Document doc = null;
StringReader sr = new StringReader(xmlString);
InputSource is = new InputSource(sr);
doc = builder.parse(is);
Try this:
public Document parse(String xml) throws ParsingFailedException {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
//encode the xml to UTF -8
ByteArrayInputStream encXML = new ByteArrayInputStream(xml.getBytes("UTF8"));
Document doc = builder.parse(encXML);
log.error("XML parsing OK");
return doc;
} catch (Exception e) {
log.error("Parser Error:" + e.getMessage());
throw new ParsingFailedException("Failed to parse XML : Document not well formed", e);
}
}
Thanks evilone,
I have opened a issue with google, and they will be fixing this in their branch.
http://code.google.com/p/android/issues/detail?id=16892
Comments from google developer:
"I've prepared a fix for the root problem in our internal Honeycomb tree. But you don't need the fix for your code. Your parseXml method should just take an InputStream rather than a String. You can pass that directly to the InputSource constructor."

android DOM parsing with entities in tags

I might like to parse the following XML that contains entrities.
<node>
<text><title>foo fo <BR>bar bar </title></text>
</node>
The parsing works. But after the entrities I do not receive any output. Using CDATA is not possible at the position.
I'm using the following code:
urlConnection.getInputStream());
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();
doc = builder.parse(in);
Does anyone got an idea?
Thx in advance!
CDATA usage is never necessary; so that's neither problem or a solution. But how do you actually tell there is no text? It is quite possible that you just have multiple adjacent text nodes -- underlying parsers often return multiple text segements (and esp. when there are entities). You can use a DOM method to "normalize" text content that an Element contains, to replace adjacent Text nodes with just a single one. But without this you should never assume all text is within the first (and only) Text node.
If there are no nodes, it is possible that the parser Android bundles is buggy. I think they include an old version of xpp or something, and it might have issues (compared to more polished parsers like Xerces or Woodstox). But I would first make sure it's not just case of "hidden" nodes.
http://code.google.com/p/android/issues/detail?id=2607
I found out, other have a similar problem. There Bug in Android 2.0.1 and 2.1. I solved this Problem by using the sax parser.
My name is Divy Dhiman i am a senior android developer
I had done the XML parsing by the same thing you can do it by entity or by node aur by fetching literal control that's depend on you.
private class MyAsyncTask extends AsyncTask
{
#Override
protected String doInBackground(String... abc) {
try {
URL url = new URL(jksbvlds);
URLConnection connection;
connection = url.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection) connection;
int responseCode = httpConnection.getResponseCode();
if(responseCode == HttpURLConnection.HTTP_OK)
{
InputStream in = httpConnection.getInputStream();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse(in);
Element docEle = dom.getDocumentElement();
NodeList nl = docEle.getElementsByTagName("quote");
if(nl != null && nl.getLength()>0)
{
for(int i = 0; i<nl.getLength();i++ )
{
StockInfo theStock = getStockInformation(docEle);
name = theStock.getName();
yearLow = theStock.getYearLow();
yearHigh = theStock.getYearHigh();
daysLow = theStock.getDaysLow();
daysHigh = theStock.getDaysHigh();
lastTradePriceonly = theStock.getLastTradePriceonly();
change = theStock.getChange();
daysRange = theStock.getDaysRange();
}
}
}
}
catch(MalformedURLException e)
{
Log.d(TAG,"MalformedURLException",e);
}
catch(IOException e)
{
Log.d(TAG,"IOException",e);
}
catch (ParserConfigurationException e) {
Log.d(TAG,"ParserConfigurationException", e);
}
catch (SAXException e) {
Log.d(TAG,"SAXException",e);
}
finally{}
return null;

Categories

Resources