I have a problem in DOM parsing Arabic letters, I got weird characters. I've tried changing to different encoding but I couldn't.
the full code is on this link: http://test11.host56.com/parser.java
public Document getDomElement(String xml) {
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
Reader reader = new InputStreamReader(new ByteArrayInputStream(
xml.getBytes("UTF-8")));
InputSource is = new InputSource(reader);
DocumentBuilder db = dbf.newDocumentBuilder();
//InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xml));
doc = db.parse(is);
return doc;
}
}
my xml file
<?xml version="1.0" encoding="UTF-8"?>
<music>
<song>
<id>1</id>
<title>اهلا وسهلا</title>
<artist>بكم</artist>
<duration>4:47</duration>
<thumb_url>http://wtever.png</thumb_url>
</song>
</music>
You already have the xml as String, so unless that string already contains the odd characters (that is, it has been read in with the wrong encoding), you can avoid encoding madness here by using a StringReader instead; e.g. instead of:
Reader reader = new InputStreamReader(new ByteArrayInputStream(
xml.getBytes("UTF-8")));
use:
Reader reader = new StringReader(xml);
Edit: now that I see more of the code, it seems the encoding issue already happend before the XML is parsed, because that part contains:
HttpResponse httpResponse = httpClient.execute(httpPost);
HttpEntity httpEntity = httpResponse.getEntity();
xml = EntityUtils.toString(httpEntity);
The javadoc for the EntityUtils.toString says:
The content is converted using the character set from the entity (if any), failing that, "ISO-8859-1" is used.
It seems the server does not send the proper encoding information with the entity, and then the HttpUtils uses a default, which is not UTF-8.
Fix: use the variant that takes an explicit default encoding:
xml = EntityUtils.toString(httpEntity, "utf-8");
Here I assume the server sends UTF-8. If the server uses a different encoding, that one should be set instead of UTF-8. (However as the XML also declares encoding="UTF-8" I thought this is the case.) If the encoding the server uses is not known, then you can only resort to wild guessing and are out of luck, sorry.
If the XML contains Unicode characters such as Arabic or Persian letters, StringReader would make an exception. In these cases, pass the InputStream straightly to the Document object.
Related
I am new to android programming.My requirement is to invoke the web services.I successfully got the response from web services.how to parse the response in android.Give me solution.
This is the code for getting response:
HttpResponse response = httpclient.execute(httppost);
String str=response.getStatusLine().toString();
System.out.println("========URL STATUS========"+str);
HttpEntity r_entity = response.getEntity();
if( r_entity != null ) {
result = new byte[(int) r_entity.getContentLength()];
if(r_entity.isStreaming()) {
is = new DataInputStream(r_entity.getContent());
is.readFully(result);
}
}
httpclient.getConnectionManager().shutdown();
String responsedata= (new String(result).toString());
The below sample is for dom parser.
DocumentBuilderFactory dbf =DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
StringReader sr=new StringReader(result.toString());
is.setCharacterStream(sr);
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("your root tag");
//get your other tag elements.
http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/. Example of dom parser.
http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/. Example of sax parser.
Dom is w3c based parser. Dom is slower than sax cause it uses tree node and has to be in mmeory. So parsing large data using DOM parser is not recommended.
SAX on the other hand is faster than dom. Recommended for large xml data.
The above links gives you examples of both. Use any of the above parser to parse and get values from the xml tags.
You can try to use SimpleXmlParser. It's a native android class. It's simpler than DOM xml parser.
what you need is simply a android xml parse library. There are plenty of xml parse for android.
official tutorial
there is also a article "comparing methods for xml parsing in android"
public XMLParser(InputStream is) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
node = doc.getDocumentElement();
} catch (Exception e) {
DebugLog.log(e);
}
}
The inputStream contains content like: "Hey there this is a ü character."
The character 'ü' is a 'ü';
When reading the node's content System.out.println(node.getTextContent()) I receive "hey there this is a character." ü is cut of.
Well, is this a valid document? Does it have encoding specified?-> http://www.w3schools.com/XML/xml_encoding.asp
Those might help:
Howto let the SAX parser determine the encoding from the xml declaration?
http://www.coderanch.com/t/127052/XML/XML-parsers-encoding-byte-order
The Problem was the XML Entities and HTML Entities.
I request a webpage which returns data with HTML Entities.
I had to convert the HTML Entities to XML Entities and it worked!
Check this answer for some code
I am sending a SOAP POST that returns some xml. I've been testing on a newer device (Galaxy Nexus with Android 4.1) and it's been working fine. However, I just tried running it on an older device (HTC Desire HD running Android 2.2), and I am getting a ParseException: At line 1, column 0: unclosed token. Here is the relevant code:
String xml = null;
Document doc = null;
String SOAPRequest = "[SOAP REQUEST HERE]";
HttpPost httppost = new HttpPost("[WEBSITE HERE]");
InputStream soapResponse = null;
try {
StringEntity postEntity = new StringEntity(SOAPRequest, HTTP.UTF_8);
postEntity.setContentType("text/xml");
httppost.setHeader("Content-Type", "application/soap+xml;charset=UTF-8");
httppost.setEntity(postEntity);
HttpClient httpclient = new DefaultHttpClient();
BasicHttpResponse httpResponse = (BasicHttpResponse) httpclient.execute(httppost);
// Convert HttpResponse to InputStream
HttpEntity responseEntity = httpResponse.getEntity();
soapResponse = responseEntity.getContent();
//// printing response here gives me ...<Result><blahblahblah><Result>...
// Get the SearchResult xml
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder db = factory.newDocumentBuilder();
doc = db.parse(soapResponse);
} catch ...
NodeList soapNodeList = doc.getElementsByTagName("Result");
xml = soapNodeList.item(0).getFirstChild().getNodeValue();
//// printing xml here gives me "<"
return xml;
Taking a look at the httpResponse, the part that I am interested in looks like this: <Result><blahblahblah></Result>.
When I try to get this xml using the NodeList, <blahblahblah≷ turns into just the character <.
Why is this a problem, and how do I fix it?
This could be relevant:
android DOM parsing with entities in tags
...which leads to this:
http://code.google.com/p/android/issues/detail?id=2607
...which seems to indicate that on earlier Android versions the DOM parser doesn't deal with entity references properly. The bug report there discusses something about how entities are treated as a separate child rather than merged into the adjacent text node(s) which sounds oddly like your situation.
If this is the problem you're having then try switching to using the SAX parser. It's (IMHO) just such an easier XML parser to deal with as well.
i am new in android development, i don't know how to parse data from xml, so please help.
this is my Xml which i have to parse.
<MediaFeedRoot>
<MediaTitle>hiiii</MediaTitle>
<MediaDescription>hellooooo.</MediaDescription>
<FeedPath>how r u</FeedPath>
</MediaFeedRoot>
Thanx in advance.
I dont understand that why people ask the question here without searching properly on net.Please do remember that search on net before asking anything here....
Below is the link where you can find a very good tutorial about xml parsing...
http://www.androidpeople.com/android-xml-parsing-tutorial-%E2%80%93-using-domparser
My suggestion is starting from the basic step:
think about your xml file connection: url? local?
instance a DocumentBuilderFactory and a builder
DocumentBuilder dBuilder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
OR
URLConnection conn = new URL(url).openConnection();
InputStream
inputXml = conn.getInputStream();
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document xmlDoc = docBuilder.parse(inputXml);
Parsing XML file:
Document xmlDom = dBuilder.parse(xmlFile);
After that, it turns a xml file into DOM or Tree structure, and you have to travel a node by node.
In your case, you need to get content. Here is an example:
String getContent(Document doc, String tagName){
String result = "";
NodeList nList = doc.getElementsByTagName(tagName);
if(nList.getLength()>0){
Element eElement = (Element)nList.item(0);
String ranking = eElement.getTextContent();
if(!"".equals(ranking)){
result = String.valueOf(ranking);
}
}
return result;
}
return of the getContent(xmlDom,"MediaTitle") is "hiiii".
Good luck!
Xml parsing api is throwing sax parse exception, If i try to parse a xml file which has attributes at root node.
One thing i have noticed is that, this happens if there is a UTF-8 BOM character at the start of the string, if i remove the BOM character things work fine. This code is working fine on 3.0 sdk and below, i saw this problem only in 3.1
am using following parser:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = docFactory.newDocumentBuilder();
Document doc = null;
StringReader sr = new StringReader(xmlString);
InputSource is = new InputSource(sr);
doc = builder.parse(is);
Try this:
public Document parse(String xml) throws ParsingFailedException {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
//encode the xml to UTF -8
ByteArrayInputStream encXML = new ByteArrayInputStream(xml.getBytes("UTF8"));
Document doc = builder.parse(encXML);
log.error("XML parsing OK");
return doc;
} catch (Exception e) {
log.error("Parser Error:" + e.getMessage());
throw new ParsingFailedException("Failed to parse XML : Document not well formed", e);
}
}
Thanks evilone,
I have opened a issue with google, and they will be fixing this in their branch.
http://code.google.com/p/android/issues/detail?id=16892
Comments from google developer:
"I've prepared a fix for the root problem in our internal Honeycomb tree. But you don't need the fix for your code. Your parseXml method should just take an InputStream rather than a String. You can pass that directly to the InputSource constructor."