I'm working on an app which is in the German language. I'm getting the data in XML form. I used SAX parser for parsing these XMLs and display the data in the TextView. Everything is working fine except the special-characters issue which I got after the parsing.
This is my XML which I got through the URL Link. This XML has utf-8 encoding. All the characters are fine in this XML file.
<?xml version="1.0" encoding="utf-8"?>
<posts>
<page id="001">
<title><![CDATA[Sie kaufen bei uns ausschließlich Holzkunst- und Volkskunst-Produkte ]]></title>
<detial><![CDATA[Durch enge Beziehungen mit unseren Lieferanten können wir attraktive rückläufig
Preise und schnelle Lieferungen gewährleisten. Caroline Féry and Laura Herbst Universität Potsdam Mein
Flugzeug hatte zwölf Stunden VERSPÄTUNG </p>]]></detial>
</page>
</posts>
I used SAX parser for parsing this XML:- (and displaying the parsed data in the TextView.)
public class GermanParseActivity extends Activity {
/** Called when the activity is first created. */
static final String URL = "http://www.xyz.com/id=1";
ItemList itemList;
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
XMLParser parser = new XMLParser();
String XML = parser.getXmlFromUrl(URL);
System.out.println("This XML is ========>"+XML);
try
{
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
/** Create handler to handle XML Tags ( extends DefaultHandler ) */
MyXMLHandler myXMLHandler = new MyXMLHandler();
xr.setContentHandler(myXMLHandler);
ByteArrayInputStream is = new ByteArrayInputStream(XML.getBytes());
xr.parse(new InputSource(is));
}
catch(Exception e)
{
}
itemList = MyXMLHandler.itemList;
ArrayList<String> listItem= itemList.getTitle();
ListView lview = (ListView) findViewById(R.id.listview1);
myAdapter adapter = new myAdapter(this, listItem);
lview.setAdapter(adapter);
}
}
but after parsing I'm getting strange characters which are not in XML file but generated after parsing the XML file.
Like these characters:
before parsing after parsing
können ---> können
rückläufig ---> rückläufig
gewährleisten ---> gewährleisten
Can anyone please suggest the proper way to fix this issue?
You need to reencode your input. The problem is that the text is UTF-8 but is interpreted as ISO-8859-1. That seems to be a bug of SAX.
String output=new String(input.getBytes("8859_1"), "utf-8");
That line takes the ISO-8859-1 and converts it to utf-8 which is used by Java.
got my anwser from here
They suggest that the heading should be:
<?xml version="1.0" encoding="ISO-8859-1"?>
instead of
<?xml version="1.0" encoding="utf-8"?>
Hope that is the answer- edit just saw that you don't have control over the xml,
so this will not help, rekire's answer is then a option
Related
I am learning android and wanted to perform xml parsing using SAXParser. my issue is when i run my code i get the following error:
org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 5: not well-formed (invalid token)
i have gone through similar cases reported on stackoverflow.com but none of the solutions provided have fixed my issue
here is my code:
public List<RssItem> getItems() throws Exception {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
//Creates a new RssHandler which will do all the parsing.
RssHandler handler = new RssHandler();
InputSource source = new InputSource(new StringReader(rssUrl));
source.setEncoding("UTF-8");
saxParser.parse(source, handler);
return handler.getRssItemList();
the xml feed iam trying to parse is: http://t.arabi21.com/rss
Kindly help me to overcome this issue.
thanks a lot
You simply need to add XML declaration like
<?xml version='1.0' encoding='UTF-8'?>
in the beginning, before your
<rss version="2.0">
element.
I have a question about parsing xml inside a cdata tag inside another xml. I have searched, but I didn't found my exact problem. I post the example:
<?xml version="1.0" encoding="UTF-8"?>
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Body>
<ns2:CSResponse xmlns:ns2="http://webservices....../">
<RespuestaVentaPrepagoTituloCS1>
<ICallId>0</ICallId>
<IResultCode>1</IResultCode>
<SResulXML>
<![CDATA[null<?xml version="1.0" encoding="UTF-8"?>
<SS_prepagoCS version="0.1" fecha="2013-11-02T06:24:42" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=".\SS_Pasdsd">
<TTPSearchResult value="1" desc="OK">
<TTPData xsi:nil="false">
<SerialNumber>56676543243234</SerialNumber>
....
But the response that I received is the next:
<?xml version='1.0' encoding='UTF-8'?>
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Body>
<ns2:CS1Response xmlns:ns2="http://webservices..../">
<RespuestaVentaPrepagoTituloCS1>
<ICallId>0</ICallId>
<IResultCode>1</IResultCode>
<SResulXML>null<?xml version="1.0" encoding="UTF-8"?><SS_prepagoCS version="0.1" fecha="2013-11-13T10:12:20" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=".\SS_PrepagoCS_v0.1.xsd"> <TTPSearchResult value="1" desc="OK"> <TTPData xsi:nil="false">
And I cant parse the content of the cdata because the sax parser cant find the xml tags. Can somebody help me?
Thanks
EDIT: added more code
I make the request:
String data = "<?xml version=\"1.0\" encoding=\"UTF-8\"?> " +
"<soapenv:Envelope xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:web=\"http://webservices.sayp.bit.crtm/\"> " +
" <soapenv:Header/> " +
" <soapenv:Body> " +
".... ";
String action = "";
HttpResponse res = sendRequest(serverV2 + method, data, action);
if (res.getStatusLine().getStatusCode() == 200) {
HttpEntity entity = res.getEntity();
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
CSHandler handler = new CSHandler();
parser.parse(entity.getContent()), handler);//entity.getContent
return handler.getResponse();
}
In some place, I read that changing the entity.getContent() for the next code should work, but the result is the same.
HttpEntity entity = res.getEntity();
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
CSHandler handler = new CSHandler();
String xml=EntityUtils.toString(entity);
if (BuildConfig.DEBUG)
Log.i("XML", xml);
parser.parse(new InputSource(new String(xml)), handler);//entity.getContent
return handler.getResponse();
}
And the CSHandler is a normal SAX parser that search the tags, but it cant find the tags because the xml is converted to html...
The text in the CDATA section or the escaped text in the other form is likely the same, but cannot be recognized as XML by the parser because of the escaping or wrapping.
What you'll have to do is use your main parser to get all this text into a string, and then start a separate parse for the XML within that string after stripping off at least the "null" at the beginning of it.
If you post Java code showing how you're currently parsing the rest of the xml, we may be able to provide more detailed guidance on handling this.
i am able to do xml parsing for valid character but when i pass invalid character from my URL string then there no result found but when i pass that web service url from my browser then result is found.so i think problem in parsing for invalid character for doing sax xml parsing ,so how to overcome from this problem ,means how to deal with invalid character means in url http://www.arteonline.mobi/iphone/output.php?st=&ct=&type=Clínicas%20y%20Talleres&neigh=
for type attribute i pass type=Clínicas where 3rd character is not an English alphabet ,its in Spanish so how to deal with this Spanish
character. my code is below....
#Override
protected Boolean doInBackground(String... args) {
try{
try {
String temp = "http://www.arteonline.mobi/iphone/output.php?st="+filter.stateselected+"&ct="+filter.cityselected+"&type="+filter.typeselected+"&neigh="+filter.neighbourselected+"";
//String temp = "http://www.arteonline.mobi/iphone/output.php?st=&ct=&type=Clínicas%20y%20Talleres&neigh=";
temp = temp.replaceAll(" " ,"%20");
// temp= temp.replaceAll("í" ,"í");
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
Log.i("temp url..",temp.trim().toString());
URL sourceUrl = new URL(temp.trim());
XMLHandlerfiltersearch myXMLHandler = new XMLHandlerfiltersearch();
xr.setContentHandler(myXMLHandler);
xr.parse(new InputSource(sourceUrl.openStream()));
}
catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
i replace space in my url if any by using temp = temp.replaceAll(" " ,"%20");
but i could not deal with Spanish character in type attribute in my web service url .pls help.....
also check for type=Galerías when pass from web service url.. in this 6th charaacter is not valid..
You should use URLEncoder:
String stateselected= URLEncoder.encode(filter.stateselected, "UTF-8");
String cityselected = URLEncoder.encode(filter.cityselected, "UTF-8");
String typeselected= URLEncoder.encode(filter.typeselected, "UTF-8");
String neighbourselected= URLEncoder.encode(filter.neighbourselected, "UTF-8");
String temp = "http://www.arteonline.mobi/iphone/output.php?st="+stateselected+"&ct="+cityselected+"&type="+typeselected+"&neigh="+neighbourselected+"";
//String temp = "http://www.arteonline.mobi/iphone/output.php?st=&ct=&type=Clínicas%20y%20Talleres&neigh=";
if you have problems with the character encoding when parsing the XML you could set the encoding used by the parser:
InputSource is = new InputSource(sourceUrl.openStream());
is.setEncoding("ISO-8859-1");
xr.parse(is);
I am trying to parse an xml file withSaxParser on Android.
This is my xml file:
<?xml version="1.0" encoding="UTF-8"?>
<cars>
<car model="CitroenC3">
<maintenances>
<xm:maintenance xmlns:xm="it.a.b.android.c.car.m" distance="" price="">
<xm:type></xm:type>
</xm:maintenance>
</maintenances>
<chargings>
<xc:charging xmlns:xc="it.a.b.c.fuelconsumption.car.m" quantity="18" price="20" distance="400" consumption="14"/>
</chargings>
</car>
</cars>
And this is the code:
// Handling XML
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
XmlResourceParser parser = getResources().getXml(R.xml.data);
// Create handler to handle XML Tags ( extends DefaultHandler )
DataSaxHandler myXMLHandler = new DataSaxHandler();
xr.setContentHandler(myXMLHandler);
InputStream is= getResources().openRawResource(R.xml.data);
xr.parse(new InputSource(is));
After xr.parse I have the Exception:
03-22 15:24:04.248: INFO/System.out(415): XML Pasing Excpetion =
org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: not well-formed (invalid token)
What may be wrong?
Thanks a lot.
AFAIR, any xml file under res/ folder is compiled before it's placed in .apk.
Try to move your XML-file to assets/ folder and load it from there:
xr.parse(new InputSource(getAssets().open("data.xml")));
Respected All,
I have to read XML file, for that I use SAXParser and DefaultHandler using method characters(char[] ch, int start, int length) but it gives output with some extra characters such as [] in place of '#13'. someone told me that if I read that string in UTF-8 format then it will remove that all the extra characters. Is it true that I have to read it in UTF-8 format if yes then how I can read it.
Thank You
(Vikram Kadam)
I use this to parse with the SAXparser :
URL url = new URL(urlToParse);
SAXParserFactory spf = SAXParserFactory.newInstance();
// here we get our SAX parser
SAXParser sp = spf.newSAXParser();
// we fuse it to a XML reader
XMLReader xr = sp.getXMLReader();
DefaultHandler handlerContact = new DefaultHandler();
// we give it a handler to manage the various events
xr.setContentHandler(handlerContact);
// and finally we open the stream to the url
InputStream oS = url.openStream();
// and parse it
xr.parse(new InputSource(new InputStreamReader(oS, Charset.forName("utf-8"))));
// to retrieve the list of contacts created by the handler
result = handlerContact.getEntries();
// don't forget to close the resource
oS.close();
I never had any trouble as long as the initial file you are parsing is properly encoded in UTF-8. Check if it is, because sometimes, when you use default configuration of your computer, default is not UTF-8 but ANSI or ISO-8859-1