I'm parsing not so large XML files, about 200KB, in my application. Parser fails systematically when parsing some files.
Symptoms:
I use two types of XML parsers: SAX
(XMLReader) and Pull
(XMLPullParser), both of them fail
at the same place in the file
(3182th byte).
I use InputStream as the input source for parsers.
I trieed to wrap FileInputStream with BufferedInputStream and nothing changed.
I don't know if the problem is in the Parser or in InputStream.
Please help to fix the problem or advise a workaround.
Related
I have a huge bible data that is in xml format. I am making an android Bible application. But I feel like my data is very huge.
In my research, I read that xml parser parses through the whole file till it gets the tag that it needs. Does anyone know an easier and faster way to parse all the data.
SAX parsing may be appropriate when the data extraction logic is relatively simple and forward only... if you want to have the ease and comfort of traversing the hierarchical structure or XPath, then you are out of luck...
JDOM or DOM have serious memory usage issues...
VTD-XML is a library that spans the use cases too complicated for SAX StAX, and too memory intensive for DOM or JDOM.
While VTD-XML loads everything in memory, the memory footprint is a modest 1.3x~1.5x the size of the XML document, which is 3~5x more efficient than DOM..
It also exports a DOM like cursor API and supports XPath 1.0...
Can SAX Parsers use XPath in Java?
You should use a SAX parser, it's the best way to parse large XML files. For instance you can do this:
File inputFile = new File("input.txt");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
I want to pass my xmls from my R.xml directory to Simple library for deserialization.
The library accepts Readers and InputSreams and such.
But from R.xml I get XMLPullParser. How can I get something like a Reader or InputStream instead?
if you save your xml fine inside res/raw instead of res/xml, you should be able to use openRawResource
getResources().openRawResource(R.raw.xml_name)
which returns an InputStream
I need to know what is the best way to parsing XML file in android, I know there is 3 parser (XMLPullParser, Dom Parser and Sax parser) so whats the different between it and if there any code to do that.
Sax Parser : Simple API of XML Parse node to node, using top-down traversing, parse without storing xml, Faster compared to Dom Manipulating of node like insertion or deletion is allowed. Needs SAXParserFactory
Dom Parser : Document Object Model Stores entire xml in memory before processing, traverse in any direction, Manipulating of node like insertion or deletion is NOT allowed. Needs DocumentBuilderFactory
Pull Parser: It provides more control and speed from the above two.
Android training recommends XMLPullParser.
http://developer.android.com/training/basics/network-ops/xml.html
We recommend XmlPullParser, which is an efficient and maintainable way to parse XML on Android.
They also give some code examples.
When parsing an xml file in android, I'm doing like this:
try
{
InputStream is = ...
MyContentHandler ch = new MyContentHandler();
Xml.parse(is, Encoding.UTF_8, ch);
}
catch ...
The problem is that sometimes the file I'm trying to parse is not well-formed.
In my case, undeclared namespaces may be present.
The data I'm interested in is not inside those tags so I could simply ignore it, but I get an exception of unbound prefix not inside the content handler but in the parser itself; this means that if the exception occurs the entire parsing process is interrupted.
Is there a way of using the sax parser ignoring this kind of error (or namespaces at all)?
p.s. I want to avoid loading all the file in memory as a string and strip namespaces out of it, or having to rewrite the file.
I found the solution in another thread.
Instead of using Xml.parse you need to manually instantiate a sax parser through the SAXParserFactory and get a reader.
You can then set the reader features.
Among the available features, one disables namespaces and that does the trick.
Reference -> LINK
I'm building an RSS reader APP, and I've been told to use the XMLPullParser interface.
Here is the block of code I'm working with:
XmlResourceParser parser = context.getResources().getXml(resource);
'Resource' is a an integer with the R.id. integer of the Xml file. This is not an internal XML file, so I don't know how to work around this.
Any ideas? Is the XmlResourceParser the wrong approach for this project? I've seen XMLReaders used with content handlers as well. Can you integrate these technologies together?
Thank you
what is the type of your xml source?
xmlPullParser can be used to parse any xml sources.
It is in my opinion the way to do this. Only problem you may encounter is when the rss feed has empty lines. The xml pullparser of android (api-level 14) jumps to the /channel if this is the case. When implementing the parser try to use the AsyncTask to start the reading of the rss feed.
Success with the implementation.