Big Data, How to parse a huge xml file faster?

Big Data, How to parse a huge xml file faster? - android

I have a huge bible data that is in xml format. I am making an android Bible application. But I feel like my data is very huge.
In my research, I read that xml parser parses through the whole file till it gets the tag that it needs. Does anyone know an easier and faster way to parse all the data.

SAX parsing may be appropriate when the data extraction logic is relatively simple and forward only... if you want to have the ease and comfort of traversing the hierarchical structure or XPath, then you are out of luck...
JDOM or DOM have serious memory usage issues...
VTD-XML is a library that spans the use cases too complicated for SAX StAX, and too memory intensive for DOM or JDOM.
While VTD-XML loads everything in memory, the memory footprint is a modest 1.3x~1.5x the size of the XML document, which is 3~5x more efficient than DOM..
It also exports a DOM like cursor API and supports XPath 1.0...
Can SAX Parsers use XPath in Java?

You should use a SAX parser, it's the best way to parse large XML files. For instance you can do this:
File inputFile = new File("input.txt");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);

Related

XML Parser in android

I need to know what is the best way to parsing XML file in android, I know there is 3 parser (XMLPullParser, Dom Parser and Sax parser) so whats the different between it and if there any code to do that.

Sax Parser : Simple API of XML Parse node to node, using top-down traversing, parse without storing xml, Faster compared to Dom Manipulating of node like insertion or deletion is allowed. Needs SAXParserFactory
Dom Parser : Document Object Model Stores entire xml in memory before processing, traverse in any direction, Manipulating of node like insertion or deletion is NOT allowed. Needs DocumentBuilderFactory
Pull Parser: It provides more control and speed from the above two.

Android training recommends XMLPullParser.
http://developer.android.com/training/basics/network-ops/xml.html
We recommend XmlPullParser, which is an efficient and maintainable way to parse XML on Android.
They also give some code examples.

The difference among : SAX Parser,XPath,DOM,XMLPullParser

I want to know the difference between the four above types (SAXPaser, XPath, DOM, XMLPullParse) and when should we use each one.

SAX Parsing is the Best one to implement than DOM, see the difference between these two in the following:
DOM
The Nodes are in the form of Tree Structure
Memory: It Occupies more memory, DOM is only preffered in the case of small XML documents
Slower at runtime
Stored as an objects
Programmatically easy to implement
Ease of navigation and use.
SAX
Sequence of events
It doesn't use any memory preferred for large documents.
Faster at runtime, because of the above mentioned point.
Objects are to be created.
Need to write code for creating objects
In SAX Backward navigation is not possible as it sequentially processes the document
So if you have very large files then you should use SAX parser since it will fire events and releasing them ,nothing is stored in memory ,and using SAX parser you can't access element in a random way there is no going back ! , but Dom let you access any part of the xml file since it keeps the whole file/document in memory .
see this article and you can get what you want by reading the Summary.
also check this link to view performance of different xml parser

Please check below links...
http://steveliles.github.com/comparing_methods_of_xml_parsing_in_android.html
http://xjaphx.wordpress.com/2011/11/01/android-xml-adventure-compare-xml-parsers/
http://www.ibm.com/developerworks/opensource/library/x-android/index.html
http://www.developer.com/ws/android/development-tools/Android-XML-Parser-Performance-3824221-2.htm
http://www.geekinterview.com/question_details/12797
(As Per above Article)
Both SAX and DOM are used to parse the XML document. Both has advantages and disadvantages and can be used in our programming depending on the situation
SAX:
Parses node by node
Doesnt store the XML in memory
We cant insert or delete a node
Top to bottom traversing
DOM
Stores the entire XML document into memory before processing
Occupies more memory
We can insert or delete nodes
Traverse in any direction.
If we need to find a node and doesnt need to insert or delete we can go with SAX itself otherwise DOM provided we have more memory.

DOM
The Nodes are in the form of Tree Structure
Memory: It Occupies more memory, DOM is only preffered in the case of small XML documents..Store the entire XML document into memory befor processing
Slower at runtime
Stored as an objects
Programmatically easy to implement
Ease of navigation and use,can traverse in any direction.
We can insert or delete,alter nodes.
SAX : use when you want to access XML ( not alter XML)
Sequence of events
It doesn't use any memory preferred for large documents.Doesn't store the XML in memory before processing
Faster at runtime, because of the above mentioned point.
Objects are to be created.
Need to write code for creating objects
In SAX Backward navigation is not possible as it sequentially processes the document,top to bottom traversing
We can't insert or delete a node
XPATH: Xpath is useful when you only need a couple of values from the XML document and you know where to find them(you know the path of the data./root/item/challange/text)
XMLPullParser:
Fast and requires less memory with DOM
Source:
http://www.time2ask.com/
http://www.time2ask.com/Android/The-difference-among-SAX-ParserXPathDOMXMLPullParser/_2361836

How to Pull an RSS Feed using XmlPullParser in Android

I'm building an RSS reader APP, and I've been told to use the XMLPullParser interface.
Here is the block of code I'm working with:
XmlResourceParser parser = context.getResources().getXml(resource);
'Resource' is a an integer with the R.id. integer of the Xml file. This is not an internal XML file, so I don't know how to work around this.
Any ideas? Is the XmlResourceParser the wrong approach for this project? I've seen XMLReaders used with content handlers as well. Can you integrate these technologies together?
Thank you

what is the type of your xml source?
xmlPullParser can be used to parse any xml sources.

It is in my opinion the way to do this. Only problem you may encounter is when the rss feed has empty lines. The xml pullparser of android (api-level 14) jumps to the /channel if this is the case. When implementing the parser try to use the AsyncTask to start the reading of the rss feed.
Success with the implementation.

DOM or SAX Parsing Example

I've tried every tutorial I could on Google to parse my XML file (you can view it here).
All I want to do with the file in the link above is parse it in an Android app and extract each String's name and string.
Could anyone help out here? I've been through at least 7 or so tutorials and I'm losing all hope right now.
Thank you in advanced.

You can use XmlPullParser for parsing XML.
For e.g. refer to http://developer.android.com/reference/org/xmlpull/v1/XmlPullParser.html

JAXP:
JAXP stands for Java API for xml processing.
It is a specification from w3c.
JAXP is an API from SUN.
using JAXP api, we can process xml document in two mthods.
DOM:
Stores the entire xml document into memory before processing.
It occupies more amount of memory.
It traverse in any direction.
Tree data structure
Steps to work with DOM:
Create documentBuilderFactory
DocumentBuilderFactory factory=
DocumentBuilderFactory.newInstance();
Create DocumentBuilder
DocumentBuilder builder=factory. newDocumentBuilder();
get input stream ClassLoader cls=DomReader.class.getClassLoader();
InputStream is=cls.getResourceAsStream("xml file"); 4. parse xml file and get Document object by calling parse method
on DocumentBuilder object.
Document document=builder.parse(is); 5. Traverse dom tree using document object.SAX:
Simple xml parsing.
It parses node by node
Traversing is from top to bottom
Low memory usage
Back navigation is not possible with sax.
//implementing required handlers
public class SaxParse extends DefaultHandler{ } //new instance of saxParserFactory SAXParserFactory factory=SAXParserFactory.newInstance();
//NEW INSTANCE OF SAX PARSER SAXParser saxparser=factory.newSAXParser(); //Parsing xml document
SAXParser.parse(new File(file to be parsed), new SAXXMLParserImpl());

Android XML parsing problem with both SAX and Pull parser

I'm parsing not so large XML files, about 200KB, in my application. Parser fails systematically when parsing some files.
Symptoms:
I use two types of XML parsers: SAX
(XMLReader) and Pull
(XMLPullParser), both of them fail
at the same place in the file
(3182th byte).
I use InputStream as the input source for parsers.
I trieed to wrap FileInputStream with BufferedInputStream and nothing changed.
I don't know if the problem is in the Parser or in InputStream.
Please help to fix the problem or advise a workaround.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Big Data, How to parse a huge xml file faster? - android

I have a huge bible data that is in xml format. I am making an android Bible application. But I feel like my data is very huge. In my research, I read that xml parser parses through the whole file till it gets the tag that it needs. Does anyone know an easier and faster way to parse all the data.

Related

XML Parser in android

The difference among : SAX Parser,XPath,DOM,XMLPullParser

How to Pull an RSS Feed using XmlPullParser in Android

DOM or SAX Parsing Example

Android XML parsing problem with both SAX and Pull parser

Categories

Resources