android dom parser issue - android

i have this rss feed to parse that contains several tags. i am able to retrieve the value (child element) for all except for the description tag node. please find below the rss feed
<fflag>0</fflag>
<tflag>0</tflag>
<ens1:org>C Opera Production</ens1:org>
−
<description>
<p>Opera to be announced</p>
<p>$15 adults/$12 seniors/$10 for college students<span style="white-space: pre;"> </span></p>
</description>
the code that i am using for this is
StringBuffer descriptionAccumulator = new StringBuffer();
else if (property.getNodeName().equals("description")){
try{
String desc = (property.getFirstChild().getNodeValue());
if(property.getNodeName().equals("p")){
descriptionAccumulator.append(property.getFirstChild().getNodeValue());
}
}
catch(Exception e){
Log.i(tag, "No desc");
}
else if (property.getNodeName().equals("ens1:org")){
try{
event.setOrganization(property.getFirstChild().getNodeValue());
Log.i(tag,"org"+(property.getFirstChild().getNodeValue()));
}
catch(Exception e){
}
else if (property.getNodeName().equals("area")||property.getNodeName().equals("fflag") || property.getNodeName().equals("tflag") || property.getNodeName().equals("guid")){
try{
//event.setOrganization(property.getFirstChild().getNodeValue());
Log.i(tag,"org"+(property.getFirstChild().getNodeValue()));
}
catch(Exception e){
}
else if(property.getNodeName().equals("p") || property.getNodeName().equals("em") || property.getNodeName().equals("br") || property.getNodeName().startsWith("em") || property.getNodeName().startsWith("span") || property.getNodeName().startsWith("a") || property.getNodeName().startsWith("div") || property.getNodeName().equals("div") || property.getNodeName().startsWith("p")){
descriptionAccumulator.append(property.getFirstChild().getNodeValue());
descriptionAccumulator.append(".");
System.out.println("description added:"+descriptionAccumulator);
Log.i("Description",descriptionAccumulator+property.getFirstChild().getNodeValue());
}
I tried capturing the value of <description> tag but that dint work out, so I tried using all the usual html formatting tags that are used but still no way out. using any other parser is not an option. could some body please help me out with this. thanks

I believe smth is wrong with the rss xml. For instance check what xml is returned by StackOverflow rss feed. Specifically pay attention how <summary type="html"> node content looks like - it has no child xml nodes inside, only pure xml-escaped text. So if it is acceptable in your case - spend efforts on a proper rss xml generation rather than on fixing the consequences.

You are parsing this as xml, so the description tag doesn't have a string value, it has multiple children. You might try getting getting the description node and pretty printing it's children. See LSSerializer for printing to XML.

Related

Android why xPath.String return empty string?

I got xml
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0" xmlns:l="http://www.w3.org/1999/xlink">
<description>
<title-info>
<genre>love_contemporary</genre>
<author>
<first-name>Sylvain</first-name>
<last-name>Reynard</last-name>
</author>
<book-title>Gabriel's Inferno</book-title>
<annotation>
<p>Enigmatic and sexy, Professor Gabriel Emerson is a well respected Dante specialist by day, but by night he devotes himself to an uninhibited life of pleasure. He uses his notorious good looks and sophisticated charm to gratify his every whim, but is secretly tortured by his dark past and consumed by the profound belief that he is beyond all hope of redemption. When the sweet and innocent Julia Mitchell enrolls as his graduate student, his attraction and mysterious connection to her not only jeopardizes his career, but sends him on a journey in which his past and his present collide. An intriguing and sinful exploration of seduction, forbidden love and redemption, Gabriel's Inferno is a captivating and wildly passionate tale of one man's escape from his own personal hell as he tries to earn the impossible…forgiveness and love.</p>
</annotation>
<date/>
<coverpage>
<image l:href="#_0.jpg"/>
</coverpage>
<lang>en</lang>
<src-lang>en</src-lang>
<sequence name="Gabriel's Inferno" number="1"/>
</title-info>
<document-info>
<author>
<first-name/>
<last-name/>
</author>
<date/>
<id>2aec7273-a8a4-4edc-803a-820c4d76bc3f</id>
<version>1.0</version>
</document-info>
<publish-info>
<book-name>Gabriel's Inferno</book-name>
<year>2011</year>
</publish-info>
</description>
</FictionBook>
My expression to get value of attribute
string(//coverpage/image/#l:href)
Code in android programm
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String expression;
String attrValue;
expression = "string(//coverpage/image/#l:href)";
try {
attrValue = xpath.compile(expression).evaluate(obj,
XPathConstants.STRING).toString();
System.out.println("VAL XML:"+attrValue);
} catch (XPathExpressionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
But on console i get only:
VAL XML:
Why? What i doing wrong?
I try http://www.freeformatter.com/xpath-tester.html#ad-output for online testtings - everything works fine. Get string #_0.jpg
Your problem is that the node you're trying to catch is using the XML namespace, and the factory isn't aware of it. I see two solutions for this:
Without defining the namespace
Avoid the issue using local-name() to ignore namespaces altogether.
//*[local-name() = 'coverpage']/*[local-name() = 'image']/#*[local-name() = 'href']
(//coverpage/image/#*[local-name() = 'href'] might work as well)
Defining the namespace
Make XPathFactory aware of the different namespaces so that it knows which one to use.
import javax.xml.namespace.NamespaceContext;
...
xpath.setNamespaceContext(new MyNamespaceContext());
attrValue = xpath.compile(expression).evaluate(obj,
XPathConstants.STRING).toString();
...
private static class MyNamespaceContext implements NamespaceContext {
public String getNamespaceURI(String prefix) {
if("l".equals(prefix)) {
return "http://www.w3.org/1999/xlink";
}
return null;
}
public String getPrefix(String namespaceURI) {
return null;
}
public Iterator getPrefixes(String namespaceURI) {
return null;
}
}
(possible duplicate: How to use XPath on xml docs having default namespace)

Parsing local gpx file in Android

I followed this example to parse a local GPX file in Android:
http://android-coding.blogspot.pt/2013/01/get-latitude-and-longitude-from-gpx-file.html
All works fine to access "lat" and "long" but I need also to get the "ele" value but all my tentatives were unsuccessful.
Anyone can give me some hits to do that?
Thanks in advance!
Best regards,
NR.
I will add my library for GPX parsing to these answers: https://github.com/ticofab/android-gpx-parser. It provides two ways to parse you GPX file: once you obtain / create a GPXParser object (mParser in the examples below), you can then either parse directly your GPX file
Gpx parsedGpx = null;
try {
InputStream in = getAssets().open("test.gpx");
parsedGpx = mParser.parse(in);
} catch (IOException | XmlPullParserException e) {
e.printStackTrace();
}
if (parsedGpx == null) {
// error parsing track
} else {
// do something with the parsed track
}
or you can parse a remote file:
mParser.parse("http://myserver.com/track.gpx", new GpxFetchedAndParsed() {
#Override
public void onGpxFetchedAndParsed(Gpx gpx) {
if (gpx == null) {
// error parsing track
} else {
// do something with the parsed track
}
}
});
Contributions are welcome.
you have the "Node node = nodelist_trkpt.item(i);" in your first loop.
Get the child elements from this node an run through these child elements.
e.g.:
NodeList nList = node.getChildNodes();
for(int j=0; j<nList.getLength(); j++) {
Node el = nList.item(j);
if(el.getNodeName().equals("ele")) {
System.out.println(el.getTextContent());
}
}
Update: I've added parsing "ele" element as well, so this code could match your requirements.
I will propose different approach: https://gist.github.com/kamituel/6465125.
In my approach I don't create an ArrayList of all track points (this is done in the example you posted). Such a list can consume quite a lot of memory, which can be an issue on Android.
I've even given up on using regex parsing to avoid allocating too many objects (which causes garbage collector to run).
As a result, running Java with 16Mb heap size, parsing GPX file with over 600 points, garbage collector will be run only 12 times. I'm sure one could go lower, but I didn't optimize it heavily yet.
Usage:
GpxParser parser = new GpxParser(new FileInputStream(file));
TrkPt point = null;
while ((point = parser.nextTrkPt()) != null) {
// point.getLat()
// point.getLon()
}
I've successfully used this code to parse around 100 Mb of GPX files on Android. Sorry it's not in the regular repo, I didn't plan to share it just yet.
I've ported the library GPXParser by ghitabot to Android.
https://github.com/urizev/j4gpx

Android Html.fromHtml() loses the HTML if it starts with <p> tag

i call a web service that returns some HTML which enclosed in an XML envelop... something like:
<xml version="1.0" cache="false">
<text color="white">
<p> Some text <br /> <p>
</text>
</xml>
I use XmlPullParser to parse this XML/HTML. To get the text in element, i do the following:
case XmlPullParser.START_TAG:
xmlNodeName = parser.getName();
if (xmlNodeName.equalsIgnoreCase("text")) {
String color = parser.getAttributeValue(null, "color");
String text = parser.nextText();
if (color.equalsIgnoreCase("white")) {
detail.setDetail(Html.fromHtml(text).toString());
}
}
break;
This works well and gets the text or html in element even if it contains some html tags.
Issue arises when the element's data starts with <p> tag as in above example. in this case the data is lost and text is empty.
How can i resolve this?
EDIT
Thanks to Nik & rajesh for pointing out that my service's response is actually not a valid XML & element not closed properly. But i have no control over the service so i cannot edit whats returned. I wonder if there is something like HTML Agility that can parse any type of malformed HTML or can at least get whats in html tags .. like inside <text> ... </text> in my case?? That would also be good.
OR anything else that i can use to parse what i get from the service will be good as long as its decently implementable.
Excuse me for my bad english
You are seeing that behavior because what you have inside the <text>...</text> tags is not a text element, but an XML Node element. You should enclose the contents in a CDATA section.
Edit: Providing the code segment for my suggestion in the comment. It does indeed work with the sample XML given by you.
StringBuffer html = new StringBuffer();
int eventType = parser.getEventType();
while (eventType != XmlPullParser.END_DOCUMENT) {
if(eventType == XmlPullParser.START_TAG) {
String name = parser.getName();
if(name.equalsIgnoreCase("text")){
isText = true;
}else if(isText){
html.append("<");
html.append(name);
html.append(">");
}
} else if(eventType == XmlPullParser.END_TAG) {
String name = parser.getName();
if(name.equalsIgnoreCase("text")){
isText = false;
}else if(isText){
html.append("</");
html.append(name);
html.append(">");
}
} else if(eventType == XmlPullParser.TEXT) {
if(isText){
html.append(parser.getText());
}
}
eventType = parser.next();
}
Because above code you don't close "</p>" TAG.
<p> Some text <br /> </p>
Used this line .
Solution
Isnpired by Martin's approach of converting the received data first to string, i managed my problem in a kind of mixed approach.
Convert the received InputStream's value to string and replaced the erroneous tag with "" (or whatever you wish) : as follows
InputStreamReader isr = new InputStreamReader(serviceReturnedStream);
BufferedReader br = new BufferedReader(isr);
StringBuilder xmlAsString = new StringBuilder(512);
String line;
try {
while ((line = br.readLine()) != null) {
xmlAsString.append(line.replace("<p>", "").replace("</p>", ""));
}
} catch (IOException e) {
e.printStackTrace();
}
Now i have a string which contains correct XML data (for my case), so just use the normal XmlPullParser to parse it instead of manually parsing it myself:
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(false);
XmlPullParser parser = factory.newPullParser();
parser.setInput(new StringReader(xmlAsString.toString()));
Hope this helps someone!

JSONObject text must begin with '{' at character 1 of

I had a PHP API which showed a JSON Array, which I then read into an Android Application.
I since moved servers and the android application broke.
I assumed it was the Authentication and thought I would re-build the Android application (Was my first application and thought a re-write could make things better)
For some reason I am now getting this exception error
I read somewhere that I need to parse JSON_FORCE_OBJECT in the PHP json_encode
json_encode($arrMyData, JSON_FORCE_OBJECT);
But I am running PHP 5.2 (Options parameter came out in PHP 5.3)
My code for you to rip into
private void displayAllStories(){
String line;
int intNumStories = 0;
JSONObject arrAllStories;
LinearLayout storiesLayout = (LinearLayout) findViewById(R.id.lyoutStoriesMain);
storiesLayout.removeAllViewsInLayout();
try {
while((line = this.jsonResult.readLine()) != null){
JSONObject arrStories;
arrStories = new JSONObject(line.trim());
intNumStories = Integer.parseInt(arrStories.optString("NumStories"));
arrAllStories = arrStories.getJSONObject("StoryData");
this.strDebug += "We have "+intNumStories+"\n";
}
} catch (IOException e) {
this.strDebug += "Error (3) "+e.getLocalizedMessage()+"\n";
} catch (JSONException e) {
this.strDebug += "Error (4) "+e.getLocalizedMessage()+"\n";
}
}
And the encoded data from the website
{
"NumStories":1,
"StoryData":{
"Story0":{
"ID":"1020",
"OWERNAME":"Alicia",
"STORYMAIN":"Good evening my son was born with bilateral club feet. When he was a week old we started serial casting once a week for 3 months and then he was placed in braces for the next 6 months for a 23 hour period and then for the next 3 months just durning the night. This last visit the doctor said that he needs to have his tendons lengthened and he will go back into cast. After reading all of these articles I am a little scared on what will be best for him. It sounds like the risk of having the surgery are just as heavily weighed as just keeping him in AFO\\'s till he can make his own decision. I would like all advice whether it be positive or negative. Thank you in advance for your help.",
"STORYBRIEF":"Need reassurance that tendon lengthening is the best decision.",
"ADDEDDATE":"2011-12-12 00:51:16",
"CURRENTSTATUS":"n"
}
}
}
Sorry I should add, the code before this which procudes jsonResult is as follows
try{
URL url = null;
URLConnection urlConn = null;
InputStreamReader jsonIsr = null;
BufferedReader jsonBr = null;
//this.strDebug += "URL is "+this.strURL+"\n";
url = new URL(this.strURL);
urlConn = url.openConnection();
jsonIsr = new InputStreamReader(urlConn.getInputStream());
jsonBr = new BufferedReader(jsonIsr, 8192);
this.jsonResult = jsonBr;
return true;
}catch(MalformedURLException e){
this.strDebug += "JSON Error (1) "+e.getLocalizedMessage()+"\n";
}catch(IOException e){
this.strDebug += "JSON Error (2) "+e.getLocalizedMessage()+"\n";
}
}else{
strDebug = "NO URL Passed to JSON\n";
}
// EDIT 2
For those who asking
The error is as the title says
Error (4) A JSONObject text must being with '{' at character 1 of {"NumStories":1, "StoryData":........
Your code assumes that whole JSON data comes on one line: it iterates with readLine() but creates a new JSON object every time.
You are reading the data line by line and trying to convert each line into a JSON object. That won't work because a single line just contains a fragment of a complete JSON object.
I don't know what type jsonResult has. But you'll probably want to read the whole thing at once.
Your old web application probably produced JSON data without line break so a single line would contain a full JSON object.
i think you read the json file line by line and pass to the json object you should like this way the whole string you have to pass to the json object for parsing than only you getting the json
JSONObject arrStories = new JSONObject(jsonResult);
now get the object like this way
intNumStories = Integer.parseInt(arrStories.getString("NumStories"));
This code is going to break, if object takes more than one line (apparemtly it does). Your choices are:
Collect all the strings into string builder, the parse from this string ( http://developer.android.com/reference/org/json/JSONTokener.html )
Take GSON or my databinding layer ( https://github.com/ko5tik/jsonserializer ) and just parse stream into object.

Android: Character encoding raw resource files

I'm in the process of translating one of my apps to Spanish, and I'm having a character encoding problem with a raw HTML file I'm sticking into a WebView. I have the spanish translation of the file in my raw-es folder, and I'm reading it in with the following function:
private CharSequence getHtmlText(Activity activity) {
BufferedReader in = null;
try {
in = new BufferedReader(new InputStreamReader(getResources().openRawResource(R.raw.help), "utf-8"));
String line;
StringBuilder buffer = new StringBuilder();
while ((line = in.readLine()) != null) buffer.append(line).append('\n');
return buffer;
} catch (IOException e) {
return "";
} finally {
closeStream(in);
}
}
But everywhere there is a spanish character in the file, there is a diamond with a question mark inside of it when I run the app, and look at the activity that displays the HTML. I'm using the following to load the text into the WebView:
mWebView.loadData(text, "text/html", "utf-8");
I originally created the file in Microsoft Word, so I'm sure there is some sort of character encoding issue going on, but I'm not really sure how to fix it, and a Google search isn't helping. Any ideas?
Don't use loadData. Use loadDataWithBaseURL instead. You would say:
mWebView.loadDataWithBaseURL( null, text, "text/html", "utf-8", null );
I had a similar issue with a French translation where diamond symbols with question marks were appearing in place of certain characters, including those which I had escaped. I got around it by opening file properties in Eclipse and changing the encoding to "ISO-8859-1". Don't know if this would work for Spanish though.

Categories

Resources