I would like to know if there is any method in Jsoup that differentiates multiple elements with the same class. For clarification, consider the following HTML fragment, I need to retrieve the class name "description", but I need to differentiate one information from the other.
<Div class = "related-box gray-text no-margin">
<H3 class = "epsilon"> Awards </ h3>
<P class = "description">
<Strong> Sena - 6 </ strong> <br>
There was no
</ P>
<P class = "description">
<Strong> Quina - 5 </ strong> <br>
124 winning bets, R $ 43,174.39
</ P>
<P class = "description">
<Strong> Quadra - 4 </ strong> <br>
8817 winning bets,
</ P>
Thank You!
Assuming you want to grab the second <p>tag in the posted div, you have multiple options. The following examples use the nth-of-type CSS selector, the get function for jsoup Elements (inherited from ArrayList) and iterating through the elements, if you need more detailed comparison of the elements contents. There are a lot more options and as mentioned in my comment above, using tools like the chrome developer tools, you can select elements and get a matching selector, which might be a good starting point for generalization.
Example Code
String source ="<div class='related-box gray-text no-margin'>"+
"<h3 class='epsilon'>Awards</h3>"
+ "<p class='description'><strong>Sena - 6</strong><br>There was no</p>"
+ "<p class='description'><strong>Quina - 5</strong><br>124 winning bets, R $ 43,174.39</p>"
+ "<p class='description'><strong>Quadra - 4</strong><br>8817 winning bets,</p></div>";
Document doc = Jsoup.parse(source, "UTF-8");
// nth-of-type(n) CSS selector
Element quina = doc.select(".description:nth-of-type(2)").first();
System.out.println(quina.text());
// Elements.get(n) jsoup method
quina = doc.select(".description").get(1);
System.out.println(quina.text());
// iterate over Elements
Elements descriptions = doc.select(".description");
for (Element element : descriptions) {
if(element.text().contains("Quina")){
quina = element;
}
}
System.out.println(quina.text());
Output
Quina - 5 124 winning bets, R $ 43,174.39
Quina - 5 124 winning bets, R $ 43,174.39
Quina - 5 124 winning bets, R $ 43,174.39
Related
Consider this HTML:
<ul>
<li>Some bullet point text.</li>
<li>Another bullet point:</li>
<ul>
<li>A Sub Bullet point of the above.</li>
<ul>
<li>A Sub sub bullet point.</li>
</ul>
<li>The second sub bullet point.</li>
</ul>
</ul>
I'm loading this into a TextView in MainActivity:
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
val someText = findViewById<TextView>(R.id.someTextView)
val url = "<ul>\n" +
" <li>Some bullet point text.</li>\n" +
" <li>Another bullet point:</li>\n" +
" <ul>\n" +
" <li>A Sub Bullet point of the above.</li>\n" +
" <ul>\n" +
" <li>A Sub sub bullet point.</li>\n" +
" </ul>\n" +
" <li>The second sub bullet point.</li>\n" +
" </ul>\n" +
"</ul>\n"
someText.text = Html.fromHtml(url, Html.TO_HTML_PARAGRAPH_LINES_CONSECUTIVE)
}
Nothing special with the activity_main.xml:
<TextView
android:id="#+id/someTextView"
android:layout_width="match_parent"
android:layout_height="wrap_content" />
Expected (Sub, Sub-sub bullet points):
Actual (no sub and sub-sub bullets AND indenting is wrong):
I've tried this library but it has the same problems.
Another solution would be to convert the tags to "\u25CF" and sub-sub bullets to "\u25CB" (or BulletSpan) and use JSoup for the ordering. But this feels like a lot of work for a standard HTML text.
I've tried MyTagHandler but the same problem.
Any ideas?
Thanks for the help guys!
ul and li tags are not supported by Html.fromHtml() method. There is an open issue in Google Issue Tracker about these missing tag supports and about the supported tags not listen on Html.fromHtml() method's documentation.
But, I think BulletSpan might be helpful.
I am trying to read and check all text from Android string.xml file.
the file contains below content:
<resources xmlns:xliff="urn:oasis:names:tc:xliff:document:1.2">
<string name="status">Finishing <xliff:g id="number">%d</xliff:g> percent.</string>
</resources>
I tried:
import xml.etree.ElementTree as ET
tree = ET.parse(filepath)
root = tree.getroot()
print len(root[0])
# it prints 1
print root[0].text
# it prints Finishing
print root[0][0].text
#it prints %d
How to find a way to print the 3rd text, "percent.", while <xliff:g> tag is gettng in my way.
Any proper way to do it? thanks
In ElementTree, your target text "percent." is modeled as tail of the element xliff:g :
>>> ns = {"xliff": "urn:oasis:names:tc:xliff:document:1.2"}
>>> g = root.find("string/xliff:g", namespaces=ns)
>>> print g.tail
percent.
Or using something closer to your attempted code :
>>> print root[0][0].tail
percent.
I am displaying of my HTML code in my textview and using html.fromhtml(). However, when comes to display h1 tag. It must have empty line after h1. I try to fix it by adding . But I understand that html.fromhtml() does not support class. May I know is it any solution to remove empty line after h1 in html.fromhtml()? The following is my sample code
title="<h1>Human Resource</h1>";
textView.setText(Html.fromHtml(title));
There will always be a blank space after < h1 > tags. Just like using < p > tags.
You could replace < h1 > tag programically:
title="<h1>Human Resource</h1>";
title = title.replace("<h1>", "<b>");
title = title.replace("</h1>", </b></br>");
textView.setText(Html.fromHtml(title));
I have a TAG getinng the whole description is getting difficult in SimpleXML what property should i give to get whole description.
SimpleXML is parssing the p tag also .
<description>
<p>
Bar In Inconel</p> <ul> <li>600</li> <li>601</li> <li>617</li> <li>625</li> <li>718/718 NACE MRO</li> <li>725</li> <li>800</li> <li>825</li> <li>925</li> <li>K500</li> <li>MP35N</li> <li>X750</li> <li>C276</li></ul> <p>Aluminum Tubing, Aluminum Pipe, Aluminum Sheet, Aluminum Bar, Aluminum Plate In Aluminum</p> <ul> <li>2219</li> <li>2024</li> <li>2124</li> <li>3003</li> <li>5052</li> <li>6061</li> <li>7075</li> <li>7050</li></ul> <p>Stainless Steel In Tube, Pipe, Plate, Sheet &
</p>
</description>
When getting final response just enclose the CDATA tag and make the data=true for that element
str = str.replaceAll("<description>","<description><![CDATA[");
str = str.replace("</description>", "]]></description>");
:D :D
:D
Xml response is like , but in one tag , the text is this:
<Description>
<center><strong><span>Warehouse / Building Maintenance</span></strong></center><br />
<br />
<strong><span>I</span></strong><span>mmediate openings available in the local Perris area for warehouse/building building maintenance positions. <br />
<br />
<strong>Job Description:</strong><br />
</span>
<ul>
<li><span>Associates will be responsible to define pieces of equipment that will paralyze operations if they fail, and plan whatever level of preventative maintenance necessary. <br />
</span></li>
<li><span>
......
similar text
......
</Description>
I can't able to parse it in proper way.
I tried using Jsoup.parse((nodeValue))
and Html.fromHtml(String) also URLEncoder.encode(String)
but its returning simple & symbol thats it.
How to parse this type of response?
A temporary solution could be applying replaceAll("<", "<").replaceAll(">", ">") if API methods aren't working, but arent you supposed to use Html.toHtml(string) when you have stuff encoded and want it to become real html ?
Actually its returning only & symbol , that's because the nodelist item returns only single value.
After using like this:
NodeList fstNm = fstNmElmnt.getChildNodes();
System.out.println("nodefstnm"+fstNm.getLength());
for(int j=0;j<fstNm.getLength();j++) {
String val = ((Node) fstNm.item(j)).getNodeValue();
nodeValue=nodeValue.concat(val);
}
and then parsing with Jsoup., returns what I want.