How to get Text from Parent div only using Jsoup in Android

How to get Text from Parent div only using Jsoup in Android - android

I am working with Android Application using Jsoup for extracting text from website, in the below Html I want to get text of the parent div only. What I want is to display Date & Time which is in the parent div of class "fr".
<div class="fr">
<div id="newssource">
<a href="http://nhl.com" class="newssourcelink" target="_blank">
Philadelphia Flyers
</a>
</div>
April 15, 2014, 11:13 a.m.
</div>
What I have tried.
for(Element detailsDate:document.getElementsByClass("fr")){
newsDate.add(detailsDate.clone().children().remove().last().text().trim());
}
It only get text from child div i.e. "Philadelphia Flyers" which is in the "a" tag, but I want to display the Date & Time only.

use below jQuery code to get only Date and Time form "fr" div
<script>
jQuery('.fr').each(function(){
var textValue = jQuery(this).clone() //clone the element
.children() //select all the children
.remove() //remove all the children
.end() //again go back to selected element
.text();
alert(textValue);
});
</script>

I have found the answer by myself. Just Posted here for some one else with same issue.
for(Element detailsDate:document.getElementsByClass("fr")){
newsDate.add(detailsDate.getElementById("newssource").nextElementSibling());
}

Related

JSOUP - Accessing elements within a div class / stop when reaching a specific div class

I'm trying to parse data from HTML. I need to get specific content from the html code which the ordering or the html content may be different.
<h1>Latest Deals</h1>\r\n </div>\r\n </div>\r\n</div>\r\n\r\n
<div class=\"breadcrumb-wrapper\">\r\n
<ul class=\"breadcrumb\">\r\n
<li>Home</li>\r\n
<li>Deals</li>\r\n
<li class=\"active\">Mau Mudik Hemat? Nikmati Diskon Hingga 20%</li>\r\n
</ul>\r\n</div>\r\n\r\n
<div class=\"article outer clearfix\">\r\n
<div class=\"col-sm-12\">\r\n
<img alt=\"Mau Mudik Hemat? Nikmati Diskon Hingga 20%\" title=\"Mau Mudik Hemat? Nikmati Diskon Hingga 20%\" src=\"/images/slider/id/special-raya-offer-id-v2.jpg\">\r\n
<h1>Mau Mudik Hemat? Nikmati Diskon Hingga 20%</h1>\r\n
<p class=\"date\">May 18th, 2018</p>\r\n
<p><strong class=\"text-red\"></strong></p>\r\n\r\n
<p>This is the first paragraph</p>\r\n\r\n
<p>This is the second paragraph.</p>\r\n\r\n
<p>This is the third paragraph</p>\r\n\r\n
<p>Below is the point form start:</p>\r\n\r\n
<ol>\r\n
<li>Point form A</li>\r\n
<li>Point form B</li>\r\n
<li>Point form C</li>\r\n
<li>Point form D</li>\r\n
</ol>\r\n\r\n\r\n\r\n
<div class=\"m-top30 m-bottom20\">\r\n
Home\r\n\r\n \r\n\r\n\r\n</div>\r\n\r\n\r\n
Previously i had successfully get the content i want via:
Document doc = Jsoup.parse(content);
Element eTitle = doc.getElementsByTag("h1").get(1);
Elements eBody = doc.getElementsByTag("p");
for (Element body : eBody) {
detailContent += "<p>" + body.html() + "</p>";
The code above i getting the first "h1" and all element with "p" from my long html code. However, now in some case i might have element "ol" in between of those "p". For example:
<div class=\"col-sm-12\">\r\n <img alt=\"abc\" title=\"abcd\" src=\"/images/slider/id/abcd.jpg\">\r\n
<h1>This is the header</h1>\r\n
<p class=\"date\">November 4th, 2015</p>\r\n
<p><strong class=\"text-red\">Sorry, this promotion has expired.</strong></p>\r\n
<p> Paragraph 1 </p>\r\n
<p> Paragraph 2 </p>\r\n
<ol>\r\n
<li> Point 1 </li>\r\n
<li> Point 2 </li>\r\n
</ol>\r\n
<p> Paragraph 3 </p>\r\n
<p> Paragraph 4 </p>\r\n
<ol>\r\n
<li> Point 1 </li>\r\n
<li> Point 2 </li>\r\n
</ol>\r\n
<div class=\"m-top30 m-bottom20\">
How should i create my code to get all these item?
*P.s All i want to do is
1) To get the element in "col-sm-12" div / the last element before "m-top30 m-bottom20"
2) Ignore certain element contain in "col-sm-12"

Changing the selectors to CSS and adding the filter such as 'p' under the first div can help you. However from the above html it is not clear whether the first div ends before the starting of the second div. If you share more details about the html, may be we can refine the selectors. I have stated the assumptions/my understanding in the code comment.
String eTitle = doc.select("div.col-sm-12 > h1").text(); //I'm assuming you are trying to fetch the title text.
Elements eBody = doc.select("div.col-sm-12 > p , ol"); //This CSS selector will limit the 'p' elements to this div alone.
for (Element body : eBody) {
//work with the 'body' element here.

WebView extract and edit html

I am creating a very simple WebView application on android. However, I want to edit the html file before displaying it in the WebView.
For example, if the original html source looked like :
<html>
<body>
<h1> abc </h1>
<h2> abc </h2>
......
<h6> abc </h6>
</body>
</html>
And I want to change it to:
<html>
<body>
<h1> cba </h1>
<h2> cba </h2>
......
<h6> cba </h6>
</body>
</html>
(all "abc" become "cba")
And then, I want to display that new code in my WebView. How can I do this? thanks

I am not sure why do you need this and what kind of app it is to need this. But if you have to do it check foll code:
$(function() {
for(var i =0;i<101;i++) {
if(jQuery('h'+i).length)
jQuery('h'+i).html(jQuery('h'+i).html().split("").reverse().join(""));
}
});

First, a note on your header tags: <h100> is a common misconception for newcomers. <h_> tags are simply an organizational item for your page, and only go out to <h6> You can have multiple <h1> tags on the same page, which are just headings for that section of content (with <h2> implying a subsection of <h1>, etc).
From there, when you say "original source", I assume you mean this is your own code, correct? Not a WebView sourced from another site? If this is the case, and you are only looking to change a specific instance of a specific string in your own code, a Find and Replace should be sufficient via any text or code editor you are using.
But if this is the case, you might want to look into first learning HTML and being able to render it in a basic web browser before moving on to also trying to learn Android.

elements in HTML parsed twice using JSON

My website contains of 149 of these tags
<!-- Begin Module Image -->
<div class="module-img">
<a href="http://prodigy.co.id/news/events/youtube-viewer-event/" >
<img src="http://prodigy.co.id/wp-content/uploads/Prodigy_Sticky_YoutubeViewer.png" width="280" height="150" alt="Youtube Viewer Event!" />
<span></span>
</a>
<div class="lightboxLink">
<a class="popLink boxLink" href="http://prodigy.co.id/wp-content/uploads/Prodigy_Sticky_YoutubeViewer.png" data-rel="prettyPhoto[Youtube Viewer Event!]" title="Youtube Viewer Event!"></a>
</div>
<div class="thumbLink">
<a class="popLink" href="http://prodigy.co.id/news/events/youtube-viewer-event/" title="Full Post"></a>
</div>
</div>
<!-- End Module Image -->
Here's my parser:
Document document = Jsoup.connect(Server.EXPLORE_LINK).timeout(10 * 1000).get();
Elements divs = document.select("div[class=module-img] a[href]");
for (Element div : divs) {
try {
href = div.attr("href");
Elements a = document.select("a[href=" + href + "] img[src]");
src = a.attr("src");
if (!src.startsWith("http://"))
src = src.substring(src.indexOf("http://"));
hrefs.add(href);
srcs.add(src);
} catch (Exception any) {
any.printStackTrace();
}
}
I want my href to be http://prodigy.co.id/news/events/youtube-viewer-event/, and src to be http://prodigy.co.id/wp-content/uploads/Prodigy_Sticky_YoutubeViewer.png for 149 times. At this point I'm completely confused that the size of element divs are 444, not 149 as it should be.
Forgive my laziness but I'm new in this JSON thing and I've been googling around for hours looking for answers.

Are you sure the size is 444? It would make sense if it is 447.
Your selector is valid for all three links in your HTML code. A space means that there can be any number of elements in between. If you want to select direct child nodes only you have to use '>' in between:
Elements divs = document.select("div[class=module-img] > a[href]");
PS: you could use
.classname
instead of
div[class=classname]

I've never used this jsoup API but looking at the selector you used, it seems that you're querying for ALL tags that are DESCENDED from <div class="module-img">. Note that there are 3 <a> inside each module. This would explain the number 444 as 148x3=444. (You said there are 149, but perhaps the first occurrence or the last is not being counted.)
Anyway, try this:
Elements divs = document.select("div[class=module-img] > a[href]");
It should list only <a> children that are DIRECT DESCENDING from given <div>.
Here's more about selectors and combinators.

Passing value from html to javascript in phonegap

I am trying to develop a shopping cart system using kendo-ui mobile and phonegap. First I am listing all the items in a list view. In each listview item, there will be one plus button, minus button and a label.I am using this combination for selecting the quantity of items.So, if we click plus button, the label value should be 0+1=> 1 and when we click minus, it should be like 1-1=>0 .To change the value of label when clicking button, I am passing the id of label to change the corresponding label value. But I am not able to pass the id form html to javascript, like I do in web development. Here is my code,
My listview item template,
<script type="text/x-kendo-tmpl" id="endless-scrolling-template">
<div class="product">
<img src="images/image.jpg" alt="#=ProductName# image" class="pullImage"/>
<h3>#:ProductName#</h3>
<p>$#:kendo.toString(UnitPrice, "c")#</p>
<a id="minus" data-role="button" data-click="minus(#:ProductID#)" >-</a>
<label id=#:ProductID#>0</label>
<a id="plus" data-role="button" data-click="plus(#:ProductID#)" data-name="plus">+</a>
<a id="loginButton" data-role="button" data-click="login">Add to Cart</a>
<div class="console"></div>
</div>
and my javascript functions,
<script>
function plus(itemid) {
var quantity=document.getElementById(itemid).innerHTML;
document.getElementById(itemid).textContent = parseInt(quantity)+1;
}
function minus(itemid) {
var quantity=document.getElementById(itemid).innerHTML;
document.getElementById(itemid).textContent = parseInt(quantity)-1;
}
</script>
Can anyone please tell me what Iam doing wrong here? Or can you provide an alternate solution?

When using Kendo, you can use Kendo MVVM. When JS objects are wired up in views using Kendo MVVM, when the value of the input elements change, the value of the JS objects reflects the change automatically. So what you need to do is to create a JS model for your view and set it as your view's model using data-model="yourModel". See this link: http://docs.kendoui.com/getting-started/mobile/mvvm
In your scenario here I think this link will help you: http://demos.kendoui.com/web/mvvm/source.html
This behavior is explained in the Kendo mobile book I wrote and you can see the checkout screen of the sample application built for the book here: http://movies.kendomobilebook.com/

document.evaluate does not returns proper TextNodes XPath

I am creating "Highlighter" for Android in WebView.
I am getting XPath expression for the selected Range in HTML through a function as follows
/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5]
Now i am evaluating the above XPath expression through this function in javascript
var resNode = document.evaluate('/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5]',document,null,XPathResult.FIRST_ORDERED_NODE_TYPE ,null);
var startNode = resNode.singleNodeValue;
but I am getting the startNode 'null'.
But, here is the interesting point:
if I evaluate this '/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]' XPath expression using the same function, it gives the proper node i.e. a 'div'.
The difference between the two XPaths is the previous ones contains a textNode and later only div.
But the same thing is working fine on Desktop browsers.
Edited
Sample HTML
<html>
<head>
<script></script>
</head>
<body>
<div id="mainpage" class="highlighter-context">
<div> Some text here also....... </div>
<div> Some text here also.........</div>
<div>
<h1 class="heading"></h1>
<div class="left_side">
<ol></ol>
<h1></h1>
<div class="text_bio">
In human beings, height, colour of eyes, complexion, chin, etc. are
some recognisable features. A feature that can be recognised is known as
character or trait. Human beings reproduce through sexual reproduction. In this
process, two individuals one male and another female are involved. Male produces
male gamete or sperm and female produces female gamete or ovum. These gametes fuse
to form zygote which develops into a new young one which resembles to their parent.
During the process of sexual reproduction
</div>
</div>
<div class="righ_side">
Some text here also.........
</div>
<div class="clr">
Some text here also.......
</div>
</div>
</div>
</body>
</html>
getting XPath:
var selection = window.getSelection();
var range = selection.getRangeAt(0);
var xpJson = '{startXPath :"'+makeXPath(range.startContainer)+
'",startOffset:"'+range.startOffset+
'",endXPath:"'+makeXPath(range.endContainer)+
'",endOffset:"'+range.endOffset+'"}';
function to make XPath:
function makeXPath(node, currentPath) {
currentPath = currentPath || '';
switch (node.nodeType) {
case 3:
case 4:return makeXPath(node.parentNode, 'text()[' + (document.evaluate('preceding-sibling::text()', node, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength + 1) + ']');
case 1:return makeXPath(node.parentNode, node.nodeName + '[' + (document.evaluate('preceding-sibling::' + node.nodeName, node, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength + 1) + ']' + (currentPath ? '/' + currentPath : ''));
case 9:return '/' + currentPath;default:return '';
}
}
I am not working with XML but with HTML in webview.
I tried using Rangy serialize and deserialize but the Rangy "Serialize" works properly but not the "deserialize".
Any ideas guys, whats going wrong?
UPDATE
Finally got the root cause of the problem (not solution yet :( )
`what exactly is happening in android webview. -->> Somehow, the android webview is changing the DOM structure of the loaded HTML page. Even though the DIV doesn't contains any TEXTNODES, while selecting the text from DIV, i am getting TEXTNODE for every single line in that DIV. for example, for the same HTML page in Desktop browser and for the same text selection, the XPath getting from webview is entirely different from that of given in Desktop Browser'
XPath from Desktop Browser:
startXPath /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[1]
startOffset: 184
endXPath: /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[1]
endOffset: 342
Xpath from webview:
startXPath :/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[3]
startOffset:0
endXPath:/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[4]
endOffset:151

Well in your sample the path /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5] selects the fifth text child node of the div element
<div class="text_bio">
In human beings, height, colour of eyes, complexion, chin, etc. are
some recognisable features. A feature that can be recognised is known as
character or trait. Human beings reproduce through sexual reproduction. In this
process, two individuals one male and another female are involved. Male produces
male gamete or sperm and female produces female gamete or ovum. These gametes fuse
to form zygote which develops into a new young one which resembles to their parent.
During the process of sexual reproduction
</div>
That div has a single text child node so I don't see why text()[5] should select anything.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

How to get Text from Parent div only using Jsoup in Android - android

I have found the answer by myself. Just Posted here for some one else with same issue. for(Element detailsDate:document.getElementsByClass("fr")){ newsDate.add(detailsDate.getElementById("newssource").nextElementSibling()); }

Related

JSOUP - Accessing elements within a div class / stop when reaching a specific div class

WebView extract and edit html

elements in HTML parsed twice using JSON

Passing value from html to javascript in phonegap

document.evaluate does not returns proper TextNodes XPath

Categories

Resources