I'm trying to learn jsoup for android and I'm having a hard time with learning the selectors. I've already set up the application with simple buttons and textviews that can retrieve basic info i.e. title etc. Now I'm trying to get the text that I've highlighted below. I've tried multiple times and cannot get the correct syntax down.
<li class="info info info">
<script>clicked = false</script>
<div class="simple">
<p class="name">TEXT I NEED TO PARSE </p>
<ul class="Type">
<li>Normal</li> </ul>
<p class="address">120 Hollywood Blvd.</p>
</div>
<div class="sortables">
<p class="inches"></p>
</div>
<div class="action_links">
</div>
Document doc = null;
try {
doc = Jsoup.connect("http://example.com/index.html").get();
} catch (IOException e) {
// TODO Throws exception
}
Element simple = doc.getElementsByClass("simple").first();
Element p = simple.getElementsByClass("name").first();
Element a = p.select("a").first();
String text = a.text();
System.out.println(text);
Related
**hi, i was trying obtain data from a page but i don't know how to obtain this data: Chapter 120 and de url link
This is the code from page (i simplified it):
<div class="row">
<div class="col-12">
<div class="card chapters" id="chapters">
<ul class="list-group list-group-flush">
<li class="list-group-item p-0 bg-light upload-link" data-index="0">
<h4 class="px-2 py-3 m-0">
<div class="row">
<div class="col-10 text-truncate">
<a style="display: block;" class="btn-collapse" onclick="collapseChapter('collapsible490362')" role="button"> Capítulo 120.00</a>
</div>
</div>
</h4>
<div style="display: block;" id="collapsible490362">
<div class="card chapter-list-element">
<ul class="list-group list-group-flush chapter-list">
<li class="list-group-item">
<div class="row">
<div class="col-2 col-sm-1 text-right">
<a href="https://lectortmo.com/view_uploads/599487" class="btn btn-default btn-sm">
<span class="fas fa-play fa-2x" style="color:#2957ba"></span>
</a>
</div>
</div>
</li>
</ul>
</div>
</div>
</li>
</ul>
</div>
</div>
</div>
In this line we can see the text (Chapter 120) that i need show in TextView but i don't know how to obtain it
<a style="display: block;" class="btn-collapse" onclick="collapseChapter('collapsible490362')" role="button"> Chapter 120</a>
And in this line we can see the url that i need:
<a href="https://lectortmo.com/view_uploads/599487" class="btn btn-default btn-sm">
This is my method to obtain data parsing:
#Override
protected ArrayList<TMODatosSeleccion> doInBackground(Void... voids) {
String url = getIntent().getStringExtra("valor");
tmoDatosSeleccions.clear();
try {
Document doc = Jsoup.connect(url).get();
Elements data = doc.select("div.row>.col-10");
int size = data.size();
Log.d("doc", "doc: "+doc);
Log.d("data", "data: "+data);
Log.d("size", ""+size);
for (Element e : data) {
String numeroCap = e.select("a").attr("none");
String urlManga = e.select("div.row>.col-2").select("a").addClass("btn").attr("href").trim();
tmoDatosSeleccions.add(new TMODatosSeleccion(numeroCap, urlManga));
}
} catch (IOException e) {
e.printStackTrace();
}
return tmoDatosSeleccions;
}
Someone can help me?
Print Screen:
You could get the two links you are trying to find using:
Elements data = doc.select("div.row a");
for (Element e : data)
{
// process the link
}
Or you could get them individually using:
Elements data = doc.select("div.row>.col-10 a");
if (data.size() == 1)
{
Element e = data.get(0);
// process col-10 link
}
data = doc.select("div.row>.col-2 a");
if (data.size() == 1)
{
Element e = data.get(0);
// process col-2 link
}
The main problem you were having was that the col-2 element was not nested inside the col-10 element, so your loop would not have found any items.
I am trying to match pattern for getting tag and data between two html tag.
to replace data between two tags i want to inspect elements for that Pattern
i want to make pattern regex so i can match it with html elements and reach to that point and replace data between tags.
if anybody know how to create regex pattern for below html tags.
My HTML file is like this:
<div id="frame">
<div class="content">
<div class="messages">
<ul>
<li class="sent">
<img src="http://emilcarlsson.se/assets/mikeross.png" alt="" />
<p>####data</p>
</li>
<li class="replies">
<img src="http://emilcarlsson.se/assets/harveyspecter.png" alt="" />
<p>####data</p>
</li>
</ul>
</div>
</div>
</div>
what i done:
public void readWritedatatFromHtml(){
InputStream input;
try {
input = getResources().openRawResource(R.raw.view);
int size = input.available();
byte[] buffer = new byte[size];
input.read(buffer);
input.close();
String text = new String(buffer);
// Pattern tags = Pattern.compile ("<div class=\"content\">+<div class=\"messages\">+<ul>");
// Pattern tags = Pattern.compile ("<div class=\"content\">\n<div class=\"messages\">");
// Pattern tags = Pattern.compile ("<div class=\"content\">(.*?)<ul>");
Pattern tags = Pattern.compile ("<div class=\"messages\">.? </div>");
Matcher m = tags.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, " <ul> <li class=\"sent1\">\n" +
" <img src=\"http://emilcarlsson.se/assets/mikeross.png\" alt=\"\" />\n" +
" <p>####data</p>\n" +
" </li>");
}
m.appendTail(sb);
Log.i("sb",sb.toString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Do not under any circumstances try to parse HTML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.
Use an HTML parsing library see this page for some ways to do it.
well, after try some pattern i find something like this which is perfectly working for me:
Pattern tags = Pattern.compile ("<div\\s+class=\"messages\">[\\S\\s]*?<\\/div>");
as #JGNI suggested we should avoid this but right now it's right for my requirement if anybody has any better option please guide me so it can helpful to others as well.
I want to fetch sometitle and somelink from HTML code below for my android app ...
HELP ME :(
<div class="proper-list list-group page-cat-wrap">
<figure class="col-md-12 thumb-vertical">
<div class="col-xs-4 thumb-image">
<a href="/somelink.html" class="image-hover">
<img alt="SomeTag" src="/storage/images/100/2382.jpg">
</a>
</div>
<figcaption class="col-xs-8">
<h3>
<a href="/somelink.html">
SomeTitle
</a>
</h3>
<p>
<a href="/secondlink.html">
SomeText
</a>
</p>
</figcaption>
<div class="clearfix"></div>
<div class="mobile-only icon-right">
<a href="/somelink.html">
<i class="fa fa-chevron-right" aria-hidden="true"></i>
</a>
</div>
I heard of jsoup but won't able to get links with jsoup.
Jsoup is the best library to parse any of HTML content or document,
Here is the link and example,
http://jsoup.org/
Example
private void parsehtmlPage(){
File input = new File("/yourFolder/home.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Element elementId = doc.getElementById("elementId");
Elements ankerLinks = elementId.getElementsByTag("a");
for (Element link : ankerLinks) {
String linkHref = link.attr("href");
String linkText = link.text();
}
}
I am trying to get only one div (by class) to my webview. I don't know anything about PHP or CSS so i can't realize what should I do when i parse them by class name. I want to take
<div class="container_wrap container_wrap_first main_color fullsize">
part here but its so complicated so i really don't know what to write on doc.select(div. "HERE"). Thanks in advice.
Divs I Must Parse:
<div id="wrap_all">
<div class="mobil-logo">
<div id="main" data-scroll-offset="88">
<!--- header icerik sonu--->
<div class="container_wrap container_wrap_first main_color fullsize">
<div class="container">
And this is what I tried in Main.java:
// webview settings here
loadJsoup();
public void loadJsoup(){
try {
doc = Jsoup.connect("http://isvecehliyet.se/mobil").timeout(10000).get();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Element ele = doc.select("div.entry-content-wrapper").first();
String html = ele.toString();
String mime = "text/html";
String encoding = "utf-8";
mWebview.loadData(html, mime, encoding);
}
This works for me:
String url = "http://isvecehliyet.se/mobil/";
Document doc = Jsoup.connect(url).get();
Elements e = doc.select("div.container").first().parents();
System.out.println(e);
Part of output:
<div class="container_wrap container_wrap_first main_color fullsize">
<div class="container">
<main class="template-page content av-content-full alpha units" role="main" itemprop="mainContentOfPage">
<article class="post-entry post-entry [...]
I have this HTML:
<ul class="programList">
<li class="showing">
<div class="filterTime"></div>
<div class="filterGenre"></div>
<div class="outerSmallPoster">
<span class="posterBanner"></span>
<a href="/filmdatabase/06-juni/22-jump-street/">
<img src="/fileshare/filarkivroot/AuroraKino/Filmer/06%20-%20juni/22%20Jump%20Street/kynoefo11.jpg?width=160" class="smallPoster" /></a>
</div>
<div class="movieDescr">
<a class="movieTitle" href="/filmdatabase/06-juni/22-jump-street/">22 Jump Street</a>
<p class="movieDescription">Channing Tatum og Jonah Hill er tilbake i rollene som radarparet Jenko og Schmidt i oppfølgeren til...</p>
</div>
<div class="outerMovieDescription">
<div class="outerProgramTicketSale">
<button type="button" data-frames="underholdning" data-hour="21" href="http://91.207.226.164/ticketweb.php?sign=2&UserCenterID=100007&PaymentType=000&ShowID=484797&PaymentTypeSelection=&ErrorCode=0" target="_blank" onclick="openTicket(100007,484797);return false;" data-usercenterid="100007" data-showid="484797" class="programTime">
21:15
</button>
<span class="theater">Sal 6</span>
</div>
</div>
</li>
</ul>
This code is at a URL which I am trying to parse from an android app. To do so, I have written the following code:
protected Hashtable<String, Elements> doInBackground(Void... params) {
// Get all the movies from aurorakino and return a list of them
// to the postexecute method.
MainActivity.out("Bakgrunn");
Hashtable<String, Elements> map = new Hashtable<String, Elements>();
Document doc;
try {
doc = (Document) Jsoup.connect("http://fokus.aurorakino.no/billetter-og-program/").get();
Elements title = doc.select("a.movieTitle");
Elements desc = doc.select("p.movieDescription");
Elements image = doc.select("img.smallPoster");
MainActivity.out(title.size());
MainActivity.out("Vi er inne i try");
map.put("title", title);
map.put("desc", desc);
map.put("image", image);
return map;
}
catch (IOException e) {
// TODO Auto-generated catch block
MainActivity.out("Noe gikk galt");
}
return null;
}
This code is in an AsyncTask, and the task is running. I print some debug info from each stage in the asynctask so that I can see if its actually running. And it is.
My problem is that even though the html page has a few links like this:
movie title>
the code dont manage to find any of them. I print the element size and it says zero.
When i remove the class specification it finds 73 titles.
What am I doing wrong?
Many thanks