OK, What I want to achieve is to write each result JSoup fetches me in a separate String. Is this somehow possible? I can get the first and last with a function but, yea, then the rest is lost.
right now i have this in my doInBackground:
// Connect to the web site
Document document = Jsoup.connect(url).get();
// Using Elements to get the Meta data
Elements titleElement = document.select("h2[property=schema:name]");
// Locate the content attribute
date1 = titleElement.toString();
Log.e("Date", String.valueOf(Html.fromHtml(date1)));
With this i get a list of results which is nice, but i'd like to have every result in a separate String.
Thanks in advance, if you need anything more please ask :)
I read through the documentation carefully again and found this:
element.eq(n).text
The "n" defines which position to get, the .text strips all the html and makes it a readable text
Related
I want to fetch some data from my website using Jsoup in my webview. The website is still in development so I can't post any code but here's what I want to achieve:
So the user visits the website where all data I require in the app is loaded onto one page. So I want to fetch all that data as separate strings and use them to fill my table layout. The website has all that I want with each string in a p tag with a unique id.
How can I achieve this? I already have jsoup installed but I can't get my head around how to use it.
Generally if you want to extract element that has id so use select(element_name#id_name)
To extract text that element involves it use .text()
Again show us the html part you want to extract the text from
So try this code
try {
Document doc = Jsoup.connect("your url").get();
System.out.println(doc.select("p#id_name").text());
} catch (Exception e) {
e.printStackTrace();
}
I am making an android app that displays stored HTML data using webview. Now, the problem I am trying to over come is how to ignore HTML/CSS etc tag/elements when searching for some user-input string. My DB is already 110MB and I think using another field with only text and no HTML will just add more size to DB. Regex will be expensive too and may not be reliable.
Is there any other way to do it?
Maybe you can do an additional filtering in your program on the queried records. You can use HTML parsers like Jsoup to strip HTML tags, then you can search in the remaining text. Simple Java example with Jsoup:
List<String> records = ... // your queried records - potential results
List<String> results = new ArrayList<String>();
for(String r : records) {
Document d = Jsoup.parse(r); // parse HTML
String text = d.text(); // extract text
if (text.contains(searchTerm)) { // or do your search here
results.add(r);
}
}
return results; // you got real results here
It may not be the best solution but is an option. I think it's expensive too, but more reliable than regular expressions (which you try to avoid).
Update: the regex way
I think the only way to strip HTML tags while fetching is to use regex in SQLite. For example, the following pattern should work to match string outside HTML tags:
(^|>)[^<]*(searchterm)[^<]*(<|$)
In the following example text it will match only the 1st, 3rd and 4th searchterm and not the 2nd:
searchterm <tag searchterm> searchterm </tag> searchterm
You can see it in action here.
In SQLite you can use regular expressions this way:
WHERE column-name REGEXP 'regular-expression'
I write app for Android such gets data from server in JSON format. Now I get this value in string, but in my application it must look like:
Route:
1)first point
2)secon point
3).....
n) n point
I read that in Android in textView I can do it if string will be with html tags but I think it is not the best variant. After Android I must do it in iPhone now I don't know how to do that there. Send Routes as Array is not good variant too. Can you say what is the best way to decide this problem?
Have a look here you will have to find the good pattern .
Hence you have separated strings just use a list View with an ArrayAdapter.
I am not so good with regex but i think it should like : [1-9][0-9]) [[a-f][0-9]]+
I couldn't comment b/c of rep, sorry. Could you provide an example of returned JSON string. I think JSON format can be parsed with ease.
If this the case you can parse it in a loop (or another way. I'm not that good at it)
String[] parseIt (String JSON){
String[] list=JSON.split("\\d\\)");
String[] rlist=new String[list.length-1];
for(int i=0;i<list.length-1;i++){
rlist[i]=list[i+1].trim();
}
return rlist;
}
This might do trick. But you should edit result. I didn't test yet
Edit: I edited code. It simply return the address now with leading whitespace. You can get rid off them using. String trim() method like;
list[1].trim();
Do it in loop and don't care about first element (index 0).
Edit 2: Now it should work
I'd like to preface my question with an apology - which will make this into a 2 part question....double apology.
I am struggling with JSoup (again) 1st apology for repeatedly asking and not learning well enough yet - so can anyone suggest some reading beyond the usual searches for something that will help me understand how to decipher the DOM each time I try this?
If you are still inclined to help, this time, within the doc returned I have:
<span id="priceProductQA1" class="productPrice">$29.99</span>
and I want to grab the href and price "29.99".
I've tried
doc = Jsoup.connect(srchStr).get();
for (Element choices : doc.select("a:has(.productPrice)")){
absHref = choices.attr("abs:href");
String pricetxt = choices.text();
and about 10 other ways to no avail. Any better ideas for me?
Here's another solution:
for( Element element : doc.select("span.productPrice") ) // Select all 'span' tags with a 'productPrice' class and iterate over them
{
final String price = element.text(); // save the price (= text of element)
final String link = element.parent().absUrl("href"); // Get the 'a' tag (= parent of 'span' tag) and its absolute (!) URL
// ...
}
Explanation:
Select the span tag because you can easy decide if its the one you need (has a class, while a has none)
Iterate over each element from 1.
Get the price of the element
Select the parent of the span tag since it contains the required url
Select the absolute URL; if you want the relative one, use attr("href") instead
Btw. if you are 100% shure there's only one of such elements on this website you can replace the for-Loop by Element element = doc.select("span.productPrice").first();, followed by the other two lines of code.
I have a question:
I have a link: http://wap.nastabuss.se/its4wap/QueryForm.aspx?hpl=Teleborg+C+(V%C3%A4xj%C3%B6)
and I wanna take only some specific data from this link and to show in textview in Android.
Is this possible in Android, I mean is there any chance by parsing or I don't know, you can suggest me guys.
For example I just want to take this column Nästa tur (min) from that site.
Regards
JSoup is pretty nice and getting popular. Here's how you could just parse the whole table:
URL url = new URL("http://www.nseindia.com/content/equities/niftysparks.htm");
Document doc = Jsoup.parse(url, 3000);
Element table = doc.select("table[title=Avgångar:]").first();
Iterator<Element> it = table.select("td").iterator();
//we know the third td element is where we wanna start so we call .next twice
it.next();
it.next();
while(it.hasNext()){
// do what ever you want with the td element here
//iterate three times to get to the next td you want. checking after the first
// one to make sure
// we're not at the end of the table.
it.next();
if(!it.hasNext()){
break;
}
it.next();
it.next();
}
If the parsing seems simple enough and you want you could also use regular expressions to find the correct part of the html. Regular expressions will be useful to know at some point anyway. Using some XML/HTML parsing library is the more flexible way to do it (XMLReader for example).