Hi i have searched the internet and stackoverflow all day without any solution! :(
My problem is:
I have to show some schedules in a app, and the data is stored on a webpage, normally on PC you visit the page, enter your id, and it shows the schedule...
but how do i get to the schedule without interacting with a webview or stuff like that? i have to save some specific html data after login...
i have tired with jsoup, but after login, then the url changes, and i dont know how to get it, therefore i tried with webview, but this didnt work either
please help :)
public class getHTML extends AsyncTask{
String words;
#Override
protected Void doInBackground(Void... params) {
try {
final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36";
final String FORM_URL = "http://timetable.scitech.au.dk/apps/skema/VaelgElevskema.asp?webnavn=skema";
final String SCHEDULE_URL = "http://timetable.scitech.au.dk/apps/skema/ElevSkema.asp";
final String USERID = "201506426";
// # Go to search page
Connection.Response loginFormResponse = Jsoup.connect(FORM_URL)
.method(Connection.Method.GET)
.userAgent(USER_AGENT)
.execute();
// # Fill the search form
FormElement loginForm = (FormElement)loginFormResponse.parse()
.select("form").first();
// ## ... then "type" the id ...
Element loginField = loginForm.select("input[name=aarskort]").first();
loginField.val(USERID);
// # Now send the form
Connection.Response loginActionResponse = loginForm.submit()
.cookies(loginFormResponse.cookies())
.userAgent(USER_AGENT)
.execute();
// # go to the schedule
Connection.Response someResponse = Jsoup.connect(SCHEDULE_URL)
.method(Connection.Method.GET)
.userAgent(USER_AGENT)
.execute();
// # print out the body
Element el = someResponse.parse()
.select("body").first();
words = el.text();
System.out.println(loginActionResponse.parse().html());
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(Void aVoid) {
super.onPostExecute(aVoid);
tv.setText(words);
Toast.makeText(MainActivity.this, "DET VIRKER!", Toast.LENGTH_SHORT).show();
}
}
Related
I've been using Jsoup in order to fetch certain words from google search but it fails to my understanding in the Jsoup query process.
It's getting successfully into the doInBackground method but it won't print the title and body of each link on the search.
My guess is that the list I'm getting from doc.select (links) is empty.
which brings it to query syntax problem
value - it's the keyword search, in my case, it's a barcode that actually works. Here's the link
Here it's the async call from another class:
String url = "https://www.google.com/search?q=";
if (!value.isEmpty())
{
url = url + value + " price" + "&num10";
Scrape_Asynctasks task = new Scrape_Asynctasks();
task.execute(url);
}
and here is the async task itself:
public class Scrape_Asynctasks extends AsyncTask<String, Integer, String>
{
#Override
protected void onPreExecute() {
super.onPreExecute();
}
#Override
protected String doInBackground(String... strings) {
try
{
Log.i("IN", "ASYNC");
final Document doc = Jsoup
.connect(strings[0])
.userAgent("Jsoup client")
.timeout(5000).get();
Elements links = doc.select("li[class=g]");
for (Element link : links)
{
Elements titles = link.select("h3[class=r]");
String title = titles.text();
Elements bodies = link.select("span[class=st]");
String body = bodies.text();
Log.i("Title: ", title + "\n");
Log.i("Body: ", body);
}
}
catch (IOException e)
{
Log.i("ERROR", "ASYNC");
}
return "finished";
}
#Override
protected void onProgressUpdate(Integer... values) {
super.onProgressUpdate(values);
}
#Override
protected void onPostExecute(String s) {
super.onPostExecute(s);
}
}
Don't use "Jsoup client" as your user agent string. Use the same string as your browser, eg. "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0". Some sites (including google) don't like it.
Your first selector should be .g: Elements links = doc.select(".g");
The sites uses javascript, so you will not get all the results as you get in your browser.
You can disable JS in your browser and see the difference.
I have an image in JPG in a sit (I suppose it is HTML format but I am not sure about it). I open the source of the page and I see there the image I need written this way.
If I take the link it show me the image.
But i don't know how can I get from the URL page to get this link. It is not look like written in JSON format.
How can I get it?
Thanks
Bar.
After some play I get to this:
The meta is the elements, and og.image and content are one of there meta data attribute.
So I do as follow to get the image URL string
String imageLink=null;
try {
Log.d(TAG, "Connecting to [" + strings[0] + "]");
Document doc = Jsoup.connect(strings[0]).get(); // put all the HTML page in Document
// Get meta info
Elements metaElems = doc.select("meta");
for (Element metaElem : metaElems) {
String property = metaElem.attr("property");
if(property.equals("og:image"))// if find the line with the image
{
imageLink = metaElem.attr("content");
Log.d(TAG, "Image URL" + imageLink );
}
}
} catch (Exception e) {
e.printStackTrace();
exception =e;
return null;
}
Here I am posting the small code snippet for ingrate this kind of functionality may this help you.
Step 1: Add below gradle
compile 'org.jsoup:jsoup:1.10.2'
Step 2:
Use below async task for get all meta information from any Url.
public class MainActivity extends AppCompatActivity {
private ImageView imgOgImage;
private TextView text;
String URL = "https://www.youtube.com/watch?v=ufaK_Hd6BpI";
String UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36";
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
text = (TextView) findViewById(R.id.text);
imgOgImage = (ImageView) findViewById(R.id.imgOgImage);
new FetchMetadataFromURL().execute();
}
private class FetchMetadataFromURL extends AsyncTask<Void, Void, Void> {
String websiteTitle, websiteDescription, imgurl;
#Override
protected void onPreExecute() {
super.onPreExecute();
}
#Override
protected Void doInBackground(Void... params) {
try {
// Connect to website
Document document = Jsoup.connect(URL).get();
// Get the html document title
websiteTitle = document.title();
//Here It's just print whole property of URL
Elements metaElems = document.select("meta");
for (Element metaElem : metaElems) {
String property = metaElem.attr("property");
Log.e("Property", "Property =" + property + " \n Value =" + metaElem.attr("content"));
}
// Locate the content attribute
websiteDescription = metaElems.attr("content");
String ogImage = null;
Elements metaOgImage = document.select("meta[property=og:image]");
if (metaOgImage != null) {
imgurl = metaOgImage.first().attr("content");
System.out.println("src :<<<------>>> " + ogImage);
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(Void result) {
text.setText("Title : " + websiteTitle + "\n\nImage Url :: " + imgurl);
//t2.setText(websiteDescription);
Picasso.with(getApplicationContext()).load(imgurl).into(imgOgImage);
}
}
}
Note : Here I have just roughly making this demo.no any coding standard will user so please take care this while you ingrate this code in your application.I am just making this demo for learning purpose only.
Here I am just used youtube url for display meta data.you can used any url based on your requirement.
I hope you are clear with my logic.
Good Luck
Well, I have been working in a app to display news headings and contents from the site http://www.myagdikali.com
I am able to extract the data from 'myagdikali.com/category/news/national-news/' but there are only 10 posts in this page and there are links to other pages as 1,2,3... like myagdikali.com/category/news/national-news/page/2.
All I need to know is, how do I extract news from every possible pages under /national_news ? Is it even possible using Jsoup ?
Till now my code to extract data from a single page is:
public View onCreateView(LayoutInflater inflater, ViewGroup container,
Bundle savedInstanceState) {
View rootView = inflater.inflate(R.layout.fragment_all, container, false);
int i = getArguments().getInt(NEWS);
String topics = getResources().getStringArray(R.array.topics)[i];
switch (i) {
case 0:
url = "http://myagdikali.com/category/news/national-news";
new NewsExtractor().execute();
break;
.....
[EDIT]
private class NewsExtractor extends AsyncTask<Void, Void, Void> {
String title;
#Override
protected Void doInBackground(Void... params) {
while (status == OK) {
currentURL = url + String.valueOf(page);
try {
response = Jsoup.connect(currentURL).execute();
status = response.statusCode();
if (status == OK) {
Document doc = response.parse();
Elements urlLists = doc.select("a[rel=bookmark]");
for (org.jsoup.nodes.Element urlList : urlLists) {
String src = urlList.text();
myLinks.add(src);
}
title = doc.title();
}
} catch (IOException e) {
e.printStackTrace();
}
page++;
}
return null;
}
EDIT:
While trying to extract data from single page without loop, I can extract the data. But after using while loop, I get the error stating No adapter attached.
Actually I am loading the extracted data in the RecyclerView and onPostExecute is like this:
#Override
protected void onPostExecute(Void aVoid) {
layoutManager = new LinearLayoutManager(getActivity());
recyclerView.setLayoutManager(layoutManager);
myRecyclerViewAdapter = new MyRecyclerViewAdapter(getActivity(),myLinks);
recyclerView.setAdapter(myRecyclerViewAdapter);
}
Since you know the URL of the pages you need - http://myagdikali.com/category/news/national-news/page/X (where X is the page number between 2 and 446), you can loop through the URLs. You'll also need to use the Jsoup's response, to make sure that the page exists (the number 446 can be changed - I believe that it increases).
The code should be something like this:
final String URL = "http://myagdikali.com/category/news/national-news/page/";
final int OK = 200;
String currentURL;
int page = 2;
int status = OK;
Connection.Response response = null;
Document doc = null;
while (status == OK) {
currentURL = URL + String.valueOf(page); //add the page number to the url
response = Jsoup.connect(currentURL)
.userAgent("Mozilla/5.0")
.execute(); //you may add here userAgent/timeout etc.
status = response.statusCode();
if (status == OK) {
doc = response.parse();
//extract the info. you need
}
page++;
}
This is of course not fully working code - you'll have to add try-catch sentences, but the compiler will help you.
Hope this helps you.
EDIT:
1. I've editted the code - I've had to send a userAgent string in order to get response from the server.
2. The code runs on my machine, it prints lots of ????, because I don't have the proper fonts installed.
3. The error you're getting is from the Android part - something to do with your views. You haven't posted that piece of code...
4. Try to add the userAgent, it might solve it.
5. Please add the error and the code you're running to the original question by editting it, it's much more readable.
I want to display in a TextView the Snow in the past 24 hours of a ski resort. I used the CSS path and tried other ways but nothing happens the TextView doesn't display nothing.
The web page: http://www.arizonasnowbowl.com/resort/snow_report.php
The CSS path: #container > div.right > table.interior > tbody > tr:nth-child(2) > td.infoalt
private class Description extends AsyncTask<Void, Void, Void> {
String desc;
#Override
protected void onPreExecute() {
super.onPreExecute();
mProgressDialog = new ProgressDialog(Snowreport.this);
mProgressDialog.setTitle("Snow Report");
mProgressDialog.setMessage("Loading...");
mProgressDialog.setIndeterminate(false);
mProgressDialog.show();
}
#Override
protected Void doInBackground(Void... params) {
try {
// Connect to the web site
Document document = Jsoup.connect(url).get();
Elements elms = document.select("td.infoalt");
for(Element e:elms)
if(e.className().trim().equals("infoalt"))
//^^^<--trim is required as,
// their can be leading and trailing space
{
TextView txtdesc = (TextView) findViewById(R.id.snowp24);
txtdesc.setText((CharSequence) e);
}
mProgressDialog.dismiss();
} catch (IOException e1) {
e1.printStackTrace();
}
return null;
}
The code:
Element div = doc.getElementById("contentinterior");
Elements tables = div.getElementsByTag("table");
Element table = tables.get(1);
String mSnow = table.getElementsByTag("tr").get(1).getElementsByTag("td").get(1).text();
You may have the incorrect String for the selection parameter. The correct selection to use as a parameter for Document.select() can be found by 'Inspecting the element' of a webpage most easily done by right clicking in the Chrome browser.
The following code may produce a better result for you:
final Elements tableElements = response.parse()
.getElementsByClass("info")
.select("td");
for (Element element : tableElements) {
String string = element.getElementsByClass("infoalt").text().trim()
Log.d("Jsoup", string);
}
Good luck and happy coding!
I'm trying parse html with Jsoup lib. Everything works perfect, but something that does't display.
Code:
protected ArrayList<Order> doInBackground(String... urls) {
listItems.clear();
myAdapterDouble.notifyDataSetChanged();
String url = null;
try {
Document doc = Jsoup.connect(URL).timeout(0).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6").get();
Elements days = doc.select("div.day_now");
for (Element day : days) {
dd = day.select("div.tooltip");
for (Element d : dd) {
title = d.select("td.tooltip_title h4").text();
time = d.select("td.tooltip_info h4").text();
img = d.select("td.tooltip_desc img[src]");
Order o = new Order();
o.setLink(URL + img.attr("src"));
o.setTextName(title);
o.setTextTime(time
.replace("on", getResources().getString(R.string.on))
.replace("at", getResources().getString(R.string.at))
.replace("Ep:", getResources().getString(R.string.episode))
.replace("Final", getResources().getString(R.string.final_ep)));
o.setDetailsUrl(URL + url); //set urls text in list
listItems.add(o);
}
Elements links = day.select("h3");
for (Element link : links) {
url = link.select("a").attr("href"); // parse page urls
System.out.println(url); //display urls in LogCat
}
}
} catch (IOException e) {
e.printStackTrace();
}
return listItems;
}
In LogCat i see urls, that i parse in code above
01-20 12:13:17.671: I/System.out(23390): /show/678/AKB0048_next_stage
01-20 12:13:17.671: I/System.out(23390): /show/668/Battle_Spirits%3A_Sword_Eyes
01-20 12:13:17.671: I/System.out(23390): /show/694/Beast_Saga
01-20 12:13:17.671: I/System.out(23390): /show/660/Cross_Fight_B-Daman_eS
But these links are not displayed on the screen instead i get null.
What am I doing wrong?
Thanks.
Currently you are not adding url to listItems . change your code as to get url :
ArrayList<Order> newarraylist=new ArrayList<Order>;
Elements links = day.select("h3");
int urlcount=0;
for (Element link : links) {
url = link.select("a").attr("href"); // parse page urls
System.out.println(url); //display urls in LogCat
if(urlcount < listItems.size()){
Order o = (Order)listItems.get(urlcount);
o.setDetailsUrl(URL + url); //set urls text in list
newarraylist.add(o);
}
urlcount++;
}
now return newarraylist from doInBackground instead of listItems