i would like to parse out some text from a page.
Is there an easy way to save the product info in to a string for example? Example url: http://upcdata.info/upc/7310870008741
Thanks
Jsoup is excellent at parsing simple HTML from Android applications:
http://jsoup.org/
To get the page, just do this:
URL url = new URL("http://upcdata.info/upc/7310870008741");
Document document = Jsoup.parse(url, 5000);
Then you can parse out whatever you need from the Document. Check out this link for a brief description of how to extract parts of the page:
http://jsoup.org/cookbook/extracting-data/dom-navigation
If you want to read from a URL into a String:
StringBuffer myString = new StringBuffer();
try {
String thisLine;
URL u = new URL("http://www.google.com");
DataInputStream theHTML = new DataInputStream(u.openStream());
while ((thisLine = theHTML.readLine()) != null) {
myString.append(thisLine);
}
} catch (MalformedURLException e) {
} catch (IOException e) {
}
// call toString() on myString to get the contents of the file your URL is
// pointing to.
This will give you a plain old string, HTML markup and all.
String tmpHtml = "<html>a whole bunch of html stuff</html>";
String htmlTextStr = Html.fromHtml(tmpHtml).toString();
Related
I have This URL and I want to fetch all the data present in here in an android list view, I only know how to retrieve data from a JSON object but here I don't even know the format of this data present in the URL.
The format of the URL is:
tvg-logo = url of the logo chanel
group-title = category where you need to display the channel (just for movie not for TV)
After the "," you have the name of the channel
And after the name you have the URL of video
How can I parse my data from the URL so that I can make a list view like that:
i think, you must split the String text by special characters. and keep them in an array. for example,the special character might be "[space character]" or "," or "#".
I hope to help you
This function will get the data from URL and you could split your data as per your requirement and populate UI.
void fetchDataFromUrl() {
try {
URL oracle = new URL("http://cinecosta.com/api_tv.php?pass=yojeju123");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
The result seems easy to parse actually.Just see the pattern.
#SOMETHING tvg-logo="logo" tvg-categorie="something"
Use regex for split the pattern you want.
Regex
if you are using retrofit as a network library so you can pass the "ResponseBody" in the api callback function. In onSuccess Method We will get the Body And Use the Following the Code.
Interface Class:
Call<ResponseBody> yourFuncationName();
ResponseBody data = (ResponseBody) model.body();
String json = getStringData(data.byteStream());
Function is
public String getStringData(InputStream inputStream) {
BufferedReader r = new BufferedReader(new InputStreamReader(inputStream));
StringBuilder total = new StringBuilder();
String line;
try {
while ((line = r.readLine()) != null) {
total.append(line).append('\n');
}
} catch (IOException e) {
e.printStackTrace();
}
return total.toString();
}
Maybe this will helpful for you.
Try with below code, Here I am extracted only url from the api response
String strData = "#EXTM3U #EXTINF:-1 tvg-logo=\"http://www.cinecosta.com/image-appletv/tv/tf1-tv.png\" tvg-categorie=\"TV\",TF1 http://217.182.164.103:25461/live/YnAmpNBQUX/YUCgme6CXS/314.ts #EXTINF:-1 tvg-logo=\"http://www.cinecosta.com/image-appletv/tv/france2.png\" tvg-categorie=\"TV\",France 2 http://217.182.164.103:25461/live/YnAmpNBQUX/YUCgme6CXS/315.ts #EXTINF:-1 tvg-logo=\"http://www.cinecosta.com/image-appletv/tv/france3.png\" tvg-categorie=\"TV\",France 3 http://217.182.164.103:25461/live/YnAmpNBQUX/YUCgme6CXS/316.ts #EXTINF:-1 tvg-logo=\"http://www.cinecosta.com/image-appletv/tv/france4.png\" tvg-categorie=\"TV\",France 4 http://217.182.164.103:25461/live/YnAmpNBQUX/YUCgme6CXS/317.ts #EXTINF:-1 tvg-logo=\"http://www.cinecosta.com/image-appletv/tv/france5.png\" tvg-categorie=\"TV\",France 5 http://217.182.164.103:25461/live/YnAmpNBQUX/YUCgme6CXS/318.ts";
private void convertDataToArray() {
String[] splitArray = strData.split("#EXTINF:-");
ArrayList<String> arrstrUrl = new ArrayList<String>();
ArrayList<String> arrstrMainUrl = new ArrayList<String>();
ArrayList<String> arrstrCategory = new ArrayList<String>();
ArrayList<String> arrstrName = new ArrayList<String>();
for (int i = 1; i < splitArray.length; i++) {
System.out.println("Final=>" + splitArray[i]);
arrstrUrl.add(splitArray[i].split("1 tvg-logo=")[1].split(" ")[0]);
arrstrMainUrl.add("http" + splitArray[i].split("1 tvg-logo=")[1].split("tvg-categorie=")[1].split("http")[1]);
arrstrName.add(splitArray[i].split("1 tvg-logo=")[1].split("tvg-categorie=")[1].split(",")[0]);
arrstrCategory.add(splitArray[i].split("1 tvg-logo=")[1].split("tvg-categorie=")[1].split(",")[1].split("http")[0]);
}
System.out.println("Final Image=>" + arrstrUrl.toString());
System.out.println("Final Main=>" + arrstrMainUrl.toString());
System.out.println("Final Name=>" + arrstrName.toString());
System.out.println("Final Category=>" + arrstrCategory.toString());
}
So this way, you can get parse your data and update your listview.
Note:- You need to write your own logic to parse this data, by checking data pattern.
The solution for this is :
Either you can scrap the data from python libraries like scrapy or beautiful soup then convert it to json and read from the android.
Parse the html using the jsoup lib (https://jsoup.org/) and model the data in the desire format that you want.
We need to take the photo from national geographic photo of the day for an android project. We are using jsoup to do it and it is working for other sites and photos we tried to get, but not with this one.
http://www.nationalgeographic.com/photography/photo-of-the-day
This is the link to the photo that we need to get. If you inspect the page, you will see that the element which contains the link, has multiple links of the photo in different sizes. So we suspect that this is the problem. Here is the element with all the links:
<source srcset="http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU6FOW3O0jR-t4LlattRw52wBmvg/ 240w, http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU6cKxp_v-TRYywK8kMonNsWFMiA/ 320w, http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU76IwFM89MgsU2CsVpABa94yrwg/ 500w, http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU7Lx-mjq8_Dk9iI7H4kcoPo-SmA/ 640w, http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU4kJMUl3WmTvlAFqfo4wIlDssvw/ 800w, http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU6-HA9n31rVvmbG5touqPt59wY3s/ 1024w, http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU6-dIS7lLTB0CSOM4O0wlvLx9pDnb/ 1600w, http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU6FcgiBNz-Nj7_J7e61F6_8oUXwoV/ 2048w" sizes="730px" data-reactid=".5.0.1.0.0.$http=2//www=1nationalgeographic=1com/photography/photo-of-the-day/2017/01/boy-buffalo-thailand.0.0.0.0.0.0.0.0">
As you can see, there are multiple links, so we also tried to split the code and get just one of them, but jsoup doesn't seem to get any of the code in the first place.
Here is the code:
Document doc = Jsoup.connect("http://www.nationalgeographic.com/photography/photo-of-the-day").get();
Elements img = doc.select("div.modules-images__placeholder source[srcset]");
imgSrc = img.attr("srcset"); //srcset
String[] splitStr = imgSrc.split("\\s+");
int n = splitStr.length;
imgSrc = splitStr[n-2];
//Download Image from URL
InputStream input = new java.net.URL(imgSrc).openStream();
//Decode Bitmap
bitmap = BitmapFactory.decodeStream(input);
myWallpaperManager.setBitmap(bitmap);
I've got the solution.
Also, if you want to get the different sizes, just change the 0 to another index and it will work:
String imgSrc = img.attr("srcset").split(",")[0].replaceAll(" \\d+w", "");
This is returning http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDq-TNIRQ3qELJppd8ZLNRvnhakVub3VQlC2V5_yAGtyNoIAtaUObf5sBn_PGVEIlVVcerfj6l1ovYy2W4h7lMAkEVLdiCZKr9S9wuwge1myLnbvmEvxjeQ-HOfdmgprhGjqn4pNtAwmKvwU6FOW3O0jR-t4LlattRw52wBmvg/ at the moment.
EDIT:
It works if you can get the srcset attribute correctly. For some reason, Jsoup isn't getting it.
My other answer didn't work because Jsoup wasn't getting the entire page. I've found the JSON URL of the article, so I've made a simple code to get the URL:
try {
JSONObject jobject = readJsonFromUrl(
"http://www.nationalgeographic.com/photography/photo-of-the-day/_jcr_content/.gallery.json");
JSONObject article = jobject.getJSONArray("items").getJSONObject(0);
String url = article.getString("url") + article.getString("originalUrl");
System.out.println(url);
} catch (Exception e) {
e.printStackTrace();
}
You'll need to add this methods to any class:
private static String readAll(Reader rd) throws IOException {
StringBuilder sb = new StringBuilder();
int cp;
while ((cp = rd.read()) != -1) {
sb.append((char) cp);
}
return sb.toString();
}
public static JSONObject readJsonFromUrl(String url) throws IOException, JSONException {
InputStream is = new URL(url).openStream();
try {
BufferedReader rd = new BufferedReader(new InputStreamReader(is, Charset.forName("UTF-8")));
String jsonText = readAll(rd);
JSONObject json = new JSONObject(jsonText);
return json;
} finally {
is.close();
}
}
Just apply the URL to your current code and it should work.
The methods readAll and readJsonFromUrl are from this answer.
EDIT:
To get another size, use:
String url = article.getString("url") + article.getJSONObject("sizes").getString("2048");
2048 can be replaced with either 240, 320, 500, 640, 800, 1024 and 1600.
Not sure if the sizes change each day, but if so, check the JSON to see which ones are available.
i want to load html url in webview from raw folder it working fine
url = "file:///android_res/raw/a1.html";
webView.loadUrl(url);
But i want pass a value in url like this
String s = "1";
url = "file:///android_res/raw/a"+s+".html";
but its not working please help how can i achieve this.
first, u can not insure url = "file:///android_res/raw/a"+s+".html"; is a useful file path. so, this method can not work as you planed.
you can use
webview.loadUrl("javascript:xxxx");
to pass a parameter to html .
or use url = "file:///android_res/raw/a.html?action=go";
This is done in the same way on android as in Java SE.
Put your complete URL inside URLEncoder
try {
String url = "http://www.example.com/?id=123&art=abc";
String encodedurl = URLEncoder.encode(url,"UTF-8");
Log.d("TEST", encodedurl);
}
catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
I am calling a HTML page via a web servise . I need to get hole source code of HTML page.
My problem is that, when I convert the http response to string I am getting only some part of HTML page. How do I can get hole HTML page .Please help me.
//paramString1 = url,paramString = header, paramList = paramiters
public String a(String paramString1, String paramString2, List paramList)
{
String str1 = null;
HttpPost localHttpPost = new HttpPost(paramString1);
localHttpPost.addHeader("Accept-Encoding", "gzip");
InputStream localInputStream = null;
try
{
localHttpPost.setEntity(new UrlEncodedFormEntity(paramList));
localHttpPost.setHeader("Referer", paramString2);
HttpResponse localHttpResponse = this.c.execute(localHttpPost);
int i = localHttpResponse.getStatusLine().getStatusCode();
localInputStream = localHttpResponse.getEntity().getContent();
Header localHeader = localHttpResponse.getFirstHeader("Content-Encoding");
if ((localHeader != null) && (localHeader.getValue().equalsIgnoreCase("gzip")))
{
GZIPInputStream localObject = null;
localObject = new GZIPInputStream(localInputStream);
Log.d("API", "GZIP Response decoded!");
BufferedReader localBufferedReader = new BufferedReader(new InputStreamReader((InputStream)localObject, "UTF-8"));
StringBuilder localStringBuilder = new StringBuilder();
while(true){
String str2 = localBufferedReader.readLine();
if (str2 == null)
break;
localHttpResponse.getEntity().consumeContent();
str1 = localStringBuilder.toString();
localStringBuilder.append(str2);
continue;
}
}
}
catch (IOException localIOException)
{
localHttpPost.abort();
}
catch (Exception localException)
{
localHttpPost.abort();
}
Object localObject = localInputStream;
return (String)str1;
Are you receiving the HTML in the variable paramString1?, in that case, are you encoding the String somehow or its just plane HTML?
Maybe the HTML special characters are breaking your response. Try encoding the String with urlSafe Base64 in your server side, and decoding it in the client side:
You can use the function Base64 of Apache Commons.
Server Side:
Base64 encoder = new Base64(true);
encoder.encode(yourBytes);
Client side:
Base64 decoder = new Base64(true);
byte[] decodedBytes = decoder.decode(paramString1);
HttpPost localHttpPost = new HttpPost(new String(decodedBytes));
You may not get the complete source code in your stringBuilder as it must be exceeding the max size of stringBuilder as StringBuilder is set of arrays. If u want to store that particular sourcecode. You may try this: The inputStream (which contains html source code) data, store directly into a File. Then you will have complete source code in that file and then perform file operation to whatever you require. See if this may help you.
I am trying to create a simple Android app that will have the possibility to fetch the source code of a website. Anyways, I have written the following:
WebView webView = (WebView) findViewById(R.id.webView);
try {
webView.setWebViewClient(new WebViewClient());
InputStream input = (InputStream) new URL(url.toString()).getContent();
webView.loadDataWithBaseURL("", "<html><body><p>"+input.toString()+"</p></body></html>", "text/html", Encoding.UTF_8.toString(),"");
setContentView(webView);
} catch (Exception e) {
Alert alert = new Alert(getApplicationContext(),
"Error fetching data", e.getMessage());
}
I've tried to change the 3rd line several times to other methods that will fetch the source code, but they all redirect me to the alert (error with no message, only the title).
What am I doing wrong?
Is there a particular reason why you can't just use this to load the webpage?
webView.loadUrl("www.example.com");
If you really want to grab the source code into a string so you can manipulate it and display it as you are trying to do, try opening a stream to the content and then using standard java methods to read in the data to a String, to which you can then do whatever you want:
InputStream is = new URL("www.example.com").openStream();
InputStreamReader is = new InputStreamReader(in);
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(is);
String read = br.readLine();
while(read != null) {
sb.append(read);
read = br.readLine();
}
String sourceCodeString = sb.toString();
webView.loadDataWithBaseURL("www.example.com/", "<html><body><p>"+sourceCodeString+"</p></body></html>", "text/html", Encoding.UTF_8.toString(),"about:blank");