Get HTML code from url in android

Get HTML code from url in android - android

I was wondering if is any way to get HTML code from any url and save that code as String in code?
I have a method:
private String getHtmlData(Context context, String data){
String head = "<head><style>#font-face {font-family: 'verdana';src: url('file://"+ context.getFilesDir().getAbsolutePath()+ "/verdana.ttf');}body {font-family: 'verdana';}</style></head>";
String htmlData= "<html>"+head+"<body>"+data+"</body></html>" ;
return htmlData;
}
and I want to get this "data" from url. How I can do that?

Try this (wrote it from the hand)
URL google = new URL("http://www.google.com/");
BufferedReader in = new BufferedReader(new InputStreamReader(google.openStream()));
String input;
StringBuffer stringBuffer = new StringBuffer();
while ((input = in.readLine()) != null)
{
stringBuffer.append(input);
}
in.close();
String htmlData = stringBuffer.toString();

Sure you can. That's actually the response body. You can get it like this:
HttpResponse response = client.execute(post);
String htmlPage = EntityUtils.toString(response.getEntity(), "ISO-8859-1");

take a look at this please, any other parser will work too, or you can even make your own checking the strings and retrieving just the part you want.

Related

Parse HTML text in Android

I'm trying to parse some HTML in my Android app and I need to get the text:
Pan Artesano Elaborado por Panadería La Constancia. ¡Esta Buenísimo!
in
Is there any easy way to get only the text and remove all html tags?
The behavior that I need is exactly the one shown in this PHP code http://php.net/manual/es/function.strip-tags.php

Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");
String pConcatenated="";
for (Element x: p) {
pConcatenated+= x.text();
}
System.out.println(pConcatenated);//sometext another p tag

Well when you want just to show it, then webview would help you, just set that string to webview and you got it.
When you would to use it elsewhere then i am to stupid for that :D.
String data = "your html here";
WebView webview= (WebView)this.findViewById(R.id.webview);
webview.getSettings().setJavaScriptEnabled(true);
webview.loadDataWithBaseURL("", data, "text/html", "UTF-8", "");
also you can pass just web URL webview.loadDataWithBaseURL("url","","text/html", "UTF-8", "");

Firstly get HTML code with
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
then I recommend to create custom tag in HTML such as <toAndroid></toAndroid> and then you can get text with
String result = html.substring(html.indexOf("<toAndroid>", html.indexOf("</toAndroid>")));
your html for example
<toAndroid>Hello world!</toAndroid>
will result
Hello world!
Note that you can place <p> into <toAndroid> tags and then remove it in Java from result.

Converting String to Android JSONObject loses utf-8

I am trying to get a (JSON formatted) String from a URL and consume it as a Json object. I lose UTF-8 encoding when I convert the String to JSONObject.
This is The function I use to connect to the url and get the string:
private static String getUrlContents(String theUrl) {
StringBuilder content = new StringBuilder();
try {
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
String line;
while ((line = bufferedReader.readLine()) != null) {
content.append(line + "\n");
}
bufferedReader.close();
} catch(Exception e) {
e.printStackTrace();
}
return content.toString();
}
When I get data from server, the following code displays correct characters:
String output = getUrlContents(url);
Log.i("message1", output);
But when I convert the output string to JSONObject the Persian characters becomes question marks like this ??????. (messages is the name of array in JSON)
JSONObject reader = new JSONObject(output);
String messages = new String(reader.getString("messages").getBytes("ISO-8859-1"), "UTF-8");
Log.i("message2", messages);

You're telling Java to convert the string (with key message) to bytes using ISO-8859-1 and than to create a new String from these bytes, interpreted as UTF-8.
new String(reader.getString("messages").getBytes("ISO-8859-1"), "UTF-8");
You could simply use:
String messages = reader.getString("messages");

You can update your code as the following:
private static String getUrlContents(String theUrl) {
StringBuilder content = new StringBuilder();
try {
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), "utf-8"));
String line;
while ((line = bufferedReader.readLine()) != null) {
content.append(line).append("\n");
}
bufferedReader.close();
} catch(Exception e) {
e.printStackTrace();
}
return content.toString().trim();
}

You've got two encoding issues:
The server sends text encoded in a character set. When you setup your InputStreamReader, you need to pass the encoding the server used so it can be decoded properly. The character encoding is usually given in the Content-type HTTP response, in the charset field. JSON is typically UTF-8 encoded, but can also be legally UTF-16 and UTF-32, so you need to check. Without a specified encoding, your system environment will be used when marshalling bytes to Strings, and vice versa . Basically, you should always specify the charset.
String messages = new String(reader.getString("messages").getBytes("ISO-8859-1"), "UTF-8"); is obviously going to cause issues (if you have non-ascii characters) - it's encoding the string to ISO-8995-1 and then trying to decode it as UTF-8.
A simple regex pattern can be used to extract the charset value from the Content-type header before reading the inputstream. I've also included a neat InputStream -> String converter.
private static String getUrlContents(String theUrl) {
try {
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
InputStream is = urlConnection.getInputStream();
// Get charset field from Content-Type header
String contentType = urlConnection.getContentType();
// matches value in key / value pair
Pattern encodingPattern = Pattern.compile(".*charset\\s*=\\s*([\\w-]+).*");
Matcher encodingMatcher = encodingPattern.matcher(contentType);
// set charsetString to match value if charset is given, else default to UTF-8
String charsetString = encodingMatcher.matches() ? encodingMatcher.group(1) : "UTF-8";
// Quick way to read from InputStream.
// \A is a boundary match for beginning of the input
return new Scanner(is, charsetString).useDelimiter("\\A").next();
} catch(Exception e) {
e.printStackTrace();
}
return null;
}

Not sure if this will help, but you might be able to do something like this:
JSONObject result = null;
String str = null;
try
{
str = new String(output, "UTF-8");
result = (JSONObject) new JSONTokener(str).nextValue();
}
catch (Exception e) {}
String messages = result.getString("messages");

How to parse HTML full page in android

I am calling a HTML page via a web servise . I need to get hole source code of HTML page.
My problem is that, when I convert the http response to string I am getting only some part of HTML page. How do I can get hole HTML page .Please help me.
//paramString1 = url,paramString = header, paramList = paramiters
public String a(String paramString1, String paramString2, List paramList)
{
String str1 = null;
HttpPost localHttpPost = new HttpPost(paramString1);
localHttpPost.addHeader("Accept-Encoding", "gzip");
InputStream localInputStream = null;
try
{
localHttpPost.setEntity(new UrlEncodedFormEntity(paramList));
localHttpPost.setHeader("Referer", paramString2);
HttpResponse localHttpResponse = this.c.execute(localHttpPost);
int i = localHttpResponse.getStatusLine().getStatusCode();
localInputStream = localHttpResponse.getEntity().getContent();
Header localHeader = localHttpResponse.getFirstHeader("Content-Encoding");
if ((localHeader != null) && (localHeader.getValue().equalsIgnoreCase("gzip")))
{
GZIPInputStream localObject = null;
localObject = new GZIPInputStream(localInputStream);
Log.d("API", "GZIP Response decoded!");
BufferedReader localBufferedReader = new BufferedReader(new InputStreamReader((InputStream)localObject, "UTF-8"));
StringBuilder localStringBuilder = new StringBuilder();
while(true){
String str2 = localBufferedReader.readLine();
if (str2 == null)
break;
localHttpResponse.getEntity().consumeContent();
str1 = localStringBuilder.toString();
localStringBuilder.append(str2);
continue;
}
}
}
catch (IOException localIOException)
{
localHttpPost.abort();
}
catch (Exception localException)
{
localHttpPost.abort();
}
Object localObject = localInputStream;
return (String)str1;

Are you receiving the HTML in the variable paramString1?, in that case, are you encoding the String somehow or its just plane HTML?
Maybe the HTML special characters are breaking your response. Try encoding the String with urlSafe Base64 in your server side, and decoding it in the client side:
You can use the function Base64 of Apache Commons.
Server Side:
Base64 encoder = new Base64(true);
encoder.encode(yourBytes);
Client side:
Base64 decoder = new Base64(true);
byte[] decodedBytes = decoder.decode(paramString1);
HttpPost localHttpPost = new HttpPost(new String(decodedBytes));

You may not get the complete source code in your stringBuilder as it must be exceeding the max size of stringBuilder as StringBuilder is set of arrays. If u want to store that particular sourcecode. You may try this: The inputStream (which contains html source code) data, store directly into a File. Then you will have complete source code in that file and then perform file operation to whatever you require. See if this may help you.

Android - Parse text from website

I have webpage with this simple text, which is changeable.
<html><head><style type="text/css"></style></head><body>69766</body></html>
I need parse only number 69766 and save it to variable as String or int. It's possible to parse this number without adding libraries?
Thanks for your questions !

You can do like this
URL url = new URL("http://url for your webpage");
URLConnection yc = url.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
String inputLine;
StringBuilder builder = new StringBuilder();
while ((inputLine = in.readLine()) != null)
builder.append(inputLine.trim());
in.close();
String htmlPage = builder.toString();
String yourNumber = htmlPage.replaceAll("\\<.*?>","");

For your basic need you should take a lot at Html class.

this link shows how to parse the xml with the SAX parser. Its pretty straight forward.
http://www.codeproject.com/Articles/334859/Parsing-XML-in-Android-with-SAX

How to programmatically download an HTML page in Android and get its HTML?

I need to download an HTML page programmatically and then get its HTML. I am mainly concerned with the downloading of the page. If I download the page, where will I put it?
Will I have to keep in an String variable? If yes then how?

This site provides a good explanation on how to download a file, and also how to set the location to where it should be stored. You do not have to, and should not, keep it in a string variable. If you are to manipulate the data I would suggest you use an XML parser.

You can call this method in doInBackground of AsyncTask
String html = "";
String url = "ENTER URL TO DOWNLOAD";
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Get HTML code from url in android - android

Sure you can. That's actually the response body. You can get it like this: HttpResponse response = client.execute(post); String htmlPage = EntityUtils.toString(response.getEntity(), "ISO-8859-1");

take a look at this please, any other parser will work too, or you can even make your own checking the strings and retrieving just the part you want.

Related

Parse HTML text in Android

Converting String to Android JSONObject loses utf-8

How to parse HTML full page in android

Android - Parse text from website

How to programmatically download an HTML page in Android and get its HTML?

Categories

Resources