I'm trying to parse some HTML in my Android app and I need to get the text:
Pan Artesano Elaborado por Panadería La Constancia. ¡Esta Buenísimo!
in
Is there any easy way to get only the text and remove all html tags?
The behavior that I need is exactly the one shown in this PHP code http://php.net/manual/es/function.strip-tags.php
Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");
String pConcatenated="";
for (Element x: p) {
pConcatenated+= x.text();
}
System.out.println(pConcatenated);//sometext another p tag
Well when you want just to show it, then webview would help you, just set that string to webview and you got it.
When you would to use it elsewhere then i am to stupid for that :D.
String data = "your html here";
WebView webview= (WebView)this.findViewById(R.id.webview);
webview.getSettings().setJavaScriptEnabled(true);
webview.loadDataWithBaseURL("", data, "text/html", "UTF-8", "");
also you can pass just web URL webview.loadDataWithBaseURL("url","","text/html", "UTF-8", "");
Firstly get HTML code with
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
then I recommend to create custom tag in HTML such as <toAndroid></toAndroid> and then you can get text with
String result = html.substring(html.indexOf("<toAndroid>", html.indexOf("</toAndroid>")));
your html for example
<toAndroid>Hello world!</toAndroid>
will result
Hello world!
Note that you can place <p> into <toAndroid> tags and then remove it in Java from result.
Related
I was wondering if is any way to get HTML code from any url and save that code as String in code?
I have a method:
private String getHtmlData(Context context, String data){
String head = "<head><style>#font-face {font-family: 'verdana';src: url('file://"+ context.getFilesDir().getAbsolutePath()+ "/verdana.ttf');}body {font-family: 'verdana';}</style></head>";
String htmlData= "<html>"+head+"<body>"+data+"</body></html>" ;
return htmlData;
}
and I want to get this "data" from url. How I can do that?
Try this (wrote it from the hand)
URL google = new URL("http://www.google.com/");
BufferedReader in = new BufferedReader(new InputStreamReader(google.openStream()));
String input;
StringBuffer stringBuffer = new StringBuffer();
while ((input = in.readLine()) != null)
{
stringBuffer.append(input);
}
in.close();
String htmlData = stringBuffer.toString();
Sure you can. That's actually the response body. You can get it like this:
HttpResponse response = client.execute(post);
String htmlPage = EntityUtils.toString(response.getEntity(), "ISO-8859-1");
take a look at this please, any other parser will work too, or you can even make your own checking the strings and retrieving just the part you want.
I have webpage with this simple text, which is changeable.
<html><head><style type="text/css"></style></head><body>69766</body></html>
I need parse only number 69766 and save it to variable as String or int. It's possible to parse this number without adding libraries?
Thanks for your questions !
You can do like this
URL url = new URL("http://url for your webpage");
URLConnection yc = url.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
String inputLine;
StringBuilder builder = new StringBuilder();
while ((inputLine = in.readLine()) != null)
builder.append(inputLine.trim());
in.close();
String htmlPage = builder.toString();
String yourNumber = htmlPage.replaceAll("\\<.*?>","");
For your basic need you should take a lot at Html class.
this link shows how to parse the xml with the SAX parser. Its pretty straight forward.
http://www.codeproject.com/Articles/334859/Parsing-XML-in-Android-with-SAX
I am trying to create a simple Android app that will have the possibility to fetch the source code of a website. Anyways, I have written the following:
WebView webView = (WebView) findViewById(R.id.webView);
try {
webView.setWebViewClient(new WebViewClient());
InputStream input = (InputStream) new URL(url.toString()).getContent();
webView.loadDataWithBaseURL("", "<html><body><p>"+input.toString()+"</p></body></html>", "text/html", Encoding.UTF_8.toString(),"");
setContentView(webView);
} catch (Exception e) {
Alert alert = new Alert(getApplicationContext(),
"Error fetching data", e.getMessage());
}
I've tried to change the 3rd line several times to other methods that will fetch the source code, but they all redirect me to the alert (error with no message, only the title).
What am I doing wrong?
Is there a particular reason why you can't just use this to load the webpage?
webView.loadUrl("www.example.com");
If you really want to grab the source code into a string so you can manipulate it and display it as you are trying to do, try opening a stream to the content and then using standard java methods to read in the data to a String, to which you can then do whatever you want:
InputStream is = new URL("www.example.com").openStream();
InputStreamReader is = new InputStreamReader(in);
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(is);
String read = br.readLine();
while(read != null) {
sb.append(read);
read = br.readLine();
}
String sourceCodeString = sb.toString();
webView.loadDataWithBaseURL("www.example.com/", "<html><body><p>"+sourceCodeString+"</p></body></html>", "text/html", Encoding.UTF_8.toString(),"about:blank");
I have get html data from webpage. But i want to get only data excluding html tags.
I have tried this:
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(urlText.getText().toString());
// Get the response
BufferedReader rd = new BufferedReader(new InutStreamReader(response.getEntity().getContent()));
StringBuilder sb = new StringBuilder();
String line = "";
while ((line = rd.readLine()) != null)
{
textView.append(line);
sb.append(line+"\n");
}
This giving me whole html data. Tell me now i can get data only.
Have you tried using Html.fromHtml(source)? or use any Java HTML parser (If they work on android) for this.
Here source is your html formatted whole data.
EDIT:
while ((line = rd.readLine()) != null)
{
sb.append(line+"\n");
}
String source = sb.toString();
textView.setText(Html.fromHtml(source));
Look at this example Android Parsing HTML Content Containing Links.
I need to download an HTML page programmatically and then get its HTML. I am mainly concerned with the downloading of the page. If I download the page, where will I put it?
Will I have to keep in an String variable? If yes then how?
This site provides a good explanation on how to download a file, and also how to set the location to where it should be stored. You do not have to, and should not, keep it in a string variable. If you are to manipulate the data I would suggest you use an XML parser.
You can call this method in doInBackground of AsyncTask
String html = "";
String url = "ENTER URL TO DOWNLOAD";
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();