I am getting the html text from the website. this site return the character which like is shown in the below figure. I tried to find the character set from site, it found <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
It show the output on device after set in the text view like:
I tried some coding but doesn't effect the text,which is shown below:
final Charset windowsCharset = Charset.forName("windows-1252");
final Charset utfCharset = Charset.forName("UTF-8");
final CharBuffer windowsEncoded = windowsCharset.decode(ByteBuffer
.wrap(ne.scrape_detail_article_text.getBytes()));
final byte[] utfEncoded = utfCharset.encode(windowsEncoded).array();
// System.out.println(new String(utfEncoded, utfCharset.displayName()));
String s = "" ;
try {
// String s = new String(utfEncoded, utfCharset.displayName());
//String s = new String(texttoencoding.getBytes("windows-1252"),"UTF-8");
s = URLEncoder.encode(texttoencoding, "windows-1252");
Log.e("LOG", "Encoded >> " + s);
} catch (UnsupportedEncodingException e) {
Log.e("utf8", "conversion", e);
}
TextviewToset.setText(Html.fromHtml(texttoencoding);
TextviewToset.setMovementMethod(LinkMovementMethod.getInstance());
Please Help me, how can I encode this text into UTF-8 And display in the textview?
Thanks in Advance
Looks like you are dealing with HTML-Entites here. Therefore you have to decode the HTML Entities via:
String text = HTML.fromHtml(yourText).toString();
This will give you the correct UTF-8 characters. The documentation for Html.fromHtml() is here
Related
I'm writing an app that uses PDFbox library to fill fields in a PDF file.
In one of those field, I'm setting the text to be written in Hebrew letters.
When I run the code on my Android device, I get the following log:
java.lang.IllegalArgumentException: This font type only supports 8-bit code points
at com.tom_roush.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:317)
at com.tom_roush.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:264)
at com.tom_roush.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:293)
at com.tom_roush.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:183)
at com.tom_roush.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:360)
at com.tom_roush.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:224)
at com.tom_roush.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:128)
at com.tom_roush.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:247)
at com.tom_roush.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:221)
at com.tom_roush.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:202)
at com.package.app.MainActivity.lambda$checkPdf$4$MainActivity(MainActivity.java:128)
at com.package.app.MainActivity$$Lambda$2.run(Unknown Source:18)
at java.lang.Thread.run(Thread.java:764)
I've tried to find some information about it all over Stack Overflow, but none of the answers I found is related to filling forms. It's all related to PDPageContentStream.
This is how I fill the form in my code:
try {
PDDocument document = PDDocument.load(getAssets().open("file.pdf"));
PDDocumentCatalog docCatalog = document.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
// Fill the text field
((PDTextField) acroForm.getField("name")).setValue("בדיקה");
File root = android.os.Environment.getExternalStorageDirectory();
String path = root.getAbsolutePath() + "/test.pdf";
document.save(path);
document.close();
} catch (IOException e) {
Log.e("e", e.getMessage());
}
Can you please help me solve this error and fill Hebrew letters in a form using PDFbox?
I used this answer to change the font of the field's text.
The only problem is that now the text was facing the wrong direction, so I changed the direction of the string:
try {
PDDocument document = PDDocument.load(getAssets().open("file.pdf"));
PDDocumentCatalog docCatalog = document.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDResources dr = acroForm.getDefaultResources();
PDFont liberationSans = PDType0Font.load(document, getAssets().open("com/tom_roush/pdfbox/resources/ttf/LiberationSans-Regular.ttf"));
COSName fontName = dr.add(liberationSans);
Iterator<PDField> it = acroForm.getFields().iterator();
while (it.hasNext()) {
PDField field = it.next();
if (field instanceof PDTextField) {
PDTextField textField = (PDTextField) field;
String da = textField.getDefaultAppearance();
// replace font name in default appearance string
Pattern pattern = Pattern.compile("\\/(\\w+)\\s.*");
Matcher matcher = pattern.matcher(da);
String oldFontName = matcher.group(1);
da = da.replaceFirst(oldFontName, fontName.getName());
textField.setDefaultAppearance(da);
}
}
// Fill the text field
((PDTextField) acroForm.getField("name")).setValue(new StringBuilder("בדיקה").reverse().toString());
File root = android.os.Environment.getExternalStorageDirectory();
String path = root.getAbsolutePath() + "/test.pdf";
document.save(path);
document.close();
} catch (IOException e) {
Log.e("e", e.getMessage());
}
I am hitting server and getting some data in string format.
in this data there are some special character like - ' . but when i set that string in textview these special character convert into ? .
So how can i avoid this issue ? please help.
first try :
String t = "<![CDATA["+title+"]]>";
mTitle.setText(Html.fromHtml(text));
second try :
String base64 = Base64.encodeToString(getTitle().getBytes(), Base64.DEFAULT);
byte[] data = Base64.decode(base64, Base64.DEFAULT);
String text = new String(data, StandardCharsets.UTF_8);
mTitle.setText(text);
Try this:
tv.setText(news_item.getTitle().replaceAll("\u2019", "'"));
Refer this link for unicode encoding.
I want to get Text out of an URL into my Android String.
Website:
<html>
<body>
Text I don't want to get.
<div id="editorText" class="answer" itemprop="text">Text I want to get</div>
</body>
Text I don't want to get.
<html>
Android:
I want that the result is like that:
String text = "Text I want to get";
use jsoup library to parse html string for more details check this link
You can try to get all the content and parse (maybe 'substring') the content to get what you want based on pattern's. In this case it's somethins like:
String urlContent = ...//getContent from URL
String beginPattern = "itemprop='text'>";
String endPattern = "</div>";
int begin = urlContent.indexOf(beginPattern)+beginPattern.length();
int end = urlContent.indexOf(endPattern);
String contentNeeded = urlContent.substring(begin, end);
You can use jsoup library as mentioned by Maulik and write this code
try {
Document doc= Jsoup.connect("url of the page").get();
} catch (IOException e) {
Log.e("MyTag", e.getMessage());
}
Elements elements = doc.getElementsByTag("body");
for(Element ele: elements){
String text = ele.ownText();
// Now here you need to add some logic
}
String strArr[]={"सांखà¥à¤¯à¤¯à¥‹à¤—",
"'करà¥à¤®à¤¯à¥‹à¤—",
"जà¥à¤žà¤¾à¤¨à¤•à¤°à¥à¤®à¤¸à¤‚नà¥à¤¯à¤¾à¤¸à¤¯à¥‹à¤—",
"करà¥à¤®à¤¸à¤‚नà¥à¤¯à¤¾à¤¸à¤¯à¥‹à¤—",
"आतà¥à¤®à¤¸à¤‚यमयोग",
"जà¥à¤žà¤¾à¤¨à¤µà¤¿à¤œà¥à¤žà¤¾à¤¨à¤¯à¥‹à¤—"};
I have UTF-8 code like this when I have converted into string am getting like:
सा�?�?्यय�?�?'�?र्मय�?�?�?्�?ान�?र्मस�?न्यासय�?�?�?र्मस�?न्यासय�?�?�?त्मस�?यमय�?�?
�?्�?ानवि�?्�?ानय�?�?सा�?�?्यय�?�?'�?र्मय�?�?�?्�?ान�?र्मस�?न्यासय�?�?�?र्मस�?
न्यासय�?�?�?त्मस�?यमय�?�?�?्�?ानवि�?्�?ानय�?�?सा�?�?्यय�?�?'�?र्मय�?�?�?्�?ान�?
र्मस�?न्यासय�?�?�?र्मस�?न्यासय�?�?�?त्मस�?यमय�?�?�?्�?ानवि�?्�?ानय�?�?सा�?�?्यय�?
�?'�?र्मय�?�?�?्�?ान�?र्मस�?न्यासय�?�?�?र्मस�?न्यासय�?�?�?त्मस�?यमय�?�?�?्�?ानवि�?्
�?ानय�?�?सा�?�?्यय�?�?'�?र्मय�?�?�?्�?ान�?र्मस�?न्यासय�?�?�?र्मस�?न्यासय�?�?�?
त्मस�?यमय�?�?�?्�?ानवि�?्�?ानय�?�?सा�?�?्यय�?�?'�?र्मय�?�?�?्�?ान�?र्मस�?न्यासय�?
�?�?र्मस�?न्यासय�?�?�?त्मस�?यमय�?�?�?्�?ानवि�?्�?ानय�?�?सा�?�?्यय�?�?'�?र्मय�?�?
�?्�?ान�?र्मस�?न्यासय�?�?�?र्मस�?न्यासय�?�?�?त्मस�?यमय�?�?�?्�?ानवि�?्�?ानय�?�?
please anyone help to get proper string value!
[EDIT]
Code:
public static String convertFromUTF8(String s) {
String out = null;
try {
out = new String(s.getBytes("ISO-8859-1"), "UTF-8");
} catch (java.io.UnsupportedEncodingException e) {
return null;
} return out;
}
assuming you are using eclipse:
right click your project > select properties
select "Resource" on the left
change text file encoding to UTF-8
Try this site http://www.string-functions.com/encodedecode.aspx
Try to find source charset
and end charset of your string
then replace this line in yuor code
out = new String(s.getBytes("CODE OF source charset"), "CODE OF end charset");
i have a very huge HTML string in my app.
When I use it in the code, everything is fine but when I try to declare it in strings.xml, I am getting some errors. Is there a way to make a simple copy of the string in strings.xml? Thank you
HTML and XML are the same basic language, I do not believe that you can store HTML in a string, why not save the html page and package it with the application?
Save the page as a html page in res > raw and then call this method
String html = Utils.readRawTextFile(ctx, R.raw.rawhtml);
public static String readRawTextFile(Context ctx, int resId)
{
InputStream inputStream = ctx.getResources().openRawResource(resId);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i;
try {
i = inputStream.read();
while (i != -1)
{
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
} catch (IOException e) {
return null;
}
return byteArrayOutputStream.toString();
}
Error may come at special characters like # double quote single quote etc. to overcome it prefix \ to it and your error get resolved
if you assign same string programmatically there also you will find the same issue
String mString= "your huge string with # error";
in this also you have to overcome be prefixing backslash
String mString= "your huge string with \# error";