My XML response is
<name> DPD futár </name>
My XML parsing code
SAXParserFactory _saxFactory = SAXParserFactory.newInstance();
SAXParser _saxParser = _saxFactory.newSAXParser();
XMLReader _xmlReader = _saxParser.getXMLReader();
_shippingMethodList = new ArrayList<ShippingMethodItem>();
ShippingMethodParser _cartLoginParser = new ShippingMethodParser(
_shippingMethodList);
_xmlReader.setContentHandler(_cartLoginParser);
InputSource is = new InputSource();
is.setEncoding("ISO-8859-1");
is.setCharacterStream(new StringReader(response));
_xmlReader.parse(is);
But I got following as string in my name variable.
DPD futár
I also try with
is.setEncoding("UTF-8");
But not getting success. Can anybody please help me regarding this ?
Try this one
InputSource in = new InputSource(new InputStreamReader(url.openStream(),"ISO-8859-1"));
You should not use ISO-8859-1 (latin 1) encoding for Hungarian data since it does not contain national characters like ő. The letter á is also a part of Portugese and Spanish alphabets for example and both are supported by latin 1, but in the case of Hungarian only latin 2 (ISO-8859-2) will work. However you should always use unicode encoding whenever possible.
For first I'd check if the received data is properly encoded since the error may also happen when you encode your string into unicode. The example you provide is a typical encoding error. Maybe you're trying to encode UTF-8 into UTF-8 which is wrong.
Related
I need your help in display the Arabic content and to start the writing from right to left in the PDF sample that I am trying to create. Here is the sample code:
public static void main(String[] args) throws IOException {
try {
BaseFont ArialBase = BaseFont.createFont("C:\\Users\\dell\\Desktop\\arialbd.ttf", BaseFont.IDENTITY_H, true);
Font ArialFont = new Font(ArialBase, 20);
Document document = new Document(PageSize.LETTER);
PdfWriter.getInstance(document, new FileOutputStream("C:\\Users\\dell\\Desktop\\HelloWorld.pdf"));
document.setMargins(72f, 72f, 72f, 0f);
document.open();
document.add(new Paragraph("الموقع الإلكتروني,",ArialFont));
document.close();
System.out.println("PDF Completed");
} catch (DocumentException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
With the above code, the Arabic text will be shown as below:
الموقع الإلكتروني,
which is unidentified and the text is from left to right. So how can I solve this?
Wrong encoding:
It is a bad programming practice to use non-ASCII characters in your source code. For instance, you have "الموقع الإلكتروني". This String should be interpreted as double byte UNICODE characters. However, when you save the source code file using an encoding different from UNICODE, or when you compile that code using a different encoding, or when your JVM uses a different encoding each double-byte character risks to be corrupted, resulting in gibberish such as "الموقع الإلكتروني"
How to solve this? Use the UNICODE notation: "\u0627\u0644\u0645\u0648\u0642\u0639 \u0627\u0644\u0625\u0644\u0643\u062a\u0631\u0648\u0646\u064a"
Please consult the official documentation, the free ebook The Best iText Questions on StackOverflow, where you will discover that this problem has already been described here: Can't get Czech characters while generating a PDF
Wrong font:
If you read this book carefully, you'll discover that your example might not work because you may be using the wrong font. This is explained in my answer to this question: Arabic characters from html content to pdf using iText
You are assuming that arialbd.ttf can produce Arabic glyphs. As far as I know only arialuni.ttf supports Arabic.
Wrong approach:
Furthermore, you are overlooking the fact that you can only use Arabic in the context of the ColumnText and the PdfPCell object. This is explained here: how to create persian content in pdf using eclipse
For instance:
BaseFont bf = BaseFont.createFont(
"c:/windows/fonts/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(bf, 20);
ColumnText column = new ColumnText(writer.getDirectContent());
column.setSimpleColumn(36, 730, 569, 36);
column.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
column.addElement(new Paragraph(
"\u0627\u0644\u0645\u0648\u0642\u0639 \u0627\u0644\u0625\u0644\u0643\u062a\u0631\u0648\u0646\u064a", font));
column.go();
Note that I am using the Identity-H encoding because UNICODE is involved.
go with PdfTable will print arabic text in pdf. following is the code
Font f = FontFactory.getFont("assets/NotoNaskhArabic-Regular.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
f.setStyle(Font.BOLD);
f.setColor(new BaseColor(context.getResources().getColor(R.color.colorPrimary)));
PdfPTable pdfTable=new PdfPTable(1);
pdfTable.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
PdfPCell pdfPCell=new PdfPCell();
pdfPCell.setBorder(Rectangle.NO_BORDER);
Paragraph paragraph=new Paragraph(string,f);
paragraph.setAlignment(PdfPCell.ALIGN_LEFT);
pdfPCell.addElement(paragraph);
pdfTable.addCell(pdfPCell);
document.add(pdfTable);
this code will add arabic text in your pdf. pdfPCell.setBorder(Rectangle.NO_BORDER); will give paragraph view for PdfTable
I need write PDF File, and I use this sample(http://www.vogella.com/tutorials/JavaPDF/article.html) with this version "itextpdf-5.4.1.jar".
This create the PDF file, but when the word has "você" write this "você".
I find this code but has not work:
Document document;
...
...
document.addLanguage("pt-BR");
How set encoding or language to Brasil?
Thanks!
Take a look at my answer to Divide page in 2 parts so we can fill each with different source (this is year another question answered in The Best iText Questions on StackOverflow). In this example, we read a series of text files that are stored in UTF-8. To achieve this, we use this method:
public Phrase createPhrase(String path) throws IOException {
Phrase p = new Phrase();
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(path), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
p.add(str);
}
in.close();
return p;
}
If you remove the "UTF8" and if you read that text as if it were ASCII, then you'd get the same behavior you are describing in your question: each byte would be treated as a single character whereas you have characters that require two bytes.
This is not really an iText question. This is a pure encoding question.
I'm encountering an odd situation whereby strings that I load from my resource XML file that have Spanish characters in them display correctly in my TextViews, but strings that I'm fetching from a JSON file that I load via HTTP at runtime display the missing char [] boxes
ESPAÑOL for example, when embedded in my XML strings works fine, but when pulled from my JSON is rendered as SPAÃ[]OL, so the Ñ is transformed into a à and a missing char!
I'm not sure at what point I need to intercept these strings and set the correct encoding on them. The JSON text file itself is generated on the server via Node, so, I'm not entirely sure if that's the point at which I should be encoding it, or if I should be encoding the fileReader on the Android side, or perhaps setting the TextView itself to be of some special encoding type (I'm unaware that this is an option, just sort of throwing my hands in the air, really).
[EDIT]
As per ianhanniballake's suggestion I am logging and seeing that the screwy characters are actually showing up in the log as well. However, when I look at the JSON file with a text viewer on the Android file system (it's sitting on the SDCARD) it appears correct.
So, it turned out that the text file was, indeed, encoded correctly and the issue was that I wasn't setting UTF-8 as my encoding on the FileInputStream...
The solution is to read the file thusly:
static String readInput() {
StringBuffer buffer = new StringBuffer();
try {
FileInputStream fis = new FileInputStream("myfile.json");
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
Reader in = new BufferedReader(isr);
int ch;
while ((ch = in.read()) > -1) {
buffer.append((char) ch);
}
in.close();
return buffer.toString();
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
I have an XML file which starts with <?xml version="1.0" encoding="iso-8859-2"?>. I read it the following way:
SAXParserFactory.newInstance().newSAXParser().parse(is, handler);
where is is an InputStream and handler is some arbitrary handler.
Then I get this exception:
org.apache.harmony.xml.ExpatParser$ParseException: At line 41152, column 17: not well-formed (invalid token)
Actually there is a degree sign at that position, enclosed in a CDATA like this:
<![CDATA[something °]]>
Using the charset iso-8859-2, the parser should accept almost any character, including this one. This seems not to be the case. What am I doing wrong?
EDIT
I'm doing all this on Android.
Weird: it seems that the parser completely ignores the encoding attribute. I converted the file to UTF-8 while leaving the header as is, and now my program can read it without error. Why is that??
(I'm making the InputStream like this: new BufferedInputStream(new FileInputStream(filename)), i.e. without a reader, so that cannot be the error.)
I worked around the error by recognizing the encoding manually. I peeked the XML header and looked for the encoding attribute (if available), extracted as a String, created a Java Charset object from it by Charset.forName(), then made a Reader with the given encoding and an InputSource over that Reader like this:
String encoding;
Charset charset;
[...]
Reader reader = new BufferedReader(new InputStreamReader(inputStream, charset));
InputSource inputSource = new InputSource(reader);
inputSource.setEncoding(encoding);
SAXParserFactory.newInstance().newSAXParser().parse(inputSource, myHandler);
Unfortunately I still don't know why the encoding could not be recognized automatically by the parser.
i have the following text in my strings.xml file
\n\nSVG Service Verlags GmbH & Co. KG \n
Schwertfegerstra?e 1-3\n
D-23556 L?beck\n
this is german text.
i need to decode this using utf-8 and then set it as text of a textview.
how do i go about this
thank you in advance.
EDIT:
i have tried the following
String decodedstring = URLDecoder.decode(nodevalue, "UTF-8");
this also does not work. why does this not work?
Some things to check.
Make sure your xml is tagged with the right encoding.
Make sure your xml file is SAVED with the right encoding. Looking at the text you pasted (from a browser?) it looks like the file is already mangled. Schwertfegerstra?e should be Schwertfegerstraße.
When you open the file you need to use an InputStreamReader with the encoding set.
See this page for an example:
http://www.mkyong.com/java/how-to-read-utf-8-encoded-data-from-a-file-java/
The key bit is:
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream(fileDir), "UTF8"));