I need write PDF File, and I use this sample(http://www.vogella.com/tutorials/JavaPDF/article.html) with this version "itextpdf-5.4.1.jar".
This create the PDF file, but when the word has "você" write this "você".
I find this code but has not work:
Document document;
...
...
document.addLanguage("pt-BR");
How set encoding or language to Brasil?
Thanks!
Take a look at my answer to Divide page in 2 parts so we can fill each with different source (this is year another question answered in The Best iText Questions on StackOverflow). In this example, we read a series of text files that are stored in UTF-8. To achieve this, we use this method:
public Phrase createPhrase(String path) throws IOException {
Phrase p = new Phrase();
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(path), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
p.add(str);
}
in.close();
return p;
}
If you remove the "UTF8" and if you read that text as if it were ASCII, then you'd get the same behavior you are describing in your question: each byte would be treated as a single character whereas you have characters that require two bytes.
This is not really an iText question. This is a pure encoding question.
I know this is old question but I am not satisfied with answers given before.
My question is how to display french letters (like- é,à etc.) using UTF-8 or any technique.
Currently these are displaying "?".
My code is:
InputStream is = null;
try {
if(CommonUtilities.prefs.getString("KEY_LOCALE", "fr").equalsIgnoreCase("fr")){
is = getAssets().open("terms_txt_fr.txt");
}else{
is = getAssets().open("terms_txt_en.txt");
}
String term_tx =convertStreamToString(is);
String valueUTF8 = new String(term_tx.getBytes(), "UTF-8");
terms_tx.setText(valueUTF8);
} catch (IOException e) {
e.printStackTrace();
}
You code is fine, but you need to make sure, your input is UTF-8 encoded.
There are a lot of methods how to do it. Each of them is specific to your environment like your operating system. A good summary is given here:
Best way to convert text files between character sets?
On Linux (I guess on OSX it's the same) you do:
iconv -f ISO-8859-15 -t UTF-8 in.txt > out.txt
Most good text editors have options for how to save files. You might want to check that too.
In modern operating systems UTF-8 should be the default.
I'm having a problem with iText. Other people say that iText is for PDF Creation only? and it can not read or extract text from a PDF. is that true?
If it is true then what are other options i can choose to EXTRACT text from PDF File and Save it on a Variable or Display it in Android device?
If iText is capable of Extracting text from PDF, then HOW?
iText can extract text from PDFs. While it is true that it originated as a tool to create new and manipulate existing PDFs, it in the recent years also has become better and better at extracting text. This obviously implies that you should use a current iText version (5.3.x) for text extraction.
The book "iText in Action, second edition" by the main iText developer, Bruno Lowagie, explains basic iText text extraction in chapter 15, and the samples from that chapter are available in the iText Sourceforge SVN repository, cf. Samples for chapter 15. A good starting point is ExtractPageContentSorted2 which extracts the text of a whole page.
If you have special requirements, you may use ExtractPageContentSorted1 as a starting point which explicitly defines a text extraction strategy; depending on your requirements you will need your own startegy. If you want the text from a specific region only, look at ExtractPageContentArea.
To really fine tune the text extraction capabilities of iText, you should have a look at the itext-question mailing list archive (e.g. at nabble.com) as recently the iText text extraction API was extended to serve additional use cases.
Use below code to extract text from pdf :
String pat = data.getData().getPath();
File f = new File(pat);
//f is file path of pdf file
read = new PdfReader(new FileInputStream(f));
parser = new PdfReaderContentParser(read);
strw = new StringWriter();
stretegy = parser.processContent(j, new SimpleTextExtractionStrategy());
strw.write(stretegy.getResultantText());
String da = strw.toString();
//set extracted text from pdf file
//to Edit-text
edt1.setText(da);
Hello I am trying to read a UTF-8 encoded txt files with Hebrew chars on my android application, and now after managing doing for some reason the 'a' char is always appended at the beginning of the String i read.. and I wonder why
Here is my code:
void Read(){
try {
File fileDir = new File("/sdcard/test.txt");
BufferedReader in = new BufferedReader( new InputStreamReader(
new FileInputStream(fileDir), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
Log.i("TEST",str);
}
in.close();
}
catch (UnsupportedEncodingException e)
{
System.out.println(e.getMessage());
}
catch (IOException e)
{
System.out.println(e.getMessage());
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
}
this is the result i get
05-15 01:53:25.269: INFO/TEST(16236): אבגדהוזחטיכלמנסעפצקרשתa
In order to get a better answer, I need two questions answered:
What is the exact code point of the character in question (your "a")?
What is the exact byte sequence in your file, around the questionable area?
I'm going to take a guess here: You say the character is the first thing in the file ("appended at the beginning of the String") and that you got back it's in the Arabic Presentation Forms B block. The last character of Arabic Presentation Forms B, which oddly has nothing to do with Arabic, is U+FFEF, or the byte order mark (BOM). It usually appears at the beginning of UTF-16 or UTF-32 encoded files, and identifies the "endianess" of the encoding (whether the file is UTF-16LE or UTF-16BE encoded, likewise for UTF-32). It typically does not appear, however, in UTF-8 data, as UTF-8 has no notion of "byte order". That said, some brain-dead Windows programs will stick it there, and then have an additional option of "UTF-8 without BOM". (The BOM is used then to identify a file as likely being encoded in UTF-8.) My guess is you have a BOM in your data, and your program is reading it and passing it on to you.
IF this is your problem, and your file is genuinely encoded in UTF-8, you should be able to find the following byte sequence near the beginning of the file: EF BB BF — this is the UTF-8 representation of U+FFEF.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Is there any way to create PDF Files from an Android application?
If anyone wants to generate PDFs on Android device, here is how to do it:
http://sourceforge.net/projects/itext/ (library)
http://www.vogella.de/articles/JavaPDF/article.html (tutorial)
http://tutorials.jenkov.com/java-itext/image.html (images tutorial)
If you are developing for devices with API level 19 or higher you can use the built in PrintedPdfDocument: http://developer.android.com/reference/android/print/pdf/PrintedPdfDocument.html
// open a new document
PrintedPdfDocument document = new PrintedPdfDocument(context,
printAttributes);
// start a page
Page page = document.startPage(0);
// draw something on the page
View content = getContentView();
content.draw(page.getCanvas());
// finish the page
document.finishPage(page);
. . .
// add more pages
. . .
// write the document content
document.writeTo(getOutputStream());
//close the document
document.close();
A trick to make a PDF with complex features is to make a dummy activity with the desired xml layout. You can then open this dummy activity, take a screenshot programmatically and convert that image to pdf using this library. Of course there are limitations such as not being able to scroll, not more than one page,but for a limited application this is quick and easy. Hope this helps someone!
It's not easy to find a full solution of the problem of a convertion of an arbitrary HTML to PDF with non-english letters in Android. I test it for russian unicode letters.
We use three libraries:
(1) Jsoup (jsoup-1.7.3.jar) for a convertion from HTML to XHTML,
(2) iTextPDF (itextpdf-5.5.0.jar),
(3) XMLWorker (xmlworker-5.5.1.jar).
public boolean createPDF(String rawHTML, String fileName, ContextWrapper context){
final String APPLICATION_PACKAGE_NAME = context.getBaseContext().getPackageName();
File path = new File( Environment.getExternalStorageDirectory(), APPLICATION_PACKAGE_NAME );
if ( !path.exists() ){ path.mkdir(); }
File file = new File(path, fileName);
try{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
document.open();
// Подготавливаем HTML
String htmlText = Jsoup.clean( rawHTML, Whitelist.relaxed() );
InputStream inputStream = new ByteArrayInputStream( htmlText.getBytes() );
// Печатаем документ PDF
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
inputStream, null, Charset.defaultCharset(), new MyFont());
document.close();
return true;
} catch (FileNotFoundException e) {
e.printStackTrace();
return false;
} catch (DocumentException e) {
e.printStackTrace();
return false;
} catch (IOException e) {
e.printStackTrace();
return false;
}
The difficult problem is to display russian letters in PDF by using iTextPDF XMLWorker library. For this we should create our own implementation of FontProvider interface:
public class MyFont implements FontProvider{
private static final String FONT_PATH = "/system/fonts/DroidSans.ttf";
private static final String FONT_ALIAS = "my_font";
public MyFont(){ FontFactory.register(FONT_PATH, FONT_ALIAS); }
#Override
public Font getFont(String fontname, String encoding, boolean embedded,
float size, int style, BaseColor color){
return FontFactory.getFont(FONT_ALIAS, BaseFont.IDENTITY_H,
BaseFont.EMBEDDED, size, style, color);
}
#Override
public boolean isRegistered(String name) { return name.equals( FONT_ALIAS ); }
}
Here we use the standard Android font Droid Sans, which is located in the system folder:
private static final String FONT_PATH = "/system/fonts/DroidSans.ttf";
A bit late and I have not yet tested it yet myself but another library that is under the BSD license is Android PDF Writer.
Update I have tried the library myself. Works ok with simple pdf generations (it provide methods for adding text, lines, rectangles, bitmaps, fonts). The only problem is that the generated PDF is stored in a String in memory, this may cause memory issues in large documents.
PDFJet offers an open-source version of their library that should be able to handle any basic PDF generation task. It's a purely Java-based solution and it is stated to be compatible with Android. There is a commercial version with some additional features that does not appear to be too expensive.
Late, but relevant to request and hopefully helpful. If using an external service (as suggested in the reply by CommonsWare) then Docmosis has a cloud service that might help - offloading processing to a cloud service that does the heavy processing. That approach is ideal in some circumstances but of course relies on being net-connected.
U can also use PoDoFo library. The main goal is that it published under LGPL. Since it is written in C++ you should cross-compile it using NDK and write C-side and Java wrapper. Some of third-party libraries can be used from OpenCV project. Also in OpenCV project U can find android.toolchain.cmake file, which will help you with generating Makefile.