I'm having a problem with iText. Other people say that iText is for PDF Creation only? and it can not read or extract text from a PDF. is that true?
If it is true then what are other options i can choose to EXTRACT text from PDF File and Save it on a Variable or Display it in Android device?
If iText is capable of Extracting text from PDF, then HOW?
iText can extract text from PDFs. While it is true that it originated as a tool to create new and manipulate existing PDFs, it in the recent years also has become better and better at extracting text. This obviously implies that you should use a current iText version (5.3.x) for text extraction.
The book "iText in Action, second edition" by the main iText developer, Bruno Lowagie, explains basic iText text extraction in chapter 15, and the samples from that chapter are available in the iText Sourceforge SVN repository, cf. Samples for chapter 15. A good starting point is ExtractPageContentSorted2 which extracts the text of a whole page.
If you have special requirements, you may use ExtractPageContentSorted1 as a starting point which explicitly defines a text extraction strategy; depending on your requirements you will need your own startegy. If you want the text from a specific region only, look at ExtractPageContentArea.
To really fine tune the text extraction capabilities of iText, you should have a look at the itext-question mailing list archive (e.g. at nabble.com) as recently the iText text extraction API was extended to serve additional use cases.
Use below code to extract text from pdf :
String pat = data.getData().getPath();
File f = new File(pat);
//f is file path of pdf file
read = new PdfReader(new FileInputStream(f));
parser = new PdfReaderContentParser(read);
strw = new StringWriter();
stretegy = parser.processContent(j, new SimpleTextExtractionStrategy());
strw.write(stretegy.getResultantText());
String da = strw.toString();
//set extracted text from pdf file
//to Edit-text
edt1.setText(da);
Related
Context
I'm using a third-party library called PsPdfKit to edit PDF files, adding annotations - like PNG images, and text decorations - on top of the PDF document. But there's a limitation of this library where I'm not able to use this annotation feature on password-protected PDF files. I can open and see the documents, but I'm unable to actually drop these annotations in.
What I'm trying to do now is to figure out whether could be possible to create an editable copy of the PDF file.
Question
Is there any way on Android to create an editable copy of a password protected PDF file? Again, these password protected PDF files are only preventing writing any changes on top of the PDF, you don't actually need a password to see the PDF content.
My idea is to create an editable copy of the PDF file and then pass that copy to the PsPdfKit library.
Okay, so I've finally figured out a workaround for this issue. What I did was to instantiate the password protected PDF file via PsPdfKit and then create a writeable copy of the file using the page bitmaps of the password protected file. A bit of a hacky solution, but it did allow me to use the annotation feature.
// We open the password-protected file.
val readOnlyPdfDocument = PdfDocumentLoader.openDocument(applicationContext, readOnlyFile.toUri())
// Use the PsPdfKit API to create a copy of the pages of this document
val task = PdfProcessorTask.newPage(NewPage.fromPage(readOnlyPdfDocument, 0).backgroundColor(Color.RED).build())
for (i in 1 until readOnlyPdfDocument.pageCount) {
task.addNewPage(NewPage.fromPage(readOnlyPdfDocument, i).backgroundColor(Color.RED).build(), i)
}
// Finally store the writeable PDF document in a new file
val writeableFile = File(applicationContext.cacheDir, "Writeable.pdf")
PdfProcessor.processDocument(task, writeableFile)
I have yet another hurdle to climb with my GOOGLE DRIVE SDK Android App. I am uploading scanned images with tightly controlled index fields - user defined 'tags' from local dictionary. For instance XXX.JPG has index words "car" + "insurance". Here is a simplified code snippet:
...
body.setTitle("XXX.JPG");
body.setDescription("car, insurance");
body.setIndexableText(new IndexableText().setText("car insurance"));
body.setMimeType("image/jpeg");
body.setParents(Arrays.asList(new ParentReference().setId(...)));
FileContent cont = new FileContent("image/jpeg", new java.io.File(fullPath("xxx.jpg")));
File gooFl = _svc.files().insert(body, cont).execute();
...
Again, everything works great, except when I start a search, I get results that apparently come from some OCR post process, thus rendering my system's DICTIONARY unusable. I assume I can use a custom MIME type, but then the JPEG images become invisible for users who use standard GOOGLE DRIVE application (local, browser-based ... ). So the question is: Can I upload MIME "image/jpeg" files with custom indexes (either Indexable, or Description fields) but stop GOOGLE from OCR-ing my files and adding indexes I did not intend to have?
Just to be more specific, I search for "car insurance" and instead of my 3 files I indexed this way, I get unmanageable pile of other results (JPEG scanned documents) that had "car" and "insurance" somewhere in them. Not what my app wants.
Thank you in advance, sean
...
Based on Burcu's advise below, I modified my code to something that looks like this (stripped to bare bones):
// define meta-data
File body = new File();
body.setTitle("xxx.jpg");
body.setDescription(tags);
body.setIndexableText(new IndexableText().setText(tags));
body.setMimeType("image/jpeg");
body.setParents(Arrays.asList(new ParentReference().setId(_ymID)));
body.setModifiedDate(DateTime.parseRfc3339(ymdGOO));
FileContent cont =
new FileContent("image/jpeg",new java.io.File(fullPath("xxx.jpg")));
String sID = findOnGOO(driveSvc, body.getTitle());
// file not found on gooDrive, upload and fix the date
if (sID == null) {
driveSvc.files().insert(body, cont).setOcr(false).execute();
driveSvc.files().patch(gooFl.getId(), body).setOcr(false).setSetModifiedDate(true).execute();
// file found on gooDrive - modify metadata and/or body
} else {
// modify content + metadata
if (contentModified) {
driveSvc.files().update(sID, body, cont).setOcr(false).setSetModifiedDate(true).execute();
// only metadata (tags,...)
} else {
driveSvc.files().patch(sID, body).setOcr(false).setSetModifiedDate(true).execute();
}
}
...
It is a block that uploads or modifies a Google Drive file. The two non-standard operations are:
1/ resetting the file's 'modified' date in order to force the date of file creation - tested, works OK
2/ stopping the OCR process that interferes with my apps indexing scheme - will test shortly and update here
For the sake of simplicity, I did not include the implementation of "findInGOO()" method. It is quite simple 2-liner and I can supply it upon request
sean
On insertion, set the ocr parameter to false:
service.files().update(body, content).setOcr(false).execute();
I'm interested in writing a visualization program for the road data in the 2009 Tiger/Line Shapefiles. I'd like to draw the line data to display all the roads for my county.
The ESRI Shapefile or simply a
shapefile is a popular geospatial
vector data format for geographic
information systems software. It is
developed and regulated by ESRI as a
(mostly) open specification for data
interoperability among ESRI and other
software products.1 A "shapefile"
commonly refers to a collection of
files with ".shp", ".shx", ".dbf", and
other extensions on a common prefix
name (e.g., "lakes.*"). The actual
shapefile relates specifically to
files with the ".shp" extension,
however this file alone is incomplete
for distribution, as the other
supporting files are required.
Does anyone know of existing libraries for parsing and reading in the line data for Shapefiles?
GeoTools will do it. There are a ton of jars and you don't need most of them. However, reading the shapefile is just a few lines.
File file = new File("mayshapefile.shp");
try {
Map<String, String> connect = new HashMap();
connect.put("url", file.toURI().toString());
DataStore dataStore = DataStoreFinder.getDataStore(connect);
String[] typeNames = dataStore.getTypeNames();
String typeName = typeNames[0];
System.out.println("Reading content " + typeName);
FeatureSource featureSource = dataStore.getFeatureSource(typeName);
FeatureCollection collection = featureSource.getFeatures();
FeatureIterator iterator = collection.features();
try {
while (iterator.hasNext()) {
Feature feature = iterator.next();
GeometryAttribute sourceGeometry = feature.getDefaultGeometryProperty();
}
} finally {
iterator.close();
}
} catch (Throwable e) {}
Openmap has a Java API that provides read and write access to ESRI files.
There is GeoTools, or more exactly this class ShapefileDataStore.
You could try to use Java ESRI Shape File Reader library. It's small, easy to install and has very simple API.
The only drawback is that it does not read other mandatory and optional files (.shx, .dbf, etc.) that are usually shipped with a shape file.
You can directly use GUI GIS tools so that their is no need of changing the source code of GeoTools.
I use QGIS which does all operations(even more) than GeoTools.
Quantum GIS - An open source Geographic Information System for editing, merging and simplifying shapefile maps. See also: creating maps with multiple layers using Quantum GIS.
I am trying to create a PDF file by inserting some text into it with a proper structure in Android.
Document doc = new Document();
PdfWriter.getInstance(doc, new FileOutputStream("urgentz.pdf"));
doc.open();
Image image = Image.getInstance ("urgentzImageahslkdhaosd.jpg");
doc.add(new Paragraph("Your text blah bleh"));
doc.add(image);
doc.close();
The above code does not work.
Without seeing stacktrace I can only guess what's wrong. You might find these questions useful:
How to create PDFs in an Android app?
how to Generate Pdf File with Image in android?
Some ideas:
Check File paths. You have specified only filenames to pdf and image without any paths. Use something like /mnt/sdcard/<YourFolderHere>/somefile.pdf and make sure you have defined android.permission.WRITE_EXTERNAL_STORAGE in AndroidManifest.xml
Use droidText instead of iText
Try another PDF Writer library http://coderesearchlabs.com/androidpdfwriter/
I have .pdf file and multiple forms are there.
I want to open my .pdf file, fill the forms and save it from Android development.
Is there any API for Android Rendering.
I found iText but I just manage to create new pdf and than i can fill form. means which .pdf file i created that will be filled out. I need to fill my form in my own .pdf.
Thanks in Advance...any help will be appreciated...
DynamicPDF Merger for Java allows you to do just that. You can take an existing PDF document, fill out the form field values and then output that newly filled PDF.
There was a recent blog post on dynamicpdf.com on setting up DynamicPDF for Java in an Android application and creating a simple PDF from it, http://www.dynamicpdf.com/Blog/post/2012/06/15/Generating-PDFs-Dynamically-on-Android.aspx.
You can easily take that example one step further and use it to accomplish your task of form filling. The following (untested) code is an example of what it would take to form fill an existing PDF on an Android device using DynamicPDF Merger for Java:
InputStream inputStream = this.getAssets().open("PDFToFill.pdf");
long avail = inputStream.available();
byte[] samplePDF = new byte[(int) avail];
inputStream.read(samplePDF , 0, (int) avail);
inputStream.close();
PdfDocument objPDF = new PdfDocument(samplePDF);
MergeDocument document = new MergeDocument(objPDF);
document.getForm().getFields().getFormField("FormField1").setValue("My Text");
document.draw("[PhysicalPath]/FilledPDF.pdf");
The native PDF support on current Android platforms (including Android P) doesn't expose any controls for filling forms. 3rd-party PDF SDKs such as PSPDFKit fill this gap and allow programmatic PDF form filling:
List<FormField> formFields = document.getFormProvider().getFormFields();
for (FormField formField : formFields) {
if (formField.getType() == FormType.TEXT) {
TextFormElement textFormElement = (TextFormElement) formField.getFormElement();
textFormElement.setText("Test " + textFormElement.getName());
} else if (formField.getType() == FormType.CHECKBOX) {
CheckBoxFormElement checkBoxFormElement = (CheckBoxFormElement)formField.getFormElement();
checkBoxFormElement.toggleSelection();
}
}
(If you click on above link there's also a Kotlin PDF form filling example.)
Note that most SDKs on the market focus on PDF AcroForms, and not the XFA specification, which has been deprecated in the PDF 2.0 spec.