How to get Page/Sheet Count of Word/Excel documents? - android

In my project I have one requirement to show the number of pages in Word documents (.doc, .docx) files and number of sheets in Excel documents (.xls, .xlsx). I have tried to read the .docx file using Docx4j but the performance is very poor but I need just the word count and tried using Apache POI. I am getting an error, something like:
"trouble writing output: Too many methods: 94086; max is 65536. By package:"
I want to know whether there is any paid/open source library available for android.

There is just no way to show exact number of pages in MS Word file, because it will be different for different users. The exact number depends on printer settings, paper settings, fonts, available images, etc.
Still, you can do the following for binary files:
open file use POIFSFileSystem or NPOIFSFileSystem
extract only FileInformationBlock as it is done in the constructor HWPFDocumentCore
create DocumentProperties using information from FileInformationBlock as it is done in constuctor of HWPFDocument
get value of property cPg of DOP: DocumentProperties::getCPg()
The description of this field is: "A signed integer value that specifies the last calculated or estimated count of pages in the main document, depending on the values of fExactCWords and fIncludeSubdocsInStats."
For DOCX/XLSX documents you will need to access the same (I assume) property but using SAX or StAX methods.

Related

Snapdragon neural processing- SNPE

I am quite new to this platform so please be kind if my question is stupid. Currently I am trying to integrate a deep learning model by using SNPE to detect human pose. The architecture of the model is as following:
Input -> CNN layers -> seperate to two different set of CNN -- > 2 different output layers
So, basically my network is stated from an input data and then genertates two different outputs (output1 and output2), but when I try to execute the network in SNPE, It seems like only have information about the output2 layer. Do any of you has any idea about this situation and is it possipole for me to look for the output of output1. Thank you all in Advance!.
I assume you have successfully converted the model to DLC and are trying to run the network with snpe-net-run tool. For getting multiple outputs, while running snpe-net-run you need to specify the output layers (in addition to input) in the file that is given to --input_list argument.
Let's assume outputlayer1 and outputlayer2 are the names of 2 output layers and ~/test/example_input.raw is the path of the input, then the input list file format for the same is as follows:
#outputlayer1 outputlayer2
~/test/example_input.raw
In the first line # is followed by output layer names which are separated by a whitespace. Next line contains the path to input(single input case). You can also add multiple input files, one line per iteration. If there is more than one input per iteration, a whitespace should be used as a delimiter.
General format for input list file is as follows
#<output_name>[<space><output_name>]
<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
…
You can refer to snpe-net-run documentation for more information.

Why is PDFBox overwriting multiple Fields, even when they don't match the fullyQualifiedName? (Kotlin Android)

I'm using com.tom_roush:pdfbox-android:1.8.10.1 version of PDFBox.
I have the following code.
val skillList = listOf<String>("Athletics","Acrobatics","Sleight of Hand", "Stealth","Acrana", "History","Investigation","Nature", "Religion", "Animal Handling", "Insight", "Medicine", "Perception", "Survival", "Deception", "Intimidation", "Performance", "Persuasion"
private fun getField(acroForm:PDAcroForm,name:String): PDTextField {
return acroForm.getField(name) as PDTextField
}
var temp = 0
skillList.forEach {
val field = getField(acroForm,it.name)
temp += 1
field.value = temp.toString()
}
Here is a link to the PDF.
PDF in question
My problem is that my final PDF (all fields with unique names that match the above list), has many of them being set with the 17th out of 18 passes. What am I doing wrong?
This is a bug in PDFBox (1.8.x and 2.x) when filling PDF forms which only occurs if in the original form multiple fields share the same XObject as appearance stream.
In detail
Your original document contains many empty text fields. Several subsets of them share the same appearance stream, e.g. "Athletics" and "Religion":
As you can see they both share the XObject in PDF object 479.
When PDFBox fills in the form values, it first sets the value of "Athletics" to "1" and also updates the appearance XObject to show "1", and later it sets the value of "Religion" to "9" and updates the appearance XObject to show "9". The end result: In a viewer both "Athletics" and "Religion" show "9" as value.
The issue is that PDFBox assumes it can simply update an existing appearance stream when setting the value of a form field. Actually it must replace it, probably also the AP dictionary if it happens to be indirect as it might also be shared.
A work-around
A work-around in your case is to drop the existing empty appearances before setting the field:
field.getDictionary().removeItem(COSName.AP)
field.value = temp.toString()
(Probably that line can be shortened in Kotlin to field.dictionary.removeItem(COSName.AP) but I know next to nothing about Kotlin...)
Backgrounds
One might wonder whether a construction as found in the source PDF here (i.e. appearance streams shared by multiple text fields) is valid at all. But indeed I could not find anything forbidding this in the PDF specification, on the contrary the following section about annotations in general (form field widgets are special annotations) can be taken to explicitly allow it:
A given annotation dictionary shall be referenced from the Annots array of only one page. This requirement applies only to the annotation dictionary itself, not to subsidiary objects, which may be shared among multiple annotations.
(both ISO 32000-1 and ISO 32000-2, section 12.5.2 "Annotation Dictionaries")

How to extract Annotations in PDFTron in Android and save to db?

I'm trying to get all the Annotations eg: INK and save to the db from Android.
I have looked thru the PDFTron examples particularly ElementReaderAdvTest. I can follow where it process Element.e_path and prints out the path.
https://www.pdftron.com/documentation/samples/kt/ElementReaderAdvTest?platforms=android
How do I save each path data and later on I want to convert the path data to svg.
The PDF ISO standard defines an annotation data interchange format called FDF. FDF is a PDF file with no pages, and just annotations and/or form field values.
To just extract the annotations use the following
pdfviewctrl.docLockRead();
FDFDoc fdf = pdfviewctrl.getDoc().fdfExtract(PDFDoc.e_annots_only);
pdfviewctrl.docUnlockRead();
You can then save the FDF file as binary/pdf data in whatever storage you want. You do not have to save the annotated PDF, you can at anytime later on merge back.
doc.fdfMerge(fdf);
You would not go into the ElementReader sample code, that is too low level for what you want, and just FDFMerge and FDFExtract are probably all you need.

How to add some data to .jpg or .mp4 file in android

Business Purpose :
1) Want to add large string(data) of length 1200 to the .jpg / .mp4 file in android mobile
2) Later the file can be uploaded to server from mobile
3) In server we retrieve the added data from the file
What i have tried in .jpg file :
Used the below code for adding data
ExifInterface exif = new ExifInterface(photoPath);
exif.setAttribute("UserComment", "String having length of 1000");
exif.saveAttributes();
This code is working. After i set the attribute, i can able to read it by
String userComment=exif.getAttribute("UserComment");
In low end mobile it showed error "stack corruption detected: aborted" while saving attribute.Later i found it taken up to 663 characters alone.
In high end mobile the string of length saved up to 1999 after saveAttribute().
Is there any other way to add some tag/meta data/string to .jpg,.mp4 and .mp3 file ?
So that the added data can be retrieved later.
please share your views. Is it possible ?
It sounds as if it's certainly is possible using your approach, but you're running into various implementation limits in how long attribute values are supported.
One solution to at least investigate is of course to split your 1200-byte string into multiple shorter strings, say four 300-byte ones, and add those as UserComment0, UserComment1 and so on. That should be trivial to extract and concatenate to get back your original longer string, and might work around the limitations.
Praveen,
take a look at Steganography project
https://github.com/johnkil/Steganography
Thanks,
Jey.

android SearchableDictionary sample - suggesting words

i just used the searchabledictionary sample of android, which is available in samples folder of android sdk for windows. ( API level 9 )
i completed the definitions.txt file with the same format like
key - value
when i type a word in search area, the app tries to suggest words, but it doesn't find the exact word.here is an example. i searched the word test , and this is definitions.txt :
acceptance test - meaning
acid test - meaning
alpha test - meaning
benchmark tests - meaning
....
flight test - meaning
load test - meaning
....
test - meaning
it finds 15 first words of this list, ( hopefully it doesn't search words like attestaion) but it doesn't show the exact word test !
i read the DictionaryProvider and DictionaryDatabase but i could not realize the root of problem!
the question is how can i suggest the exact word test at first of the list?
Check Dictionary.java, you will see loadWords function.
That function opens definitions.txt file and iterates all lines, and splits every line with "-" and adds to a Hashmap dictionary.
And again in same file there is getMatches function, which takes your search keyword and gets result from dictionary.
Not sure but because when you are searching a suggestion, first couple of matches displays.
And because of "test" keyword exists on too many lines and ("test - meaning") is your last definition, you can't see on suggesting list.
You could try to move "test -meaning" line to the very first line on definitions.txt.

Categories

Resources