I am trying to create a PDF in my Android application using the Android PDF Writer. This is a very basic library that allows to create simple PDF files. It works quite well, but there is one thing I do not understand:
When I look at the generated PDF source code I can see, that the file starts with the following lines:
%PDF-1.4
%©»ªµ
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
...
What does the second line mean? I searched a lot of different PDF syntax documentations but I have found no hint what that line could mean. In all examples I found the the %PDF-VersionXY line is directly followed by the first object / the catalog.
I am not sure if this is valid PDF code at all, or if this some an error due to some charset/enconding problem with the libraries source code.
Any idea what this could be about? What information could be included at this place and is %©»ªµ valid PDF or some enconding error?**
When taking a look at the pdf-1.4 reference here (or also in the current 1.7 here) in section 3.4.1 it says
Note: If a PDF file contains binary data, as most do (see Section 3.1, “Lexical Conventions”),
it is recommended that the header line be immediately followed by a
comment line containing at least four binary characters—that is, characters whose
codes are 128 or greater. This will ensure proper behavior of file transfer applications
that inspect data near the beginning of a file to determine whether to treat the file’s
contents as text or as binary.
So your generator seems to include this additional comment-line by default, even if there is no binary data to follow. What's in there doesn't matter as long as each byte value is > 128 (that is: outside the ASCII-range). In your case it's hex values A9 BB AA B5, so everything is fine and you don't have to worry about this line.
Related
While reading some android samples I usually see comments like
// BEGIN_INCLUDE (something)
// END_INCLUDE (something)
However, my current IDE — Android Studio 1.1 — can not recognise them (or maybe I do something wrong). I guess, they serve as some kind of code region marks (like
//<editor-fold desc="Region name">
// some code
//</editor-fold>
in AndroidStudio/IntellijIDEA), but such syntax is much like c++ preprocessor directives. So the question: should I know something important about these comments (besides obvious commenting function) that could improve my code in any way?
It's for documentation purposes, used for identifying snippets to include in target documentation. It's not really useful when editing the code; it's useful for avoiding repetition by generating documentation from actual code.
{#sample} and {#include}
These tags copy sample text from an arbitrary file into the output javadoc html.
The #include tag copies the text verbatim from the given file.
The #sample tag
copies the text from the given file and strips leading and trailing whitespace
reduces the indent level of the text to the indent level of the first non-whitespace line
escapes all <, >; and & characters for html
drops all lines containing either BEGIN_INCLUDE or END_INCLUDE so sample code can be nested
Both tags accept either a filename and an id or just a filename. If no id is provided, the entire file is copied. If an id is provided, the lines in the given file between the first two lines containing BEGIN_INCLUDE(id) and END_INCLUDE(id), for the given id, are copied. The id may be only letters, numbers and underscore ().
Four examples:
{#include samples/SampleCode/src/com/google/app/Notification1.java}
{#sample samples/SampleCode/src/com/google/app/Notification1.java}
{#include samples/SampleCode/src/com/google/app/Notification1.java Bleh}
{#sample samples/SampleCode/src/com/google/app/Notification1.java Bleh}
https://code.google.com/p/doclava/wiki/JavadocTags
I have to maintain an app translated into more than 10 different languages. Whenever a new version is developed, new strings are added to the source values.xml . The translation editor helps me to get an overview about which strings are missing in other languages, but at the moment, it looks like there is no option to get a diff xml with just the new strings added for each language. Since we use translation services we have to pay per translated word. Therefore I always have to manually create the files with the missing translations, which is very time consuming.
I can't imagine I'm the only one needing this particular feature. Is there a workaround / script / plugin which does solve this problem?
Back in the steam age I faced similar problem while trying to keep like 14 translations in sync, so I created small PHP script to to help me with this.
As I said it's pretty dated (2010 :) yet it should work. I just made it available on GitHub: https://github.com/MarcinOrlowski/android-strings-check
Basically what it does is diff two translation XMLs and generate human readable report:
./strings-check.php values/strings.xml values-pl/strings.xml
It will give you the output like this:
Missing in LANG (You need to add these)
File: values-pl/strings.xml
------------------------------------------------------
show_full_header_action
hide_full_header_action
recreating_account
Not present in BASE (remove it from your LANG file)
File: values/strings.xml
------------------------------------------------------------------
provider_note_yahoo
Summary
----------------
BASE file: 'values/strings.xml'
LANG file: 'values-pl/strings.xml'
3 missing strings
1 orphaned strings
Ok, I guess I found the solution to my problem, a python script called android-localization-helper:
https://github.com/jordanjoz1/android-localization-helper
I get no error on Android 3.0+, but only on Android 2.2, 2.3.3 when I try to parse a small XML file via XmlPullParser, the app breaks with an error:
org.xmlpull.v1.XmlPullParserException: PI must not start with xml
(position:unknown #1:5 in java.io.InputStreamReader#40568770)
What is PI mentioned in the error???
I found out that this may cause the first line of XML file (<?xml version="1.0" encoding="utf-8"?>), but I did not find the reason why this is happening on lower Android versions.
If this is the cause of error (first line of XML file), how can I fix this?
Should I:
a) ask the admin of web server to change XML? If yes, what he should change in XML?
b) substring InputStream using BufferedReader?
Somehow I think that the 2nd approach will cause extra delays on weak Android phones.
EDIT
I pulled XML content from debugger and saw that the first like is ending with \r\n, then the next characters starts. Does this say anything to you?
And this is how XML file look like. It's a small one and there is no visual reason why app is crashing.
<?xml version="1.0" encoding="utf-8"?>
<song>
<artist>Pink Floyd</artist>
<title>Shine On You Crazy Diamond</title>
<picture>http://www.xxyz.com/images/image.jpg</picture>
<time>Fri, 23 Nov 2012 11:22:31 GMT</time>
</song>
This is how InputStream taken from this XML look like (starting chars only).
Please advise!!!
I was having the same problem with a data file I have been using on an App. Was having this exception on android 2.3.* and the problem was the UTF-8 BOM, so the easier solution I found, on Windows, was use the Notepad++ tool, and that let you convert the file encoding to UTF-8 without BOM and that's it.
After checking the xml parser source it seems that the issue occurs due to a byte order marker on the beginning of the response, in my case '77u/' (in base64).
If you don't convert the stream to String but parse it directly the parser correctly throws this away.
For example instead of
String xml = new String(xmlData, "UTF-8");
KXmlParser parser = new KXmlParser();
parser.setInput(new StringReader(xml));
use
KXmlParser parser = new KXmlParser();
parser.setInput(new ByteArrayInputStream(xmlData), null);
As an alternative you could also substring until the first '<'
Encountered the same problem.
For me I replaced all '\n' with space ' ' and then it worked.
PS:
#Ahmad said, a blank line before could also cause this problem, better to check the first character '<' of a xml string.
I am currently working on a project for android using Tesseract OCR. I was hoping to fine-tune the results given to the user by adding a dictionary. According to tesseract OCR wiki, the best way to go about this would be to
Replace tessdata/eng.user-words with your own word list, in the same
format - UTF8 text, one word per line.
However there is no eng.user-words file in the tessdata folder, I assume that if I just make a text file with my dictionary in it, it will never be used...
Has anybody had a similar experience and knows what to do?
If you're using tesseract 3 (which I assume you are).
You'll have to rebuild your eng.trainddata file.
I intended to replace the word-dawg file completely to try to get better results (ie - the words I'm detecting are always the same).
You'll need combine_tessdata and wordlist2dawg executables in the training directory when you compile tesseract.
unpack everything (i did this just to back up my eng.word-dawg, you'll also need the unicharset later)
./combine_tessdata -u eng.traineddata
create a textfile of your wordlist (wordlistfile)
create a eng.word-dawg
./wordlist2dawg wordlistfile eng.word-dawg traineddat_backup/.unicharset
replace the word-dawg file
./combine_tessdata -o eng.traineddata eng.word-dawg
that should be it.
I have some reference data in a text file (~5MB) that I want to use with might android application.
The file is of the format:
1|a|This is line 1a
1|b|This is line 1b
2|a|This is line 2a
2|b|This is line 2b
2|c|This is line 2c
What I want to know is the most efficient way (less memory, fast, size etc.) to use this file within my application.
a.) Should I save the file as a raw resource and open and read the whole file whenever I need a certain line.
b.) Should I convert the file to XML and use XPath to query the file when ever I need to look up a value
<!--sample XML -->
<data>
<line number="1">
<entry name="a">This is line 1 a</entry>
</line>
</data>
c.) Should I just copy & paste the whole file as a static string array in the application and use that.
... any other suggestions are welcome.
[EDIT]
I will also need to search this file and jump to arbitrary keywords e.g. "line 1a".
XML will always take longer to read than simple text or CSV files. What XML gives you in the tradeoff is a highly structured and reliable way of storing and retrieving data. XML files are, as you can see in the examples above, a good 2-3x larger than the data they actually contain.
If you're sure that you're never going to run into the "delimiter" character in your simple text file, then that would probably work just fine, purely from a file speed perspective.
You have not provided enough information to answer this question. However, if I were a betting man, the answer is probably "none of the above".
I will also need to search this file
What does this mean? You are searching by some string key? By some regular expression? By a SQL-style query string where certain portions of a line are interpreted as integers versus strings versus something else? By a Google search-style string?
Each of those answers probably dictates a different technology for storing this information.
I will also need to...jump to arbitrary lines.
Why? How are you determining which "arbitrary lines" you are "jump"ing to: key? line number? byte offset? search results? something else?
And, of course, there are other questions, like:
How often is this data updated?
How is this data updated: new version of the app? download the whole file? download deltas/diffs? something else?
Is the data ASCII? UTF-8? Something else?
and so on.
Something that size that must be searched upon suggests "use a SQLite database", but some of the other answers might steer away from that solution.
If you are talking about very small amounts of data, the Android XML compiler can produce very efficient binary representations for you that you can access just like XML. On the other hand if the data is very large at all, and you need arbitrary queries, I would expect SQLlite to win out on performance (as well as flexibility). A small benchmark should be easy to write and would give you a good idea as to the basic tradeoffs involved.
Flat-files would be a last option, imo, but could work if the file isn't very large.
If you define efficiency as (less memory, fast, size etc.), a flat or delimited file will be faster to load and save.
However, people use XML because they are willing to trade some of that speed for XML's greater flexibility and ease of use.