mXparser result rounding - android

I am trying out mXparser in an android app and I almost have it working. But, if I parse the following expression "10/3" then it returns: 3.33333333335. Why this rounding in the end? and how do I tell mXparser to return 3.33333333333 instead?
I am writing the app using kotlin and has added mXparser through Maven.
Alternatively, do you know of a better/more used/more maintained math parser library for Android?

The reason is that computers calculate in base 2, not base 10. The number 10/3 has an infinite expansion in both base 2 and base 10 meaning it must be truncated. The decimal expansion of 10/3 is 3.333..., which when you cut it off simplifies to a bunch of 3's; while the binary expansion is 11.010101010101... and when you cut it off and convert back to decimal, it's totally believable that you could get the 5 at the end.
I'm not sure you can get around that when using a computer since computers have to use binary and they also have to truncate the binary expansion.

Any system based around IEEE 754 double precision will give the same answer. That includes all major programming languages. This is a very frequent SO question. See for example Is floating point math broken?
The solution is to never use the default Double.toString() method for your output. Use and output with specific number of decimal places and the problem goes away.
A more complex solution is to use a ration representation of your numbers, so the result of 10/3 is stored internally as a rational number {numerator:10,denominator:3}. This works for basic arithmetic but can't work with function like cos(x) or sqrt(x). The Jep parsing evaluation library does have options to allow ration number. (disclaimer I'm one of the authors of Jep).

Related

How do the Unicode control characters work?

What I'm doing now is to show the phone number correctly under right-to-left layout. I want +111111111 but it appears like 111111111+ now. I found a solution that using LRM(left-to-right mark), which is a Unicode control character '\u200E'.
There may be several formats for phone numbers in different place of world like XXX-XXX-XXXX. To prevent further bugs, I have to understand how those control characters work, especially which changes the direction of strings.
In my understanding, for common characters:
strings are stored as bytes in memory.
the editor/textview loads the bytes and look them up in
Unicode.
the editor/textview shows those Unicode in the form of
fonts.
So, when or which step do those control characters like LRM work? How to make sure that using them does not cause further bugs?
I wish I had made it clear for you.

sprintf() handling of %s extended ASCII (ISO 8859-1) on some runtimes?

I'm using ISO 8859-1 (Latin extended ASCII char set) in my C application. When I strcpy/strcat the portions of the string together, it works fine. But when I use sprintf("%s %s"), on some runtimes (particularly certain versions of Android), the string will truncate when an extended ASCII character (specifically é, although I haven't tried others) is hit.
I thought %s was just supposed to copy the bytes until '\0' was hit. I suspect that strcpy/strcat works because it does do just that, without any formatting. What could possibly be going on here?
I should note that I'm not viewing the text using printf(), rather my own text rendering engine which handles ISO-8859-1 just fine.
UPDATE:
To clarify, I have an NDK app, which is keeping the string in C, and passing it to my OpenGL based text rendering engine. If I pass the full string as a char* literal, it displays fine. If I sprintf() the portions together, it gets truncated at the é character.
For example:
char buffer[1024];
strcpy(buffer, "This is ");
strcat(buffer, "the string I want to diésplay.");
That shows up fine. But this:
sprintf(buffer, "%s%s", "This is ", "the string I want to diésplay.");
Prints as:
This is the string I want to di
The behavior of s[n]printf() is specified differently than the behavior of string-manipulation functions such as strcpy() and strcat(). The printf-family functions are all required to produce the same byte sequences when presented identical formats and print items. The only difference is in where those bytes are sent. Thus, if your C library were built such that it performed a transformation on string data (maybe a transcoding) when printing to the standard streams via printf(), then it would perform that same transformation when printing to a string via sprintf().
The "f" in "printf" is for "formatted". The standard neither says nor implies that formatting a string must mean dumping its bytes to the output verbatim, so a transcoding or other transformation such as I hypothesized above is not out of the question. In fact, the docs for some versions of these functions indicate locale-dependence ("Note that the length of the strings produced is locale-dependent and difficult to predict"), so transcoding in particular is a real possibility.
Any specific explanation of the third-party observations you describe would necessarily be speculative, as you have not presented nearly enough code or data to make a confident diagnosis. I am inclined to suspect an issue revolving around running the program in a locale that uses a character encoding differing from the one used internally by the program. If so, then you may be able to reproduce the problem locally by varying the locale in which you run, and you may be able to address it by ensuring one way or another that your program always runs in a suitable locale. Among other things, you might use the getlocale() and setlocale() functions to help here, especially if you want to limit the scope in which you exercise locale control.
Since ultimately you are relying on printf-family functions only for string manipulation, however, I think it would be better to use the workaround presented in the question: as much as possible, use C's dedicated string-manipulation functions, such as strcpy() and strncat(), to perform your string building. Since you are not relying on the stdio functions for your actual output, this should be fine.

How to speed up searching alphabetized word list for leading wildcard matches

I'm a word puzzle junky in my spare time, so I've spent a LOT of other spare time working on a helper program that allows wildcards in search patterns. It works great. On my Dell Laptop (i5, 8GB RAM) the search of a 140,000-word "dictionary" for wildcard matches for words has an almost imperceptible and definitely acceptable delay that occurs only if tens of thousands of words are returned. Java rules. So does its implementation of regex and match().
I was hoping to port it to Android. I worked all day getting a more-or-less equivalent app to compile. No chance with given code architecture.
The problem is that leading wildcard characters can (must) be allowed. E.g., ???ENE returns 15 matches--from achENE to xylENE and *RAT returns 22 matches--from aristocRAT through `zikuRAT--i.e., all 140,000 words must (?) be searched, which is going to take aaaaaaaaawhiiiiiiiiile on most (all?) Android devices. (Each took less than a second on my laptop.) (It takes my PC 3 seconds to return all 140,000 words and a little longer to eyeball them all.)
Since some word puzzles allow variable numbers of letters in words, disallowing leading wildcards cuts the heart out of the app for such puzzles. But if the search pattern had to start with a letter it would be easy enough to then do a binary search (or something quicker). (And it still might be unacceptably slow.)
Anyway, I was wondering if anybody might know some algorithm or can think of some approach that might be applied to speed up searches with leading wildcard characters.
I believe that the optimized version of what you are trying to do is widely known as the Unix/Linux utility "grep", which, if I remember correctly, uses the Boyer-Moore search algorithm.
Under the covers, Java's Pattern class uses Boyer-Moore. And it supports regex, so if you can write something to turn your wildcard search patterns into regular expressions, you can use Pattern.
There's an interesting Java implementation of grep at http://www.java2s.com/Code/Java/Regular-Expressions/AnotherGrep.htm
It uses memory-mapped files. I'm guessing that you won't be able to fit your entire word list into memory, but you could split it up into a bunch of smaller files - the implementation above memory-maps one file at a time. You'd have to do some testing to find the optimal size of a file.
I just Googled and found having a second list reverse alphabetized might be a way to then have a leading wildcard become trailing, opening door to binary search for pattern start. Interesting. But *a???ene* is also a legal search pattern in the program. What then? (Yeah. How often would you need such a search.)
I just found this about Apache Lucene:
Leading wildcards (e.g. *ook) are not supported by the QueryParser by default. As of Lucene 2.1, they can be enabled by calling QueryParser.setAllowLeadingWildcard( true ). Note that this can be an expensive operation: it requires scanning the list of tokens in the index in its entirety to look for those that match the pattern.

Tesseract - OCR issues with typewriter style fonts

We are using Tesseract.NET (and the Android version too) to recognize and extract document data. It worked really good with Arial and Cambria fonts, but now we have to recognize documents like that:
Tesseract cannot recognize it. Absolutely nothing (except the big sized serial number on the right upper corner).
We tried to train it, but - maybe it's our fault - it's still unstable.
What can we do?
(Btw the font is use by national offices, we cannot get it as true type or other font format.
In the current form it is very hard for an OCR tool to recognize any letters.
Serif fonts are hard to ocr.
Letters are very close together. Some are joined.
A dictionary is not of any help.
You might be able to improve the result with the following:
As this looks like an vehicle registration certificate you should be able to predict the positions of the textstrings of interest and then ocr they separatly.
Thereby using the -psm=7 or 8 option (assume single line or word).
As some strings seem to be numbers only you can help tesseract by using the digits argument.
For the alphanumeric strings it might help to reduce the dictionary pruning (or completely remove the dawg files.)
If those strings like 'ETZ' or 'MZ' are abbreviations you could also build an dictionary with those.
Reducing the yellow and green color is also an (easy) option you could test.
Use the barcode instead of trying to ocr the string.
For tesseract questions it always helps if you specify the version used and, if you do image preprocessing, provide a sample image of the processed input.

max file name length in Android

I am trying give a name for a file that I am creating. I just want to know that
what is the max file name length in Android ?
Is there a specification for a file name? Can I use characters like - or > ?
It is apparently unsafe to use labels over 127 bytes on Android. AFAIK, the 255 limit is a goal, but is a WIP. I trashed my Galaxy Tab 10.1's sdcard file system last week when music sync software generated some filenames of around 160 characters. Limiting the filenames to 127 solved the issue. Be safe, unless you are sure of your particular release... stick to a limit of 127.
About the characters: Reading here, looks like - is not a reserved character, so it may be used. > however, is reserved therefore may not be used.
About the maximum length: Since I couldn't find anything specific to Android, and since java does not restrict the length of a file name it works with (As you can see here), I'd say the maximum length is like the most widely used limit, which is 255 bytes.
In the specific case of resource names, like images, I've found that the max length is 100 characters, extension included. I've checked this in Android Studio 1.2 beta. I'm sure there must be something about this in the Android documentation.
I have found that '-' (period) is a standard in naming as a rule. Frequently you will see special characters to create readability, such as the '.' Yet, the period is a constant for separating the filename from the filetype nomenclature regardless of the OS. This rule goes back to the earliest days of computing. The thing that younger users do not realize is how rigid the rules were in those early years. Modern protocol, on the other hand makes more use of assumptions to shorten code so it takes less space. Just as your smartphone is smaller than an old IBM360, so too is modern instruction set.
max file size in Android is
255

Categories

Resources