APK Analyzer diff makes no sense

APK Analyzer diff makes no sense - android

I am comparing two APKs to determine why one of them is so much larger than the other.
I am using APK Analyzer, and I think I might have something wrong in my brain, because I cannot make any sense of what I am looking at.
I am posting the following diff in two parts since my monitor isn't large enough to capture the entire list.
If you look at the total (the first line, highlighted in blue), the diff is a substantial 45MB.
But if you carefully look at the contents, this number does not seem to add up. It actually looks like the "larger" apk should be smaller just by glancing at the numbers.
I am sure I am making some sort of stupid mistake or I am just not using this tool correctly.
Thanks for your input.

Related

mXparser result rounding

I am trying out mXparser in an android app and I almost have it working. But, if I parse the following expression "10/3" then it returns: 3.33333333335. Why this rounding in the end? and how do I tell mXparser to return 3.33333333333 instead?
I am writing the app using kotlin and has added mXparser through Maven.
Alternatively, do you know of a better/more used/more maintained math parser library for Android?

The reason is that computers calculate in base 2, not base 10. The number 10/3 has an infinite expansion in both base 2 and base 10 meaning it must be truncated. The decimal expansion of 10/3 is 3.333..., which when you cut it off simplifies to a bunch of 3's; while the binary expansion is 11.010101010101... and when you cut it off and convert back to decimal, it's totally believable that you could get the 5 at the end.
I'm not sure you can get around that when using a computer since computers have to use binary and they also have to truncate the binary expansion.

Any system based around IEEE 754 double precision will give the same answer. That includes all major programming languages. This is a very frequent SO question. See for example Is floating point math broken?
The solution is to never use the default Double.toString() method for your output. Use and output with specific number of decimal places and the problem goes away.
A more complex solution is to use a ration representation of your numbers, so the result of 10/3 is stored internally as a rational number {numerator:10,denominator:3}. This works for basic arithmetic but can't work with function like cos(x) or sqrt(x). The Jep parsing evaluation library does have options to allow ration number. (disclaimer I'm one of the authors of Jep).

Why would I ever use unaligned apks?

When I run gradle task "assembleDebug" for just getting a debug release I put on my phone it also generates another apk: MyApp-debug-unaligned.apk.
I think I understand what "alignment" of a zip means. That it has optimized placement of file boundaries for easy unzipping (correct me if I'm wrong). It's just an optimization and really doesn't have much to do with Android specifically.
So, since Android keeps all apps as apks and only seems to unzip them at run time, it would benefit to only install the aligned, optimized apks. It also takes a seemingly trivial amount of time to zip-align the package, but maybe that's just due to the size of my particular apps.
When would an unaligned zip be beneficial over it's aligned alternative? Or is it just because you have to have an unaligned version to align and the process doesn't clean up the unaligned file after it's done?

You would Never use an unaligned APK.
It's an intermediate product that isn't cleaned up. In my opinion it should be.
How It works:
What aligning does is it puts images and other large bits of uncompressed data on a 4 byte boundary. This increases the file size but causes them to belong to a certain page. It avoids having to pick up multiple pages from the APK for a single image (that is it minimizes the number of pages picked up). Since the image begins on a 4 byte boundary, there is a higher chance we will not pick up junk data, that is related to other processes.
This in the end allows me to waste less RAM and run faster, by picking up less pages. A trivial but good optimization
About the time it takes, it is relatively trivial, so it is worth it. Obviously, the more uncompressed data you have the more time it takes but it never is very significant. IMHO the compiler should throw away the unaligned file but I guess someone wanted to keep it.
Resources:
Announcement of ZipAlign
http://android-developers.blogspot.com/2009/09/zipalign-easy-optimization.html
Current ZipAlign Docs
http://developer.android.com/tools/help/zipalign.html
About Data Structure Alignment (read about padding)
https://en.wikipedia.org/wiki/Data_structure_alignment

How to speed up searching alphabetized word list for leading wildcard matches

I'm a word puzzle junky in my spare time, so I've spent a LOT of other spare time working on a helper program that allows wildcards in search patterns. It works great. On my Dell Laptop (i5, 8GB RAM) the search of a 140,000-word "dictionary" for wildcard matches for words has an almost imperceptible and definitely acceptable delay that occurs only if tens of thousands of words are returned. Java rules. So does its implementation of regex and match().
I was hoping to port it to Android. I worked all day getting a more-or-less equivalent app to compile. No chance with given code architecture.
The problem is that leading wildcard characters can (must) be allowed. E.g., ???ENE returns 15 matches--from achENE to xylENE and *RAT returns 22 matches--from aristocRAT through `zikuRAT--i.e., all 140,000 words must (?) be searched, which is going to take aaaaaaaaawhiiiiiiiiile on most (all?) Android devices. (Each took less than a second on my laptop.) (It takes my PC 3 seconds to return all 140,000 words and a little longer to eyeball them all.)
Since some word puzzles allow variable numbers of letters in words, disallowing leading wildcards cuts the heart out of the app for such puzzles. But if the search pattern had to start with a letter it would be easy enough to then do a binary search (or something quicker). (And it still might be unacceptably slow.)
Anyway, I was wondering if anybody might know some algorithm or can think of some approach that might be applied to speed up searches with leading wildcard characters.

I believe that the optimized version of what you are trying to do is widely known as the Unix/Linux utility "grep", which, if I remember correctly, uses the Boyer-Moore search algorithm.
Under the covers, Java's Pattern class uses Boyer-Moore. And it supports regex, so if you can write something to turn your wildcard search patterns into regular expressions, you can use Pattern.
There's an interesting Java implementation of grep at http://www.java2s.com/Code/Java/Regular-Expressions/AnotherGrep.htm
It uses memory-mapped files. I'm guessing that you won't be able to fit your entire word list into memory, but you could split it up into a bunch of smaller files - the implementation above memory-maps one file at a time. You'd have to do some testing to find the optimal size of a file.

I just Googled and found having a second list reverse alphabetized might be a way to then have a leading wildcard become trailing, opening door to binary search for pattern start. Interesting. But *a???ene* is also a legal search pattern in the program. What then? (Yeah. How often would you need such a search.)
I just found this about Apache Lucene:
Leading wildcards (e.g. *ook) are not supported by the QueryParser by default. As of Lucene 2.1, they can be enabled by calling QueryParser.setAllowLeadingWildcard( true ). Note that this can be an expensive operation: it requires scanning the list of tokens in the index in its entirety to look for those that match the pattern.

Is merging multiple XML files into one more efficient?

So, I have a lot of strings (I reckon more than 100, easily.)...
I have 6-7 XML files across which these strings are spread out.
Now during development, it was obviously easier to put some similar strings into one file for convenience.
But now that I'm going to put my app on the market, I want it to be as efficient and fast as possible, even if by mere milliseconds.
Here's what I did :
I just made a copy of my project, exactly same , except I just made another XML file, copy-pasted all of the other files strings , and deleted all the other files.
Now I got 1 XML file.
Problem is My development device is new, and has 1GHz CPU...So I didn't really notice a difference, I want to know whether having just one XML file would be better for low end (and also high end) devices...

To be honest, this falls into the category of problem where readability would be more important than performance. The performance difference between reading 6-7 files or just 1 would be barely anything, definitely nothing noticeable to the user.
I'd go with whichever makes your code cleanest as the readability will be worth more in the long run than saving 1 nanosecond of performance if it means losing hours of dev when you revisit it an it's just a mess of xml.

Automatic transformation of Android's dex code

I want to transform/instrument Dex files. The goals of transformation include measuring code coverage. Note that the source files are not available. So instrumenting Dex is the only option.
I am wondering if there are any existing code base that I could look at as examples to write a tool to achieve my goal.
I know about the Smali project and a host of other projects that build on Smali. However, none of these projects are good examples for my purpose.
I am looking for code that automatically transforms smali code or the dexlib representation, from which smali is generated. The later option is preferred for my purpose because the overhead of generating smali can be avoided.

It's a lot of code, but dx's DexMerger is an example program that transforms dex files. It's made quite complicated by the fact that it needs to guess the size of the output in order make forward-references work.
You'd also need to create infrastructure to rewrite dalvik instructions. DexMerger's InstructionTransformer does a shallow rewrite: it adjusts offsets from one mapping to another. To measure code coverage your instruction rewriting would probably need to be much more sophisticated.

Another option that have become available recently is Dexpler. It is an extension of Soot, which is a framework for analysis and instrumentation of Java programs. Dexpler reads in .apk files and converts to Jimple intermediate format. Jimple code can then be arbitrarily instrumented, and eventually dumped into a new apk.

(For the record, I am answering my own question here)
Eventually I did not find any tool that fit my requirements. So I ended up building my own tool, called Ella, based on DexLib. Out of the box, it does a few things such as measuring code coverage, recording method traces, etc. But it can be easily extended to do other types of transformations.

In some cases smali itself does a small amount of instruction rewriting while re-assembling a dex file. Things like replacing a const-string with a const-string/jumbo, or a goto instruction with a "larger" one, if the target is out of range. This involves replacing instructions in the instruction list with potentially larger ones, and the corresponding fixing up of offsets.
CodeItem.fixInstructions is the method responsible for this.
Additionally, there is the asmdex library. I'm not all that familiar with it, but it sounds like it might be relevant to what you're wanting to do.

I know it's a bit late but just in case you're still interested or perhaps for some other readers. ASMDEX has been mentioned already. And I think that's your best bet for the moment for what you're trying to achieve.
As for adding new registers take a look at org.ow2.asmdex.util.RegisterShiftMethodAdapter class. It's not perfect! As a matter of fact as it is it's horrible changing existing 4bit instructions when adding a register would mean some register would end up being 0xF and won't fit in 4 bits.
But it should be a good start.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.