remove unused classes with proguard for Android

remove unused classes with proguard for Android - android

History/Context
I have a project[1] where size really matters - recently I moved stuff to a shared lib[2] and thought proguard will take care and remove the unused classes because I had a config that was drastically reducing the size but by using the lib i came over the magic 100kb mark so I investigated: classes which I do not use for sure are in the resulting dex file - and even with full name ( not shortened to single-char ) - e.g. I see the SquareView in the dex which I in no way use in the App.
Question
Surprisingly I found in the proguard documentation the following:
The library jars themselves always remain unchanged.
Can I somehow tell/trick proguard (in)to process them? I find this really strange especially because I expect more stuff to be removeable in the lib than in the App itself..
[1] https://github.com/ligi/FAST
[2] https://github.com/ligi/AndroidHelper

The Eclipse/Ant/Gradle build processes in the Android SDK automatically specify your code (from bin/classes) and its libraries (from libs) with the option -injars. This means that the complete application is compacted, optimized, and obfuscated (in release builds, assuming ProGuard is enabled).
The build processes only specify the Android runtime android.jar with the option -libraryjars. It is necessary to process the code, but it should not end up in the processed apk, since it is already present on the device.
So it should all work out automatically. You may still see entire libraries with their original names in processed apks, if your configuration proguard-project.txt contains lines like -keep class org.mylibrary.** { *; }. Such configuration is typically a conservative solution to account for reflection. With some research and experimentation, you can often refine the configuration and get better results. You can figure out why classes are being kept with the option -whyareyoukeeping.

I believe you have to use -injars:
-injars class_path
Specifies the input jars (or wars, ears, zips, or directories) of the application to be processed. The class files in these jars will be
processed and written to the output jars. By default, any non-class
files will be copied without changes. Please be aware of any temporary
files (e.g. created by IDEs), especially if you are reading your input
files straight from directories. The entries in the class path can be
filtered, as explained in the filters section. For better readability,
class path entries can be specified using multiple -injars options.
Source: http://proguard.sourceforge.net/index.html#manual/usage.html

Related

What are these files in analyse apk?

these files weren't in my project but were auto included in analyze apk?
what are these a and b packages which consumed more space?

I believe those are obfuscated classes.
While obfuscation does not remove code from your app, significant size savings can be seen in apps with DEX files that index many classes, methods, and fields. However, as obfuscation renames different parts of your code, certain tasks, such as inspecting stack traces, require additional tools
You can get more info here

Android : Explanation of Proguard Integration

I've been doing Android development for a little bit and I'm getting to a point in one of my projects where I would like to use Proguard to shrink the size of my APk and help with the dex limit. Unfortunately, I am getting a few errors and stack overflow has answers but they seem to be targeted for those with more experience.
My question is what is the relationship with your proguard-android.txt and proguard-rules.pro? Why are there two separate files and why are they in separate formats? When are the statements in these files called and in what order? I am just looking for an explanation of the overall context of using Progurad in a development environment.
Thank you in advance.

ProGuard manipulates Java bytecode the way you tell it with your configuration files and the rules they contain. ProGuard can do many things. And it can completely break your app so you have to make sure to add the correct rules.
I assume you use Gradle based builds for your apps. Then you've probably encountered this snippet that enables ProGuard for release builds of your app (or Android library):
android {
buildTypes {
release {
minifyEnabled true
proguardFiles getDefaultProguardFile(‘proguard-android.txt'),
'proguard-rules.pro'
}
}
...
}
In the config the list proguardFiles tells the build what files that contain ProGuard rules it has to use. This list can contain any number of files.
Why are the files (proguard-android.txt and proguard-rules.pro) defined differently?
The magical getDefaultProguardFile(‘proguard-android.txt') loads file named proguard-android.txt from the standard location in the Android SDK (the location is ${ANDROID_SDK}/tools/proguard/).
Other config files are resolved locally, so file proguard-rules.pro is expected to be at the root of the current Gradle module.
Why are there two separate files? And what is the relationship between proguard-android.txt and proguard-rules.pro?
ProGuard configuration is additive. You can define some rules in one file and other in other files. The rules are internally concatenated into single list of rules.
File getDefaultProguardFile(‘proguard-android.txt') contains several general rules for all Android apps (check them yourself, in the file in your SDK). The local proguard-rules.pro is expected to contain rules specific for your own app. For example you want to make sure that a class is not stripped away when you use it only through reflection (I'll get to that later).
Note that having multiple local files is very useful. For example you can use two local config files for debug builds - one with the release rules for your app and the second containing rules disabling obfuscation.
Also note that the additive behaviour of the configurations can be a bit troubling. If you add a rule in one config file, you cannot remove it in another. So be careful with very general rules (e.g. imagine adding -keep class ** { *; }).
When are the statements in these files called and in what order?
You can define them in any order, there's no difference. And you can define the same rule in multiple files, it doesn't matter. The order of the specified files doesn't matter either.
ProGuard itself is run as a single job within the Android build (single Gradle task to be precise). The task is provided all the inputs:
classes to manipulate
library classes to use but not manipulate
output path for generated processed jar
ProGuard rules specifying the manipulation
output paths for various output information (what was removed, mapping, …)
And then it processes the files and generates an output which is further processed by the Gradle build.
How does ProGuard actually work? And why do I need the rules?
ProGuard traverses the whole call graph of classes/methods/fields/…. It starts with the classes/methods/… defined by the provided rules. Then traverses the call graph and marks classes/methods/fields/… as necessary and keeps them for the output. So if you call it with no matching keep rules it will generate an empty output (or maybe it will throw an error and tell you to define some, I don't remember now). ProGuard doesn't recognize calls done via reflection, so you have to add some rules to handle that. There are many other cases that require you to add some rules, check the documentation for that.
Final notes
If you check ProGuard documentation you can find various rules
you can use. But not all of the rules are good for Android (ProGuard is a general Java tool).
Some rules are generated by Android build itself, you don't have to define them yourself. There are 2 types of such rules:
General config rules like -injars, -libraryjars, …
Rules generated from AndroidManifest.xml and resources (layouts). Android build (aapt tool) generates rules to keep classes mentioned in the manifest (activities, services, receivers, …) and custom views used in layouts. You can check these generated rules in build/intermediates/proguard-rules/${PRODUCT_FLAVOR}/${BUILD_TYPE}/aapt_rules.txt
Some rules can come from aar libraries. The libraries can contain ProGuard config necessary for the library to work (there can be proguard.txt file inside).
When writing Android libraries yourself be extremely careful with the rules you want to add to the aar. Because of the additive nature of the rules, it can cause problems for the app that bundles the library.

Please explain couple of proguard keywords

Would any of you be so kind as to rephrase (in your own words) the explanations for some of the proguard keywords that are written in their manual? I have hard time understanding in full what some of them mean, and what changes if they are not there in the .cfg file.
The keywords I'm interested are:
1) -dontskipnonpubliclibraryclasses and -dontskipnonpubliclibraryclassmembers
The second is being explained as:
Specifies not to ignore package visible library class members (fields and methods). By default, ProGuard skips these class members while parsing library classes, as program classes will generally not refer to them. Sometimes however, program classes reside in the same packages as library classes, and they do refer to their package visible class members. In those cases, it can be useful to actually read the class members, in order to make sure the processed code remains consistent.
First of all, does it refere only in the context of external jars? Second, what is the difference between those flags reside in the .cfg file vs not being there?
2) -libraryjars, I'm lost for that one. What is the purpose of this keyword? On proguard manual page it reads:
Specifies the library jars (or aars, wars, ears, zips, apks, or directories) of the application to be processed.
So does it mean, that if I don't use this flag, then those jars are not put under the whole obfuscation process? But if that's the case, then why when I don't use this keyword, there are a lot of warnings for classes in those jars in the proguard output?
Next it says:
The files in these jars will not be included in the output jars.
What does it mean exactly? It means, that if this flag is set, then all other files aside .class files will not be included in the parent's application jar?

After hours of reading I think I got my answers. Especialy what helped me was reading many of the creator of ProGuard answers here at StackOverflow.
Let me start with the jars topic. Libraryjars is usually the platfrom jar, the application is build against, so android.jar is a good example here. This jar will not be processed, it's classes will not reside in the output apk, because they will be all on the clients device. They will not be obfuscated or shrunk, because a) even if they were they would not be coppied into the output apk anyway, and b) if they were obfuscated then it would crash application due to the fact that say Activity during obfuscation would have the name changed to "a", but on the clients device the android API is not changed.
So libraryjars is used for all the jars that proguard needs when processing our app, but which jars will not be included (or it's class files) in the final apk.
Injars on the other hand are all the jars that we want to be shrunk/obfuscated etc (unless we use keep* keywords).
Now the reason that I had so much difficulty was because there were conflicting information about those keywords all over the place. Some people said to use -injars, some said to use -libraryjars, some said neighter. What I found out later on, is that the last answer is correct. No -libraryjars or -injars keyword is needed because ADT does all this for the developer, and it uses the -injars keyword with all the jars residing in the /libs folder.
That is also the reason why I found many people using the "keep" keywords with the packages of one of the jars to ignore it's obfuscation/shrinking. The reason for it is that because ADT uses -injars keyword for those jars by default (and not libraryjars which would essentialy do the same in this context) then those jars are marked to be processed (obfuscated/shrunk). To negate this effect, people use -keep keywords for the packages of those jars.
As for the #1 question:
First of all, does it refere only in the context of external jars? The answer is no. It reffers to all the libraries even referenced inside the attached jars
Second, what is the difference between those flags reside in the .cfg file vs not being there? From what I found out it's for helping the ProGuard with processing of those libraries.

Why is -dontusemixedcaseclassnames included in the default ProGuard-android.xml file?

According to the documentation -dontusemixedcaseclassnames turns off the feature that causes files to self distruct if extracted on windows. surely this is a good thing when trying to hide your code. Why is it enabled, is there a downside to not using it?
-dontusemixedcaseclassnames
Specifies not to generate mixed-case class names while obfuscating. By
default, obfuscated class names can contain a mix of upper-case
characters and lower-case characters. This creates perfectly
acceptable and usable jars. Only if a jar is unpacked on a platform
with a case-insensitive filing system (say, Windows), the unpacking
tool may let similarly named class files overwrite each other. Code
that self-destructs when it's unpacked! Developers who really want to
unpack their jars on Windows can use this option to switch off this
behavior. Obfuscated jars will become slightly larger as a result.
Only applicable when obfuscating.

Dalvik bytecode works fine with similar mixed-case class names. I suspect the configuration in the Android SDK contains the option to avoid confusion for developers who inspect their own compiled code.

Does Proguard guarantee to provide the same mapping if no source has changed?

In the case, if I will
build a project
clean up all binaries
build it again (no source/resources and etc has changed).
Does Proguard guarantee to provide the same mapping.txt file?

ProGuard is deterministic: for the same input, it will generate the same output.
There is one subtlety though: if the operating system lists input files in a directory (notably class files that are not inside an archive) in a different order, then they may be processed in a different order, and the output can be different.

It might actually happen, but i don't think proguard guarantees that.
I found this in the Proguard documentation that will allow you to reuse your mapping.txt to avoid changes on the mappings
-applymapping filename
Specifies to reuse the given name mapping that was printed out in a previous obfuscation run of ProGuard. Classes and class members that are listed in the mapping file receive the names specified along with them. Classes and class members that are not mentioned receive new names. The mapping may refer to input classes as well as library classes. This option can be useful for incremental obfuscation, i.e. processing add-ons or small patches to an existing piece of code. If the structure of the code changes fundamentally, ProGuard may print out warnings that applying a mapping is causing conflicts. You may be able to reduce this risk by specifying the option -useuniqueclassmembernames in both obfuscation runs. Only a single mapping file is allowed. Only applicable when obfuscating.

If you want a guarantee then you have to use the mappings file as input to the obfuscation process. But then you carefully have to check all warnings about conflicts relating to that mapping file. If you ignore that, you may get subtle errors, when working with reflection.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.