Linker stripping unused classes

Linker stripping unused classes - android

I am working on a cross platform project in C++, under IOS and Android, and I am having the following situation
:
I am writing a library used to load scene graphs directly from xml files describing them. The library has a base tree node class, that implements all the functionality to make a class constructable by it's name. Then additional tree nodes are implemented, all deriving from this base node class. This works excellent but there is one problem. The linker 'thinks' that some of my classes are not going to be used and strips them out of the library. I have a nasty workaround right now, having a file that includes all existing nodes headers, and in this file one instance of every node is being created and altered to indicate the compiler/linker that this class is really being used.
Does anybody know a good design pattern that can be used to automatically generate the required instances of all classes?
I have tried to create macros that are placed into the classes cpp file that creates a static instance of the given class, but the linker still detects that those static instances are never going to be referenced.
Or is there a linker flag that can be used to tell the linker not to strip any unused classes out? Like already mentioned: I am working on Android (ndk 6.0) and on IOS (xcode 4.2)
This problem is not going to be a showstopper for my project but it would really be nice to find an acceptable solution here.

It is implementation-defined whether the dynamic initialization of a non-local variable with static storage
duration is done before the first statement of main. If the initialization is deferred to some point in time
after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable
defined in the same translation unit as the variable to be initialized.
Therefore there's no standard way to guarantee the construction of those objects but to list them all in one specific place.

Related

How to best divide big app into modules?

Building the app I am working on takes a lot of time. Its the biggest one I worked on. I tried to tweak the gradle settings, which is helping, but still the build is quite slow.
Since the app was built without modules in mind, its just a whole lot of packages and now I wonder how I could "extract" some of them and put them into separate modules. AFAIK the modules should not have dependencies to the app module, so I wondered if there is a tool or technique which would allow me to analyse code and help me to find the right packages to extract, since it's a lot of code.
How would you approach my issue?

This is primarily a design problem. As you stated that there is already a large amount of code in the project, one approach would be to analyse the UML diagram for the entire project structure. The goal is to identify regions of the architecture where the interactions are closely coupled between a few classes, groups may also be formed based on which classes have the same external dependencies.
With this approach, you reduce the complexity of the large project, de-coupling classes from external dependencies which they do not use in the large project. The invididual modules which you split the project into will have faster build times. The modules which you split the project into can then be referenced in the main project as dependencies. The additional benefit is that only the modified modules in the main project will be rebuilt each time you make changes.
This Stack Overflow post discusses many UML diagram generator plugins for Android Studio. Code Iris is a good option that you can install via the Android Studio plugin menu. As an example, here is the output from Code Iris on a sample FaceTracker Android application (click on the diagram to enlarge):
The diagram here shows the grouping of packages and projects. You can see that different projects are split into separate green boxes, within these boxes, are boxes for the packages and then finally classes and interactions. By analysing the UML, you can first identify how to best group your classes and create individual projects. Once you split the main project into modules, you can then use Code Iris again to visualise interactions after changes have been made to the structure.

Your question is Source Code Modularization in Software Engineering. It is new subject in software and there are few references about it. Source Code Modularization is recasting of Clustering concepts on Source Codes.
in this reference from (see reference 1)
The aim of the software modularization process is to partition a
software system into subsystems to provide an abstract view of the
architecture of the software system, where a subsystem is made up of a set of software artifacts which collaborate with each other to
implement a high-level attribute or provide a high-level service for
the rest of the software system.
However, for large and complex software systems, the software
modularization cannot be done manually, owing to the large
number of interactions between different artifacts, and the large size
of the source code. Hence, a fully automated or semiautomated tool is
needed to perform software modularization.
There are many techniques (Algorithms) to Source Code Modularization (see reference 1):
Hierarchical Techniques:
Single Linkage, Complete Linkage, Average Linkage
Ward Method, Median Method, Centroid Method
Combined and Weighted Combined Methods
Search-Based Techniques:
Hill Climbing, Multiple Hill Climbing (HC)
Simulated Annealing (SA)
Genetic Algorithm (GA)
Notice that you can find other Clustering techniques with this names too. But Modularization is a little different. They are recast to source code modularization.
The overall Source Code Modularization Process shown as below:
There are many tools you can use. You can use them in Modularization Process:
Static Source Code Analysis Tools (to get ADG format and etc.) see the reference here - (like Understand, NDepend and etc.)
Visualization Tools - (Graph Visualization) see the list here (like Tom Sawyer Visualization)
For example of little project, If your project structure (that generated from source code by use of Static Analysis Tools) are like this:
the result can be like this (after applying Modularization Process):

I would Divide my application into four layers :
Layer for Objects : in this layer you initiate all the objects that you are in need , with the get and set methods {example:
class person{
region private
private int _PersonID;
endregion
region public
public int PersonID{get{return _PersonID;}set{_PersonID=value;}}
endregion
}}
Layer for Data Access : this layer will handle the contribution of connecting your database and do everything related to procedures, triggers and functions .{this section must be truly protected }
{Do not implement any sql queries inside your code , build all your queries into your database and connect those procedure by calling their names in your codes}
{example: //
class personDAO
{
private List _GetPersons(){//codes here} ;
public List GetPersons(){ _GetPersons();}
public delegate void del_GetPersons();
private del_GetPersons _del_GetPersons;
public del_GetPersons Del_GetPersons
{
get{return _del_GetPersons;}
set {_del_GetPersons=value;}
}
public personDAO()
{//constructor
del_GetPersons=GetPersons;
}
} }
Layer for Business Object , this Layer will delegate instances of the Data access library and than modify it and add with multiple exception handlers . "we use delegates to hide our method names that are used in by equalizing the method to it's delegate into the constructor function of the DataAccessLibrary ".
example
class personBO
{
//create instance of personDAO
//create an other delegate for personBO
//create private method _GetPerson(){//call personDAO.del_GetPersons()}
//create public method GetPerson() {// call _GetPerson()}
create public constructor function personBO{//set public method = delegates of bo}
}
4.Finally there is the final layer or the layer where the user have the privilege to inter-act with , it is a multiple connected forms that are handled via front-end handlers and hidden back-end Handlers (where they are called using delegates too).
this structure may take longer in building your application than
other
but it is fast ( since delegates make it faster)
it is protected( since it is devised into many layers and you are dealing with the hidden methods that call an instance of an object not the object itself).

Can't remove an empty folder in C++

I want to remove an empty folder using remove() in C++ on Windows 7 but I can't. I tried rmdir() instead of remove() then the folder got removed!
Nevertheless, the reason why I don't use rmdir() is due to Android. In a library project for Android, I can't include "direct.h" header so can't use rmdir(), either. Unlike on Windows, the function remove() works well on Android. I don't understand why.
Anybody knows why this is happening?
Or any other functions which will work on both Windows and Android?

This is a pretty common problem when writing cross-platform programs.
Sometimes, a library can provide the abstraction you need. For example, Boost has a filesystem library that can enumerate files, manipulate directories, etc., on multiple platforms using the exact same code.
Also, there are usually symbols defined that allow you to detect which compiler is currently building your code. Even if there isn't one that does what you want, you can define your own.
Let's say you need to build your software for two different fictitional operating systems named FooOS and for BarOS. I'm going to invent two symbols, FOO_OS and BAR_OS. In my code, I can do something like this:
#ifdef FOO_OS
#include <foo_stuff.h>
#elseif BAR_OS
#include <bar_stuff.h>
#endif
void do_something()
{
#ifdef FOO_OS
do_it_this_way();
#elseif BAR_OS
do_it_that_way();
#endif
}
Now, we just need to either define FOO_OS or BAR_OS. This can be done through an IDE's project configuration or on the command line of the compiler. Use Google to find out about your particular situation, since you didn't include those details in your post.
There is a preprocessing step when you compile your code that makes a pass through the source, and applies these conditional statements. A following pass actually compiles the code. Here is some documentation about Visual Studio's preprocessor, for example.

How can I examine the whole source tree with an annotation processor?

I have a lot of handler classes that handle specific message types. To register all these handlers, I need to know which ones exist. Currently, they're all annotated with a specific annotation, and I use a Java 6 annotation processor to get all of them, and make a Register class that holds an instance of each of the annotated types.
This works great if the whole tree is being built at once, but if just one of the annotated classes is being built (when I save the file in Eclipse, for example), the processor only sees that type, and builds an incomplete Register. How can I examine the other types in this scenario?

I've solved this well enough for now. What I did is a little hackey, but basically for every annotated class I see, I add its name to a HashSet. Then I use Filer.getResource() to get open a file where I've recorded all the previously-seen annotated classes, and add those to the HashSet too. Then I generate the register class, and write the whole HashSet out to the same resource with Filer.createResource(). This will cause problems if I delete an annotated type, since it will still be recorded in that file, but I can just clean the project or delete that file to solve it.
EDIT: Also, I believe that passing the appropriate "originating elements" to Filer.createSource() should allow Eclipse to track those dependencies properly, but it doesn't. Perhaps that's an Eclipse bug.

Unsurprisingly, compile-time annotation processors only process the files being compiled. Eclipse uses incremental compilation to save time, so the short answer is that you cannot expect your annotation processor to see all types in one pass.
One solution is to change your architecture to support incremental compilation. For example, for each annotated HandlerClass, generate a RegisterHandlerClass class that registers that handler class.
That said, sounds like what you are doing would be better done at runtime, perhaps with the help of a tool like Reflections.

Combining C-code files into one C-code file

I'm converting libx264 to renderscript as an exercise in how much work it is to port a bit larger project into renderscript. One of the pains with renderscript is that everything needs to be declared static to not be automatically getting a java interface. Also this automatic java interface can't handle pointer, multi-dim arrays etc. Hence I need to declare all functions and global variables as static in libx264, besides a few invocation functions to control it.
My problem then is that since everything is declared static I need to have all the code in one file scope. I started to just include all the C-code files into one and compile that. Which would had worked quite easily if not libx264 itself had also included C-files with different pre-processing macro definitions, hence some functions exist twice with different content and some is redeclared identical. I could of course handle this manually, but it would be easier with a tool.
I'm asking if anyone knows of a tool that can take a C project and pre-process/merge that into one C-file, managing re-declarations, conflicting declarations, etc.
And I thought the heap allocations would be the difficult problem...

I have found a tool that does this, CIL.
http://sourceforge.net/projects/cil
http://kerneis.github.com/cil/doc/html/cil/merger.html
/Harald

How to remove strings from a compiled binary (.so)

How do I remove strings from / obfuscate a compiled binary? The goal is to avoid having people read the names of the functions/methods inside.
It is a dynamic library (.so) compiled from C++ code for Android with the NDK tools (includes GCC)
I compile with -O3 and already use arm-eabi-strip -g mylib.so to remove debugging symbols, but when I do strings mylib.so all the names of the functions/methods are still readable.

These strings are in the dynamic symbol table, which is used when the library is loaded at runtime. readelf -p .dynstr mylib.so will show these entries.
strip -g will remove debugging symbols, but it can't remove entries from the dynamic symbol table, as these may be needed at runtime. Your problem is that you have entries in the dynamic symbol table for functions which are never going to be called from outside your library. Unless you tell it, the compiler/linker has no way of knowing which functions form part of the external API (and therefore need entries in the dynamic symbol table) and which functions are private to your library (and so don't need entries in the dynamic symbol table), so it just creates dynamic symbol table entries for all non-static functions.
There are two main ways you can inform the compiler which functions are private.
Mark the private functions static. Obviously, this only works for functions only needed within a single compilation unit, though for some libraries this technique might be sufficient.
Use the gcc "visibility" attribute to mark the functions as visible or hidden. You have two options: either mark all the private functions as hidden, or change the default visibility to hidden using the -fvisibility=hidden compiler option and mark all the public functions as visible. The latter is probably the best option for you, as it means that you don't have to worry about accidentally adding a function and forgetting to mark it as hidden.
If you have a function:
int foo(int a, int b);
then the syntax for marking it hidden is:
int foo(int a, int b) __attribute__((visibility("hidden")));
and the syntax for marking it visible is:
int foo(int a, int b) __attribute__((visibility("default")));
For further details, see this document, which is an excellent source of information on this subject.

There are some commercial obfuscators which accomplish this. Basically, they re-write all of the symbols on the go. Something like this:
void foo()
becomes
void EEhj_y33() // usually much, much longer and clobbered
Variable names are also given the same treatment, as are members of structures / unions (depending on what level of obfuscation you set).
Most of them work by scanning your code base, establishing a dictionary then substituting garbled messes for symbol names in the output, which can then be compiled as usual.
I don't recommend using them, but they are available. Simply obfuscating meaningful symbol names is not going to stop someone who is determined to discover how your library / program works. Additionally, you aren't going to be able to do anything about someone who traces system calls. Really, what's the point? Some argue that it helps keep the 'casual observer' at bay, I argue that someone running ltrace strace and strings is typically anything but casual.
Unless you mean string literals , not symbols ? There's nothing you can do about them, unless you store the literals in an encrypted format that you code has to decrypt before using. That is not just a waste, but an egregious waste that provides no benefit whatsoever.

Assuming you are correctly specifying a hidden visibility to g++ for all of your source files (as other posters have recommended), there's a chance you might be running in to this GCC bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38643
Try dumping the symbols in your binary that are showing up (readelf -Wa mylib.so | c++filt | less); if you see only vtable and VTT symbols after demangling, then the gcc bug might be your problem.
Edit: if you can, try GCC 4.4.0 or later, as it appears to be fixed there.

They are unavoidable. Those strings are the means by which the loader links shared libraries at runtime.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.