How to remove strings from a compiled binary (.so)

How to remove strings from a compiled binary (.so) - android

How do I remove strings from / obfuscate a compiled binary? The goal is to avoid having people read the names of the functions/methods inside.
It is a dynamic library (.so) compiled from C++ code for Android with the NDK tools (includes GCC)
I compile with -O3 and already use arm-eabi-strip -g mylib.so to remove debugging symbols, but when I do strings mylib.so all the names of the functions/methods are still readable.

These strings are in the dynamic symbol table, which is used when the library is loaded at runtime. readelf -p .dynstr mylib.so will show these entries.
strip -g will remove debugging symbols, but it can't remove entries from the dynamic symbol table, as these may be needed at runtime. Your problem is that you have entries in the dynamic symbol table for functions which are never going to be called from outside your library. Unless you tell it, the compiler/linker has no way of knowing which functions form part of the external API (and therefore need entries in the dynamic symbol table) and which functions are private to your library (and so don't need entries in the dynamic symbol table), so it just creates dynamic symbol table entries for all non-static functions.
There are two main ways you can inform the compiler which functions are private.
Mark the private functions static. Obviously, this only works for functions only needed within a single compilation unit, though for some libraries this technique might be sufficient.
Use the gcc "visibility" attribute to mark the functions as visible or hidden. You have two options: either mark all the private functions as hidden, or change the default visibility to hidden using the -fvisibility=hidden compiler option and mark all the public functions as visible. The latter is probably the best option for you, as it means that you don't have to worry about accidentally adding a function and forgetting to mark it as hidden.
If you have a function:
int foo(int a, int b);
then the syntax for marking it hidden is:
int foo(int a, int b) __attribute__((visibility("hidden")));
and the syntax for marking it visible is:
int foo(int a, int b) __attribute__((visibility("default")));
For further details, see this document, which is an excellent source of information on this subject.

There are some commercial obfuscators which accomplish this. Basically, they re-write all of the symbols on the go. Something like this:
void foo()
becomes
void EEhj_y33() // usually much, much longer and clobbered
Variable names are also given the same treatment, as are members of structures / unions (depending on what level of obfuscation you set).
Most of them work by scanning your code base, establishing a dictionary then substituting garbled messes for symbol names in the output, which can then be compiled as usual.
I don't recommend using them, but they are available. Simply obfuscating meaningful symbol names is not going to stop someone who is determined to discover how your library / program works. Additionally, you aren't going to be able to do anything about someone who traces system calls. Really, what's the point? Some argue that it helps keep the 'casual observer' at bay, I argue that someone running ltrace strace and strings is typically anything but casual.
Unless you mean string literals , not symbols ? There's nothing you can do about them, unless you store the literals in an encrypted format that you code has to decrypt before using. That is not just a waste, but an egregious waste that provides no benefit whatsoever.

Assuming you are correctly specifying a hidden visibility to g++ for all of your source files (as other posters have recommended), there's a chance you might be running in to this GCC bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38643
Try dumping the symbols in your binary that are showing up (readelf -Wa mylib.so | c++filt | less); if you see only vtable and VTT symbols after demangling, then the gcc bug might be your problem.
Edit: if you can, try GCC 4.4.0 or later, as it appears to be fixed there.

They are unavoidable. Those strings are the means by which the loader links shared libraries at runtime.

Related

Android: How can I intercept native function calls?

I'm a student in computer science. As part of my master's project, I'm trying to intercept calls to functions in native libraries on the Android platform. The goal is to decide whether to allow the call or deny it in order to improve security.
Following the approach of a research paper 1, I want to modify the Procedure Linkage Table (PLT) and the Global Offset Table (GOT) of the ELF file. The idea is that I want to make all the function calls point to my own intercepting function, which decides whether to block the call or pass it through to the original target function.
The ELF specification 2 says (in Book III, Chapter 2 Program Loading and Dynamic Linking, page 2-13, Sections "Global Offset Table" and "Procedure Linkage Table") that the actual contents and form of the PLT and the GOT depend upon the processor. However, in the documentation "ELF for the ARM Architecture" 3, I was unable to see the exact specification of either of those tables. I am concentrating on ARM and not considering other architectures at the moment.
I have 3 questions:
How can I map a symbol to a GOT or PLT entry?
Where do I find the precise specification of the GOT and PLT for ARM processors?
As the PLT contains machine code; will I have to parse that code in order to modify the target address, or do all PLT entries look identical, so that I could just modify the memory at a constant offset for each PLT entry?
Thanks,
Manuel

You need to parse ELF headers and look up the symbol index by the string name in the SHT_DYNSYM. Then iterate over the GOT (which would be called ".rela.plt") and find the entry with the matching index.
I don't know about the formal spec, but you can always study the android linker source and disassemble some binaries to notice the patterns
Usually PLT is just common code and you don't need to modify it. It's actually designed this way because if linker had to modify it, you would end up with RWX memory which is undesirable. So you just need to rewrite the entry in the GOT. By default the GOT entries point to the resolver routine that will find the needed function and write the entry to the GOT. That's on Linux. On Android the address are already resolved.
I did something for the x86_64 Linux
https://github.com/astarasikov/sxge/blob/vaapi_recorder/apps/src/sxge/apps/demo1_cube/hook-elf.c
And also there's a blog about doing what you want on Android
https://www.google.de/amp/shunix.com/android-got-hook/amp/

Can't remove an empty folder in C++

I want to remove an empty folder using remove() in C++ on Windows 7 but I can't. I tried rmdir() instead of remove() then the folder got removed!
Nevertheless, the reason why I don't use rmdir() is due to Android. In a library project for Android, I can't include "direct.h" header so can't use rmdir(), either. Unlike on Windows, the function remove() works well on Android. I don't understand why.
Anybody knows why this is happening?
Or any other functions which will work on both Windows and Android?

This is a pretty common problem when writing cross-platform programs.
Sometimes, a library can provide the abstraction you need. For example, Boost has a filesystem library that can enumerate files, manipulate directories, etc., on multiple platforms using the exact same code.
Also, there are usually symbols defined that allow you to detect which compiler is currently building your code. Even if there isn't one that does what you want, you can define your own.
Let's say you need to build your software for two different fictitional operating systems named FooOS and for BarOS. I'm going to invent two symbols, FOO_OS and BAR_OS. In my code, I can do something like this:
#ifdef FOO_OS
#include <foo_stuff.h>
#elseif BAR_OS
#include <bar_stuff.h>
#endif
void do_something()
{
#ifdef FOO_OS
do_it_this_way();
#elseif BAR_OS
do_it_that_way();
#endif
}
Now, we just need to either define FOO_OS or BAR_OS. This can be done through an IDE's project configuration or on the command line of the compiler. Use Google to find out about your particular situation, since you didn't include those details in your post.
There is a preprocessing step when you compile your code that makes a pass through the source, and applies these conditional statements. A following pass actually compiles the code. Here is some documentation about Visual Studio's preprocessor, for example.

Combining C-code files into one C-code file

I'm converting libx264 to renderscript as an exercise in how much work it is to port a bit larger project into renderscript. One of the pains with renderscript is that everything needs to be declared static to not be automatically getting a java interface. Also this automatic java interface can't handle pointer, multi-dim arrays etc. Hence I need to declare all functions and global variables as static in libx264, besides a few invocation functions to control it.
My problem then is that since everything is declared static I need to have all the code in one file scope. I started to just include all the C-code files into one and compile that. Which would had worked quite easily if not libx264 itself had also included C-files with different pre-processing macro definitions, hence some functions exist twice with different content and some is redeclared identical. I could of course handle this manually, but it would be easier with a tool.
I'm asking if anyone knows of a tool that can take a C project and pre-process/merge that into one C-file, managing re-declarations, conflicting declarations, etc.
And I thought the heap allocations would be the difficult problem...

I have found a tool that does this, CIL.
http://sourceforge.net/projects/cil
http://kerneis.github.com/cil/doc/html/cil/merger.html
/Harald

Linker stripping unused classes

I am working on a cross platform project in C++, under IOS and Android, and I am having the following situation
:
I am writing a library used to load scene graphs directly from xml files describing them. The library has a base tree node class, that implements all the functionality to make a class constructable by it's name. Then additional tree nodes are implemented, all deriving from this base node class. This works excellent but there is one problem. The linker 'thinks' that some of my classes are not going to be used and strips them out of the library. I have a nasty workaround right now, having a file that includes all existing nodes headers, and in this file one instance of every node is being created and altered to indicate the compiler/linker that this class is really being used.
Does anybody know a good design pattern that can be used to automatically generate the required instances of all classes?
I have tried to create macros that are placed into the classes cpp file that creates a static instance of the given class, but the linker still detects that those static instances are never going to be referenced.
Or is there a linker flag that can be used to tell the linker not to strip any unused classes out? Like already mentioned: I am working on Android (ndk 6.0) and on IOS (xcode 4.2)
This problem is not going to be a showstopper for my project but it would really be nice to find an acceptable solution here.

It is implementation-defined whether the dynamic initialization of a non-local variable with static storage
duration is done before the first statement of main. If the initialization is deferred to some point in time
after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable
defined in the same translation unit as the variable to be initialized.
Therefore there's no standard way to guarantee the construction of those objects but to list them all in one specific place.

determining if resources such as those in strings.xml are no longer used

I have done some significant re-coding on one of my Android programs and now I am unsure if certain xml strings are used anymore. In addition I have a few translations which makes the task even more difficult. Is there a tool to test this? This would be useful for drawables also.
I am using the eclipse plugin.

This question has been discussed in the irc channel before. There is no tool to test it, but I agree it would be useful. Note that resources can be referenced in xml, but they can also be referenced from code. Furthermore, resources can also be looked up by their identifier, and such lookup could be determined by runtime.
So actually you cannot determine 100% whether a resource is used or not anymore, but you can probably determine which resources are referenced in a static way (in xml or code). Depending on your code/app which you know best yourself, such approach might be sufficient in many cases.
The approach would be to write a tool that parses xml and java source files and also take the import statements into consideration. With that information you should be able to determine which resources you can get rid of.

The easiest way is to remove them all, attempt to compile, and re-add those the compiler says are lacking. It's a little tiresome, but it's certainly tractable.
Note, as Mathias already pointed out, that it's technically possible to access resources by name with a string at runtime, and the way I suggest here would remove such resources though they are, in fact, needed. However, this pattern should be really rarely seen in any application, and if you are the one who wrote it, you already know if/where you do such treatment.

Use grep to extract a list of resources to a file by way of sort
Use recursive grep through sort and uniq to create a list of those mentioned in any source file (make a copy of project without unused files or dispatch grep on a list of used ones, of course commented out code will be an issue)
Use diff on the two lists

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

How to remove strings from a compiled binary (.so) - android

They are unavoidable. Those strings are the means by which the loader links shared libraries at runtime.

Related

Android: How can I intercept native function calls?

Can't remove an empty folder in C++

Combining C-code files into one C-code file

Linker stripping unused classes

determining if resources such as those in strings.xml are no longer used

Categories

Resources