Android: How can I intercept native function calls? - android

I'm a student in computer science. As part of my master's project, I'm trying to intercept calls to functions in native libraries on the Android platform. The goal is to decide whether to allow the call or deny it in order to improve security.
Following the approach of a research paper 1, I want to modify the Procedure Linkage Table (PLT) and the Global Offset Table (GOT) of the ELF file. The idea is that I want to make all the function calls point to my own intercepting function, which decides whether to block the call or pass it through to the original target function.
The ELF specification 2 says (in Book III, Chapter 2 Program Loading and Dynamic Linking, page 2-13, Sections "Global Offset Table" and "Procedure Linkage Table") that the actual contents and form of the PLT and the GOT depend upon the processor. However, in the documentation "ELF for the ARM Architecture" 3, I was unable to see the exact specification of either of those tables. I am concentrating on ARM and not considering other architectures at the moment.
I have 3 questions:
How can I map a symbol to a GOT or PLT entry?
Where do I find the precise specification of the GOT and PLT for ARM processors?
As the PLT contains machine code; will I have to parse that code in order to modify the target address, or do all PLT entries look identical, so that I could just modify the memory at a constant offset for each PLT entry?
Thanks,
Manuel

You need to parse ELF headers and look up the symbol index by the string name in the SHT_DYNSYM. Then iterate over the GOT (which would be called ".rela.plt") and find the entry with the matching index.
I don't know about the formal spec, but you can always study the android linker source and disassemble some binaries to notice the patterns
Usually PLT is just common code and you don't need to modify it. It's actually designed this way because if linker had to modify it, you would end up with RWX memory which is undesirable. So you just need to rewrite the entry in the GOT. By default the GOT entries point to the resolver routine that will find the needed function and write the entry to the GOT. That's on Linux. On Android the address are already resolved.
I did something for the x86_64 Linux
https://github.com/astarasikov/sxge/blob/vaapi_recorder/apps/src/sxge/apps/demo1_cube/hook-elf.c
And also there's a blog about doing what you want on Android
https://www.google.de/amp/shunix.com/android-got-hook/amp/

Related

How to obtain thread name in android ndk

In my Android project, I'm using std::thread.
I use the same C++ code also in some Linux and OSX projects.
For debugging purpose, I want to assign human-readable thread names and I do that by calling pthread_setname_np() (because lack of std::thread::set_name()).
In case of later debug output, I try to obtain the current thread name by calling pthread_getname_np() and this works e.g. on Linux target.
But for my surprise, there is no pthread_getname_np() in Android Ndk pthread.h, not in e.g. ndk-bundle/platforms/android-19/arch-arm/usr/include/pthread.h nor in ndk-bundle/platforms/android-21/arch-arm/usr/include/pthread.h
A stupid trying with a forward declaration like:
extern "C" int pthread_getname_np(pthread_t, char*, size_t);
fails with a linker error (as expected).
Any idea how to obtain the human readable name of the current thread in Android from C/C++ code?
You can see how Dalvik sets them in dalvik/vm/Thread.cpp. It uses pthread_setname_np() if available, prctl(PR_SET_NAME) if not. So if pthread_getname_np() isn't available -- and bear in mind that "np" means "non-portable" -- you can use prctl(PR_GET_NAME) to get a 16-byte null-terminated string under Linux.
You can find other bits by fishing around in /proc entries.
If you have specific requirements for the size and format of the name then you may want to define a pthread key and tuck it into thread-local storage. It's more work, but it's consistent and portable.

Load OpenCV's ML (SVM) from string

I'm currently developing an algorithm for texture classification based on Machine Learning, primarily Support Vector Machines (SVM). I was able to gain some very good results on my test data and now want to use the SVM in productive environment.
Productive in my case means, it is going to run on multiple Desktop- and Mobile platforms (i.e. Android, iOS) and always somewhere deep down in native threads. For reasons of software structure and the platform's access policies, I'm not able to access the file system from where I use the SVM. However, my framework supports reading Files in an environment where access the file system is granted and channel the file's content as a std::string to the SVM-part of my application.
The standard procedure how to configure an SVM is by using filenames and OpenCV reads directly from the file:
cv::SVM _svm;
_svm.load("/home/<usrname>/DEV/TrainSoftware/trained.cfg", "<trainSetName>");
I want this (basically reading from the file somewhere else and passing the file's content as a string to the SVM):
cv::SVM _svm;
std::string trainedCfgContentStr="<get the content here>";
_svm.loadFromString(trainedCfgContentStr, "<trainSetName>") // This method is desired
I couldn't find anything in OpenCV's docs or source that this is possible somehow, but it wouldn't be the first OpenCV-Feature that's there and not documented or widely known. Of course, I could hack the OpenCV source and cross-compile to each of my target platforms, but I'd try to avoid that since it is a hell lot of work, besides I'm pretty convinced I'm not the first one with this problem.
All ideas (also unconventional) and/or hints are highly appreciated!
as long as you stick with the c++ api it's quite easy, FileStorage can read from memory:
string data_string; //containing xml/yml data
FileStorage fs( data_string, FileStorage::READ | FileStorage::MEMORY);
svm.read(fs.getFirstTopLevelNode()); // or the node with your trainset
(unfortunately not exposed to java)

OpenGL ES: using Tegra specific extensions (GL_EXT_texture_array)

I'm currently developing an OpenGL-ES application for Android using the NDK.
The application would greatly benefit from the following Open-GL extension:
GL_EXT_texture_array
(details here: GL_EXT_texture_arary)
The extension is supported by my Tegra-3 device (Asus EeePad Transformer Prime Tf-201)
The issue I'm now facing is, that I have no clue how to make the extension available for my application as it is not included by the Open-GL ES API registry.
(see "Extension Specifications": http://www.khronos.org/registry/gles/)
However I noticed an extension called "GL_NV_texture_array" which seems to be of the same use, but is not supported by my Tegra-3 device.
I'm aware of the possibility to include extensions using function pointers.
But I thought there might be a more comfortable way.
I have also found a header file (gl2ext_nv.h), which contains the necessary extension.
But when you search for it through google, the file is always part of particular projects, not something official.
I have also downloaded the Tegra Android Development Pack (2.0) in which neither this header file nor the desired extension is included.
Can anyone explain this to me, please?
How can I use Open-GL ES extension supported by my Tegra-3 device,
which are seemingly not supported by any official Open-GL ES specific headers (in the NDK)?
Thanks in advance!
When you say that your Tegra 3 device supports GL_EXT_texture_array but not GL_NV_texture_array, I'm assuming that you determined that through a call to glGetString(GL_EXTENSIONS).
GL_NV_texture_array is very similar to GL_EXT_texture_array, just limited to 2d texture arrays. Not surprisingly, it uses many of the same constants as GL_EXT_texture_array, just with different names.
GL_NV_texture_array:
TEXTURE_2D_ARRAY_NV 0x8C1A
TEXTURE_BINDING_2D_ARRAY_NV 0x8C1D
MAX_ARRAY_TEXTURE_LAYERS_NV 0x88FF
FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER_NV 0x8CD4
SAMPLER_2D_ARRAY_NV 0x8DC1
GL_EXT_texture_array:
TEXTURE_2D_ARRAY_EXT 0x8C1A
TEXTURE_BINDING_2D_ARRAY_EXT 0x8C1D
MAX_ARRAY_TEXTURE_LAYERS_EXT 0x88FF
FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER_EXT 0x8CD4
SAMPLER_2D_ARRAY_EXT 0x8DC1
This version of gl2ext_nv.h defines the constants for GL_EXT_texture_array but not for GL_NV_texture_array, so perhaps nVidia is using the the old name now. If you can't find a more recent version of the header, just include this one.
To gain access to functions offered by GL extensions, use eglGetProcAddress to assign the function to a function pointer.
// The function pointer, declared in a header.
// You can put this in a class instance or at global scope.
// If the latter, declare it with "extern", and define the actual function
// pointer without "extern" in a single source file.
PFNGLFRAMEBUFFERTEXTURELAYEREXTPROC glFramebufferTextureLayerEXT;
In your function that checks for the presence of the GL_EXT_texture_array extension, if it's found, get the address of the function and store it in your function pointer. With OpenGL-ES, that means asking EGL:
glFramebufferTextureEXT = (PFNGLFRAMEBUFFERTEXTURELAYEREXTPROC) eglGetProcAddress("glFramebufferTextureLayerEXT");
Now you can use the function just like it was part of regular OpenGL.

Good way to identify the code section in /proc/*/maps

I'm trying to find the address range that my native Android library's code occupies in the process' address space. I'm reading and parsing /proc/self/maps. There are two sections for the library. One is code, the other one is data, I presume. I need to tell them apart. However, the difference between them is, well, kinda circumstantial. Testing on Android 2.3.3.
The code section's permissions are r-xp, the data's are rwxp - both are executable. I feel uneasy basing the decision upon writeability - what if on same flavors of Android, there's a read-only data section?
The other difference is offset of the mapped section relative to the file - the code section has offset 0. Again, what if some iteration of the linker places data before code?
The tools, like GDB and Android's stack walker, have no problem telling me what module does a code address belong to, and what's its offset in the library. Just sayin'.
EDIT: on Android 4.0, the sections are different: there's r-xp, r--p, rw-p. So that lets me identify the executable section failry easily - but what about earlier Androids?
Found a workaround. Fortunately, those are my libraries, so I can get a address that's for sure within the code block by having a function that returns its own address. Then I match it against section boundaries.
void *TestAddress()
{
return TestAddress; //That's within the code block, that's for sure.
}
This won't work for third party libraries, because a function address of an imported function would correspond to the import thunk, not to the real function's body.

How to remove strings from a compiled binary (.so)

How do I remove strings from / obfuscate a compiled binary? The goal is to avoid having people read the names of the functions/methods inside.
It is a dynamic library (.so) compiled from C++ code for Android with the NDK tools (includes GCC)
I compile with -O3 and already use arm-eabi-strip -g mylib.so to remove debugging symbols, but when I do strings mylib.so all the names of the functions/methods are still readable.
These strings are in the dynamic symbol table, which is used when the library is loaded at runtime. readelf -p .dynstr mylib.so will show these entries.
strip -g will remove debugging symbols, but it can't remove entries from the dynamic symbol table, as these may be needed at runtime. Your problem is that you have entries in the dynamic symbol table for functions which are never going to be called from outside your library. Unless you tell it, the compiler/linker has no way of knowing which functions form part of the external API (and therefore need entries in the dynamic symbol table) and which functions are private to your library (and so don't need entries in the dynamic symbol table), so it just creates dynamic symbol table entries for all non-static functions.
There are two main ways you can inform the compiler which functions are private.
Mark the private functions static. Obviously, this only works for functions only needed within a single compilation unit, though for some libraries this technique might be sufficient.
Use the gcc "visibility" attribute to mark the functions as visible or hidden. You have two options: either mark all the private functions as hidden, or change the default visibility to hidden using the -fvisibility=hidden compiler option and mark all the public functions as visible. The latter is probably the best option for you, as it means that you don't have to worry about accidentally adding a function and forgetting to mark it as hidden.
If you have a function:
int foo(int a, int b);
then the syntax for marking it hidden is:
int foo(int a, int b) __attribute__((visibility("hidden")));
and the syntax for marking it visible is:
int foo(int a, int b) __attribute__((visibility("default")));
For further details, see this document, which is an excellent source of information on this subject.
There are some commercial obfuscators which accomplish this. Basically, they re-write all of the symbols on the go. Something like this:
void foo()
becomes
void EEhj_y33() // usually much, much longer and clobbered
Variable names are also given the same treatment, as are members of structures / unions (depending on what level of obfuscation you set).
Most of them work by scanning your code base, establishing a dictionary then substituting garbled messes for symbol names in the output, which can then be compiled as usual.
I don't recommend using them, but they are available. Simply obfuscating meaningful symbol names is not going to stop someone who is determined to discover how your library / program works. Additionally, you aren't going to be able to do anything about someone who traces system calls. Really, what's the point? Some argue that it helps keep the 'casual observer' at bay, I argue that someone running ltrace strace and strings is typically anything but casual.
Unless you mean string literals , not symbols ? There's nothing you can do about them, unless you store the literals in an encrypted format that you code has to decrypt before using. That is not just a waste, but an egregious waste that provides no benefit whatsoever.
Assuming you are correctly specifying a hidden visibility to g++ for all of your source files (as other posters have recommended), there's a chance you might be running in to this GCC bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38643
Try dumping the symbols in your binary that are showing up (readelf -Wa mylib.so | c++filt | less); if you see only vtable and VTT symbols after demangling, then the gcc bug might be your problem.
Edit: if you can, try GCC 4.4.0 or later, as it appears to be fixed there.
They are unavoidable. Those strings are the means by which the loader links shared libraries at runtime.

Categories

Resources