Junk byte injection in Android - android

After reading this interesting article about code obfuscation in Android, I'm trying to do it for research purposes but after applying the technique into a classes.dex file I'm getting a crash.
The next is the code I'm trying to run after applying the technique:
0006e8: |[0006e8] com.example.root.bji.MainActivity.paintGUI:()V
0006f8: 1202 |0000: const/4 v2, #int 0 // #0
0006fa: 1a01 0000 |0001: const-string v1, "" // string#0000
0006fe: 1200 |0003: const/4 v0, #int 0 // #0
000700: 1303 1400 |0004: const/16 v3, #int 20 // #14
000704: 3244 0900 |0006: if-eq v4, v4, 000f // +0009
000708: 2600 0300 0000 |0008: fill-array-data v0, 0000000b // +00000003
00070e: 0003 0100 1600 0000 1212 0000 0000 ... |000b: array-data (15 units)
00072c: 0000 |001a: nop // spacer
00072e: 0000 |001b: nop // spacer
... more NOPs ...
000742: 0000 |0025: nop // spacer
000744: 0000 |0026: nop // spacer
000746: 1503 087f |0027: const/high16 v3, #int 2131230720 // #7f08
...
To give you some context, I want to keep clear some assignations like the 0 value into the v2 register at 0x6f8 ("const/4 v2, 0" => 12 02), which will be shown in the GUI at the end of this method (at 0x746 and beyond); and using this obfuscation technique, "hide" the modification of the v2 register setting a value of 1 into the v2 register at 0x716 ("const/4 v2, 1" => 12 12).
If you follow the code at 0x704 the branch is done to 0x716, where the "const/4 v2, 1"r esides, inside the fill-data-array-payload.
And the problem I'm facing is a crash when I'm running the code (I've tried it from 4.3 to 5.1), and what logcat tells me when the crash happens is:
W/dalvikvm(13874): VFY: invalid branch target 9 (-> 0xf) at 0x6
W/dalvikvm(13874): VFY: rejected Lcom/example/root/bji/MainActivity;.paintGUI ()V
W/dalvikvm(13874): VFY: rejecting opcode 0x32 at 0x0006
W/dalvikvm(13874): VFY: rejected Lcom/example/root/bji/MainActivity;.paintGUI ()V
W/dalvikvm(13874): Verifier rejected class Lcom/example/root/bji/MainActivity;
W/dalvikvm(13874): Class init failed in newInstance call (Lcom/example/root/bji/MainActivity;)
D/AndroidRuntime(13874): Shutting down VM
For what I understand in the logs, the OS is rejecting the "if-eq" jump because the offset pointed (I've tried other branch instructions but the result is the same). The only way the code works is if I point to an offset outside the fill-array-data-payload, but then there is no obfuscation technique applied :P.
Anyone have tried something similar to this technique or have fight against this branch verification rejection?

This is not expected to work. The bytecode verifier explicitly checks all branches for validity. The question of whether or not an address is an instruction or data is determined by a linear walk through the method. Data chunks are essentially very large instructions, so they get stepped over.
You can make this work if you modify the .odex output, and set the "pre-verified" flag on the class so the verifier doesn't examine it again -- but you can't distribute an APK that way.

This "obfuscation" technique worked due to an issue in dalvik. This issue was fixed somewhere around the 4.3 timeframe, although I'm not sure the first released version that contained the fix. And lollipop uses ART, which never had this issue.
Here is the change that fixed this issue: https://android-review.googlesource.com/#/c/57985/

Related

Dalvik Verifier: copy1 v16<-v22 type=2 cat=1

The following smali code is not accepted by Dalvik:
.method getOrCompute(Ljava/lang/Object;ILcom/google/inject/internal/guava/base/$Function;)Ljava/lang/Object;
.registers 24
.param p2, "hash" # I
.annotation system Ldalvik/annotation/Signature;
value = {
"(TK;I",
"Lcom/google/inject/internal/guava/base/$Function",
"<-TK;+TV;>;)TV;"
}
.end annotation
.annotation system Ldalvik/annotation/Throws;
value = {
Ljava/util/concurrent/ExecutionException;
}
.end annotation
##0
.prologue
.line 12
:cond_0
:try_start_0
move-object/16 v17, p3
##3
move/16 v16, p2
Verifier Error:
dalvikvm: VFY: copy1 v16<-v22 type=2 cat=1
dalvikvm: VFY: rejecting opcode 0x03 at 0x0003
dalvikvm: VFY: rejected Lcom/google/inject/internal/guava/collect/$ComputingConcurrentHashMap$ComputingSegment;.getOrCompute (Ljava/lang/Object;ILcom/google/inject/internal/guava/base/$Function;)Ljava/lang/Object;
dalvikvm: Verifier rejected class Lcom/google/inject/internal/guava/collect/$ComputingConcurrentHashMap$ComputingSegment;
I don't really understand the issue. v16 and v22 (p2) are 16bit register. So all should be good.
From the error message, the type of p2 at that point is "2", which is kRegTypeConflict. A conflicted type means that there are multiple code paths that merge together, and each code path has an incompatible incoming type in that register.
If you look at the beginning of the method, you'll see a ":cond_0" label, which means that there is some conditional elsewhere in the method that can jump there. The value of p2 at that conditional is not an integer, so we have 1 code path (from the beginning of the method) where p2 is an integer, and another code path (from the conditional jump) where it is something else, so the verifier marks the register as conflicted.
A register with a conflicted type can't be read from. You can basically treat it as an uninitialized register at that point.
If you want to see more info about how the register types are merged in this case, you can use baskmali's --register-info option with the FULLMERGE flag. --register-info=ARGS,DEST,FULLMERGE. Or, if you want to see every register before and after every instruction, you can use --register-info="ALL,FULLMERGE"

Dalvik Verifier: register1 v25 type 0, wanted ref

I have the following Smali code:
.method private k(I)V
.registers 27 (original) 29 (after)
...
##68a
invoke-direct/range {v24 .. v25}, Landroid/widget/LinearLayout;-><init>(Landroid/content/Context;)V
...
This is rejected by the Dalvik verifier. 0x76 is invoke-direct/range.
dalvikvm: VFY: register1 v25 type 0, wanted ref
dalvikvm: VFY: bad arg 1 (into Landroid/content/Context;)
dalvikvm: VFY: rejecting call to Landroid/widget/LinearLayout;.<init> (Landroid/content/Context;)V
dalvikvm: VFY: rejecting opcode 0x76 at 0x068a
dalvikvm: VFY: rejected Lcom/pocketwood/myav/MyAV;.k (I)V
dalvikvm: Verifier rejected class Lcom/pocketwood/myav/MyAV;
dalvikvm: Class init failed in newInstance call (Lcom/pocketwood/myav/MyAV;)
Interestingly v25 is not used in any instruction above 68a! The original APK runs fine, but repacked with smali the verifier rejects class MyAV.
I suspect you have the wrong code location. If you look at the error message, it mentions opcode 0x76, which is invoke-direct/range. The code snippet you provided does not have an invoke-direct/range instruction, so, unless something really screwy is going on, that can't be the code that's causing the issue.
Also, take a look at the name of the method in the error message: Lcom/pocketwood/myav/MyAV;.k (I)V. There is what looks like a space after the k. The space character itself isn't a valid character in a method name, but maybe it's actually some other space-like unicode character?
Nevermind. That space appears to be baked into the error message.
Finally, the offset mentioned in the error message (at 0x068a) should be the code offset of the instruction within the containing method. You can use baksmali's --offsets option when disassembling the dex file, and baksmali will add a comment with the code offset before each instruction. Although, I'm not sure offhand if the offset is in bytes or code units, which are 16 bits, so it may be off by a factor of 2.
The solution is: v26 is p1 and v25 is p0. Due to modification, the register count has been extended to 29 and due to that v25 is no longer p0.

Redundant opcodes in Android dex

I'm looking into some Android performance issues at the moment and noticing some sub-optimal patterns in the dex code. I'm just wondering if anyone knows if this is to be expected, and what the rationale behind it might be.
For example, consider the following Java code:
m_testField += i;
doSomething(m_testField);
When this is built and then run through baksmali it looks like the following:
iget v1, p0, Lcom/example/MainActivity$FieldTest;->m_testField:I
add-int/2addr v1, v0
iput v1, p0, Lcom/example/MainActivity$FieldTest;->m_testField:I
iget v1, p0, Lcom/example/MainActivity$FieldTest;->m_testField:I
invoke-direct {p0, v1}, Lcom/example/MainActivity$FieldTest;->doSomething(I)V
The part that's concerning me is the iget opcode to read the value of the instance field into register v1. The same field was written from the very same v1 register in the preceding opcode, so the opcode would appear to be completely redundant.
The only thing I can think of is that this is done to make this more thread-safe. But surely that should be the programmer's responsibility (by using sync blocks) instead of the compiler's responsibility. Although I'm not 100% certain, I think the above behaviour is quite different to what most C/C++ compilers would do.
I should say that essentially the same dex is produced when ProGuard is used. I should also probably mention that I'm using the very latest Android tools and a late model JDK.
Every access to a field is independent. To get the behavior you describe, you need to add an extra local variable:
int local = m_testField; // iget
local = local + i;
m_testField = local; // iput
doSomething(local);
That said, some combination of the interpreter, just-in-time compiler and ahead-of-time compiler may end up making these optimizations for you at runtime anyway.
On a hunch, I've done some further research and I think I'm in a position to answer my own question...
The sub-optimal dex seems to be a by-product of the fact that it is generated from standard Java bytecode which is stack-based rather than register-based. I disassembled the .class file corresponding to the sample code in my question. The relevant section looks like this:
5: aload_0
6: dup
7: getfield #22 // Field m_testField:I
10: iload_1
11: iadd
12: putfield #22 // Field m_testField:I
15: aload_0
16: aload_0
17: getfield #22 // Field m_testField:I
20: invokespecial #33 // Method doSomething:(I)V
After the iadd opcode on line 11 is executed, the value of m_testField is at the top of the stack and the 'this' reference is second from the top. The problem is that the putfield opcode on line 12 removes these from the stack. This means that the field value has to be re-pushed to the stack on line 17.
I must say I'm pretty surprised by this inefficiency. I'd have thought that the dx tool that converts bytecode to dex would be clever enough to remove this redundancy. I'm just hoping that ART is clever enough to do this at runtime instead.

what is meaning of .prologue in a smali file?

I diassembled a simple android app using apktool and it generated some smali codes.other things are understandable but i am not getting the meaning of .prologue in the smali code.please help me
Here other variables are self understanding linke invoke and locals but what does this .prologue do??
# direct methods
.method public constructor <init>()V
.locals 0
.prologue # What does this do?
.line 17
invoke-direct {p0}, Landroid/app/Activity;-><init>()V
return-void
.end method
This is equivalent to the DBG_SET_PROLOGUE_END debug opcode in the dex file, as documented here.
sets the prologue_end state machine register, indicating that the next
position entry that is added should be considered the end of a method
prologue (an appropriate place for a method breakpoint). The
prologue_end register is cleared by any special (>= 0x0a) opcode.

smali: String Constants

Is there anything else that must be done to load a String constant into a register, and then using it in a method invocation, besides doing:
const-string v6, "TEST CONSTANT"
invoke-static {v6, p1}, Landroid/util/Log;->wtf(Ljava/lang/String;Ljava/lang/String;)I
?
the following block of instructions
iget-object v4, p0, Lcom/mypackage/MyClass;->myList:Ljava/util/List;
invoke-interface {v4, p1}, Ljava/util/List;->contains(Ljava/lang/Object;)Z
move-result v5
if-eqz v5, :cond_not_met_0
const-string v6, "TEST CONSTANT"
invoke-static {v6, p1}, Landroid/util/Log;->wtf(Ljava/lang/String;Ljava/lang/String;)I
:cond_not_met_0
invoke-interface {v4, p2}, Ljava/util/List;->contains(Ljava/lang/Object;)Z
move-result v5
if-eqz v5, :cond_not_met_1
invoke-static {v6, p2}, Landroid/util/Log;->wtf(Ljava/lang/String;Ljava/lang/String;)I
:cond_not_met_1
gave me the following logcat error messages:
10-29 23:37:37.191: W/dalvikvm(515): VFY: register1 v6 type 2, wanted ref
10-29 23:37:37.241: W/dalvikvm(515): VFY: bad arg 0 (into Ljava/lang/String;)
10-29 23:37:37.241: W/dalvikvm(515): VFY: rejecting call to Landroid/util/Log;.wtf (Ljava/lang/String;Ljava/lang/String;)I
10-29 23:37:37.241: W/dalvikvm(515): VFY: rejecting opcode 0x71 at 0x0028
Your suspicions in this case are correct. The problem is because at the second Log->wtf instance, v6 was not necessarily set.
As to why this happens, it's important to note that very little verification is done when re-assembling the bytecode. In many cases, the assembler simply does not have enough information to do this level of verification - this would require the assembler have knowledge of the full set of classes that will be present when the application runs, similar to the case of deodexing (-o) or generating register info (-r).
These types of problems are caught by dalvik when verifies the bytecode, which is exactly what the error you mentioned is from.
Additionally, you mention "the values i used for p1 and p2 would have failed both if-eqz tests". This does not matter to dalvik's bytecode verifier. The verifier makes sure that all code paths are valid. It can't know or assume any particular value for the parameters that are passed in to the method.
If you want to see some additional information related to how the register types are propagated throughout the method, you can try the -r option for baksmali.
# grab the full framework directory from your device
adb pull /system/framework framework
# run baksmali with the -r command
baksmali -r ARGS,DEST,FULLMERGE -d framework <apk_or_dex_file>
This will add comments before/after every instruction with detailed information about the types of registers at that position.
i changed some things around and got it to work, but i'm not sure why...
the change i did was to move the constant loading (const-string v6, "TEST CONSTANT") outside of the "if-block", resulting in
const-string v6, "TEST CONSTANT"
iget-object v4, p0, Lcom/mypackage/MyClass;->myList:Ljava/util/List;
invoke-interface {v4, p1}, Ljava/util/List;->contains(Ljava/lang/Object;)Z
move-result v5
if-eqz v5, :cond_not_met_0
invoke-static {v6, p1}, Landroid/util/Log;->wtf(Ljava/lang/String;Ljava/lang/String;)I
:cond_not_met_0
invoke-interface {v4, p2}, Ljava/util/List;->contains(Ljava/lang/Object;)Z
move-result v5
if-eqz v5, :cond_not_met_1
invoke-static {v6, p2}, Landroid/util/Log;->wtf(Ljava/lang/String;Ljava/lang/String;)I
:cond_not_met_1
i suspect the reason for the initial problem is that if the flow bypassed the first "if-block" but entered the second "if-block", then the register v6 would not yet have been loaded before it was used.
the reason why i'm not so sure is because, when i ran the reassembled programme, the values i used for p1 and p2 would have failed both if-eqz tests (i.e. the flow would enter both "if-blocks").
so it seemed that,
1. the check for register v6 to be loaded before it was used was done preemptively before the actual control flow?
2. i thought such checking is only done at compile time?
i'm posting this as an answer as i needed more space to explain what i did. however, i'm still curious as to why such a change got it to work, so if anyone could give an explanation i'll mark that as the answer, thanks!

Categories

Resources