I am going to learn a little bit about Dalvik VM, dex and Smali.
I have read about smali, but still cannot clearly understand where its place in chain of compilers. And what its purpose.
Here some questions:
As I know, dalvik as other Virtual Machines run bytecode, in case of Android it is dex byte code.
What is smali? Does Android OS or Dalvik Vm work with it directly, or it is just the same dex bytecode but more readable for the human?
Is it something like dissasembler for Windows (like OllyDbg) program executable consist of different machines code (D3 , 5F for example) and there is appropriate assembly command to each machine code, but Dalvik Vm also is software, so smali is readable representation of bytecodes
There is new ART enviroment. Is it still use bytecodes or it executes directly native code?
Thank you in advance.
When you create an application code, the apk file contains a .dex file, which contains binary Dalvik bytecode. This is the format that the platform actually understands. However, it's not easy to read or modify binary code, so there are tools out there to convert to and from a human readable representation. The most common human readable format is known as Smali. This is essentially the same as the dissembler you mentioned.
For example, say you have Java code that does something like
int x = 42
Assuming this is the first variable, then the dex code for the method will most likely contain the hexadecimal sequence
13 00 2A 00
If you run baksmali on it, you'd get a text file containing the line
const/16 v0, 42
Which is obviously a lot more readable then the binary code. But the platform doesn't know anything about smali, it's just a tool to make it easier to work with the bytecode.
Dalvik and ART both take .dex files containing dalvik bytecode. It's completely transparent to the application developer, the only difference is what happens behind the scenes when the application is installed and run.
High level language programming include extra tools to make programming easier & save time for the programmer. After compiling the program, if it was to be decompiled, going back to the original source code would need a lot of code analysis, to determine structure & flow of program code, most likely a few more than 1 pass/parse. Then the decompiler would have to structure the source based on the features of the compiler that compiled the code, the version or the compiler, and the operating system it was compiled on eg. if an OS specific features or frameworks or parsers or external libraries were involved, such as .net or dome.dll, and their versions, etc
The next best result would be to output the whole program flow, as if the source code was written in one large file ie. no separate objects, libraries, dependencies, inheritances, classes or api. This is where the decompiler would spit out code which when compiled, would result in errors since there's no access to the source codes & structure of the other files/dependencies. See example here.
The 3rd & best option would be to follow what the operating system is doing based on the programmed instructions, which would be machine code, or dex (in case of Android). Unless you're sitting in the Nebuchadnezzar captained by Morpheus and don't have time to decode every opcode in the instruction set of the architecture your processor is running, you'd want something more readable than unicode characters scrolling on the screen as you monitor the program flow/execution.
This is where assembly code makes the difference; it's almost the direct translation of machine code, in a human readable format. I say "almost" direct because microprocessors have helpers like microcodes, multithreaders for pipelining & hardware accelerators to give a better user experience.
If you have the source code, you'd be editing in the language the code is written in. Similarly, if you don't have the source code, and you're editing the compiled app, you'd still be editing in the language the code is written in; in this case, it's machine code, or the next best thing: smali.
Here's a diagram to illustrate "Dalvik VM, dex and Smali" and "its place in chain of compilers".
Related
For the past six months as my final university project, I've been writing a PlayStation 1 emulator in Java to prove it can be performant - part of my strategy involves writing a custom class loader that imports bytecode I have just generated from an array into a new class - in effect a Java bytecode dynarec core which speeds up the emulated CPU orders of magnitude (in theory). All quite possible on an Oracle JVM, and done before by others.
My question is, aside from the fact I would need to generate dalvik bytecode rather than Java bytecode, there doesn't seem to be anyway to dynamically load classes into a running Android app that doesn't involve loading them from a dex file on flash somewhere. I know similar things have been asked before, but as I would eventually like to port this emulator (and have it be quicker than its currently unplayable speed), is there anyway around this? I don't want to be continually writing to flash when a new section of MIPS code is converted to bytecode, as it could wear the flash out and probably isn't very fast either.
A thought I had was maybe mounting a tmpfs using a small JNI lib and storing class files there to be loaded, so in effect storing them in RAM as before - is this even possible for an unprivileged app to do though? I'd appreciate peoples input/thoughts.
No, that might be possible on a jailbroken device but it's not possible in a sandboxed app.
I tried several ways to load dynamic code on Android but the only feasible way is via the DexClassLoader where the dex file must be stored in a privileged region.
You can have a look at my project Byte Buddy where I implemented such class loading: https://github.com/raphw/byte-buddy/blob/master/byte-buddy-android/src/main/java/net/bytebuddy/android/AndroidClassLoadingStrategy.java
In my Android project, I'm using std::thread.
I use the same C++ code also in some Linux and OSX projects.
For debugging purpose, I want to assign human-readable thread names and I do that by calling pthread_setname_np() (because lack of std::thread::set_name()).
In case of later debug output, I try to obtain the current thread name by calling pthread_getname_np() and this works e.g. on Linux target.
But for my surprise, there is no pthread_getname_np() in Android Ndk pthread.h, not in e.g. ndk-bundle/platforms/android-19/arch-arm/usr/include/pthread.h nor in ndk-bundle/platforms/android-21/arch-arm/usr/include/pthread.h
A stupid trying with a forward declaration like:
extern "C" int pthread_getname_np(pthread_t, char*, size_t);
fails with a linker error (as expected).
Any idea how to obtain the human readable name of the current thread in Android from C/C++ code?
You can see how Dalvik sets them in dalvik/vm/Thread.cpp. It uses pthread_setname_np() if available, prctl(PR_SET_NAME) if not. So if pthread_getname_np() isn't available -- and bear in mind that "np" means "non-portable" -- you can use prctl(PR_GET_NAME) to get a 16-byte null-terminated string under Linux.
You can find other bits by fishing around in /proc entries.
If you have specific requirements for the size and format of the name then you may want to define a pthread key and tuck it into thread-local storage. It's more work, but it's consistent and portable.
Unfortunately I deleted my Android source code. Tried to get jar using dex2jar and baksmali and used jd-gui to get java source files using my .apk file. Yes I got the files but the problem I have is, in more places in the java file, the code is in byte format. Need to get that to readable format to get myself to move forward.
Decompiling is not a perfect science, and you rarely get back the exact Java code you typed.
When you compile your code, a bunch of optimizations are done on it, which make decompiling more difficult if you're aiming to get the original code.
At best, you'll get a lot of decently decompiled code, along with some byte code. You should be able to figure out what Java code to substitue for that byte code based on where in the program it is, seeing as you wrote the original code.
For most simple apps, it is easier to rewrite from scratch than it is to decompile and try to fix that decompiled code.
tl;dr: Don't forget to backup your code. Ever.
Generally if your goal is to get readable source back, then something like JD-Gui is your best bet. But for cases where it fails, you could try Krakatau, a decompiler I've written.
Krakatau is designed to be able to decompile classfiles, even if they're obfuscated or not compiled from Java. However, the result is less readable then something like JD-Gui because it doesn't take advantage of the patterns left by the Java compiler. It's not perfect, but I think it's definitely worth a try.
P.S. Krakatau only supports JVM bytecode. You'll need a way to convert it from Android back into Java bytecode before you can decompile it.
I'm interested in doing some tinkering on compiled Class files before they're converted to dex files by dx. I've looked a bit at the official Dalvik documentation and also at comparisons between the DEX format and Class format. I can't find much information regarding the actual conversion process, class->dex. Does dx first verify the Class files before the conversion? Does it simply go field by field and method by method, merging groups of instructions into more compact groupings? Any insight would be appreciated.
Thanks.
The way that dx is run, it doesn't typically have sufficient information to do all possible verification, nor is it written to do so. In particular, part of verification has to do with how the code in one class refers to code in other classes, and when dx is run, the code for the "other classes" in question might not actually be available. For example, you could compile some code against Android API level 6, producing a .dex file. Later, when a device running API level 29 comes out, you could try to run that .dex file. It's only when the file is on a system and getting ready to run that the system has all the info needed to perform verification. At that point, it can inspect the references in the .dex file with what's available on the system and either accept (pass verification of) or reject (fail verification of) that file.
As a brief example, maybe the .dex file refers to a class or method that existed in API level 6 but was removed as of API level 29.
But to be clear, as #JesusFreke said, dx needs to be able to parse .class files enough to be able to do its job of translation. If it runs into a problem at that layer, it will report that as a failure to translate, which, in context, is about equivalent to a verification error, though it's not generally phrased as such.
Even disregarding the possibility of evolution of the API, it is possible to take a .class that wouldn't verify, succeed in translating it into a (part of a) .dex file, and then observe that the .dex file fails to verify.
I hope this helps!
I'm not as familiar with dx itself and the conversion process as with dalvik bytecode, but I don't recall seeing any verification of the original java bytecode, although obviously it has to be well-formed enough to be parsed/understood by dx.
There is no documentation on the conversion process that I am aware of. It involves converting the bytecode into a couple of intermediate formats (ROP, SSA), and includes some logic for efficient register allocation and some optimizations on the intermediate forms (I think).
For more information on the conversion process, your best bet is to look at the dx source itself (/dalvik/dx)
I am very new to Android application development. Just started Hello world android application yesterday.
I was wondering whether there will be any control flow in the android application, like the Struts MVC, Spring MVC etc. has in them.
I am working on enhancing the android application, so I thought that knowing the flow of control would be a good start.
The following three materials will be very good for you if you'd like to know the control flow of an Android application:
Application Fundamentals
http://developer.android.com/guide/topics/fundamentals.html
Activity
http://developer.android.com/guide/topics/fundamentals/activities.html
Task and Back Stack
http://developer.android.com/guide/topics/fundamentals/tasks-and-back-stack.html
All resource files are combined together by AAP[Android Asset Packing Tool].
Resource files are like audio video images other asset related files.
2.Java files converted into .class files by JVM.So, the out of the jvm will be .class files, that are heavy weight to put into android. So, that one more level of process will be taken place.
So, the .Class files are entered as input to DX tool. Basically, this is a tool which will convert .class files to .dex files. That mean Dalvik executable file. Those files are eligible to execute on DVM (Dalvik Virtual Machine)
After getting .dex files, packed them APK builder. Which is basically, Application Packaging. So, this packed files kept into devices and that will be executed by DVM.