I am studying Android Dalvik VM and encounter a question when I read the mterp code in file vm/mterp/out/InterpC-portable.cpp. Actually it's the main interpreter loop of dalvik vm to interprete the byte code in dex file. If I wrote this file, I will choose a switch-case structure to do like this:
while (hasMoreIns()) {
int ins = getNextIns();
switch(ins) {
case MOV:
//interprete this instruction
...
break;
case ADD:
...
break;
...
default: break;
}
}
However, what mterp uses is very different with my thoughts, it uses some magical code(for me) like this:
FINISH(0);
HANDLE_OPCODE(OP_NOP)
FINISH(1);
OP_END
HANDLE_OPCODE(OP_MOVE)
...
OP_END
...
I google it and find it seems to be a modified "threaded" style execution, which different with switch-case style and has a better performance because it remove the branch operation in while loop. But I still can't understand this code and why it's better on performance. How does it find the next code to interpreter?
As a brief bit of guidance, the out directory is filled with preprocessed files and is not what I'd call a great thing to read, if you're trying to figure out the code. The source (per se) that corresponds to InterpC-portable.cpp is the contents of the portable and c directories.
In terms of how the code does opcode dispatch, you'll want to look at the definition of the FINISH macro, in portable/stubdefs.cpp:
# define FINISH(_offset) { \
ADJUST_PC(_offset); \
inst = FETCH(0); \
if (self->interpBreak.ctl.subMode) { \
dvmCheckBefore(pc, fp, self); \
} \
goto *handlerTable[INST_INST(inst)]; \
}
This macro is used at the end of each opcode definition and serves as the equivalent of a switch (opcode) statement. Briefly, this reads the instruction code unit pointed at by the PC — inst = FETCH(0) — grabs the opcode out of it — INST_INST(inst) — and uses that opcode as an index into the table of addresses of all the opcodes. The address is directly branched to with the goto statement.
The goto is a "computed goto," which is a non-standard GCC extension. You can read about it in the GCC manual, and you can also find a bit about the topic in the presentation I gave on Dalvik internals at Google IO back in 2008. (Find it at https://sites.google.com/site/io/dalvik-vm-internals.)
My talk also touches on the topic of the performance characteristics of this technique. Briefly, it saves some amount of branching and plays relatively nice with branch prediction. However, there are better ways to write an interpreter (as I cover in the talk, and as the CPU-specific Dalvik interpreters in fact work).
And for just a bit more of the larger context, compiling bytecode to native CPU instructions is in general going to result in faster execution than even the most well-tuned interpreter, assuming you have sufficient RAM to hold the compiled result. The trace-based Dalvik JIT that was introduced in Froyo was meant to make a tradeoff wherein modest amounts of extra RAM were used to achieve reasonably-fruitful performance gains.
Related
I am trying to write a android arm kernel module in which I need to use a virt_to_phys translation of a memory var allocated using _kmalloc.
I do know that I can use the macro virt_to_physc to do this task. However, I dont have the specifically full kernel source, and beacuse virt_to_physc is a macro
I can't get a function address reading kallsyms to use in my module , so I would like to find another way to do this task.
I've been trying to do it using MMU (registers ATS1Cxx and PAR) to perform V=>P as Iam working in an ARMv7 proccessor but I couldnt make it work.
That's my test code...
int hello_init_module(void) {
printk("Virtual MEM:0x%X \n", allocated_buf);
//Trying to get the physc mem
asm("\t mcr p15, 0, %[value], c7, c8, 2\n"
"\t isb\n"
\t mrc p15, 0, %[result], c7, c4, 0\n" : [result]"=r" (pa) : [value]"r" (allocated_buf));
printk("Physical using MMU : %x\n", pa );
//This show the right address, but I wanna do it without calling the macro.
printk("Physical using virt_2_physc: 0x%X",virt_to_phys((int *) allocated_buf);)
}
What Iam actually doing is developing a module that is intended to work in two devices with the same 3.4.10 kernel but different memory arquitectures,
I can make the module works as they have the same VER_MAGIC and functions crc, so the module load perfectly in both devices.
The main problem is that because of diferences in their arquitecture, PAGE_OFFSET and PHYS_OFFSET actually change in both of them.
So, I've wondering if there is a way to make the translation without define this values as constant in my module.That's what I tried using MMU to perform V=>P , but MMU hasnt worked in my case, it always returns 0x1F.
According to cat /proc/cpuinfo . Iam working with a
Processor : ARMv7 Processor rev 0 (v7l)
processor : 0
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x51
CPU architecture: 7
If it's not possible to do it using MMU as alternative way of using virt_to_phys.
Does somebody know other way to do it?
I am writing android renderscript code which requires back to back kernel calls (sometimes output of one kernel become input of other). I also have some global pointers, binded to memory from Java layer. Each kernel updates those global pointers and outputs something. I hav e to make sure that execute of kernel1 is finished, before kernel2 starts execution.
I looked at android renderscript docs, but couldn't understand syncAll(Usage) and finish() well. Can anyone clarify how to achieve this behaviour?
Thanks
mScript.forEach_kernel1(mColorImageAllocation, tempAlloc);
// make sure kernel1 finishes, from android rs doc, copyTo should block
tempAlloc.copyTo(testOutputBitmap);
for (short i = 0; i < NUM_DIST; i++) {
mScript.set_gCurrentDistanceIndex(i);
mScript.forEach_kernel2(tempAlloc);
mRS.finish(); // wait till kernel2 finishes
}
In the above example, same kernel2 is called with different global parameters on kernel1's output.
For this code you don't need either. RS is a pipeline model so any work which could impact the result of a later command must be finished first by the driver.
syncAll() is used to sync memory spaces not execution. For example to propagate changes from script memory to graphics memory.
Following the answer from this StackOverflow question how do I create the proper
integer for mask?
I made some googling and the everything I found uses CPU_SET macro from sched.h but it operates on cpu_set_t structures which are undefined when using NDK. When try using CPU_SET linker gives me undefined reference error (even though I link against pthread).
Well, in the end I found some version which was taken directly from sched.h. Im posting this here if anyone has the same problem and doesn't want to spend the time searching for it. This is quite useful.
#define CPU_SETSIZE 1024
#define __NCPUBITS (8 * sizeof (unsigned long))
typedef struct
{
unsigned long __bits[CPU_SETSIZE / __NCPUBITS];
} cpu_set_t;
#define CPU_SET(cpu, cpusetp) \
((cpusetp)->__bits[(cpu)/__NCPUBITS] |= (1UL << ((cpu) % __NCPUBITS)))
#define CPU_ZERO(cpusetp) \
memset((cpusetp), 0, sizeof(cpu_set_t))
This works well when the parameter type in the original setCurrentThreadAffinityMask (from the post mentioned in the question) is simply replaced with cpu_set_t.
I would like to pay your attention that function from link in the first post doesn't set the thread cpu affinity. It suits to set the process cpu affinity. Of course, if you have one thread in your application it works well but it is wrong for several threads. Check up sched_setaffinity() description for example on http://linux.die.net/man/2/sched_setaffinity
Try add this before your include <sched.h>
#define _GNU_SOURCE
I've been porting a cross platform C++ engine to Android, and noticed that it will inexplicably (and inconsistently) block when calling pthread_mutex_lock. This engine has already been working for many years on several platforms, and the problematic code hasn't changed in years, so I doubt it's a deadlock or otherwise buggy code. It must be my port to Android..
So far there are several places in the code that block on pthread_mutex_lock. It isn't entirely reproducible either. When it hangs, there's no suspicious output in LogCat.
I modified the mutex code like this (edited for brevity... real code checks all return values):
void MutexCreate( Mutex* m )
{
#ifdef WINDOWS
InitializeCriticalSection( m );
#else ANDROID
pthread_mutex_init( m, NULL );
#endif
}
void MutexDestroy( Mutex* m )
{
#ifdef WINDOWS
DeleteCriticalSection( m );
#else ANDROID
pthread_mutex_destroy( m, NULL );
#endif
}
void MutexLock( Mutex* m )
{
#ifdef WINDOWS
EnterCriticalSection( m );
#else ANDROID
pthread_mutex_lock( m );
#endif
}
void MutexUnlock( Mutex* m )
{
#ifdef WINDOWS
LeaveCriticalSection( m );
#else ANDROID
pthread_mutex_unlock( m );
#endif
}
I tried modifying MutexCreate to make error-checking and recursive mutexes, but it didn't matter. I wasn't even getting errors or log output either, so either that means my mutex code is just fine, or the errors/logs weren't being shown. How exactly does the OS notify you of bad mutex usage?
The engine makes heavy use of static variables, including mutexes. I can't see how, but is that a problem? I doubt it because I modified lots of mutexes to be allocated on the heap instead, and the same behavior occurred. But that may be because I missed some static mutexes. I'm probably grasping at straws here.
I read several references including:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html
http://www.embedded-linux.co.uk/tutorial/mutex_mutandis
http://linux.die.net/man/3/pthread_mutex_init
Android NDK Mutex
Android NDK problem pthread_mutex_unlock issue
The "errorcheck" mutexes will check a couple of things (like attempts to use a non-recursive mutex recursively) but nothing spectacular.
You said "real code checks all return values", so presumably your code explodes if any pthread call returns a nonzero value. (Not sure why your pthread_mutex_destroy takes two args; assuming copy & paste error.)
The pthread code is widely used within Android and has no known hangups, so the issue is not likely in the pthread implementation itself.
The current implementation of mutexes fits in 32 bits, so if you print *(pthread_mutex_t* mut) as an integer you should be able to figure out what state it's in (technically, what state it was in at some point in the past). The definition in bionic/libc/bionic/pthread.c is:
/* a mutex is implemented as a 32-bit integer holding the following fields
*
* bits: name description
* 31-16 tid owner thread's kernel id (recursive and errorcheck only)
* 15-14 type mutex type
* 13 shared process-shared flag
* 12-2 counter counter of recursive mutexes
* 1-0 state lock state (0, 1 or 2)
*/
"Fast" mutexes have a type of 0, and don't set the tid field. In fact, a generic mutex will have a value of 0 (not held), 1 (held), or 2 (held, with contention). If you ever see a fast mutex whose value is not one of those, chances are something came along and stomped on it.
It also means that, if you configure your program to use recursive mutexes, you can see which thread holds the mutex by pulling the bits out (either by printing the mutex value when trylock indicates you're about to stall, or dumping state with gdb on a hung process). That, plus the output of ps -t, will let you know if the thread that locked the mutex still exists.
I was looking at strcpy.S file in android platform at path libc/arch-arm/bionic, in this file there are many arm instructions which i am not able to understand, i am also referring ARM System Developers Guide.
Here except "tst" and "tstne" i am not able to find any refrence for others in any book or ARM refrence manual.
tst r2, #0xff00
iteet ne
strneh r2, [ip], #2
lsreq r2, r2, #8
r2, [ip]
tstne r2, #0xff
Not only these instructions there are many others in different files also.
Does anyone have any idea what these instructions are ?
The first instructions it the it-instruction from the thumb instructions set.
iteet ne
This instruction marks the next three instructions to be conditional executable. The last three characters of the instruction make a pattern consisting of e (else) or t (then). The operand 'ne' specifies the condition to be evaluated.
The other three instructions are ordinary ARM instructions with conditionals:
strneh r2, [ip], #2 ; store halfword if not equal
lsreq r2, r2, #8 ; logical shift right if equal
tstne r2, #0xff ; test if not equal
These are the three instructions affected by the it-instruction. They come with ne/eq conditional flags as well.
As you can see the conditions of the it-instructions and the conditions of the other three instructions are in conflict to each other. This is a bug in the code. Most likely it hasn't been discovered before because the code-snippet is from the ARM-big-endian code, and I know of no android phone that uses ARM in big endian.
Btw, it's worthwhile to know why the conditions are given in the it-instruction and in the instructions itself. This is part of the unified arm assembly standard. On the ARM you have two modes of operation: Thumb mode (uses It-instruction, less powerful) and ARM-mode (more powerful, uses condition-flags in the instructions itself).
If you limit yourself to the capabilities of the thumb-mode it is possible to write code that would compile in thumb and ARM-mode. This is done here.
If you assemble for Thumb-mode the It-instruction will be used to control the conditions of the next three instruction, and the conditions within the instructions gets ignored. If you assemble to ARM-instruction set the It-instruction gets ignored and the conditions from the instruction itself will become active.
This works well as long as the it-instruction and the conditions in the arm-instructions match. As I said before this is not the case here, so it will either not work in thumb-mode, arm-mode or both :-)
strneh is a store command with some conditional execution/size specifier suffixes. The ARM docs are a good place to start: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/Chdehgih.html
If you google "arm conditional execution", you'll find a number of blogs/articles that may also help: http://blogs.arm.com/software-enablement/258-condition-codes-2-conditional-execution/
As for your *strneh" instruction:
str = store
ne = execute if not equal (Z flag clear)
h = perform a half-word operation