how does the .dex file instructions get executed on a processor & how
can i access the instructions that executed on ARM processor. i heard
that the instruction in android executed in blocks & after that how
can i calculate the power per instruction that get executed on ARM
The answer to your question depends on whether you are measuring power on real hardware or on a simulator. If you are measuring it on real hardware, you can write a microbenchmark consisting of many instances of the instruction that you want to profile. For ex:
_asm_{
add
add
add
...
add
add
}
Then you can divide the total energy and get energy/instruction.
If you are measuring this in a simulator, it should be much easier, but the answer would depend on the simulator that you are using.
Related
So I'm building TWRP. I see that some use the command make when building the recovery image, e.g. make -j5 recoveryimage, but others use mka recoveryimage.
What is the difference between the two commands make and mka? I couldn't find a comprehensive answer on my own.
Thanks!
mka is intended to be a faster make. Here's a nice description (source):
This little gem of a command is basically equivalent to a super-charged version of make. Make is the program that gets called to build our source, choosing the correct compiler for each part of the Android OS that we are making. Problem is, make is SLOW in its default configuration. It can take hours longer depending on your hardware. So what did they do? They mated make with a cheetah, and took their child mka. mka improves upon make by using the program sched_tool to make full use of all the threads available on your machine (For AMD, this is equivalent to the number of cores your processor has; For Intel, this is usually equivalent to twice the number of cores your processor has, due to HyperThreading). What this means is that ALL of your processor is working, not just one small part of it.
I hope this question is relevant in SO. I posted it in Android forums like XDA Developers but I don't get any answers and at the same time, I believe my question is specific and it is hard to target people that might know something about it.
To introduce the context of my question, I think I will need to describe a bit of my job. I am currently working in a company that needs to transform a phone into IOT device that run our C/C++ library (like JanOS). The idea is to modify a phone to take off the screen and only keep its board with all its features (camera, wifi, sdcard, battery, usb connector ...) that we needs.
I am myself a C/C++ developer and I optimize algorithms. My work is basically to make everything faster using libraries like OpenMP and NEON if I work on an Android platform. I am currently working on two phones that are rooted:
Huawei Honor 5C (4x2.0 GHz Cortex-A53 & 4x1.7 GHz Cortex-A53)
Huawei Honor 8 (4x2.3 GHz Cortex-A72 & 4x1.8 GHz Cortex A53)
I want to bench the library we are developing in my company on these phones. The thing is I can never get to 100% CPU usage using C code or bash script. The CPU usage cannot go higher than 50% letting me think only 4 cores are used. This is where I tried to understood more the platform. I think each phone is based on a big.LITTLE ARM platform to optimize battery saving (I have a doubt about the Honor 5C as it contains two identical cores running at different speed only). The idea is to build 4 group of 2 cores (1 big and 1 LITTLE) so the scheduler can target the right CPU to use depending on the task needs. But my aim is to overcome that and use ALL the cores like a classic multicore platform.
OpenMP can see 8 cores in my platform. But it seems the phone cannot let the thread migrate from 1 group of cores to another. My guess is the cores being seen as pairs by the scheduler, only 1 of them can work even though OpenMP can see 8 cores. I think Global Task Scheduler is disabled and it could overcome if I understood it well. But I'm not an expert of Android development so I might be wrong here.
Do anyone has any idea how I could enable the Global Task Scheduler ? How can I even know if it available in my platform and if it is really disabled ? Or how could I setup my OpenMP pragmas to use 8 cores at once ?
I basically tried to activate the "Performance mode" directly in the phones settings, but it didn't work. That would be too simple
[EDIT]
I went through interesting information lately. I will share them with you as someone might be interested by them and might help me with the kernel building.
I found this git in the OpenKirin project which I think helps building a custom Android kernel. At least I can see the arm64 architecture among the ones possible to configure.
I also saw there is a guard checked called CONFIG_SCHED_HMP which seems to enable the Heterogeneous Multi-Processing scheduling which I think is the subject of my question.
This CONFIG_SCHED_HMP should depend on the configuration. I think the one under arch/arm64/ called Kconfig is the one I should modify and it seems the SCHED_HMPconfiguration isn't activated.
The oft-quoted advice is to add the following lines to your local or global gradle.properties:
org.gradle.daemon=true
org.gradle.parallel=true
Although that speeds up the process considerably, it still takes a good 10 to 20 seconds to see the results of a code change. This is far from the instant feedback that you get in web development.
Seems like installing your app is the biggest bottleneck, but are there any other ways to speed up the edit-build-upload-install-run cycle? Or is this as good as it gets?
If testing on an ARM-based emulator, turn on Virtual Mode on your machine, use an intel based image, and use the intel based Emulator.
Or better yet, use an actual hardware device to test on.
I just felt this needed to be said (just in case you didn't actual test on an actual device).
Measure where the bottleneck is, before you try optimizing the process.
For Graddle itself, install some of your dependencies locally, so that they don't need to be fetched from the web.
Make your initial apk smaller than it needs to be (during the development phase).
Tell Eclipse/Studio launcher to launch the Activity you're currently testing (to minimize the number of activities you have to go tap through to get to the one you're testing).
Upgrade your internet speed
So like in Linux on Intel processor, we have a large amount of hardware performance counters to access. Like previously, using a user-space software called perfmon2, I could get values of cache miss rate, CPU stalling cycles due to some reason(e.g,. L1 cache miss) and etc.
My question is , do we have those stuff in Android? Since it's based on ARM, I do not think we have as strong performance monitor counter support as we have in x86, right?
ARM11 and Cortex-A/R do have hardware performance counters. You can check that on the official ARM website on this page: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4237.html
Since Android is also running on top of Linux, you can access performance counters through user-space software via the "Perf Events" subsystem. However, you will need to write native C code, which includes "perf_event.h" and build it using the Android NDK. Before you start writing code, I suggest you take a look at this project: platform/external/linux-tools-perf (https://android.googlesource.com/platform/external/linux-tools-perf/), which may be exactly what you are looking for. I don't know about any alternatives which allow doing this directly from Java Code.
In addition to Benny's answer, I'll add the following:
The linux-tools-perf project is integrated with the AOSP (I'm using 4.2). It is under $AOSP/external/linux-tools-perf. The ./linux-tools-perf/Documentation contains a lot of info on how to run everything. The perf tool is built in $PRODUCT_OUT/system/bin/perf as part of the AOSP but not packaged; you need to explicitly push it to the target.
Check the Android (linux) kernel to confirm it supports PM: PERF_EVENTS. Google how to get .config but easiest is running "extract-ikconfig zImage".
One needs to have root access on the target and then run something like "./perf stat -- testprog1" to see results. There are a lot of arguments to the command to record/retrieve different PM counters.
Remember most ARM implementations have multiple cores and A LOT of pipelining. Cortex-A9 has out-of-order execution. So the numbers can be a little freaky - almost meaningless sometimes.
I use the ARM PMCCNTR register, the ARM PM Cycle Counter, occasionally to test code performance. PMUSERENR.EN and PMINTENCLR.C must be set at PL1+ level and then PMCCNTR can be managed at PL0 level. See the ARM ARM for what all this means and the perf subsystem for example usage!
Note: All the shell vars are set up in $AOSP/build/envsetup.sh
I think you can try to use ARM® HWCPipe Exporter.
ARM® HWCPipe Exporter is a Prometheus exporter written in Java and C++ that retrieves metrics from Android devices running on ARM® Hardware components and exports them to the Prometheus monitoring system.
Disclaimer: I am the author of ARM HWCPipe Exporter.
The same code for Android (1Ghz Snapdragon) executes 2 time faster, than on my PC (in desktop application) with 3.3 Ghz Core 2 Duo (class from PC was copied to Android project). Tested with Win7 and Debian. Time mesured by System.currentTimeMillis() for only one (main) calculating method. Why it's happend and what can I do to fix it?
UPD1. First application running on real android device, second - in JRE
UPD2. In that part of applications, that I try to compare, used only simple math and operations with BigDecimal (multiply, sqrt, divide and so on). Idea - calculate pi by gauss-legendre algorithm
You're going to need to be more specific about what you're doing to monitor this. There are a large number of factors at play that could influence this. If you're running on the emulator, forget it -- it's incredibly slow, there's really no comparison there. However, I get the feeling you're talking about one application running in the JVM as a standard Java application and another application running on Dalvik, but there, you really can't compare either. Different frameworks have different libraries and different calls that are implemented in different ways. Not to mention Dalvik is optimized differently than the standard JVM and so on.
You'll need to give us more information in order for us to attempt to give you an explanation, but I suspect you're trying to compare two things that really can't be compared.
I think because the Android device has a different processor architecture than your PC, so the CPU on your PC needs to emulate the Android, so it would do much more processing.