So like in Linux on Intel processor, we have a large amount of hardware performance counters to access. Like previously, using a user-space software called perfmon2, I could get values of cache miss rate, CPU stalling cycles due to some reason(e.g,. L1 cache miss) and etc.
My question is , do we have those stuff in Android? Since it's based on ARM, I do not think we have as strong performance monitor counter support as we have in x86, right?
ARM11 and Cortex-A/R do have hardware performance counters. You can check that on the official ARM website on this page: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4237.html
Since Android is also running on top of Linux, you can access performance counters through user-space software via the "Perf Events" subsystem. However, you will need to write native C code, which includes "perf_event.h" and build it using the Android NDK. Before you start writing code, I suggest you take a look at this project: platform/external/linux-tools-perf (https://android.googlesource.com/platform/external/linux-tools-perf/), which may be exactly what you are looking for. I don't know about any alternatives which allow doing this directly from Java Code.
In addition to Benny's answer, I'll add the following:
The linux-tools-perf project is integrated with the AOSP (I'm using 4.2). It is under $AOSP/external/linux-tools-perf. The ./linux-tools-perf/Documentation contains a lot of info on how to run everything. The perf tool is built in $PRODUCT_OUT/system/bin/perf as part of the AOSP but not packaged; you need to explicitly push it to the target.
Check the Android (linux) kernel to confirm it supports PM: PERF_EVENTS. Google how to get .config but easiest is running "extract-ikconfig zImage".
One needs to have root access on the target and then run something like "./perf stat -- testprog1" to see results. There are a lot of arguments to the command to record/retrieve different PM counters.
Remember most ARM implementations have multiple cores and A LOT of pipelining. Cortex-A9 has out-of-order execution. So the numbers can be a little freaky - almost meaningless sometimes.
I use the ARM PMCCNTR register, the ARM PM Cycle Counter, occasionally to test code performance. PMUSERENR.EN and PMINTENCLR.C must be set at PL1+ level and then PMCCNTR can be managed at PL0 level. See the ARM ARM for what all this means and the perf subsystem for example usage!
Note: All the shell vars are set up in $AOSP/build/envsetup.sh
I think you can try to use ARM® HWCPipe Exporter.
ARM® HWCPipe Exporter is a Prometheus exporter written in Java and C++ that retrieves metrics from Android devices running on ARM® Hardware components and exports them to the Prometheus monitoring system.
Disclaimer: I am the author of ARM HWCPipe Exporter.
Related
So I'm building TWRP. I see that some use the command make when building the recovery image, e.g. make -j5 recoveryimage, but others use mka recoveryimage.
What is the difference between the two commands make and mka? I couldn't find a comprehensive answer on my own.
Thanks!
mka is intended to be a faster make. Here's a nice description (source):
This little gem of a command is basically equivalent to a super-charged version of make. Make is the program that gets called to build our source, choosing the correct compiler for each part of the Android OS that we are making. Problem is, make is SLOW in its default configuration. It can take hours longer depending on your hardware. So what did they do? They mated make with a cheetah, and took their child mka. mka improves upon make by using the program sched_tool to make full use of all the threads available on your machine (For AMD, this is equivalent to the number of cores your processor has; For Intel, this is usually equivalent to twice the number of cores your processor has, due to HyperThreading). What this means is that ALL of your processor is working, not just one small part of it.
I hope this question is relevant in SO. I posted it in Android forums like XDA Developers but I don't get any answers and at the same time, I believe my question is specific and it is hard to target people that might know something about it.
To introduce the context of my question, I think I will need to describe a bit of my job. I am currently working in a company that needs to transform a phone into IOT device that run our C/C++ library (like JanOS). The idea is to modify a phone to take off the screen and only keep its board with all its features (camera, wifi, sdcard, battery, usb connector ...) that we needs.
I am myself a C/C++ developer and I optimize algorithms. My work is basically to make everything faster using libraries like OpenMP and NEON if I work on an Android platform. I am currently working on two phones that are rooted:
Huawei Honor 5C (4x2.0 GHz Cortex-A53 & 4x1.7 GHz Cortex-A53)
Huawei Honor 8 (4x2.3 GHz Cortex-A72 & 4x1.8 GHz Cortex A53)
I want to bench the library we are developing in my company on these phones. The thing is I can never get to 100% CPU usage using C code or bash script. The CPU usage cannot go higher than 50% letting me think only 4 cores are used. This is where I tried to understood more the platform. I think each phone is based on a big.LITTLE ARM platform to optimize battery saving (I have a doubt about the Honor 5C as it contains two identical cores running at different speed only). The idea is to build 4 group of 2 cores (1 big and 1 LITTLE) so the scheduler can target the right CPU to use depending on the task needs. But my aim is to overcome that and use ALL the cores like a classic multicore platform.
OpenMP can see 8 cores in my platform. But it seems the phone cannot let the thread migrate from 1 group of cores to another. My guess is the cores being seen as pairs by the scheduler, only 1 of them can work even though OpenMP can see 8 cores. I think Global Task Scheduler is disabled and it could overcome if I understood it well. But I'm not an expert of Android development so I might be wrong here.
Do anyone has any idea how I could enable the Global Task Scheduler ? How can I even know if it available in my platform and if it is really disabled ? Or how could I setup my OpenMP pragmas to use 8 cores at once ?
I basically tried to activate the "Performance mode" directly in the phones settings, but it didn't work. That would be too simple
[EDIT]
I went through interesting information lately. I will share them with you as someone might be interested by them and might help me with the kernel building.
I found this git in the OpenKirin project which I think helps building a custom Android kernel. At least I can see the arm64 architecture among the ones possible to configure.
I also saw there is a guard checked called CONFIG_SCHED_HMP which seems to enable the Heterogeneous Multi-Processing scheduling which I think is the subject of my question.
This CONFIG_SCHED_HMP should depend on the configuration. I think the one under arch/arm64/ called Kconfig is the one I should modify and it seems the SCHED_HMPconfiguration isn't activated.
I am using Marmalde and C/C++ to write an game for android.
Now I eant to write some important parts in assembler to improve the performance.
But I am wondering me whether this app could run on the most android devices? (about 90%)
Because in general assembler code depends on the processor and different android phones may have different processors, for example Intel or ARM, so I would have to write these parts in different assembler languages for every different processor!?
Yes, of course you will have to write the assembly code for every processor ABI.
The Android NDK has specific support for different ABIs.
Keep in mind that, while there are currently only three processor families supported (Intel x86/64, ARM and MIPS) you have to target all the different ABIs not the processors families themselves.
You can drop MIPS devices safely, they are very rare.
Intel devices are mostly tablet, but there are some phone too.
The vast majority of devices out there is ARM.
If you look at an official optimization guide from ARM v8 or at a very useful optimization guide for Intel you can see that it actually will take some time to write good assembly code, it's not just about making something work (which you should already be able to do easily).
Hint
Write first the critical parts in C++, then look at the disassembly and see if you can do better or if you can recognize some sub-optimal patterns.
Only then rewrite the code in assembly.
Also before doing such micro-optimizations, try to use better data structures, better algorithms and better resource handling.
Is Way To Run Machine Code Instead Android OS In Android Devices ?
I Want Remove Android Os And Work With Cpu And Other Devices Directly .
What Compiler I Can Use ?
MASM is an x86 assembler, so it would not be suitable for most Android devices as the vast majority use ARM-based processors.
That said, Android phones are computers just like any other and can be programmed in assembly. The first thing you'll need to do is select a device running a well documented CPU and chipset.
Since you'll be removing Android and plan on programming in assembly you'll need to write your own routines for nearly everything. An understanding of the CPU, power management and some form of I/O (you can avoid having to write complex display code if you plan to interface with the phone through serial communication, for example).
Unfortunately, much of the information required for successfully writing your own OS for an Android device is unavailable so you'll need some hardware analysis tools to assist in reverse engineering some of this information. A logic analyzer may be useful in sniffing some of the protocols used between chips, although much of modern phones is done on a single SoC, so you'll need to experiment heavily and compile information from a wide variety of online sources.
Aside from that, it's smooth sailing. Programming an OS in assembly for Android is pretty much the same as programming an OS in assembly for any other computer and you'll find it to be rather familiar territory.
Just seen that they've ported Ice Cream Sandwich to the Nexus One.
'They've' done this using the SDK to create a ROM.
Could someone give an overview of how this works? (How you use the SDK to create a ROM)
Why it allows some parts to work but then other bits (like wifi) don't work?
The SDK includes a system.img which contains the bulk of the phone's firmware. Beyond that, the important parts of firmware are the boot loader (which is hardware specific, and not Android-release specific), and the kernel (which is configured for the hardware, though there could be some Android version specifics in it).
Most likely, they just took pieces from the system.img and were able to get an acceptable boot. This generally results in some parts (like wifi, as you mentioned) not working, due to differences in the requirements between the older kernel and the newer system image.
A probable workflow is:
Get the SDK and install it and make sure that it is able to compile programs
Porting means "rewriting system-specific modules to get the whole system working in the new plataform". In a well-designed system, there is a clear separation of what is a system-dependent module (like, e.g., writing bytes to mass storage) and what is a higher-level module (like, e.g., writing to a file). The level of separation and abstraction the underlying system must provide is decided by needs of the operating system (in this case, Android)
The ROM itself is probably a binary image which gets loaded in RAM by the bootloader (and this is why the bootloader is hardware specific). The bootloader then transfers the control to the RAM image which is compiled and built by the SDK in such a way that it contains binary code that the particular processor of Nexus One understands.