I'm running a multi-threaded C++ application (freesoft.org's tablebase generator 'hoffman') on a dual core ARM7 running Android 4.1.2 with a Linux 3.0.8 kernel. Compilation was with gcc 4.6.3 on Ubuntu 12.04 using linuxonandroid.
I've been experimenting with using pthread_setaffinity_np() to force the different threads onto different cores. I don't want to do this, but it seems like I must.
Over the course of an hour, playing with different combinations of things, I measured the following run times for a dual threaded benchmark:
with pthread_setaffinity_np(): 10.818, 10.803, 10.814, 11.077, 11.013, 10.952 seconds
without pthread_setaffinity_np(): 20.366, 19.263, 19.539, 16.764, 19.365, 19.661, 19.330 seconds
It seems that the non-standard GNU pthread_setaffinity_np() is absolutely required to get my C++11 program to actually put its threads onto different cores. 'top' confirms that one of my two cores is sitting idle otherwise.
Is this right? Can anybody offer a better solution?
Related
I hope this question is relevant in SO. I posted it in Android forums like XDA Developers but I don't get any answers and at the same time, I believe my question is specific and it is hard to target people that might know something about it.
To introduce the context of my question, I think I will need to describe a bit of my job. I am currently working in a company that needs to transform a phone into IOT device that run our C/C++ library (like JanOS). The idea is to modify a phone to take off the screen and only keep its board with all its features (camera, wifi, sdcard, battery, usb connector ...) that we needs.
I am myself a C/C++ developer and I optimize algorithms. My work is basically to make everything faster using libraries like OpenMP and NEON if I work on an Android platform. I am currently working on two phones that are rooted:
Huawei Honor 5C (4x2.0 GHz Cortex-A53 & 4x1.7 GHz Cortex-A53)
Huawei Honor 8 (4x2.3 GHz Cortex-A72 & 4x1.8 GHz Cortex A53)
I want to bench the library we are developing in my company on these phones. The thing is I can never get to 100% CPU usage using C code or bash script. The CPU usage cannot go higher than 50% letting me think only 4 cores are used. This is where I tried to understood more the platform. I think each phone is based on a big.LITTLE ARM platform to optimize battery saving (I have a doubt about the Honor 5C as it contains two identical cores running at different speed only). The idea is to build 4 group of 2 cores (1 big and 1 LITTLE) so the scheduler can target the right CPU to use depending on the task needs. But my aim is to overcome that and use ALL the cores like a classic multicore platform.
OpenMP can see 8 cores in my platform. But it seems the phone cannot let the thread migrate from 1 group of cores to another. My guess is the cores being seen as pairs by the scheduler, only 1 of them can work even though OpenMP can see 8 cores. I think Global Task Scheduler is disabled and it could overcome if I understood it well. But I'm not an expert of Android development so I might be wrong here.
Do anyone has any idea how I could enable the Global Task Scheduler ? How can I even know if it available in my platform and if it is really disabled ? Or how could I setup my OpenMP pragmas to use 8 cores at once ?
I basically tried to activate the "Performance mode" directly in the phones settings, but it didn't work. That would be too simple
[EDIT]
I went through interesting information lately. I will share them with you as someone might be interested by them and might help me with the kernel building.
I found this git in the OpenKirin project which I think helps building a custom Android kernel. At least I can see the arm64 architecture among the ones possible to configure.
I also saw there is a guard checked called CONFIG_SCHED_HMP which seems to enable the Heterogeneous Multi-Processing scheduling which I think is the subject of my question.
This CONFIG_SCHED_HMP should depend on the configuration. I think the one under arch/arm64/ called Kconfig is the one I should modify and it seems the SCHED_HMPconfiguration isn't activated.
Does single thread application use all the 4 core in a Quad-core phone.
I searched this a lot and found some articles that says yes and some saying no. some articles even say the android OS doesn't utilize the 4 core.
Is android capable of using all 4 cores in an Quad core processor?
Does single thread application utilize multi core?
The answer is YES.
Android is basically built upon Linux kernel which does utilize mulit-core.
As far as single-threaded-application is concerned, remember that a thread can not be executed in-parts on different cores simultaneously. So although your single-thread can be executed by different cores at different point in times, it can not be sub-divided and executed by different cores at the same time.
Having said that, please be aware that chipset manufacturers like Qualcomm are developing intelligent processors capable of sub-dividing your single-threaded app code (if and only if there are mutually exclusive parts) into multiple threads and have it run on different cores. Here again, the basic principle remains same - in order to utilize multi-core, the single thread was sub-divided into multiple threads.
To get the most out of your multi-core chip, you would rather create a multi-threaded app, with maximum possible asynchronous threads, so as to have optimum utilization of maximum number of cores. Hope this clears.
EDIT:
This also translates to - An app that does not make use of multiple asynchronous threads (or any other parallelism construct) will NOT use more than one core.
Yes. Android 3.0 is the first version of the platform designed to run on either single or multicore processor architectures.
Even a single-threaded application can benefit from parallel processing on different cores.
For example, if your application uses a media server, then the media processing and your UI rendering application logic can run on different cores at the same time. Also, the garbage collector can run on a different core.
Say your using graphics. To render the same your app can use multi cores. You can check the same at the link below.
https://youtu.be/vQZFaec9NpA?t=459 (Graphics and performance)
http://android-developers.blogspot.in/2010/07/multithreading-for-performance.html
Check this pdf. Scroll down to slide 22. Might be useful
http://elinux.org/images/1/11/Application-Parallelization-Android-KlaasVanGend.pdf
So like in Linux on Intel processor, we have a large amount of hardware performance counters to access. Like previously, using a user-space software called perfmon2, I could get values of cache miss rate, CPU stalling cycles due to some reason(e.g,. L1 cache miss) and etc.
My question is , do we have those stuff in Android? Since it's based on ARM, I do not think we have as strong performance monitor counter support as we have in x86, right?
ARM11 and Cortex-A/R do have hardware performance counters. You can check that on the official ARM website on this page: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4237.html
Since Android is also running on top of Linux, you can access performance counters through user-space software via the "Perf Events" subsystem. However, you will need to write native C code, which includes "perf_event.h" and build it using the Android NDK. Before you start writing code, I suggest you take a look at this project: platform/external/linux-tools-perf (https://android.googlesource.com/platform/external/linux-tools-perf/), which may be exactly what you are looking for. I don't know about any alternatives which allow doing this directly from Java Code.
In addition to Benny's answer, I'll add the following:
The linux-tools-perf project is integrated with the AOSP (I'm using 4.2). It is under $AOSP/external/linux-tools-perf. The ./linux-tools-perf/Documentation contains a lot of info on how to run everything. The perf tool is built in $PRODUCT_OUT/system/bin/perf as part of the AOSP but not packaged; you need to explicitly push it to the target.
Check the Android (linux) kernel to confirm it supports PM: PERF_EVENTS. Google how to get .config but easiest is running "extract-ikconfig zImage".
One needs to have root access on the target and then run something like "./perf stat -- testprog1" to see results. There are a lot of arguments to the command to record/retrieve different PM counters.
Remember most ARM implementations have multiple cores and A LOT of pipelining. Cortex-A9 has out-of-order execution. So the numbers can be a little freaky - almost meaningless sometimes.
I use the ARM PMCCNTR register, the ARM PM Cycle Counter, occasionally to test code performance. PMUSERENR.EN and PMINTENCLR.C must be set at PL1+ level and then PMCCNTR can be managed at PL0 level. See the ARM ARM for what all this means and the perf subsystem for example usage!
Note: All the shell vars are set up in $AOSP/build/envsetup.sh
I think you can try to use ARM® HWCPipe Exporter.
ARM® HWCPipe Exporter is a Prometheus exporter written in Java and C++ that retrieves metrics from Android devices running on ARM® Hardware components and exports them to the Prometheus monitoring system.
Disclaimer: I am the author of ARM HWCPipe Exporter.
The same code for Android (1Ghz Snapdragon) executes 2 time faster, than on my PC (in desktop application) with 3.3 Ghz Core 2 Duo (class from PC was copied to Android project). Tested with Win7 and Debian. Time mesured by System.currentTimeMillis() for only one (main) calculating method. Why it's happend and what can I do to fix it?
UPD1. First application running on real android device, second - in JRE
UPD2. In that part of applications, that I try to compare, used only simple math and operations with BigDecimal (multiply, sqrt, divide and so on). Idea - calculate pi by gauss-legendre algorithm
You're going to need to be more specific about what you're doing to monitor this. There are a large number of factors at play that could influence this. If you're running on the emulator, forget it -- it's incredibly slow, there's really no comparison there. However, I get the feeling you're talking about one application running in the JVM as a standard Java application and another application running on Dalvik, but there, you really can't compare either. Different frameworks have different libraries and different calls that are implemented in different ways. Not to mention Dalvik is optimized differently than the standard JVM and so on.
You'll need to give us more information in order for us to attempt to give you an explanation, but I suspect you're trying to compare two things that really can't be compared.
I think because the Android device has a different processor architecture than your PC, so the CPU on your PC needs to emulate the Android, so it would do much more processing.
being a Java/Linux advocate, and having programmed my first Android app a while ago, nowadays I'm building a similar one this time in WPF (it's likely to be run in WP7). I have to admit that so far Eclipse/ADT seems to be no match to VisualStudio 9 in regards to development speed. The latter excels in the build/run speed cycle (it's obvious being Eclipse a java based app). Being my development HW an i3 laptop/4GB RAM/Win7 x64, my questions are:
Could I get a similar development speed I have today in VS9 in the "Android" environmnent if I had a state of the art processor (i7 ?)Would I have a performance boost if I worked in a Linux partition in my laptop?
Any additional hints are welcome.
Thanks
The primary reason ADT feels bulky and slow is because it is constantly rebuilding in the background. This proves useful when showing compile errors and warnings as you type, but when working on larger projects—especially when making changes to the manifest, XML files (including layouts, drawables and strings) and resources—the IDE will regularly lock up with a build progress bar, as some changes require a rebuild to complete.
You can disable automatic building via the Project menu. I use this regularly when making changes to layouts, but enable it again when writing code.
As for hardware: it appears that aapt runs only on one core, and from my experience, it runs noticeably faster on faster cores vs. more cores. (This is anecdotal, since the comparison here is a PC with two 2.11 GHz cores vs. a laptop with two 1.8 GHz cores.)
A colleague uses Linux for development; I'll inquire if he has noticed any speed increase since switching from Windows.