What factors do I need to look at when benchmarking the performance of an android device with swap enabled? and what applications are recommended to use if there are any?
Enabling swap requires the phone to be rooted and it's kernel to support swap. "a-swapper" is one of the applications I use for enabling swap, basically it launches commands to enable swap. The swap file or swap partition is located at the external SD card.
Link to "a-swapper" at google code:
http://code.google.com/p/a-swapper/
Following is a report of my paging tests on a Raspberry Pi (ARM CPU, 512 MB RAM, SD drive). A test program writes and reads increasing volumes of data, checking for correct results and measuring speed in MB/second. Data sizes reported are 350, 400, 420 and 600 MB. Speed was at about one tenth max at 420 MB and three times slower at 600 MB. Links are included to obtain the benchmarks and C source code (FREE for anyone to play with and no Ads on any pages). As with my other benchmarks, this can be converted for Android.
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Stress%20Tests.htm#anchor18
The report also provides vmstat monitoring of memory used, swapped, cache size, drive I/O and CPU utilisation. At least on my Android tablet, I can run vmstat via a Terminal Emulator at the same time as executing benchmarks.
For Windows and Linux, I have an image processing benchmark that increasingly enlarges images, with writing and reading to a drive, rotating and scrolling (You can find details by Googling for bmpspeed results.htm and Linux SDL Image Processing Benchmarks). If there is a suitable photo editor for Android, you can do the same with that using manual timing, and possibly monitor with vmstat.
Paging Test Results
StressInt uses normal memory writing and reading functions. Part 1 writes then reads the specified space with six passes using different data patterns. Reading is at high speed using AND and OR to produce a sumcheck. Part 2 writes the patterns (not timed) and reads them for at least a minimum time, in this case there is only one read pass for each pattern. The four paging tests specified 350, 400, 420 and 600 MB on a Raspberry Pi that has 512 MB RAM, with the main drive being an SD card. Vmstat was run at the same time.
At 350 MB, there is no swapping, but cache and buffer sizes are reduced, slowing down the first write pass. At 400 KB, swapping in and out at start then full speed when sorted. At 420 MB, chaos, continuous data transfer to and from the drive, CPU waiting for I/O.
1. Commands Example
lxterminal -e ./stressInt KB 600000
vmstat 10 > vmburn4.txt
2. Results
MBytes Per Second At MB Data Size
MB 350 400 420 600
Write/Read No.
1 139 24 15 14
2 209 181 16 8
3 206 203 24 8
4 206 204 26 8
5 202 205 18 8
6 206 205 20 8
Write/Rd secs 19.6 48.4 204.9 460.7
Read No.
1 158 159 20 9
2 158 159 14 9
3 159 159 39 8
4 160 155 9 9
5 159 160 25 9
6 160 159 10 9
Total secs 85 125 1082 3085
vmstat si so KB swaps in and out, bi bo KB I/O in and out, wa = waiting for I/O
350 MB vmstat 10 second samples
KBytes KB KB/sec Per sec %
procs ----------memory---------- ---swap-- -----io---- -system-- ----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 314260 12340 56724 0 0 70 3 1123 232 19 5 76 0
1 1 4 8920 48 21844 0 0 37 10 1141 298 42 16 42 0
1 0 8 12392 64 18404 0 0 2 9 1161 89 99 1 0 0
1 0 8 12144 80 18704 0 0 30 6 1167 82 99 1 0 0
1 0 8 11896 88 18868 0 0 16 2 1157 71 99 1 0 0
1 0 8 11764 96 18972 0 0 10 7 1163 71 99 1 0 0
1 0 8 11772 104 18972 0 0 0 3 1152 61 100 0 0 0
1 0 8 11772 112 18972 0 0 0 3 1153 65 100 0 0 0
1 0 8 11772 120 18972 0 0 0 4 1154 68 100 0 0 0
1 0 8 11772 128 18972 0 0 0 3 1153 64 100 0 0 0
0 0 8 362344 136 21384 0 0 239 5 1194 294 22 4 73 1
400 MB
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 8 355220 924 26480 0 0 63 3 1125 236 24 4 72 0
1 5 92368 8968 60 5464 10 9236 338 9245 1739 587 31 20 28 21
0 2 52492 9108 44 5092 4775 3802 6938 3807 3429 1169 10 22 0 68
1 2 71168 11236 44 4920 4654 8936 4929 8936 2428 1036 6 18 0 77
1 1 42216 9224 44 4788 4477 5600 5059 5602 3313 992 37 19 0 45
1 1 40948 11008 44 4932 143 0 591 3 1391 163 98 2 0 0
1 0 40924 12248 60 5032 15 0 33 6 1170 87 98 2 0 0
1 0 40912 12116 60 5228 2 0 21 0 1155 66 99 1 0 0
1 0 40912 12000 68 5228 0 0 0 3 1152 58 100 1 0 0
1 0 40912 12000 76 5260 3 0 6 3 1154 60 100 1 0 0
1 0 40892 12000 84 5260 0 0 0 3 1153 63 99 1 0 0
1 0 40704 11628 92 5260 34 0 34 3 1167 69 100 1 0 0
1 0 40700 11628 100 5260 0 0 0 3 1153 61 100 0 0 0
0 0 37956 401996 236 12804 474 0 1208 0 1626 229 89 5 3 3
0 0 36900 400392 244 13372 103 0 160 7 1125 180 6 2 91 1
420 MB Sample
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 3 59316 8820 48 4212 4238 4269 5132 4272 3592 939 20 16 0 65
0 1 68268 11732 44 3400 4281 5112 4736 5114 3337 938 6 19 0 75
1 3 60804 8820 76 4428 4715 3860 5877 3864 3518 1007 13 17 0 70
1 1 56408 9948 44 2976 4710 4164 6948 4168 4389 1186 5 19 0 75
2 2 70864 11704 44 2068 3975 6458 4908 6461 3854 1021 7 14 0 79
Following are results on 64 bit Windows systems, essentially from same C code as on Raspberry Pi but using one write/read pass. For these tests the benchmark was run with increasing data demands up to 5, 8 and 14 GB on the three systems.
64 Bit IntBurn64 64 Bit IntBurn64 64 Bit IntBurn64
CPU Athlon 64 Core 2 Duo Phenom II
MHz 2210 2400 3000
RAM MB 1024 4096 8192
Windows XP x64 64-Bit Vista 64-Bit Windows 7
Disk W/R
MB/sec 55 55 92
KB Secs MB/sec KB Secs MB/sec KB Secs MB/sec
100000 2041 100000 3393 100000 5146
800000 1 1976 2500000 2 2868 2000000 1 4900
850000 23 77 3000000 2 2878 3000000 1 4658
900000 58 32 3100000 2 2847 3500000 2 4651
920000 61 31 3200000 2 2899 4000000 2 4488
930000 91 21 3300000 3 2698 4500000 2 4489
940000 96 20 3400000 3 2610 5000000 2 4477
950000 93 21 3500000 7 1075 5500000 3 4166
960000 89 22 3600000 10 750 6000000 3 4051
970000 142 14 3700000 17 459 6500000 3 4036
980000 125 16 3800000 107 73 7000000 4 4078
990000 119 17 3900000 210 38 7500000 72 214
1000000 128 16 4000000 146 56 7600000 170 91
1100000 188 12 7700000 168 94
1200000 205 12 5000000 1024 10 7800000 230 69
1300000 266 10 7000000 652 22 7900000 239 68
1400000 358 8 7900000 770 21 8000000 227 72
8000000 N/A 9000000 697 26
2000000 683 6 10000000 1231 17
2100000 14000000 2742 10
5000000 1707 6 15000000 N/A
BMPSpeed Benchmark generates BMP files up to 512 MB. It measures speed of saving, loading, scrolling, rotating and editing/enlarging of 0.5, 1, 2, 4 etc. MB files upwards. Memory used is up to 2.5 times image size. The original had to be modifies for a Windows XP as 1.25 GB of sequential memory space could not be allocated. The first example below reflects paging at 256 MB but some memory would be cleared for a rerun. A second problem arises on later systems, with more graphics RAM, where fast BitBlt copying can be used at larger image sizes and this requires far more space than the slower StretchDIBits method.
I might produce a new 64 bit version to see if I can bust my new benchmarking toy with 32 GB RAM.
BMPSpeed Results
2.08 GHz CPU, 512 MB RAM, fast disk, slow GeForce graphics
Input Enlarge Save Load Scroll Scroll Rotate Use
Image Display Display /Repeat Overall 90 deg Fast
Mbytes Secs Secs Secs msecs MB/Sec Secs BitBlt
0.5 0.05 0.01 0.03 0.7 992.8 0.04 3
1.0 0.06 0.02 0.05 1.3 1013.2 0.06 3
2.0 0.08 0.03 0.12 2.3 1019.8 0.09 3
4.0 0.11 0.06 0.17 2.9 1032.4 0.15 3
8.0 0.15 0.14 0.43 11.4 262.7 0.25 3
16.0 0.24 0.29 0.51 11.4 262.7 0.81 3
32.0 0.45 0.61 0.88 11.4 262.5 1.10 3
64.0 0.55 1.31 1.49 41.4 72.2 2.79 0
128.0 0.97 2.50 2.83 53.9 55.5 6.21 0
256.0 73.02 88.77 14.84 109.7 27.3 86.60 0
512.0 82.93 20.70 89.05 842.4 3.5 67.98 0
2.4 GHz Core 2 Duo with 4 GB RAM and 64 Bit Vista, fast GeForce
Input Enlarge Save Load Scroll Scroll Rotate Use
Image Display Display /Repeat Overall 90 deg Fast
Mbytes Secs Secs Secs msecs MB/Sec Secs BitBlt
0.5 0.05 0.01 0.05 0.1 4748.4 0.02 3
1.0 0.05 0.02 0.08 0.3 4463.6 0.03 3
2.0 0.07 0.02 0.11 1.1 2475.2 0.04 3
4.0 0.09 0.03 0.19 2.4 1866.0 0.06 3
8.0 0.13 0.08 0.31 2.9 1765.0 0.10 3
16.0 0.20 0.24 0.48 2.7 1832.5 0.17 3
32.0 0.26 0.52 0.78 2.9 1741.2 0.28 3
64.0 0.39 1.08 1.38 2.9 1760.0 0.52 3
128.0 0.68 2.37 2.63 2.9 1740.3 1.03 3
256.0 1.35 4.62 5.38 3.1 1645.6 4.39 3
512.0 27.91 13.05 10.59 3.2 1595.6 57.11 3
Related
Recently, my app gets large native memory after monkey, i can not reproduced manually, so i just can analyse through Android profiler.it shows 255264K memory in swap on native heap, but i can not see it in android profiler, so i refer to smap file. i can see a large malloc with 129M on swap.
Pss Private Private SwapPss Heap Heap Heap
Total Dirty Clean Dirty Size Alloc Free
------ ------ ------ ------ ------ ------ ------
Native Heap 75741 75496 220 255264 374656 331568 43087
Dalvik Heap 6606 5536 1032 0 18020 5748 12272
Dalvik Other 6844 6844 0 36
Stack 52 52 0 24
Ashmem 14 12 0 0
Gfx dev 17488 16652 836 0
Other dev 21 4 16 0
.so mmap 19382 224 14992 1063
.jar mmap 8 8 0 0
.apk mmap 19678 32 17360 20
.ttf mmap 7134 0 5752 0
.dex mmap 13703 0 8668 8
.oat mmap 2786 0 712 0
.art mmap 4541 3844 344 108
Other mmap 1997 4 1688 0
EGL mtrack 29388 29388 0 0
GL mtrack 19036 19036 0 0
Unknown 2227 2144 76 549
TOTAL 483718 159276 51696 257072 392676 337316 55359
9c800000-a4c00000 rw-p 00000000 00:00 0
[anon:libc_malloc]
Name: [anon:libc_malloc]
Size: 135168 kB
Rss: 5296 kB
Pss: 5296 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 5296 kB
Referenced: 4788 kB
Anonymous: 5296 kB
AnonHugePages: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 129872 kB
SwapPss: 129872 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
now, my question is:
1.how can i get more information about this memory?
2.does 129M memory malloc at a time? how can application get so many momery for one time.
3.why this memory malloc on swap directly?
enter image description here
In order to get a detailed information about memory consumption, you can actually use "Android Profiler". You can get a detailed information about various components that may be using / storing memory of the application.
I have uses shape drawable to get rounded coreners but adding a semicircle in between seems tricky.
you can use vector asset studio to draw a custom shape and use it as background for your layout
https://developer.android.com/studio/write/vector-asset-studio.html
You can use Vector Drawable to achieve your end result. I used potrace to convert your image into svg format which is included at the bottom.
Use the Android Studio to create a vector drawable from this svg file.
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg version="1.0" xmlns="http://www.w3.org/2000/svg"
width="271.000000pt" height="263.000000pt" viewBox="0 0 271.000000 263.000000"
preserveAspectRatio="xMidYMid meet">
<metadata>
Created by potrace 1.13, written by Peter Selinger 2001-2015
</metadata>
<g transform="translate(0.000000,263.000000) scale(0.100000,-0.100000)"
fill="#000000" stroke="none">
<path d="M225 2546 c-40 -17 -84 -63 -101 -103 -12 -27 -14 -228 -14 -1140 0
-1093 0 -1107 20 -1149 12 -24 39 -54 62 -70 l41 -29 456 -3 c251 -2 471 0
489 3 23 4 32 11 32 25 0 11 7 20 15 20 8 0 19 16 25 39 14 54 79 124 142 152
74 33 191 33 265 0 61 -27 125 -92 142 -144 7 -21 17 -37 22 -37 5 0 9 -18 9
-40 0 -40 7 -49 25 -31 6 6 151 11 345 13 l335 3 36 24 c20 14 47 41 60 60
l24 35 3 1105 c1 608 0 1121 -3 1139 -8 42 -59 100 -110 123 -36 17 -73 19
-392 19 l-353 0 0 -35 c0 -24 -5 -35 -15 -35 -8 0 -24 -20 -36 -45 -25 -53
-74 -99 -138 -128 -36 -16 -66 -21 -136 -21 -79 -1 -97 3 -147 27 -90 44 -148
127 -148 211 l0 26 -462 0 c-361 -1 -470 -4 -493 -14z m920 -93 c13 -44 28
-67 70 -109 182 -177 521 -113 592 112 l17 54 335 0 c375 0 383 -1 428 -69
l23 -34 -2 -1113 -3 -1112 -25 -27 c-51 -55 -54 -55 -399 -55 l-319 0 -7 32
c-19 86 -84 160 -180 205 -52 24 -73 28 -150 28 -82 0 -97 -3 -157 -33 -84
-41 -143 -105 -164 -179 l-15 -53 -460 0 c-497 0 -487 -1 -539 55 l-25 27 -3
1111 c-2 1096 -2 1112 18 1145 11 18 34 41 52 52 32 19 52 20 465 20 l432 0
16 -57z"/>
</g>
</svg>
I am trying to reduce the memory usage of my app.I started by fixing memory leaks in my code.This reduced the Dalvik heap space considerably but no difference in the native heap space allocated.
Is there a way I can reduce my native heap space consumption. If so, how should I go about doing it?
Heres what my app's heap dump looks like ""
Pss Private Private Swapped Heap Heap Heap
Total Dirty Clean Dirty Size Alloc Free
------ ------ ------ ------ ------ ------ ------
Native Heap 27240 27208 0 0 57344 24872 32471
Dalvik Heap 27138 26804 0 0 49430 41737 7693
Dalvik Other 624 624 0 0
Stack 924 924 0 0
Gfx dev 8738 5788 0 0
Other dev 16 0 16 0
.so mmap 833 332 124 0
.apk mmap 634 0 176 0
.ttf mmap 32 0 0 0
.dex mmap 12484 0 12480 0
.oat mmap 1146 0 124 0
.art mmap 1395 1212 4 0
Other mmap 44 4 0 0
Unknown 160 160 0 0
TOTAL 81408 63056 12924 0 106774 66609 40164
Objects
Views: 919 ViewRootImpl: 1
AppContexts: 3 Activities: 1
Assets: 214 AssetManagers: 214
Local Binders: 85 Proxy Binders: 22
Parcel memory: 24 Parcel count: 96
Death Recipients: 0 OpenSSL Sockets: 3
SQL
MEMORY_USED: 0
PAGECACHE_OVERFLOW: 0 MALLOC_SIZE: 0
It seems that BitmapRegionDecoder has a memory leak. If I run the code bellow, I can see increase in native memory usage on the device. Eventually the application will die due to the crash, as the Android OS will kill it due to the lack of free memory:
public void doClick(View v) {
String bitmapFileName = "/mnt/sdcard/Wallpaper Images/-398300536.jpg";
BitmapRegionDecoder dec;
try {
for (int i = 0; i < 100; i++) {
FileInputStream is = new FileInputStream(bitmapFileName);
dec = BitmapRegionDecoder.newInstance(is, false);
// I am not even doing anything with bitmap region decoder!
is.close();
dec.recycle();
System.gc();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Running adb shell dumpsys meminfo will report the following prior executing the code above (see 3273 kB of native allocations):
Applications Memory Usage (kB):
Uptime: 63276459 Realtime: 469577132
** MEMINFO in pid 27844 [com.example.test] **
native dalvik other total
size: 3284 5379 N/A 8663
allocated: 3273 2831 N/A 6104
free: 10 2548 N/A 2558
(Pss): 600 779 2248 3627
(shared dirty): 56 1256 5164 6476
(priv dirty): 540 44 1036 1620
Objects
Views: 0 ViewRoots: 0
AppContexts: 0 Activities: 0
Assets: 2 AssetManagers: 2
Local Binders: 5 Proxy Binders: 10
Death Recipients: 0
OpenSSL Sockets: 0
SQL
heap: 0 MEMORY_USED: 0
PAGECACHE_OVERFLOW: 0 MALLOC_SIZE: 0
Running adb shell dumpsys meminfo after executed code above reports 12678kB of allocated native memory!
Applications Memory Usage (kB):
Uptime: 63281361 Realtime: 469582034
** MEMINFO in pid 27844 [com.example.test] **
native dalvik other total
size: 12972 5379 N/A 18351
allocated: 12678 2792 N/A 15470
free: 33 2587 N/A 2620
(Pss): 665 871 12411 13947
(shared dirty): 56 1256 5560 6872
(priv dirty): 612 48 10820 11480
Objects
Views: 0 ViewRoots: 0
AppContexts: 0 Activities: 0
Assets: 2 AssetManagers: 2
Local Binders: 5 Proxy Binders: 11
Death Recipients: 1
OpenSSL Sockets: 0
SQL
heap: 0 MEMORY_USED: 0
PAGECACHE_OVERFLOW: 0 MALLOC_SIZE: 0
The problem seems to be always reproducible on 2.3.5 and 4.1.2 Android (physical devices). The problem is also always reproducible on the Android emulator (I tried Android 2.3.3). I haven't tried other versions of Android, but my guess is that the problem is the same on other versions too.
Am I doing something wrong, or does recycle() method of BitmapRegionDecoder simply doesn't work?
How could I avoid this problem?
In java there is an optimal buffer size of 32 Kb which is based solely on the cpu architecture being used. On Android phones does the Dalvik VM dynamically know the proper cache of the cpu to get the largest buffer size independent of the many different phones out there? If so how would I figure that out at runtime?
Say I want to optimize a audio recording activity by making the buffer the largest it can be and also the fastest. I know you can get the minimal size for it but what about the optimal size?
Maybe it depends on what device you have or mind.
However, experimentally, 8K < buffer size < 32K does work well and there are significant performance improvements under 8K. Somewhat interesting is that some data with buffer > 64K showed poorer performance than data with under 64K buffer
(I've tested on several android devices and tried to read 20MB binary file with various buffer size.)
Here's exp result and you'd better to paste them to spreadsheet if you wanna convert data in pretty form. header means buffer size and units are millisecond
graph: http://fb.com/photo.php?fbid=468345876512381
128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M
Galaxy S 4047 3060 269 155 100 65 64 52 51 45 47 50 49 43 44 46 45 58
Optimus LTE 1178 617 322 172 101 65 47 42 41 35 36 39 44 61 56 51 72 60
HTC EVO 3971 1884 941 480 251 141 95 69 56 50 48 55 50 49 48 48 48 47
Galaxy S2 750 383 210 123 74 50 41 37 35 34 34 37 39 44 46 44 45 44
Galaxy Nexus 2272 1216 659 341 187 108 70 52 41 38 38 45 44 54 56 66 68 58
Galaxy Note 1549 799 404 220 127 75 58 54 52 56 52 45 44 62 43 39 44 46
InputStream in = openFileInput(FILE_NAME);
startTime = System.currentTimeMillis();
while (in.read(buffer) > 0) {
readCount++;
}
elapsedTime = System.currentTimeMillis() - startTime;