Very slow vkEndCommandBuffer on Android - android

I did some profiling and found out that vkEndCommandBuffer does a lot of work on Android (25% of render thread time in my application). The most strange thing to me is that it does huge memcpy (20% of render thread time).
I tried different flags in command buffer and command buffer pool, also I tried to reset command buffer implicitly and explicitly, but nothing helped. I found out that vkEndCommandBuffer works faster if I disable MSAA. But it doesn't make sense to me.
Stack:
qglinternal::vkEndCommandBuffer
QglCommandBuffer::End
A5xCommandBuffer::HwProcessWorkload
memcpy
Why is it so slow? And what is it doing? Any ideas?
UPD: I tested my app on 5 devices. App does 200-400 draw calls every frame:
Xiaomi MiA1. Android 7.1. Adreno 506. Driver version: 55.277.3829 (according Vulkan Caps Viewer). 30 fps. No memcpy in vkEndCommandBuffer, but it still takes 5% of render thread time.
Xiaomi MiA2 Lite. Android 9.0. Adreno 506. Driver version: 512.331.0. 20-25 fps. Huge memcpy, vkEndCommandBuffer takes 25.6% of render thread time.
Yandex YNDX-000SB. Android 8.1. Adreno 508. Driver version: 14.307.2170. 30 fps. Huge memcpy, vkEndCommandBuffer takes 22.7% of render thread time.
Google Pixel 2 XL. Android 10.0. Adreno 540. Driver version: 512.385.0. 60 fps (but it's expensive device). Huge memcpy, but vkEndCommandBuffer takes only 5.8% of render thread time.
Samsung Galaxy J7. Android 8.1. Mali-T830. 45-50 fps. vkEndCommandBuffer takes only 0.2% of render thread time.
Also I noticed next things:
all devices with Adreno do a lot of work in DrawIndexed. They are calling huge memset (wtf?);
all devices with Adreno don't reuse command buffers. They always free all resources of command buffer (even if I use implicit reset). It takes about 2-5% of render thread time depending on device.

Related

Inexplicable performance difference in Unity Android

Scenario: I have a simple scene with around 15k tris (depending on where you look), using Legacy Difuse shaders.
I run the game on a Galaxy S5 Android 6.0.1. From GSM Arena:
released 2014, April
Chipset Qualcomm MSM8974AC Snapdragon 801 (28 nm)
CPU Quad-core 2.5 GHz Krait 400
GPU Adreno 330
Getting 50 FPS average
Profiler screenshots
I run the game on a Lenovo P2 Android 7.0. GSM Arena:
Released 2016, November
Chipset Qualcomm MSM8953 Snapdragon 625 (14 nm)
CPU Octa-core 2.0 GHz Cortex-A53
GPU Adreno 506
Getting 38 FPS average
Profile screenshots
What I've tried so far:
replace the legacy difuse shader with mobile difuse -> both phones gain 10 fps, same performance gap
disable auto graphics api and force GLES2 - no difference
force GLES3 - no difference
tried using the Snapdragon profiler to make a snapshot capture but it keeps crashing when I try to take the snapshot
forcing the same resolution on both devices (1080), no change (I was thinking that Unity might do some magic and downscale on the S5), also note that both phones are 1080p
Unity 2018.3.0f2
Same game, same graphics profiles (preferences), same everything. How is this possible?
This is just a stripped down main menu of the game. The main scene of the game (a city with 80k tris on average, and many Standard shaders, runs at over 40 FPS on GS5, and at 28-30 on P2 :/. I also tried the city scene on a Xperia XA2 (slightly better specs than the P2) and it runs at around 56-60 fps)
PS: Notice in the screenshots how because of looking at a different area, the S5 renders more things on screen, yet still has so much more FPS than the P2
// Edit Updated thread with the missing info from comments bellow

OpenGLRenderer: GL error: Out of memory

I have written an OpenGL ES 3.1 application for Android. It runs fine on Nexus 5X ( Adreno 418 GPU ) , but on Samsung Galaxy S7 ( Mali T880 ) it dies with
E/OpenGLRenderer: Error:glFinish::execution failed
E/OpenGLRenderer: GL error: Out of memory!
in logcat.
Now, I really don't think this is because the application really uses much memory (it only has 1 texture, 2 FBOs the size of the screen, no fonts, nothing else I can think of that would use much memory). At least that's what I think the application uploads to GPU.
I would like to debug this without relying on my limited understanding of OpenGL, i.e. it would be best if there was a tool which would show me basic things like
1) the amount of memory available on GPU
2) amount of memory taken
3) taken by what
EDIT: I have solved this, with some help from the Mali Graphics Debugger suggested in the comments. It turned out to be a runaway loop in a fragment shader, which under certain circumstances (curiously not arising on Adreno) could keep on looping forever. By 'OUT_OF_MEMORY' I think it meant 'execution of fragment shader takes too effing long'.

how many the waves(threads) are scheduled and executed on the GPU using OpenCL?

I'm developing android app using OpenCL with Adreno GPU. I read the 'Snapdragon OpenCL General Programming and optimization'. but I can't see any information about size of wave(threads) or warp on Adreno devices.
On desktop GPU AMD have 64 threads wavefront size, and Nvidia GPU have 32. This information is very important for choosing best workgroup size, and making code optimization.
I wonder how many the waves are scheduled and executed on the GPU.
Can someone provide such information.

andengine (android opengl) low cpu usage

I have an andengine game, which has:
45-50-55 fps on a normal Galaxy S3, the phone temperature is warm.
stable 60 fps on CM11 Galaxy S3 with root in performance mode (maximum cpu frequence = 1400mhz) . With root you can modify the cpu frequence. The phone temperature is almost hot.
40-45fps on my Nexus 6 (without root), but this phone is faster than galaxy s3! The phone temperature is almost cold.
The resolutions of the game are the same!
The main question is: why does my game fps same on the both device? On Nexus6 it should be faster!
The game is: https://play.google.com/store/apps/details?id=com.hattedskull.gangsters
when a cpu is faster but does show less performance then the other cpus, it could be that it uses only 1 of 2,4,8,12 cores. thats just 25% usages, s the cpu stays cold. a single core cpu will always burn at 100%, and gets warm. Multithreading is the solution. that will "force" the cpu to go at 100%, and the game will run faster
I am answering my question, because it is not obvious (for me), so the solution is:
Performance mode: After rooting the phone, I changed the scaling governor (at processor) mode in performance, changing min hz was not necessary. Now the phone are hot, not warm!
Mobil ads bad performance: I switched off all the internet connections (wifi, mobile), therefore the mobile ad disappeared from the game. I use admob in my game, which has not the best performance.
These caused the FPS drops in my game(s)!

V4L2 FPS drops on android with more cores

I have three android devices Single core, Dual core and Quad core. I was able to make an app using v4l2 to grab the picture.
In standby mode all three devices are giving me 30FPS (as announced by camera hardware provider). But as soon as I start some processing on image and draw on canvas the FPS of Dual core and Quad core devices drops drastically. Single core device FPS reaches 28 which is acceptable but Dual core becomes 18 (on average) and Quad core 12.
I have used CPU performance governor in all three devices. Without performance mode FPS are even lower.
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Initially I thought it has something to do with customized linux kernel in android So,
I flashed PicUntu on quad core device but still I saw similar behavior in FPS drop. On Desktop with Windows and Ubuntu there is no such issue in the camera. I guess it has something to do with ARM.
Have anyone faced similar problem for UVC camera with V4L2 driver on android(ARM devices)? Is there any way to increase the FPS?
EDIT ** Adding some more information
For KitKat device if I bind my usb task to single core (taskset), FPS is improving by small factor. In kitkat overall performance is also better than Jelly bean.
I am not sure why usb speed suffers when the application goes to background even when overall cpu usage is small.

Categories

Resources