Are there any Android devices where renderscript executes on the GPU instead of the CPU, or is this something not yet implemented anywhere?
As of JellyBean 4.2 there is a direct GPU integration for renderscript. See this and this.
I cannot confirm with any official documentation for Google, but I work with RenderScript all day every day and each time I run it, I see the logcat report loading drivers for graphics chips in my devices, most notably Tegra 2. Google has really lagged in documenting RenderScript, and I would not at all be surprised if they simply havn't corrected this omission in their discussion.
Currently the compute side of Renderscript will only run on the CPU:
For now, compute Renderscripts can only take advantage of CPU cores, but in the future, they can potentially run on other types of processors such as GPUs and DSPs.
Taken from Renderscript dev guide.
The graphics side of Renderscript sits on top of OpenGL ES so the shaders will run on the GPU.
ARM's Mali-T604 GPU will provide a target for the compute side of Renderscript (in a future Android release?) (see ARM Blog entry).
The design of RenderScript is so that it runs on the GPU. This was the main purpose of adding the new language. I assume there are devices where it runs on the CPU due to lack of support, but on most devices it runs on the GPU
I think this may depend on whether you're doing graphics or compute operations. The graphics operations will likely get executed on the GPU but the compute operations won't as far as I understand.
When you use the forEach construct the computation will run in multiple threads on the CPU, not the GPU (you can see this in the ICS source code). In future releases this may change (see https://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_llvm_liao.pdf) but I haven't seen any announcements.
Currently, only the Nexus 10 seems to support Renderscript GPU compute.
Related
I'm wondering if anybody has developed a Renderscript Program that runs on GPU. I've tried some simple implementations, like doing IntrinsicBlur via RS but it turned out that it runs on CPU rather than GPU.
Intrinsics will always run on the processor that will do them the fastest. If it is running on the CPU, that means that the GPU is not suitable for running it quickly. Reasons for this might be that the GPU is usually used for drawing the screen (which takes a lot of effort too), and so there isn't additional compute bandwidth there.
Is RenderScript the only device-independent way to run GPGPU code on Android ?
I don't count Tegra as there is only few phones that have it.
RenderScript is the official Android compute platform. As a result it will be on all Android devices. It was designed specifically to address the problem of running one code base across many different devices.
Well, using RenderScript doesn't necessarily mean that your code will run on the GPU. It might also use the CPU and (hopefully) parallelize tasks on several CPU cores and use CPU vector instructions. But as far as I know, you can never be sure about that and the decision process is kind of a blackbox.
If you want to make sure that your code runs on the GPU, you can "simulate" some GPGPU functions with OpenGL ES 2.0 shaders. This will run on all devices that support OpenGL ES 2.0. It depends on what you want to do, but for example many image processing functions can be implemented very efficiently this way. There is a library called ogles_gpgpu that provides an architecture for GPGPU on Android and iOS systems: https://github.com/internaut/ogles_gpgpu
OpenGL ES 3.1 also supports "Compute Shaders" but few devices support this, yet.
My dev env is as follows:
Device: Nexus 5
Android: 4.4.2
SDK Tools: 22.6.1
Platform Tools: 19.0.1
Build tools: 19.0.3
Build Target: level 19
Min Target: level 19
I'm doing some image processing application. Basically I need to go through a preprocessing step to the image and then use convolution 5x5 to filter the image. In the preprocessing step, I successfully made the script to run on GPU and achieve good performance. Since Renderscript offers a 5x5 convolution intrinsics, I'd like to use it to make the whole pipeline as fast as possible. However, I found using the 5x5 convolution intrinsics after the preprocssing step is very slow. In contrast, if I use the adb tool to force all the scripts to run on CPU, the speed of the 5x5 convolution intrinsics is a lot faster. In both cases, the time consumed by the preprocessing step is basically the same. So it was the performance of the intrinsics which made the difference.
Also, in the code I use
Allocation.USAGE_SHARED
in creating all the Allocations, hoping the shared memory would facilitate memory access between CPU and GPU.
Since I understand that intrinsics runs on CPU, is this behavior expected? Or did I miss anything? Is there a way to make the GPU script/CPU intrinsics mixed code fast? Thanks a lot!
The 5X5 convolve Intrinsic (in default android rs driver for CPU) uses Neon. This is extremely fast and my measurements proved the same as well. In general, I did not find any rs apis then uses 5x5 convolve on two 5x5 matrices. This is a problem as it prevents one from writing more complex kernels.
Given the performance differences you are noticing, it is quite possible that that the GPU driver on your device supports a 5X5 convolve intrinsic that likely runs slower than the CPU 5X5 convolveIntrinsic that uses neon. So forcing CPU usage for renderscript is giving better performance.
I am now programming on Android and I wonder whether we can use GPGPU for Android now? I once heard that Renderscript can potentially execute on GPGPU in the future. But I wonder whether it is possible for us to programming on GPGPU now? And if it is possible for me to program on the Android GPGPU, where can I find some tutorials or sample programs? Thank you for your help and suggestions.
Up till now I know that the OpenGL ES library was now accelerated use GPU, but I want to use the GPU for computing. What I want to do is to accelerate computing so that I hope to use some libraries of APIs such as OpenCL.
2021-April Update
Google has announced deprecation of the RenderScript API in favor of Vulkan with Android 12.
The option for manufacturers to include the Vulkan API was made available in Android 7.0 Compatibility Definition Document - 3.3.1.1. Graphic Libraries.
Original Answer
Actually Renderscript Compute doesn't use the GPU at this time, but is designed for it
From Romain Guy who works on the Android platform:
Renderscript Compute is currently CPU bound but with the for_each construct it will take advantage of multiple cores immediately
Renderscript Compute was designed to run on the GPU and/or the CPU
Renderscript Compute avoids having to write JNI code and gives you architecture independent, high performance results
Renderscript Compute can, as of Android 4.1, benefit from SIMD optimizations (NEON on ARM)
https://groups.google.com/d/msg/android-developers/m194NFf_ZqA/Whq4qWisv5MJ
yes , it is possible .
you can use either renderscript or opengGL ES 2.0 .
renderscript is available on android 3.0 and above , and openGL ES 2.0 is available on about 95% of the devices.
As of Android 4.2, Renderscript can involve GPU in computations (in certain cases).
More information here: http://android-developers.blogspot.com/2013/01/evolution-of-renderscript-performance.html
As I understand, ScriptIntrinsic subclasses are well-optimized to run on GPU on compatible hardware (for example, Nexus10 with Mali T604). Documentation:
http://developer.android.com/reference/android/renderscript/ScriptIntrinsic.html
Of course you can decide to use OpenCL, but Renderscript is guaranteed (by Google, being a part of Android itself) to be running even on hardware which doesn't support GPGPU computation and will use any other available acceleration means supported by hardware it is running on.
There are several options: You can use OpenGL ES 2.0, which is supported by almost all devices but has limited functionality for GPGPU. You can use OpenGL ES 3.0, with which you can do much more in terms of GPU processing. Or you can use RenderScript, but this is platform-specific and furthermore does not give you any influence on whether your algorithms run on the GPU or the CPU. A summary about this topic can be found in this master's thesis: Parallel Computing for Digital Signal Processing on Mobile Device GPUs.
You should also check out ogles_gpgpu, which allows GPGPU via OpenGL ES 2.0 on Android and iOS.
I'm trying to decide on whether to primarily use floats or ints for all 3D-related elements in my app (which is C++ for the most part). I understand that most ARM-based devices have no hardware floating point support, so I figure that any heavy lifting with floats would be noticeably slower.
However, I'm planning to prep all data for the most part (i.e. have vertex buffers where applicable and transform using matrices that don't change a lot), so I'm just stuffing data down OpenGL's throat. Can I assume that this goes more or less straight to the GPU and will as such be reasonably fast? (Btw, the minimum requirement is OpenGL ES 2.0, so that presumably excludes older 1.x-based phones.)
Also - how is the penalty when I mix and match ints and floats? Assuming that all my geometry is just pre-built float buffers, but I use ints for matrices since those do require expensive operations like matrix multiplications, how much wrath will I incur here?
By the way, I know that I should keep my expectations low (sounds like even asking for floats on the CPU is asking for too much), but is there anything remotely like 128-bit VMX registers?
(And I'm secretly hoping that fadden is reading this question and has an awesome answer.)
Older Android devices like the G1 and MyTouch have ARMv6 CPUs without floating point support. Most newer devices, like the Droid, Nexus One, and Incredible, use ARMv7-A CPUs that do have FP hardware. If your game is really 3D-intensive, it might demand more from the 3D implementation than the older devices can provide anyway, so you need to decide what level of hardware you want to support.
If you code exclusively in Java, your app will take advantage of the FP hardware when available. If you write native code with the NDK, and select the armv5te architecture, you won't get hardware FP at all. If you select the armv7-a architecture, you will, but your app won't be available on pre-ARMv7-A devices.
OpenGL from Java should be sitting on top of "direct" byte buffers now, which are currently slow to access from Java but very fast from the native side. (I don't know much about the GL implementation though, so I can't offer much more than that.)
Some devices additionally support the NEON "Advanced SIMD" extension, which provides some fancy features beyond what the basic VFP support has. However, you must test for this at runtime if you want to use it (looks like there's sample code for this now -- see the NDK page for NDK r4b).
An earlier answer has some info about the gcc flags used by the NDK for "hard" fp.
Ultimately, the answer to "fixed or float" comes down to what class of devices you want your app to run on. It's certainly easier to code for armv7-a, but you cut yourself off from a piece of the market.
In my opinion you should stick with fixed-point as much as possible.
Not only old phones miss floating point support, but also new ones such as the HTC Wildfire.
Also, if you choose to require ARMv7, please note that for example the Motorola Milestone (Droid for Europe) does feature an ARMv7 CPU, but because of the way Android 2.1 has been built for this device, the device will not use your armeabi-v7a libs (and might hide your app from the Market).
I personally worked around this by detecting ARMv7 support using the new cpufeatures library provided with NDK r4b, to load some armeabi-v7a lib on demand with dlopen().