Are there any tools like fragment openGLES shaders, to draw on android canvas/bitmap? I need to calculate color of every pixel depends on it position, but it is very slow to work with bitmap as array. I can't use openGLES, because the result I have to get - bitmap.
Thanks.
It looks like you want to offload some heavy pixel manipulations to the GPU.
On Android, you have two major options:
OpenCL, but OpenCL support on Android is cumbersome
RenderScript
But don't underestimate the power that a CPU has, when you use vectorization instructions well. This might require hand-coding the vectorized loop using NEON intrinsics, but it will be worth it. Note that the performance issues mentioned in this last link are all resolved.
Related
I'm developing an app that renders the camera preview straight to a custom GlSurfaceView I have created.
It's pretty basic for someone who uses OpenGL on regular base.
The problem I'm experiencing is a low fps on some of the devices and I came with a solution - to choose which shader to apply on runtime. Now, I don't want to load a OpenGl program, measure the fps and than change the program to a lighter shader because it would create definite lags.
What I would like to do is somehow determine the GPU strength before I'm linking the GL program(Right after I'm creating the openGL context).
After some hours of investigation I pretty much understood that it won't gonna be very easy - mostly because the rendering time depends on hidden dev elements like - device gpu memory, openGL pipeline which might be implemented differently for different devices.
As I see it I have only 1 or 2 options, render a texture off-screen and measure its rendering time - if its takes longer that 16 millis(recommended fps time by Romain Guy from this post) I'll use the lighter shader.
Or - checking OS version and available RAM (though it really inaccurate).
I really hope for a more elegant and accurate solution.
My dev env is as follows:
Device: Nexus 5
Android: 4.4.2
SDK Tools: 22.6.1
Platform Tools: 19.0.1
Build tools: 19.0.3
Build Target: level 19
Min Target: level 19
I'm doing some image processing application. Basically I need to go through a preprocessing step to the image and then use convolution 5x5 to filter the image. In the preprocessing step, I successfully made the script to run on GPU and achieve good performance. Since Renderscript offers a 5x5 convolution intrinsics, I'd like to use it to make the whole pipeline as fast as possible. However, I found using the 5x5 convolution intrinsics after the preprocssing step is very slow. In contrast, if I use the adb tool to force all the scripts to run on CPU, the speed of the 5x5 convolution intrinsics is a lot faster. In both cases, the time consumed by the preprocessing step is basically the same. So it was the performance of the intrinsics which made the difference.
Also, in the code I use
Allocation.USAGE_SHARED
in creating all the Allocations, hoping the shared memory would facilitate memory access between CPU and GPU.
Since I understand that intrinsics runs on CPU, is this behavior expected? Or did I miss anything? Is there a way to make the GPU script/CPU intrinsics mixed code fast? Thanks a lot!
The 5X5 convolve Intrinsic (in default android rs driver for CPU) uses Neon. This is extremely fast and my measurements proved the same as well. In general, I did not find any rs apis then uses 5x5 convolve on two 5x5 matrices. This is a problem as it prevents one from writing more complex kernels.
Given the performance differences you are noticing, it is quite possible that that the GPU driver on your device supports a 5X5 convolve intrinsic that likely runs slower than the CPU 5X5 convolveIntrinsic that uses neon. So forcing CPU usage for renderscript is giving better performance.
Is there any other free vector library optimized for neon that math-neon?
I would like to get advantage of neon in my code, i have lot of objects and i am doing lot of simple vector physics-math, like adding vectors, multiplying, dotting them, those are 3d vectors but if i could make it a lot faster 2d should be ok too, the question is, is it worth using neon? for example lets take 100000 points, i need to calculate their movement, collisions etc. I am currently using my own math, and its based on inline functions, lets say that i would like to use my hypothetical neon library with matrices too, currently i am using glm for that, and its doing fine, but could it be faster? Speed advantage between arm-abi and arm7-abi in ndk is about 30 percent in my case, can neon be faster or maybe my code is translated to neon in compile time?
You can check eigen. It has special code that it is activated when neon instruction support is activated.
Like someone else mentioned, you should look into Eigen, it is probably good enough for you. But if you want full performance (much better than 30% gain, more like 300% gain), you should use NEON code yourself and make sure your entire inner loop is written completely with NEON (not any CPU or VFP code).
If you just NEON optimize part of your loop instead of the entire loop, you get major penalties and so the NEON code is perhaps just 30% faster or perhaps even slower than regular C code. But a full NEON loop can often give you 300% - 2000% speedup!
If you are developing for an ARM Cortex-A9 then NEON C Intrinsics should be good enough, but for ARM Cortex-A8 devices you usually need NEON Assembly code to get full performance. I give some more info on how to NEON optimize your whole loop at "http://www.shervinemami.info/armAssembly.html"
Code is compiled for NEON if the target architecture supports it, namely, if it is compiled for armeabi-v7a. To do this, simply add armeabi-v7a to the list of targets in your app's Application.mk file.
Background
I'm writing a graphing library for an android application (yes yes, I know there are plenty out there but none that offer the customizability we need).
I want the graphs to zoomable and pan-able.
Problem
I want the experience to be smooth, leave a small CPU footprint.
Solutions
Use View.onDraw(Canvas)
Use high resolution Bitmap
Use OpenGL
View.onDraw():
Benefits
Some what easy to implement
Drawbacks
Bad performance? (unless it uses OpenGL, does it?)
Bitmap:
Benefits
Really easy to implement
Great performance
Drawbacks
Have to use scaling which is ugly
OpenGL:
Benefits
Probably good performance depending on my implementation
Drawbacks
More work to implement
Final words
OpenGL would probably be the professional solution and would definitely offer more flexibility but it would require more work (how much is unclear).
One thing that is definitely easier in OpenGL is panning/zooming since I can just manipulate the matrix to get it right, the rest should be harder though I think.
I'm not afraid to get my hands dirty but I want to know I'm heading in the right direction before I start digging.
Have I missed any solutions? Are all my solutions sane?
Additional notes:
I can add that when a graph changes I want to animated the changes, this will perhaps be the most demanding task of all.
The problem with using Views is that you inherit from the overhead of the UI toolkit itself. While the toolkit is pretty well optimized, what it does it not necessarily what you want. The biggest drawback when you want to control your drawing is the invalidate/draw loop.
You can work around this "issue" by using a SurfaceView. A SurfaceView lets you render onto a window using your own rendering thread, thus bypassing the UI toolkit's overhead. And you can still use the Canvas 2D rendering API.
Canvas is however using a software rendering pipeline. Your performance will mostly depend on the speed of the CPU and the bandwidth available. In practice, it's rarely as fast as OpenGL. Android 3.0 offer a new hardware pipeline for Canvas but only when rendering through Views. You cannot at this time use the hardware pipeline to render directly onto a SurfaceView.
I would recommend you give SurfaceView a try first. If you write your code correctly (don't draw more than you need it, redraw only what has changed, etc.) you should be able to achieve the performance you seek. If that doesn't work, go with OpenGL.
what is the best choice for rendering video frames obtained from a decoder bundled into my app (FFmpeg, etc..) ?
I would naturally tend to choose OpenGL as mentioned in Android Video Player Using NDK, OpenGL ES, and FFmpeg.
But in OpenGL in Android for video display, a comment notes that OpenGL isn't the best method for rendering video.
What then? The jnigraphics native library? And a non-GL SurfaceView?
Please note that I would like to use a native API for rendering the frames, such as OpenGL or jnigraphics. But Java code for setting up a SurfaceView and such is ok.
PS: MediaPlayer is irrelevant here, I'm talking about decoding and displaying the frames by myself. I can't rely on the default Android codecs.
I'm going to attempt to elaborate on and consolidate the answers here based on my own experiences.
Why openGL
When people think of rendering video with openGL, most are attempting to exploit the GPU to do color space conversion and alpha blending.
For instance converting YV12 video frames to RGB. Color space conversions like YV12 -> RGB require that you calculate the value of each pixel individually. Imagine for a frame of 1280 x 720 pixels how many operations this ends up being.
What I've just described is really what SIMD was made for - performing the same operation on multiple pieces of data in parallel. The GPU is a natural fit for color space conversion.
Why !openGL
The downside is the process by which you get texture data into the GPU. Consider that for each frame you have to Load the texture data into memory (CPU operation) and then you have to Copy this texture data into the GPU (CPU operation). It is this Load/Copy that can make using openGL slower than alternatives.
If you are playing low resolution videos then I suppose it's possible you won't see the speed difference because your CPU won't bottleneck. However, if you try with HD you will more than likely hit this bottleneck and notice a significant performance hit.
The way this bottleneck has been traditionally worked around is by using Pixel Buffer Objects (allocating GPU memory to store texture Loads). Unfortunately GLES2 does not have Pixel Buffer Objects.
Other Options
For the above reasons, many have chosen to use software-decoding combined with available CPU extensions like NEON for color space conversion. An implementation of YUV 2 RGB for NEON exists here. The means by which you draw the frames, SDL vs openGL should not matter for RGB since you are copying the same number of pixels in both cases.
You can determine if your target device supports NEON enhancements by running cat /proc/cpuinfo from adb shell and looking for NEON in the features output.
I have gone down the FFmpeg/OpenGLES path before, and it's not very fun.
You might try porting ffplay.c from the FFmpeg project, which has been done before using an Android port of the SDL. That way you aren't building your decoder from scratch, and you won't have to deal with the idiosyncracies of AudioTrack, which is an audio API unique to Android.
In any case, it's a good idea to do as little NDK development as possible and rely on porting, since the ndk-gdb debugging experience is pretty lousy right now in my opinion.
That being said, I think OpenGLES performance is the least of your worries. I found the performance to be fine, although I admit I only tested on a few devices. The decoding itself is fairly intensive, and I wasn't able to do very aggressive buffering (from the SD card) while playing the video.
Actually I have deployed a custom video player system and almost all of my work was done on the NDK side. We are getting full frame video 720P and above including our custom DRM system. OpenGL is not your answer as on Android Pixbuffers are not supported, so you are bascially blasting your textures every frame and that screws up OpenGLESs caching system. You frankly need to shove the video frames through the Native supported Bitmap on Froyo and above. Before Froyo your hosed. I also wrote a lot of NEON intrinsics for color conversion, rescaling, etc to increase throughput. I can push 50-60 frames through this model on HD Video.