I am using a plasma shader in my Android (libGDX) app, which I found from here:
http://www.bidouille.org/prog/plasma
Here is my shader (slightly modified):
#define LOWP lowp
precision mediump float;
#else
define LOWP
#endif
#define PI 3.1415926535897932384626433832795
uniform float time;
uniform float alpha;
uniform vec2 scale;
void main() {
float v = 0.0;
vec2 c = gl_FragCoord.xy * scale - scale/2.0;
v += sin((c.x+time));
v += sin((c.y+time)/2.0);
v += sin((c.x+c.y+time)/2.0);
c += scale/2.0 * vec2(sin(time/3.0), cos(time/2.0));
v += sin(sqrt(c.x*c.x+c.y*c.y+1.0)+time);
v = v/2.0;
vec3 col = vec3(1, sin(PI*v), cos(PI*v));
gl_FragColor = vec4(col *.5 + .5, alpha);
}
I am rendering it via an ImmediateModeRendererGL20, as a quad.
However, it simply appears to be way too slow for my needs. I am trying to fill almost the whole screen on my Nexus 7 (first gen) with the shader, and I cannot get even close to 60 FPS.
This is really my first real trip onto the GLSL world, and I have no idea how these things usually should perform!
I am wondering how one could optimize this shader? I really don't care about the image quality, I can sacrifice it. I came to a conclusion that somekind of lookup table might be what I am after and/or dropping the resolution of the shader..? But I am not quite sure where to begin. I am still very new to GLSL and "low-level" programming has never really been my cup of tea, but I am eager to at least try!
The ultimate way to speed this up is to pre-calculate computation-heavy stuff (sin() ans cos() stuff) and bake it into texture(s), then get ready values from them. This will make it super-fast because your shader won't make any heavy computations but consume pre-calculated values.
As was stated in comments, you can optimize performance by moving certain calculations to vertex shader. For example, you can move this line
vec2 c = gl_FragCoord.xy * scale - scale/2.0;
to vertex shader without sacrificing any quality because this is a linear function so interpolation won't distort it.
Do it like this:
// vertex
...
uniform float scale;
varying mediump vec2 c;
const float TWO = 2.0;
...
void main() {
...
gl_FragCoord = ...
c = gl_FragCoord.xy * scale - scale / TWO;
...
}
// fragment
...
varying mediump vec2 c;
...
void main() {
...
// just use `c` as usual
...
}
Also, please use constants instead of literals - literals use your uniform space. While this may not affect performance it is still bad (if you use too much of them you can run out of max uniforms on certain GPUs). More on this: Declaring constants instead of literals in vertex shader. Standard practice, or needless rigor?
Related
As a continuation from my previous question (GLSL : Accessing an array in a for-loop hinders performance), I have encountered an entirely new and annoying problem.
So, I have a shader that performs a black hole effect.
The shader works perfectly on my computer, the android emulator, and ShaderToy – but for some reason, even though the code is exactly the same, does not work on my Android device.
The problem occurs when I zoom in too far. For whatever reason, when my zoom reaches a certain point – the whole background zooms in and then zooms out and goes crazy. Like this :
When it should look like this :
However, it does work on my device if I change this :
#ifdef GL_ES
precision mediump float;
#endif
to this :
#ifdef GL_ES
precision highp float;
#endif
The problem with this is that it also decreases my FPS from 60 down to ~40.
I believe the problem is that my Android device's OpenGL version is "OpenGL ES 3.0" according to Gdx.gl.glGetString(GL20.GL_VERSION).
But I cannot figure out how to set my version to OpenGL 2.0 since the AndroidApplicationConfiguration class is giving me little to no options.
I've tried putting <uses-feature android:glEsVersion="0x00020000" android:required="true" /> in the manifest, but it still prints "OpenGL ES 3.0".
And I still don't even know if this is actually the cause of the problem or not, so that's why I'm asking here. Thank you for taking the time to read/answer my question :).
P.S. Here's the Shader code:
#ifdef GL_ES
precision mediump float;
#endif
const int MAX_HOLES = 4;
uniform sampler2D u_sampler2D;
varying vec2 vTexCoord0;
struct BlackHole {
vec2 position;
float radius;
float deformRadius;
};
uniform vec2 screenSize;
uniform vec2 cameraPos;
uniform float cameraZoom;
uniform BlackHole blackHole[MAX_HOLES];
void main() {
vec2 pos = vTexCoord0;
float black = 0.0;
for (int i = 0; i < MAX_HOLES; i++) {
BlackHole hole = blackHole[i];
vec2 position = (hole.position - cameraPos.xy) / cameraZoom + screenSize*0.5;
float radius = hole.radius / cameraZoom;
float deformRadius = hole.deformRadius / cameraZoom;
vec2 deltaPos = vec2(position.x - gl_FragCoord.x, position.y - gl_FragCoord.y);
float dist = length(deltaPos);
float distToEdge = max(deformRadius - dist, 0.0);
float dltR = max(sign(radius - dist), 0.0);
black = min(black+dltR, 1.0);
pos += (distToEdge * normalize(deltaPos) / screenSize);
}
gl_FragColor = (1.0 - black) * texture2D(u_sampler2D, pos) + black * vec4(0, 0, 0, 1);
}
As you have found the issue is down to a lack of precision in fp16 (mediump), which is fixed by using fp32 (highp). Most maths units will have double the throughput for fp16 vs fp32, which also explains the drop in performance.
Querying the driver GLES version will return maximum supported version, not the version of the current EGL context, so what you are seeing is expected.
Also please note that "highp" is optional in OpenGL ES 2.0 fragment shaders, so there is no guarantee that your shader will work on some GPUs in an OpenGL ES 2.0 context. The Mali-4xx series only support fp16 fragment shaders, for example (I think also some of the OpenGL ES 2.0 Vivante GPUs based on past experience).
In OpenGL ES 3.0 highp is mandatory in fragment shaders, so it would be guaranteed to work there.
I'm trying to build a shader that allows you to combine two images by applying a gradient opacity mask to the one on top, like you do in photoshop. I've gotten to the point where I can overlay the masked image over the other but as a newbie am confused about a few things.
It seems that the images inside the shader are sometimes skewed to fit the canvas size, and they always start at position 0,0. I have played around with a few snippets I have found to try and scale the textures, but always end up with unsatisfactory results.
I am curious if there is a standard way to size, skew, and translate textures within a view, or if images in GLSL are necessarily limited in some way that will stop me from accomplishing my goal.
I'm also unsure of how I am applying the gradient/mask and if it is the right way to do it, because I do not have a lot of control over the shape of the gradient at the moment.
Here's what I have so far:
precision highp float;
varying vec2 uv;
uniform sampler2D originalImage;
uniform sampler2D image;
uniform vec2 resolution;
void main(){
float mask;
vec4 result;
vec2 position = gl_FragCoord.xy / ((resolution.x + resolution.y) * 2.0 );
mask = vec4(1.0,1.0,1.0,1.0 - position.x).a;
vec4 B = texture2D(image,uv);
vec4 A = texture2D(originalImage,uv) * (mask);
result = mix(B, A, A.a);
gl_FragColor = result;
}
Which produces an image like this:
What I would like to be able to do is change the positions of the images independently and also make sure that they conform to their proper dimensions.
I have tried naively shifting positions like this:
vec2 pos = uv;
pos.y = pos.y + 0.25;
texture2D(image, pos)
Which does shift the texture, but leads to a bunch of strange lines dragging:
I tried to get rid of them like this:
gl_FragColor = uv.y < 0.25 ? vec4(0.0,0.0,0.0,0.0) : result;
but it does nothing
You really need to decide what you want to happen when images are not the same size. What you probably want is for it to appear there's no image so check your UV coordinates and use 0,0,0,0 when outside of the image
//vec4 B = texture2D(image,uv);
vec4 getImage(sampler2D img, vec2 uv) {
if (uv.x < 0.0 || uv.x > 1.0 || uv.y < 0.0 || uv.y > 1.0) {
return vec4(0);
}
return texture2D(img, uv);
}
vec4 B = getImage(image, uv);
As for a standard way to size/skew/translate images use a matrix
uniform mat4 u_imageMatrix;
...
vec2 newUv = u_imageMatrix * vec4(uv, 0, 1).xy;
An example of implementing canvas 2d's drawImage using a texture matrix.
In general though I don't think most image manipulations programs/library would try to do everything in the shader. Rather they'd build up the image with very very simple primitives. My best guess would be they use a shader that's just A * MASK then draw B followed by A * MASK with blending on.
To put it another way, if you have 30 layers in photoshop they wouldn't generate a single shader that computes the final image in one giant shader taking in all 30 layers at once. Instead each layer would be applied on its own with simpler shaders.
I also would expect them to create an texture for the mask instead of using math in the shader. That way the mask can be arbitrarily complex, not just a 2 stop ramp.
Note I'm not saying you're doing it wrong. You're free to do whatever you want. I'm only saying I suspect that if you want to build a generic image manipulation library you'll have more success with smaller building blocks you combine rather than trying to do more complex things in shaders.
ps: I think getImage can be simplified to
vec4 getImage(sampler2D img, vec2 uv) {
return texture2D(img, uv) * step(0.0, uv) * step(-1.0, -uv);
}
I would like to be able to pass more per-vertex-data to my own custom shaders in kivy than the usual vertex coords + texture coords. Specifically, I would like to pass a value that says which animation frame should be used in selecting the texture coords.
I found an example (http://shadowmint.blogspot.com/2013/10/kivy-textured-quad-easy-right-no.html), and succeeded in changing the format of the vertices passed to a mesh using an argument to the constructor of the Mesh, like this:
Mesh(mode = 'triangles', fmt=[('v_pos', 2, 'float'),
('v_tex0', 2, 'float'),
('v_frame_i', 1, 'float')]
I can then set the vertices to be drawn to something like this:
vertices = [x-r,y-r, uvpos[0],uvpos[1],animationFrame,
x-r,y+r, uvpos[0],uvpos[1]+uvsize[1],animationFrame,
x+r,y-r, uvpos[0]+uvsize[0],uvpos[1],animationFrame,
x+r,y+r, uvpos[0]+uvsize[0],uvpos[1]+uvsize[1],animationFrame,
x+r,y-r, uvpos[0]+uvsize[0],uvpos[1],animationFrame,
x-r,y+r, uvpos[0],uvpos[1]+uvsize[1],animationFrame,
]
..this works well when I run in Ubuntu, but when I run on my android device the drawn texture either doesn't draw, or it looks like the vertex or texture coordinate data is corrupt / not aligned or something.
Here is my shader code in case that is relevant. Again, this all behaves as I want it to when I run in ubuntu, but not when I run on android device.
---VERTEX SHADER---
#ifdef GL_ES
precision highp float;
#endif
/* vertex attributes */
attribute vec2 v_pos;
attribute vec2 v_tex0;
attribute float v_frame_i; // for animation
/* uniform variables */
uniform mat4 modelview_mat;
uniform mat4 projection_mat;
uniform vec4 color;
uniform float opacity;
uniform float sqrtNumFrames; // the width/height of the sprite-sheet
uniform float frameWidth;
/* Outputs to the fragment shader */
varying vec4 frag_color;
varying vec2 tc;
void main() {
frag_color = color * vec4(1.0, 1.0, 1.0, opacity);
gl_Position = projection_mat * modelview_mat * vec4(v_pos.xy, 0.0, 1.0);
float f = round(v_frame_i);
tc = v_tex0;
float w = (1.0/sqrtNumFrames);
tc *= w;
tc.x += w*mod(f,sqrtNumFrames); //////////// I think that the problem might
tc.y += w*round(f / sqrtNumFrames); ///////////// be related to this code, here?
}
---FRAGMENT SHADER---
#ifdef GL_ES
precision highp float;
#endif
/* Outputs from the vertex shader */
varying vec4 frag_color;
varying vec2 tc;
/* uniform texture samplers */
uniform sampler2D texture0;
uniform vec2 player_pos;
uniform vec2 window_size; // in pixels
void main (void){
gl_FragColor = frag_color * texture2D(texture0, tc);
}
I wonder if it may have to do with a version of GLSL and int / float math (in particular in identifying which image from the sprite sheet to draw, see the comments in the glsl code. One version is running on my desktop and another on the device?
Any suggestions for things to experiment with would be much appreciated!
After looking at the log from the running version on the android device (a Moto X phone), I saw that the custom shader was not linking. This appeared to be due to the use of the function round(x), which I replaced with floor(x+0.5) in both cases, and the shader now works on the phone and my desktop properly.
I think the problem is that the version of GLSL supported on the phone and on my PC are different..but I am not 100% certain about this.
I have modified a working grey_scale fragment shader to change all of the non-transparent pixels to purple. For some reason it works great on iOS but on Android the transparent parts of the image are visible (although mostly transparent). Can anybody see what I am doing wrong?
The working grey_scale shader contains the lines that are commented out. I added the last line.
#ifdef GL_ES
precision mediump float;
#endif
varying vec4 v_fragmentColor;
varying vec2 v_texCoord;
void main(void)
{
vec4 c = texture2D(CC_Texture0, v_texCoord);
//gl_FragColor.xyz = vec3(0.2126*c.r + 0.7152*c.g + 0.0722*c.b);
//gl_FragColor.w = c.w;
gl_FragColor = vec4(0.5, 0.0, 0.4, c.w);
}
It turns out that I need to apply the alpha to all of the colors:
gl_FragColor = vec4(0.5*c.w, 0.0*c.w, 0.4*c.w, c.w);
I am sure why the old method didn't work. The image uses premultiplied alpha, so I guess the cocos render assumes it (or was somehow told to use it by TexturePacker). So the shader needs to re-multiply the color values by the alpha in order to behave the same way?
A screenshot will be worth a thousands of words.
Are you sure GL_ES is defined? If not, you will have unspecified precision for float (according to specs, it is unspecified for fragment shaders) which can lead even to compilation errors on certain OpenGL ES drivers. I'd play around with that float precision first and see the difference on Android.
I'm not sure about vec3() w/ single parameter. Does the following notation work:
float a = 0.2126*c.r + 0.7152*c.g + 0.0722*c.b;
gl_FragColor.xyz = vec3(a, a, a);
Or this, with a single assignment of gl_FragColor:
float a = 0.2126*c.r + 0.7152*c.g + 0.0722*c.b;
gl_FragColor.xyz = vec4(a, a, a, c.w);
On a side note, you may want to declare numeric literals as constants in order not to consume uniform space. More info: Declaring constants instead of literals in vertex shader. Standard practice, or needless rigor?
Like this:
const float COEFF_R = 0.2126;
const float COEFF_G = 0.7152;
const float COEFF_B = 0.0722;
float a = COEFF_R*c.r + COEFF_G*c.g + COEFF_B*c.b;
gl_FragColor.xyz = vec4(a, a, a, c.w);
I have the following code in the Fragment Shader:
precision lowp float;
varying vec2 v_texCoord;
uniform sampler2D s_texture;
uniform bool color_tint;
uniform float color_tint_amount;
uniform vec4 color_tint_color;
void main(){
float gradDistance;
vec4 texColor, gradColor;
texColor = texture2D(s_texture, v_texCoord);
if (color_tint){
gradColor = color_tint_color;
gradColor.a = texColor.a;
texColor = gradColor * color_tint_amount + texColor * (1.0 - color_tint_amount);
}
gl_FragColor = texColor;
}
The code works fine, but it is interesting that even all color_tint I passed in is false, the above code still cause serious drag in performance. When comparing to:
void main(){
float gradDistance;
vec4 texColor, gradColor;
texColor = texture2D(s_texture, v_texCoord);
if (false){
gradColor = color_tint_color;
gradColor.a = texColor.a;
texColor = gradColor * color_tint_amount + texColor * (1.0 - color_tint_amount);
}
gl_FragColor = texColor;
}
Which the later one can achieve 40+ fps while the first one is about 18 fps. I double checked and all color_tint passed in the first one are false so the block should never executed.
BTW, I am programming the above in Android 2.2 using GLES20.
Could any expert know what's wrong with the shader?
I am not an expert in fragment shaders, but I assume the second one would be faster because the entire if statement could be removed at compile time because it is never true. In the first one it can't tell that color_tint is always false until runtime so will need to check that and branch every time. Branches can be expensive, especially on graphics hardware that is often designed for predictable serial programming.
I suggest you try rewriting it to be branchless - Darren's answer has some good suggestions in that direction.
Branches are very slow on fragment shaders avoid them if possible. Use color_tint_amount of 0 for no tint. Premultiply the color_tint_color and save a multiply per pixel. Make color_tint_amount = 1.0 - color_tint_amount. (so now 1.0 means no gradColor) These shaders and run millions upon millions of times a second, you have to save every cycle you can.