I have to simulate the motion of some objects, so I have created a SurfaceView on which I draw them with a dedicated Thread. Every loop I call canvas.drawColor() to clean up all previous object's positions and to draw the new states. Everything works fine and the frame rate is decent.
The problem is: what if I want to draw the trails of the objects' trajectories? In that case I have to memorize the positions for every object and, at every loop, draw all past positions that are hundreds of points. This task keep the frame rate lower and it seems to me absurd that the only way is to redraw every time the same points! There is a way to keep the points painted on the canvas and not to cancel them with the canvas.drawColor() at every loop (that is necessary for others tasks)?
Sort of.
The SurfaceView's Surface uses multiple buffers. If it's double-buffered, and you don't clear the screen every frame, then you'll have the rendering from all the odd-numbered frames in one buffer, and all the even-numbered frames in the other. Every time you draw a new frame, it'll flip to the other buffer, and half of your positions will disappear (looks like everything is vibrating).
You could, on each frame, draw each object at its current position and its previous position. That way both frames would get every object position.
The practical problem with this idea is that you don't know how many buffers the Surface is using. If it's triple-buffered (which is very possible) then you would need to draw the current, previous, and previous-previous positions to ensure that each buffer had every position. Higher numbers of buffers are theoretically possible but unlikely.
Having said all this, you don't want to pursue this approach for a simple reason: when you lock the canvas, you are agreeing to modify every pixel in the dirty area. If you don't, the results are unpredictable, and your app could break weirdly in a future version of the operating system.
The best way to do what you want is to draw onto an off-screen Bitmap and then blit the entire thing onto the Surface. It's a huge waste at first, since you're copying a screen-sized bitmap for just a couple of objects, but very shortly the reduced draw calls will start to win.
Create a Bitmap that's the same size as the Surface, then create a Canvas using the constructor that takes a Bitmap. Do all your drawing through this Canvas. When you want to update the screen, use a drawBitmap() method on the SurfaceView's Canvas.
I recommend against using software scaling due to the performance cost -- make sure you're doing a 1:1 copy. You can use the setFixedSize() call on the SurfaceView surface to make it a specific size if that's helpful -- for devices with larger pixel densities it can improve your frame rates and reduce battery usage (blog post here).
Related
I created a voxel world using OpenGL ES 2.0 using a VBO to store a basic cube and using a different position matrix for each cube. I am able to get 30fps on my Galaxy S3 when there are 500-600 cubes being rendered, but anything more than 1500 cubes isn't able to run at a faster rate than 8 fps. This is unacceptable because the voxel world should be able to handle more than 5,000 voxels being rendered at a stable 30fps. I have played other mobile games on my phone that run at good framerates and render much more than 5000 blocks at a time. What kind of techniques would be best for getting good performance?
Here is what I have set up in more detail:
There is one VBO containing vertex information for a basic cube.
Each block has its own matrix that is translated to the block's position in world space (This matrix is calculated only once when the block is created). The block calls glDrawArrays to draw the cube using its position matrix. Unfortunately this means there are thousands of calls to glDrawArrays in each frame.
Is there a better technique to this? I don't know how to group all the blocks into one single call to glDrawArrays because that would mean the VBO would need a huge allocation, to add all the vertex data for every single cube, and it is impossible to know how much space the VBO would need before drawing them. What I was thinking was to allocate a VBO for every 500 or so blocks so that if it needs more space for blocks it can always create a new VBO for it. And this way it wouldn't be allocating too much extra space since it will only allocate enough space for 500 blocks, and this way if we have 5000 blocks in the world, there will be only 10 calls to glDrawArrays instead of having thousands of those calls.
Another idea I have is that instead of having a VBO for the cube, I could make a VBO for a quad, and use a transformation matrix on each quad. This would require even more calls to glDrawArrays since I would have to call it for each face of the cube, but the plus side is that this way I can remove the faces that already have a block next to them. For the floor level, each block has 4 blocks surrounding it, so those 4 faces don't actually need to be drawn. This would save drawing those 4 quads for each block, but it would require more than double the amount of glDrawArrays calls. To reduce the amount of glDrawArrays calls I could create a new VBO for every 500 or so quads, and add/remove quads to the current VBOs whenever necessary. This would reduce the amount of glDrawArrays calls, but it would mean that I have to group each quad based on its texture, which is another issue because if I have to create a VBO for each texture, that would require me to allocate a lot of extra unnecessary space because there might be just one block that uses a certain texture and I may end up allocating space for 500 blocks for that texture.
These are my thoughts on some of the methods I can think of to optimise the rendering, but I don't think any of these techniques will drastically improve the fps of the game, because every method comes with its own issues. Is there anything that I have not thought of that could be a better solution?
EDIT: I switched to rendering quads instead of cubes because this way I can skip over the faces that are not visible. After that I also added frustum culling so that only blocks visible inside the frustum are shown. This increased the performance so that I can render a decent sized world at 30 fps now. But I think there is still a lot of room for improvement, because there are currently 23,000 calls to glDrawArrays(GL_TRIANGLES) (one for each quad rendered on screen). Would switching to using glDrawArrays(GL_TRIANGLE_STRIPS) make any real difference? And also creating VBO's that hold 1,000 quads each instead of just 1 quad is a possibility, but that would mean I would have to allocate a lot more space in the VBO's. (Right now there is only one quad stored in the VBO which is transformed by a matrix to its position/rotation).
if using Octtrees (wich is definitely THE WAY) does not suit you, you can optimize the code for calling the vbo lists.
In my work, I started with a scene rendering at 3fps rate, just optimizing the opengl calls and context switches, now runs on 53fps (wich is quite fine considering the starting point).
So, try not to change any register inside the gpu between calls:
order all the objects with the same shader to render them all at the time using only one glUseProgram
order objects with transparency, so you only draw translucent objects at the end.
draw objects in such a fashion that fragments are drawn only once (if a object is behind another, draw the front object first, cause depth test is faster than fragment calculation).
use shaders without "discard;" wich is costly for the cpu to process.
use reversed loops to get a little bit of cpu speed
dont select the texture if it is already the same than selected in the GPU (a cpu 'if' is less costly than a GPU register change).
try not to update the shader attributes if there is no need to (cpu if is less costly).
if you post some pieces of code I can help you better.
I am currently implementing a voxel world using java on a normal PC with OpenGL 4.x.
At the beginning I had the same issue but that I followed a very basic tutorial: https://sites.google.com/site/letsmakeavoxelengine/
With one render call per chunk there is no problem having 10 Chunks of 32*32*32 Blocks rendered (FPS > 30). You should load the Chunk and only add those faces which are not occluded by other faces (so that they are visible to the player) to an array which will be uploaded to a VBO. Therefore you have one rendercall per Chunk with the minimum amout of faces
In 2D is looks like this
_ _ _
|B B B|
|B B |
|B B B|
- - -
There is no need to draw the faces between the outter faces. In addition you can use frustrum culling: How to check if an object lies outside the clipping volume in OpenGL?
So you just need to make a render call for those chunks which are actually inside your frustrum. Do not render chunks behind the camera. OpenGL will make a lot of calculations for all vertices of the chunk, but then the chunk is not visible so why render it in the first place. This can happen in your java code.
A third optimazation could be deferred shading: http://en.wikipedia.org/wiki/Deferred_shading
As far as I know the shading is processed before depth testing and throwing away those triangels/ faces occluded by others, you can speed up your shader using deferred shading as you only shade those vertices which will pass the depth-testing.
There are a lot of more ways to optimize voxel rendering but for me this are the most basic operations. The given tutorial behind the first link isn't finished yet, but he shows a lot of ideas for optimizing voxel rendering.
Edit:
If you want to use textures, which different textures for each cube, I recommend to place all textures in a big one, so you do not need to swap textures, a simple texture lookup is much more faster than swapping a texture (glBindTexture(..)) and then make a lookup and later swap back to this texture. Use one big huge texture and apply the right UV coordinates to your vertices.
You should use BSP Octrees to discard big blocks of offscreen cubes.
You divide the world into 8 "space cubes" wich go in the different axis.
Then, you check if the camera can see something inside the cube, if it can't you discard all the blocks in that section (wich can speed up to 8x). Then, inside the block, you divide again in 8 sections, and check again if they are visible. An so on, speeding checks and renders.
http://en.wikipedia.org/wiki/Octree
http://i.ytimg.com/vi/S-oIeUiw2UY/hqdefault.jpg
Octree can be accelerated using "portals" (and I dont mean GladOs ;) ) wich discard voxels and Octrees depending on the visibility inside doors and windows, but is only good for interiors.
I am making a 2D graphical app that will display planets. I say 2D because the majority of the app will be 2D. I however want to render some 3D objects into dynamic sprites offscreen (to a texture), with transparent (possibly translucent) areas, and subsequently render those rendered textures to the active screen as 2D textured quads. Rendering directly to the screen as 3D objects is not optimal in this case, because it would require me to implement some sort of 3D picking. I am not that advanced in math yet. Note also that the main screen render will be orthographic, while the offscreen render would be perspective.
How can I accomplish this (general idea, no need for specifics), and what would be the most efficient way to do this? Would this reduce support for a wide variety of devices? Also, if the 3D sprite renderings were constantly refreshed every frame (such as being rotated fine amounts) would that kill framerates with continuous unloading/reloading of texture to memory? I suppose that some scenes could have as many as 10 of these 3D offscreen sprites.
Thanks for the help
If you really must use the offscreen rendering just search for FBO(frame buffer object) and attach a texture to it, then use the texture in your main view as 2D. It is quite a straight forward procedure but might decrease the speed. You will probably not be able to do any multithreading on it so you should create just 1 FBO. Its dimensions will probably have to be a power of 2 so the resolution might be different then you wish. This procedure does not continually load/unload anything, the data is allocated when creating the texture and GL draws/reads directly from it. The largest drawback here will be the memory.. You will create as many as 10 of this textures just to draw on them and present once.
It might be very easy to place this objects on a specific place on your main buffer though: Make all the logic as if you would want to draw a full screen planet but use "viewport" method to place it to a specific part of the screen.
If those planet images will be updated only on user request (you don't want to draw them every frame) then I suggest you try to make a combination of both: Create a FBO with a texture of same size or larger then main view and draw all the planets to this single texture using "viewport" method. Then you can update any you want, just don't clear the buffer, rather draw a clear rect on the specific part of the buffer/texture. And keep drawing the whole texture to the main buffer.
Is that possible? Because I need to draw a photo-background with moving objects on top of it, in atleast 35 fps. It must take alot of resources to redraw that whole background every frame, even for a short time? (live wallpaper)
I tried to redraw the background only at each moving object's Rect, but that only makes those parts of the screen flickering.
Ok it seems there is no big gain anyways in doing these opmimizations.
The flicker is ofcourse because of double-buffering, there are Two buffers to erase, therefore the flickering.
I'm tried to determine the "best" way to scroll a background comprised of tiled Bitmaps on an Android SurfaceView. I've actually been successful in doing so, but wanted to determine if there is a more efficient technique, or if my technique might not work on all Android phones.
Basically, I create a new, mutable Bitmap to be slightly larger than the dimensions of my SurfaceView. Specifically, my Bitmap accomodates an extra line of tiles on the top, bottom, left, and right. I create a canvas around my new bitmap, and draw my bitmap tiles to it. Then, I can scroll up to a tile in any direction simply by drawing a "Surfaceview-sized" subset of my background Bitmap to the SurfaceHolder's canvas.
My questions are:
Is there a better bit blit technique than drawing a background bitmap to the canvas of my SurfaceHolder?
What is the best course of action when I scroll to the edge of my background bitmap, and wish to shift the map one tile length?
As I see it, my options are to:
a. Redraw all the tiles in my background individually, shifted a tile length in one direction. (This strikes me as being inefficient, as it would entail many small Bitmap draws).
b. Simply make the background bitmap so large that it will encompass the entire scrolling world. (This could require an extremely large bitmap, yet it would only need to be created once.)
c. Copy the background bitmap, draw it onto itself but shifted a tile length in the direction we are scrolling, and draw the newly revealed row or column of tiles with a few individual bitmap draws. (Here I am making the assumption that one large bitmap draw is more efficient than multiple small ones covering the same expanse.)
Thank you for reading all this, and I would be most grateful for any advice.
I originally used a similar technique to you in my 'Box Fox' platformer game and RTS, but found it caused quite noticeable delays if you scroll enough that the bitmap needs to be redrawn.
My current method these games is similar to your Option C. I draw my tiled map layers onto a grid of big bitmaps (about 7x7) taking up an area larger than the screen. When the user scrolls onto the edge of this grid, I shift all the bitmaps in the grid over (moving the end bitmaps to the front), change the offset of grid, and then just redraw the new edge.
I'm not quite sure which is faster with software rendering (your Option C or my current method). I think my method maybe faster if you ever change to OpenGL rendering as you wouldn't have to upload as much texture data to the graphics card as the user scrolls.
I wouldn't recommend Option A because, as you suggest, the hundreds small bitmap draws for a tiled map kills performance, and it gets pretty bad with larger screens. Option B may not even be possible with many devices, as it's quite easy to get a 'bitmap size exceeds VM budget' error as the heap space limit is set quite low on many phones.
Also if you don't need transparency on your map/background try to use RGB_565 bitmaps, as it's quite a lot faster to draw in software, and uses up less memory.
By the way, I get capped at 60fps on both my phone and 10" tablet in my RTS with the method above, rendered in software, and can scroll across the map smoothly. So you can definitely get some decent speed out of the android software renderer. I have a 2D OpenGL wrapper built for my game but haven't yet needed to switch to it.
My solution in a mapping app relies on a 2 level cache, first tile objects are created with a bitmap and a position, these are either stored on disk or in a Vector (synching is important for me, multithreaded HTTP comms all over the place).
When I need to draw the background I detect the visible area and get a list of all the tiles I need (this is heavily optimised as it gets called so often) then either pull the tiles from memory or load from disk. I get very reasonable performance even on slightly older phones and nice smooth scrolling with no hiccups.
As a caveat, I allow tiles not to be ready and swap them with a loading image, I don't know if this would work for you, but if you have all the tiles loaded in the APK you should be fine.
I think one efficent way to do this would be to use canvas.translate.
On the first drawing the entire canvas would have to be filled with tiles. New android phones can do this easily and quickly.
When the backround is scrolled I would use canvas.translate(scrollX, scrollY), then I would draw individualy one by one tile to fill the gaps, BUT, I would use
canvas.drawBitmap(tileImage[i], fromRect, toRect, null) which would only draw the parts of the tiles that are needed to be shown, by setting fromRect and toRect to correspond to scrollX and scrollY.
So all would be done by mathematics and no new bitmaps would be created for the background - save some memory.
EDIT:
However there is a problem using canvas.translate with surfaceView, because it is double buffered and canvas.translate will translate only one buffer but not the second one at the same time, so this alternating of buffers would have to be taken into account when depending on surfaceView to preserve the drawn image.
I am using your original method to draw a perspective scrolling background. I came up with this idea entirely by accident a few days ago while messing around with an easy technique to do a perspective scrolling star field simulation. The app can be found here: Aurora2D.apk
Just tilt your device or shake it to make the background scroll (excuse the 2 bouncing sprites - they are there to help me with an efficient method to display trails). Please let me know if you find a better way to do it, since I have coded several different methods over the years and this one seems to be superior. Simply mail me if you want to compare code.
I want to draw a graph that updates in real time (grows from the right). The most efficent way I can think of to do that would be to copy everything from x[0 .. width-2] left by 1 pixel, then draw the new value at x[width-1].
I have little experience with Android, but from what I can tell, Canvas doesn't operate on it's contents at all. Do I need to repaint the entire screen each time? This involves scaling and smoothing so I'm worried it will be slow.
Should I draw into a byte[][] then use this to paint to the screen (shifting the contents of my buffer left each time) ?
If your graph is bounded, try rendering all of it once to an Image, and then blit the relevant parts from that Image to your Canvas. Try to avoid actually "moving" pixels in the buffer, as that might introduce dependencies between your reads and writes and could really kill the performance. It might actually be better to copy from 1 buffer to another and alternate which one gets blitted to the screen. Finally, if you end up having to manually work on pixels, make sure you run on the image in lines rather than columns and that you start from the beginning of the line to help with the caching.
Regarding performance, without profiling we cannot say.
It may be that line drawing is hardware accelerated on your target phone, and you should draw the graph from scratch using line-drawing primitives each frame.
On the other hand, the straightforward pixel manipulation of an image buffer would be:
Create an image that is the right size and clear it to a "background_color". This image needs to have setpixel() functionality.
Have an array of values that record the y of each x time, so for any column you know where you last plotted your graph.
Treat this "chart_image" and "chart_array" as a circular buffer. For each time step:
Y = ...;
X = time_since_start % chart_width;
chart_image.setpixel(X,chart_array[X],background_color); // clear previous line
chart_array[X] = Y;
chart_image.setpixel(X,chart_array[X],foreground_color); // draw new plot
And now you need to blit it. You need to blit the image twice:
X = time_since_start % chart_width;
// the newest data is on the left of the chart_image but gets drawn on the right side of the output
blit(out_x+X,out_y, // destination coordinates
chart_image,
0,0, // top left of part of chart_image to blit
X,chart_height); // bottom right of chart_image part
// the oldest data is on the right of the chart_image but gets drawn on the left side of the output
blit(out_x,out_y,
chart_image,
X,0,
chart_width,chart_height);
Things get more tricky if you want to use lines rather than individual pixels, but a drawline() instead of a setpixel() can make that work with this approach too.
(Apologies for not knowing the Android APIs; but the approach is generic.)
Just a thought, which you may have considered already, but I wouldn't shift the contents of the buffer - I'd just try using it like a circular buffer. Keep an index to the current column and once you've wrapped around to the left-most column again you can draw to the destination in two segments - what is on the right side of the current column and then what is to the left, including the most recently filled column. This way you'll not have to shift anything around, and each screen refresh is just two blits (bitmap copies) for the two segments. If that bit's too slow you could still always paint those into a second off-screen buffer before blitting the whole thing to the screen in one go. Surely one large blit to the screen is fairly quick, no?
Since i can take as granted that you are storing the graph data in memory, redrawing it shouldn't be a problem. It's not intensive at all to redraw a set of points every frame. Shifting memory would be intensive, it's moving everything versus just painting only what you need.
Worst case scenario, since it's a function of time, only one value per column of the display, aprox 800 pixels/values in landscape that the system has to draw. It's trivial.
Have you profiled this?
EDIT: Remember, it's not that the system has to draw each point, it only draws on memory, then makes the use of it's primitives. Don't think like it iterates drawing the point, dumping to the video memory, then again.