I am writing an android application and I use renderscript for some complex calculation (I am simulating a magnetic pendulum) that is performed on each pixel of a bitmap (using script.forEach_root(...)). This calculation might last from tenth of a second to up to about 10 seconds or even more, depending on the input parameters.
I want to keep the application responsive and allow users to change parameters without waiting. Therefore I would like to interrupt a running calculation based on user input on the Java-Side of the program. Hence, can I interrupt a forEach_root-call?
I already tried some solutions but they either do not work or do not fully satisfy me:
Add a variable containing a cancel-Flag to RenderScript and check its status in root: Does not work because I cannot change variables using set while forEach_root is running (they are synchronized - I guess for good reasons).
Split the image up into multiple tiles: This is a possible solution and currently the one I favor the most, yet it is only a work around because calculating a single tile might also take several seconds.
Since I am new to renderscript I am wondering whether there are some other solutions which I was not aware of.
Unfortunately, there is no simple way to cancel a running kernel for RenderScript. I think that your tiling approach is probably the best solution today, which you should be able to implement using http://developer.android.com/reference/android/renderscript/Script.LaunchOptions.html when you begin kernel execution.
Related
Well i have read a lot of answers of similar questions (even if they are old from like 2013-2014) and i understood that it is not possible to know it exactly since android doesnt count the hardware usage as usage of the app, and some other possible problems like services etc.
At the moment I'm trying to test the perfomance of an App using a protocol to reach a goal and the perfomance of the same App using another protocol (not well known by everyone) to reach the same goal, the default android battery analyzer is good for me since both cases are like 90% the same and i know how the protocols work
My problem is that i'm not sure which one is the best to measure the mAph consumed by my App, i know that there are some external apps that shows it but i would prefer using the one of default, I believe this is something important not only for me but for other people who might have to compare different protocols.
I know that i can measure it programmatically and I've done it too, i save the percentage when the app is opened and how much has been consumed until it gets closed, but it isnt an exact measure since while the app is opened some other apps can do heavy work and add some kind of noise of what i'm measuring so i would prefer to use the android's battery analyzer.
Get a spare device. Load it completely, then run the protocol until shutdown without other interaction (no youtube or anything), note the time it lasted. Repeat with the other protocol. Imho that is a fair way to compare. Note that every device behaves differently and it may or may not be possible to transfer this result to other devices e.g. with different network chips, processors or even firmware versions.
For a more fair comparison I think you should compare how the protocols work. I.e. number of interactions, payload size etc. because the power consumption can only ever be an estimate.
I use the SimpleBlobDetector of OpenCV to find a specific set of little features in images. I work in C++ native (JNI) on Android. On my newer faster phone, it works nicely.
However, on an older slower phone, it is way too slow. I have discovered that the slowest part is the thesholding. Modifying the three theshold parameters to speed things up simply makes the algorithm stop working.
I found a version of the source code on some web page and started modifying it.
I try to use an adaptive thresholding instead and to perform some erode and dilate after, for good measure, but I didn't manage to get any reasonable results. Perhaps the parameters are way off?
adaptiveThreshold(mGr, mBin, 255, ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY_INV, 25, 30);
Mat kernel = getStructuringElement(MORPH_CROSS, Size(3,3), Point(1,1));
erode(mBin, mBin, kernel);
dilate(mBin, mBin, kernel, Point(-1,-1), 5);
I get confused when there are too many parameters to fiddle with. I am also concerned that the image conditions will vary and then other parameters have to be used. I'd want an "adaptive adaptive" tresholding, if you know what I mean?
What can I do to make it work, and what other ways can we do this to get higher speed?
Assuming you are dealing with video, rather than a random set of images, one technique to reduce the load on your device when doing this type of detection, is to not do it very frame.
For example, you might do it even 10th frame rather than every frame.
You can experiment with different intervals to see if you can find one that reduces the load while still detecting quickly enough for your chosen use cases.
I'm always worried about optimization when it comes to game design and need to ask more experienced kivy users about some concerns.
Which one is truly faster?
Lets say you stored your graphic instructions in class attributes. If you're going to have a number of graphics updating on the screen every frame, but you're not adding or taking away from the canvas, Ask_Update seems to be the qualified choice.
Lets say you do add and remove graphic pieces around enough. Would it be better to just Clear the Canvas and canvas.add the stored instructions back?
or
Would it be better to call Clear after every removal or addition? That would seem like a pain in the tail vs just Clearing and canvas.add the graphics back.
Vectors....
How optimized are Vectors? Is the function/method a slow process? Just wondering because I've used 3D engines in the past that had some slow calls and it's usually the mathematical ones.
What is considered a good frame rate for a game app running on a hand-held device?
I also wonder about deleting instances. Does kivy have some special call for deleting an instance or would the usual del call (after running a cleanup function) and python garbage collection be enough?
I'm researching now because I don't want to develop something only to realize I wasn't aware of Kivy 'dos-and-donts'.
Clearing the canvas is inefficient, don't do that unless you actually want to remove everything.
You don't need to call ask_update in general.
Kivy's Vectors aren't particularly optimised, they're just wrappers around lists, but this probably isn't actually a problem for you.
A good framerate target is 60fps.
You can look at KivEnt for a game engine with particularly good performance with Kivy.
I am trying to speed up my app start-up time (currently ~5 seconds due to slow Guice binding), and when I run traceview I'm seeing pretty big variations (as high as 30%) in measurements from executions of the same code.
I would assume this is from garbage collection differences, but the time spent in startGC according to traceview is completely insignificant.
This is particularly aggravating because it's very difficult to determine what the effects were of my optimizations when the measurements are so variable.
Why does this happen? Is there any way to make the measurements more consistent?
I suppose you are starting profiling from the code rather than turning it on manually? But anyway even if you use Debug.startMethodTracing and Debug.stopMethodTracing from a specific point of your code you will receive different measurments.
You can see here that Traceview disables the JIT and I believe some other optimizations so during profiling your code is executed slower than without it. Also your code performance depends on overall system load. If some other app is doing any heavy operation in background your code will execute longer. So you should definitely get results that a slightly different and so start-up time couldn't be a constant.
Generally it is not so important how long your method executes but how much CPU time it consumes comparing to other methods.
Sounds like measurement is not your ultimate goal. Your ultimate goal is to make it faster.
The way to do that is by finding what activities are accounting for a large fraction of time, so you can find a better way to do them.
I said "finding", not "measuring", and I said "activities", not "routines".
To do this, it is only necessary to sample the program's state.
Many profilers collect a large number of samples of the program's state, but then they all fall into the same logic - they summarize, on the theory that all you want is measurements, and you don't really care of what.
In fact, if rather than getting summaries you could examine some of the samples in detail, it would tell you a great deal more about how the program is spending its time.
What's more, if on as few as two(2) samples you could see the program pursuing some goal, and it was something you could improve significantly, you would see a significant speedup.
This process can be repeated several times, and that's how you can really optimize it.
The process is explained in more detail here, and there's a use case here.
If you are doing any network related activity on startup then this tool can help you understand what is happening and how you might be able to optimize connections and caching. http://developer.att.com/developer/legalAgreementPage.jsp?passedItemId=9700312
My question is about game loop design in opengl.
I believe that game loop can be spited in 3 main steps
accept input - it may come from users click on the screen or system alert, or anything else.
do logic
update view
To make it simple let's just focus on step 2 and 3.
I think that it will be wrong to run them in parallel order or to mix them in to one step.
I give you an example.
Lats say you are creating a war game, and you need to draw 100 soldiers fighting on the screen, the logic will be to update their position, to update the background area, and then you need to draw the result. You can't start drawing one soldier before updated the position of another soldier.
So according this simple steps, it is clear that step 2 and 3 need to be synchronized some how, and step 2 must be done before step 3.
Now tell me how it is possible to run game loop on more then one thread, and more then one process? Does opnegl use multiCore? How?
Edit: one way to use multithreading is to precalculate the game logic, or in another words using Vectors. But there are two big disadvantages in vectors that make them almost unrecommend to use.
Some times you wan't to change your vector, so there were lots of calculation that you did and you are not going to use them
in most of the cases you are trying to reach 60+ FPS witch mean 16 milliseconds for game-loop, switching threads requires some kind of synchronization, any synchronization is bad for performance, from what I saw, even a simple Handler.post() in android(just adding task to queue to run it on other thread) may take up to 3 milliseconds(18% from your frame rate time), so unless your calculation take longer then that, don't do it! For now I did not found anything taking so much time.
Now tell me how it is possible to run game loop on more then one thread
The idea of multicore computing is parallelization, i.e. splitting up computational intensive tasks into independent working sets that can be processed in parallel. Games have surprisingly little space for parallelization, as you found out yourself.
The usual way to use multiple cores in games is to parallelize I/O with logic operations. I.e. doing all the networking and user interaction in one thread, AI and scene management in another. Sound is usually parallelized away, too.
Does OpenGL use multiCore? How?
The OpenGL specification doesn't specify this. But some implementations may choose to utilize multiple cores. though it's quite unlikely. Why? Because it creates unneccessary cache management overhead.
I will try to explain graphically, why 2 threads, one for rendering and one for logic are much better and doesn't result in inconsistencies:
The single threaded design you proposed would run as follows:
logic |-----| |-----|
graphics |-----| |-----| and so forth.
but as you have multiple cpus, you should utilize them. That means, after the first time the logic has finished, the graphics can render that game state. meanwhile, the next logic step can be calculated:
logic |-----||-----||-----|
graphics |-----||-----| and so forth.
As you see, you can almost double your performance in this way: of course there is some overhead, but it will be faster than the single threaded version anyway.
Also, you can assume, that the game logic will take less time calculating then the render thread: This makes the threading quite easy, because you can wait in the logic thread for a call back from the renderthread to give the render thread new instructions and then calculating the next logic step.