I'm writing a small piece of Renderscript to dynamically take an image and sort the pixels into 'buckets' based on each pixel's RGB values. The number of buckets could vary, so my instinct would be to create an arraylist. This isn't possible within Renderscript, obviously, so I was wondering what the approach to creating a dynamic list of structs within the script. Any help greatly appreciated.
There's no clear answer to this. The problem is that dynamic memory management is anathema to platforms like RenderScript--it's slow, implies a lot of things about page tables and TLBs that may not be easy to guarantee from a given processor at an arbitrary time, and is almost never an efficient way to do what you want to do.
What the right alternative is depends entirely on what you're doing with the buckets after they're created. Do you need everything categorized without sorting everything into buckets? Just create a per-pixel mask (or use the alpha channel) and store the category alongside the pixel data. Do you have some upper bound on the size of each bucket? Allocate every bucket to be that size.
Sorry that this is open-ended, but memory management is one of those things that brings high-performance code to a screeching halt. Workarounds are necessary, but the right workaround varies in every case.
I'll try to answer your goal question of classifying pixel values, and not your title question of creating a dynamically-sized list of structs.
Without knowing much about your algorithm, I will frame my answer using one of the two algorithms:
RGB Joint Histogram
Does not use neighboring pixel values.
Connected Component
Requires neighboring pixel values.
Requires a supporting data structure called "Disjoint set".
Common advice.
Both algorithms require a lot of memory per worker thread. Also, both algorithms are poorly adapted to GPU because they require some kind of random memory access (Note). Therefore, it is likely that both algorithms will end up being executed on the CPU. It is therefore a good idea to reduce the number of "threads" to avoid multiplying the memory requirement.
Note: Non-coalesced (non-sequential) memory access - reads, writes, or both.
RGB Joint Histogram
The best way is to compute a joint color histogram using Renderscript, and then run your classification algorithm on the histogram instead (presumably on the CPU). After that, you can perform a final step of pixel-wise label assignment back in Renderscript.
The whole process is almost exactly the same as Tim Murray's Renderscript presentation in Google I/O 2013.
Link to recorded session (video)
Link to slides (PDF)
The joint color histogram will have to have hard-coded size. For example, a 32x32x32 RGB joint histogram uses 32768 histogram bins. This allows 32 levels of shades for each channel. The error per channel would be +/- 2 levels out of 256 levels.
Connected Component
I have successfully implemented multi-threaded connected component labeling on Renderscript. Note that my implementation is limited to execution on CPU; it is not possible to execute my implementation on the GPU.
Prerequisites.
Understand the Union-Find algorithm (and its various theoretical parts, such as path-compression and ranking) and how connected-component labeling benefits from it.
Some design choices.
I use a 32-bit integer array, same size as the image, to store the "links".
Linking occurs in the same way as Union-Find, except that I do not have the benefit of ranking. This means the tree may become highly unbalanced, and therefore the path length may become long.
On the other hand, I perform path-compression at various steps of the algorithm, which counteracts the risk of suboptimal tree merging by shortening the paths (depths).
One small but important implementation detail.
The values stored in the integer array is essentially an encoding of the "(x, y)" coordinates to (i) itself, if the pixel is its own root, or (ii) a different pixel which has the same label as the current pixel.
Steps.
The multi-threaded stage.
Divide the image into small tiles.
Inside each tile, compute the connected components, using label values local to that tile.
Perform path compression inside each tile.
Convert the label values into global coordinates and copy the tile's labels into the main result matrix.
The single-threaded stage.
Horizontal stitching.
Vertical stitching.
A global round of path-compression.
Related
as part of my project, I need to plot 2D and 3D functions in android using android studio. I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Thanks in advance.
I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Since you already understand 2D and want to advance to 3D there's a simple and non-optimal method:
Decide on how much z depth you desire:
EG: Currently your limits for x and y in your drawing functions are 1920 and 1080 (or even 4096x4096), if you want to save memory and have things a bit low resolution use a size of 1920x1080x1000 - that's going to use 1000x more memory and has the potential to increase the drawing time of some calls by 1000 times - but that's the cost of 3D.
A more practical limit is matrices of 8192,8192,16384 but be aware that video games at that resolution need 4-6GB graphic cards to work half decently (more being a bit better) - so you'll be chewing up some main memory starting at that size.
It's easy enough to implement a smaller drawing space and simply increase your allocation and limit variables later, not only does that test that future increases will go smoothly but it allows everything to run faster while you're ironing the bugs out.
Add a 3rd dimension to the functions:
EG: Instead of a function that is simply line_draw(x,y) change it to line_draw(x,y,z), use the global variable z_limit (or whatever you decide to name it) to test that z doesn't exceed the limit.
You'll need to decide if objects at the maximum distance are a dot or not visible. While testing having all z's past the limit changed to the limit value (thus making them a visible dot) is useful. For the finished product once it goes past the limit that you are implementing it's best that it isn't visible.
Start by allocating the drawing buffer and implementing a single function first, there's no point (and possibly great frustration) changing everything and hoping it's just going to work - it should but if it doesn't you'll have a lot on your plate if there's a common fault in every function.
Once you have this 3D buffer filled with an image (start with just a few 3D lines, such as a full screen sized "X" and "+") you draw to your 2D screen X,Y by starting at the largest value of Z first (EG: z=1000). Once you finish that layer decrement z and continue, drawing each layer until you reach zero, the objects closest to you.
That's the simplest (and slowest) way to make certain that closest objects obscure the furthest objects.
Now does it look OK? You don't want distant objects the same size (or larger) than the closest objects, you want to make certain that you scale them.
The reason to choose numbers such as 8192 is because after writing your functions in C (or whichever language you choose) you'll want to optimize them with several versions each, written in assembly language, optimized for specific CPUs and GPU architectures. Without specifically optimized versions everything will be extremely slow.
I understand that you don't want to use a library but looking at several should give you an idea of the work involved and how you might implement your own. No need to copy, improve instead.
There are similar questions and answers that might fill in the blanks:
Reddit - "I want to create a 3D engine from scratch. Where do I start?"
Davrous - "Tutorial series: learning how to write a 3D soft engine from scratch in C#, TypeScript or JavaScript"
GameDev.StackExchange - "How to write my own 3-D graphics library for Windows? [closed]"
I'm working on an Android app built in Unity3D that needs to create new textures at runtime every so often based off different images pixel data.
Since Unity for Android uses OpenGL ES and my app is a graphical one that needs to run at ideally a solid 60 frames per second, I've created a C++ plugin operating on OpenGL code instead of just using Unity's Texture2D slow texture construction. The plugin allows me to upload the pixel data to a new OpenGL texture, then let Unity know about it through their Texture2D's CreateExternalTexture() function.
Since the version of OpenGL ES running in this setup is unfortunately single-threaded, in order to keep things running in frame I do a glTexImage2D() call with an already gen'd TextureID but with null data in the first frame. And then call glTexSubImage2D() with a section of my buffer of pixel data, over multiple subsequent frames to fill out the whole texture, essentially doing the texture creation synchronously but chunking the operation up over multiple frames!
Now, the problem I'm having is that every time I create a new texture with large dimensions, that very first glTexImage2D() call will still lead to a frame-out, even though I'm putting null data into it. I'm guessing that the reason for this is that there is still a pretty large memory allocation going on in the background with that first glTexImage2D() call, even though I'm not filling in the image until later.
Unfortunately, these images that I'm creating textures for are of varying sizes that I don't know of beforehand and so I can't just create a bunch of textures up front on load, I need to specify a new width and height with each new texture every time. =(
Is there anyway I can avoid this memory allocation, maybe allocating a huge block of memory at the start and using it as a pool for new textures? I've read around and people seem to suggest using FBO's instead? I may have misunderstood but it seemed to me like you still need to do a glTexImage2D() call to allocate the texture before attaching it to the FBO?
Any and all advice is welcome, thanks in advance! =)
PS: I don't come from a Graphics background, so I'm not aware of best practices with OpenGL or other graphics libraries, I'm just trying to create new textures at runtime without framing out!
I haven't dealt with the specific problem you've faced, but I've found texture pools to be immensely useful in OpenGL in terms of getting efficient results without having to put much thought into it.
In my case the problem was that I can't use the same texture for an input to a deferred shader as the texture used to output the results. Yet I often wanted to do just that:
// Make the texture blurry.
blur(texture);
Yet instead I was having to create 11 different textures with varying resolutions and having to swap between them as inputs and outputs for horizontal/vertical blur shaders with FBOs to get a decent-looking blur. I never liked GPU programming very much because some of the most complex state management I've ever encountered was often there. It felt incredibly wrong that I needed to go to the drawing board just to figure out how to minimize the number of textures allocated due to this fundamental requirement that texture inputs for shaders cannot also be used as texture outputs.
So I created a texture pool and OMG, it simplified things so much! It made it so I could just create temporary texture objects left and right and not worry about it because the destroying the texture object doesn't actually call glDeleteTextures, it simply returns them to the pool. So I was able to finally be able to just do:
blur(texture);
... as I wanted all along. And for some funny reason, when I started using the pool more and more, it sped up frame rates. I guess even with all the thought I put into minimizing the number of textures being allocated, I was still allocating more than I needed in ways the pool eliminated (note that the actual real-world example does a whole lot more than blurs including DOF, bloom, hipass, lowpass, CMAA, etc, and the GLSL code is actually generated on the fly based on a visual programming language the users can use to create new shaders on the fly).
So I really recommend starting with exploring that idea. It sounds like it would be helpful for your problem. In my case I used this:
struct GlTextureDesc
{
...
};
... and it's a pretty hefty structure given how many texture parameters we can specify (pixel format, number of color components, LOD level, width, height, etc. etc.).
Yet the structure is comparable and hashable and ends up being used as a key in a hash table (like unordered_multimap) along with the actual texture handle as the value associated.
That allows us to then do this:
// Provides a pool of textures. This allows us to conveniently rapidly create,
// and destroy texture objects without allocating and freeing an excessive number
// of textures.
class GlTexturePool
{
public:
// Creates an empty pool.
GlTexturePool();
// Cleans up any textures which haven't been accessed in a while.
void cleanup();
// Allocates a texture with the specified properties, retrieving an existing
// one from the pool if available. The function returns a handle to the
// allocated texture.
GLuint allocate(const GlTextureDesc& desc);
// Returns the texture with the specified key and handle to the pool.
void free(const GlTextureDesc& desc, GLuint texture);
private:
...
};
At which point we can create temporary texture objects left and right without worrying about excessive calls to glTexImage2D and glDeleteTextures.I found it enormously helpful.
Finally of note is that cleanup function above. When I store textures in the hash table, I put a time stamp on them (using system real time). Periodically I call this cleanup function which then scans through the textures in the hash table and checks the time stamp. If a certain period of time has passed while they're just sitting there idling in the pool (say, 8 seconds), I call glDeleteTextures and remove them from the pool. I use a separate thread along with a condition variable to build up a list of textures to remove the next time a valid context is available by periodically scanning the hash table, but if your application is all single-threaded, you might just invoke this cleanup function every few seconds in your main loop.
That said, I work in VFX which doesn't have quite as tight realtime requirements as, say, AAA games. There's more of a focus on offline rendering in my field and I'm far from a GPU wizard. There might be better ways to tackle this problem. However, I found it enormously helpful to start with this texture pool and I think it might be helpful in your case as well. And it's fairly trivial to implement (just took me half an hour or so).
This could still end up allocating and deleting lots and lots of textures if the texture sizes and formats and parameters you request to allocate/free are all over the place. There it might help to unify things a bit, like at least using POT (power of two) sizes and so forth and deciding on a minimum number of pixel formats to use. In my case that wasn't that much of a problem since I only use one pixel format and the majority of the texture temporaries I wanted to create are exactly the size of a viewport scaled up to the ceiling POT.
As for FBOs, I'm not sure how they help your immediate problem with excessive texture allocation/freeing either. I use them primarily for deferred shading to do post-processing for effects like DOF after rendering geometry in multiple passes in a compositing-style way applied to the 2D textures that result. I use FBOs for that naturally but I can't think of how FBOs immediately reduce the number of textures you have to allocate/deallocate, unless you can just use one big texture with an FBO and render multiple input textures to it to an offscreen output texture. In that case it wouldn't be the FBO helping directly so much as just being able to create one huge texture whose sections you can use as input/output instead of many smaller ones.
I'm very new to Neural Network's, but for a Project of mine they seem to fit. The application should run on a Android phone in the end. My idea is to use TenserFlow, but I'm not sure if its a fit.
I have following Situation, My Input is a Set of Images (the order of them should not have any impact on the output). The Set size is not fixed, but in most cases lower then 10. My output for the whole set is just a binary categorisation (Pass/Fail).
I will have a Convoluted Neural Network, which calculates two outputs, an weight and a pass/fail value. Each Image is supplied seperately to this CNN, the resulting values are then aggregated into a final pass/fail value by using a weighted arithmetic mean.
My Question is, can I train such a network with TensorFlow?
I do not have the values for the CNN outputs in my training data, but only the outputs after the aggregation. Is this possible in general with a gradient oriented Framework or do I have to use a Genetic Algorithm aproach for that.
You can definitely do this with tensorflow. After you've done the intro tutorials, you should look at the CNN tutorial to learn how to implement a convolutional neural network in tensorflow.
All of the heavy lifting is already taken care of. All you have to do is use the tf.nn.conv2d() method to make the convolutional layer, and then use one of the pooling and normalization ops and so on.
If you're unfamiliar with what this means, you should read up on it, but in a nutshell, there are three unique components to a CNN. The convolutional layer scans a window through the image that looks for certain patterns and its activations are recorded in the output as a grid. It's important for learning to lower the dimensionality of the data, and that's what the pooling layer does; it takes the output of the convolutional layer and reduces its size. The normalization layer than normalizes this because having normalized data tends to improve learning.
If you only have the aggregate outputs, then you need to think of a way of coming up with reasonable proxy outputs for individual images. One thing you could do is just use the aggregate output as labels for each individual image in a set and use gradient descent to train.
I understand why I need to use window functions in fft,I record a sine wave (16 bit pcm format), I have the sine wave audio record which I would like to analyze,I have recorded the mic audio into a byte array, transformed it back to the sample array representing the sine wave with values from [-1,1] - values divided by 32768. Do I need to apply the window on the array with the values [-1,1](the divided one) or do I need tho apply it on the sample array without dividing it by 32768? I looked up for some answers on SO and google, couldn't find any explanation on what is the right way.
One of the properties of linear-time-invariant is that the result of a cascade of multiple linear-time-invariant systems is the same regardless of the order in which the operations where done (at least in theory, in practice filters and such can have small non-linearities which can make the result slightly different depending on order).
From a theoretical perspective, applying a constant scaling factor to all samples can be seen as such a linear-time-invariant system. For a specific computer implementation, the scaling can also be considered approximately linear-time-invariant, provided the scaling does not introduce significant losses of precision (e.g. by scaling the number to values near the floating point smallest representable value), nor distortions resulting from scaling values outside the supported range. In your case, simply dividing by 32768 is most likely not going to introduce significant distortions, and as such could be considered to be an (approximately) linear-time-invariant system.
Similarly, applying a window which multiplies each samples by a different window value, can also be seen as another linear-time-invariant system.
Having established that you have such a cascade of linear-time-invariant systems, you can perform the scaling by 32768 either before or after applying the window.
P.S.: as Paul mentioned in comments, you'd probably want to perform the conversion from 16-bit words to floating point (whether scaled or not) first if you are going to work with floating point values afterward. Trying to perform scaling in fixed-point arithmetic might prove more complex than necessary, and may be subject to loss of precision I alluded to above if not done carefully.
I have an application where I want to track 2 objects at a time that are rather small in the picture.
This application should be running on Android and iPhone, so the algorithm should be efficient.
For my customer it is perfectly fine if we deliver some patterns along with the software that are attached to the objects to be tracked to have a well-recognizable target.
This means that I can make up a pattern on my own.
As I am not that much into image processing yet, I don't know which objects are easiest to recognize in a picture even if they are rather small.
Color is also possible, although processing several planes separately is not desired because of the generated overhead.
Thank you for any advice!!
Best,
guitarflow
If I get this straight, your object should:
Be printable on an A4
Be recognizeable up to 4 meters
Rotational invariance is not so important (I'm making the assumption that the user will hold the phone +/- upright)
I recommend printing a large checkboard and using a combination of color-matching and corner detection. Try different combinations to see what's faster and more robust at difference distances.
Color: if you only want to work on one channel, you can print in red/green/blue*, and then work only on that respective channel. This will already filter a lot and increase contrast "for free".
Otherwise, a histogram backprojection is in my experience quite fast. See here.
Also, let's say you have only 4 squares with RGB+black (see image), it would be easy to get all red contours, then check if it has the correct neighbouring colors: a patch of blue to it's right and a patch of green below it, both of roughly the same area. This alone might be robust enough, and is equivalent to working on 1 channel since for each step you're only accessing one specific channel (search for contours in red, check right in blue, check below in green).
If you're getting a lot of false-positives, you can then use corners to filter your hits. In the example image, you have 9 corners already, in fact even more if you separate channels, and if it isn't enough you can make a true checkerboard with several squares in order to have more corners. It will probably be sufficient to check how many corners are detected in the ROI in order to reject false-positives, otherwise you can also check that the spacing between detected corners in x and y direction is uniform (i.e. form a grid).
Corners: Detecting corners has been greatly explored and there are several methods here. I don't know how efficient each one is, but they are fast enough, and after you've reduced the ROIs based on color, this should not be an issue.
Perhaps the simplest is to simply erode/dilate with a cross to find corners. See here .
You'll want to first threshold the image to create a binary map, probably based on color as metnioned above.
Other corner detectors such as Harris detector are well documented.
Oh and I don't recommend using Haar-classifiers. Seems unnecessarily complicated and not so fast (though very robust for complex objects: i.e. if you can't use your own pattern), not to mention the huge amount of work for training.
Haar training is your friend mate.
This tutorial should get you started: http://note.sonots.com/SciSoftware/haartraining.html
Basically you train something called a classifier based on sample images (2000 or so of the object you want to track). OpenCV already has the tools required to build these classifiers and functions in the library to detect objects.