I have an application where I want to track 2 objects at a time that are rather small in the picture.
This application should be running on Android and iPhone, so the algorithm should be efficient.
For my customer it is perfectly fine if we deliver some patterns along with the software that are attached to the objects to be tracked to have a well-recognizable target.
This means that I can make up a pattern on my own.
As I am not that much into image processing yet, I don't know which objects are easiest to recognize in a picture even if they are rather small.
Color is also possible, although processing several planes separately is not desired because of the generated overhead.
Thank you for any advice!!
Best,
guitarflow
If I get this straight, your object should:
Be printable on an A4
Be recognizeable up to 4 meters
Rotational invariance is not so important (I'm making the assumption that the user will hold the phone +/- upright)
I recommend printing a large checkboard and using a combination of color-matching and corner detection. Try different combinations to see what's faster and more robust at difference distances.
Color: if you only want to work on one channel, you can print in red/green/blue*, and then work only on that respective channel. This will already filter a lot and increase contrast "for free".
Otherwise, a histogram backprojection is in my experience quite fast. See here.
Also, let's say you have only 4 squares with RGB+black (see image), it would be easy to get all red contours, then check if it has the correct neighbouring colors: a patch of blue to it's right and a patch of green below it, both of roughly the same area. This alone might be robust enough, and is equivalent to working on 1 channel since for each step you're only accessing one specific channel (search for contours in red, check right in blue, check below in green).
If you're getting a lot of false-positives, you can then use corners to filter your hits. In the example image, you have 9 corners already, in fact even more if you separate channels, and if it isn't enough you can make a true checkerboard with several squares in order to have more corners. It will probably be sufficient to check how many corners are detected in the ROI in order to reject false-positives, otherwise you can also check that the spacing between detected corners in x and y direction is uniform (i.e. form a grid).
Corners: Detecting corners has been greatly explored and there are several methods here. I don't know how efficient each one is, but they are fast enough, and after you've reduced the ROIs based on color, this should not be an issue.
Perhaps the simplest is to simply erode/dilate with a cross to find corners. See here .
You'll want to first threshold the image to create a binary map, probably based on color as metnioned above.
Other corner detectors such as Harris detector are well documented.
Oh and I don't recommend using Haar-classifiers. Seems unnecessarily complicated and not so fast (though very robust for complex objects: i.e. if you can't use your own pattern), not to mention the huge amount of work for training.
Haar training is your friend mate.
This tutorial should get you started: http://note.sonots.com/SciSoftware/haartraining.html
Basically you train something called a classifier based on sample images (2000 or so of the object you want to track). OpenCV already has the tools required to build these classifiers and functions in the library to detect objects.
Related
I'm trying to work out how to go about creating an on-the-fly simplification of incoming RGB values.
I'm trying to write an android app that utilizes a live camera view and sample colors. I've worked out how to detect and save individual color values, but my aim is to simplify these incoming values using clear ranges.
Example: When we detect Firebrick Red 178,34,34 it would recognize that value within a predefined range defined as Red and will be converted to a simple 255,0,0 upon saving the color.
The app is being put together in unity. If anyone has read a guide that goes over the process that would be ideal, so I can learn what is going on and how it is achieved. I'm stumped.
Thanks in advance for any help.
So the problem is its hard to define what "red" is. Its not just that different people have different definitions, different cultures also have an effect (some cultures don't consider red and yellow to be different colors. In fact at least one tribal culture still present today has no word for colors. See https://www.sapiens.org/language/color-perception/) on what we think colors are. So doing this is always a best effort type deal.
One simple thing you could do is just do a least difference algorithm. Have a set of color references, and see which one has the smallest delta between it and the color you're looking up. Then treat it as the reference color. That will work- kinda. If you have enough colors in your set to not get too far.
That will only kind of work though- rgb aren't actually equally distinct to the human eye, and some differences matter more than others- its non linear. A difference of 10 in green is more important than in red. A difference of 10 in the range [0,20] may be more or less stark than a difference of 10 in the range [100,120]. If you need this to work really well you may need to talk to someone who's studied color and how the human eye works to come up with a custom algorithm. Having worked on printers once upon a time, we had teams of experts figuring out the definition of digital colors to ink. Its much the same here.
as part of my project, I need to plot 2D and 3D functions in android using android studio. I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Thanks in advance.
I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Since you already understand 2D and want to advance to 3D there's a simple and non-optimal method:
Decide on how much z depth you desire:
EG: Currently your limits for x and y in your drawing functions are 1920 and 1080 (or even 4096x4096), if you want to save memory and have things a bit low resolution use a size of 1920x1080x1000 - that's going to use 1000x more memory and has the potential to increase the drawing time of some calls by 1000 times - but that's the cost of 3D.
A more practical limit is matrices of 8192,8192,16384 but be aware that video games at that resolution need 4-6GB graphic cards to work half decently (more being a bit better) - so you'll be chewing up some main memory starting at that size.
It's easy enough to implement a smaller drawing space and simply increase your allocation and limit variables later, not only does that test that future increases will go smoothly but it allows everything to run faster while you're ironing the bugs out.
Add a 3rd dimension to the functions:
EG: Instead of a function that is simply line_draw(x,y) change it to line_draw(x,y,z), use the global variable z_limit (or whatever you decide to name it) to test that z doesn't exceed the limit.
You'll need to decide if objects at the maximum distance are a dot or not visible. While testing having all z's past the limit changed to the limit value (thus making them a visible dot) is useful. For the finished product once it goes past the limit that you are implementing it's best that it isn't visible.
Start by allocating the drawing buffer and implementing a single function first, there's no point (and possibly great frustration) changing everything and hoping it's just going to work - it should but if it doesn't you'll have a lot on your plate if there's a common fault in every function.
Once you have this 3D buffer filled with an image (start with just a few 3D lines, such as a full screen sized "X" and "+") you draw to your 2D screen X,Y by starting at the largest value of Z first (EG: z=1000). Once you finish that layer decrement z and continue, drawing each layer until you reach zero, the objects closest to you.
That's the simplest (and slowest) way to make certain that closest objects obscure the furthest objects.
Now does it look OK? You don't want distant objects the same size (or larger) than the closest objects, you want to make certain that you scale them.
The reason to choose numbers such as 8192 is because after writing your functions in C (or whichever language you choose) you'll want to optimize them with several versions each, written in assembly language, optimized for specific CPUs and GPU architectures. Without specifically optimized versions everything will be extremely slow.
I understand that you don't want to use a library but looking at several should give you an idea of the work involved and how you might implement your own. No need to copy, improve instead.
There are similar questions and answers that might fill in the blanks:
Reddit - "I want to create a 3D engine from scratch. Where do I start?"
Davrous - "Tutorial series: learning how to write a 3D soft engine from scratch in C#, TypeScript or JavaScript"
GameDev.StackExchange - "How to write my own 3-D graphics library for Windows? [closed]"
I am using the grabcut algorithm of OpenCV for the background subtraction of an image in android. Algorithms runs fine but the result it gives is not accurate.
E.g. My input image is:
Output image look like:
so How can we increase accuracy of Grabcut Algorithm?
P.S: Apology for not uploading example images due to low reputation :(
I have been battling with the same problem for quite some time now. I have a few tips and tricks for this
1> Improve your seeds. Considering that GrabCut is basically a black box, to whom you give seeds and expect the segmented image as output, the seeds are all you can control and it becomes imperative to select good seeds. There are a number of things you can do in this regard if you have some expectation for the image you want to segment. For a few cases consider these:
a> Will your image have humans? Use a face detector to find the face and mark those pixels as Probable/definite foreground, as you deem fit. You could also use skin colour models within some region of interest to further refine your seeds
b> If you have some data on what kind of foreground you expect after segmentation, you can train colour models and use them as well to mark even more pixels
The list will go on. You need to creatively come up with different ways to adds more accurate seeds.
2> Post Processing: Try simple post processing techniques like the Opening and Closing operations to smoothen your fgmask. They will help you get rid of a lot of noise in the final output.
In general graphcut (and hence grabcut) tends to snap quickly to edges and hence if you have strong edges close to your foreground boundary, you can expect inaccuracies in the result.
I need to detect objects in a scene (on an iPhone and Android). The environment is constrained in a way that should make the problem easier and more accurate:
the environment is small and known... users are exploring a single room or small outdoor area that I can take pictures of ahead of time to "train" or constrain the algorithm
the user's location within the space is often limited... even when the space is large, the user might be confined to specific paths within the space
the objects being detected are relatively static... they are part of the environment and don't move
BUT, making the problem harder:
I can't modify the environment by placing markers on objects, so I need to recognize the objects themselves
The objects are pretty similar looking, so we might have to use the surrounding scene as input, not just the individual items
For instance, imagine walking through a historic cemetery along a path (you're not allowed to walk on the grass). When a user points their phone at a headstone, I'd like to be able to identify the headstone and estimate where the user is relative to the headstone (so I can estimate the user's location on the path). Many of the headstones are pretty similar looking if you're looking at just the headstone. Ahead of time I can walk that path and take multiple pictures of the objects from a variety of angles.
Is there an algorithm or library suited to this type of object detection problem?
This is something you might be looking for: http://3dar.us/
They have their own library where you can have something close to what you want (locations of objects, your location, the distance, etc.) Only caveat is that it's only for iPhone right now? Good luck in your search!
If the surrounding scenes are sufficiently different from each other you might be able to differentiate between scenes using a simple fast technique like histogram matching. This could be used to determine which scene you are in, and narrow the search set. If you can distinguish the scenes, you can then switch to an object-detection mode that looks for a specific object expected in a specific scene. I imagine if the object is static and well documented you might be able to search against a pre-compiled set of the most recognisable feature descriptors, determine relative pose from them etc. The approach of PTAMM is broadly analogous to this (determine the scene, load the scene's feature points, track in the current scene).
If your example (matching headstones) is what you're actually attempting, the problem becomes a lot more difficult (I assume, at least superficially most headstones and backgrounds will be very similar in things like geometry, colour, etc). The path constraint means you may be able to narrow your search set according to bearing (unless all the headstones are facing the same direction). After that you'll have to do the best you can with the remaining outstanding features (text?).
I'm trying to put a particle system together in Android, using OpenGL. I want a few thousand particles, most of which will probably be offscreen at any given time. They're fairly simple particles visually, and my world is 2D, but they will be moving, changing colour (not size - they're 2x2), and I need to be able to add and remove then.
I currently have an array which I iterate through, handling velocity changes, managing lifecyling (killing old ones, adding new ones), and plotting them, using glDrawArrays. What OpenGl is pointing at, though, for this call, is a single vertex; I glTranslatex it to the relevant co-ords for each particle I want to plot, one at a time, set the colour with glColor4x then glDrawArrays it. It works, but it's a bit slow and only works for a few hundred particles. I'm handling the clipping myself.
I've written a system to support static particles which I have loaded into a vertex/colourarray and plot using glDrawArrays, but this approach only seems suitable for particles which will never change relative location (ie I move all of them using glTranslate), colour and where I don't need to add/remove particles. A few tests on my phone (HTC Desire) suggest that trying to alter the contents of those arrays (which are ByteBuffers, pointed to by OpenGL) is extremely slow.
Perhaps there's some way of manually writing the screen myself with the CPU. If I'm just plotting 1x1/2x2 dots on the screen, and I'm purely interested in writing and not doing any blending/antialiasing, is this an option? Would it be quicker than whatever OpenGl is doing?
(200 or so particles on a 1ghz machine with megs of ram. This is way slower than I was getting 20 years ago on a 7mhz machine with <500k of ram! I appreciate I'm using Java here, but surely there must be a better solution. Do I have to use the NDK to get the power of C++, or is what I'm after possible)
I've been hoping somebody might answer this definitively, as I'll be needing particles on Android myself. (I'm working in C++, though -- Currently using glDrawArrays(), but haven't pushed particles to the limit yet.)
I found this thread on gamedev.stackexchange.com (not Android-specific), and nobody can agree on the best approach there, but you might want to try a few things out and see for yourself.
I was going to suggest glDrawArrays(GL_POINTS, ...) with glPointSize(), but the guy asking the question there seemed unhappy with it.
Let us know if you find a good solution!