I am implementing the Tensorflow object detection in on of my android app, I have followed the demo and tutorial of 'Tensorflow-for-poets' and successfully created a model with that.
I need one help with this,
I have this requirement of detecting the traffic signal, I have a dataset and I have created a model for that and in the general case, it works great.
What I want is to detect what color of the traffic signal is lightened, It means is it Green or Red?
I have added dataset which has two types of images of green and red lightened traffic signals but it is just detecting a traffic signal.
Can anyone help me with this or guide how can I achieve this?
Why would you like to detect the color of the traffic light?
IMHO it would be more robust to determine which of the lights is shining, e.g. create two classes "red traffic light" and "green traffic light" and train your model on them.
The Answer given in the post referenced in the comments by sladomic is not invariant against noise. Say, you have an Image in the late afernoon, when the sun sets, you likely will have a redish lightened environment. So determing the amount of red pixels within the bounding box of your detected traffic light may will fail, because the amount of red pixels caused by the environmend is larger than the amount of green pixels coming from the flash light.
Related
I'm trying to work out how to go about creating an on-the-fly simplification of incoming RGB values.
I'm trying to write an android app that utilizes a live camera view and sample colors. I've worked out how to detect and save individual color values, but my aim is to simplify these incoming values using clear ranges.
Example: When we detect Firebrick Red 178,34,34 it would recognize that value within a predefined range defined as Red and will be converted to a simple 255,0,0 upon saving the color.
The app is being put together in unity. If anyone has read a guide that goes over the process that would be ideal, so I can learn what is going on and how it is achieved. I'm stumped.
Thanks in advance for any help.
So the problem is its hard to define what "red" is. Its not just that different people have different definitions, different cultures also have an effect (some cultures don't consider red and yellow to be different colors. In fact at least one tribal culture still present today has no word for colors. See https://www.sapiens.org/language/color-perception/) on what we think colors are. So doing this is always a best effort type deal.
One simple thing you could do is just do a least difference algorithm. Have a set of color references, and see which one has the smallest delta between it and the color you're looking up. Then treat it as the reference color. That will work- kinda. If you have enough colors in your set to not get too far.
That will only kind of work though- rgb aren't actually equally distinct to the human eye, and some differences matter more than others- its non linear. A difference of 10 in green is more important than in red. A difference of 10 in the range [0,20] may be more or less stark than a difference of 10 in the range [100,120]. If you need this to work really well you may need to talk to someone who's studied color and how the human eye works to come up with a custom algorithm. Having worked on printers once upon a time, we had teams of experts figuring out the definition of digital colors to ink. Its much the same here.
I am using the grabcut algorithm of OpenCV for the background subtraction of an image in android. Algorithms runs fine but the result it gives is not accurate.
E.g. My input image is:
Output image look like:
so How can we increase accuracy of Grabcut Algorithm?
P.S: Apology for not uploading example images due to low reputation :(
I have been battling with the same problem for quite some time now. I have a few tips and tricks for this
1> Improve your seeds. Considering that GrabCut is basically a black box, to whom you give seeds and expect the segmented image as output, the seeds are all you can control and it becomes imperative to select good seeds. There are a number of things you can do in this regard if you have some expectation for the image you want to segment. For a few cases consider these:
a> Will your image have humans? Use a face detector to find the face and mark those pixels as Probable/definite foreground, as you deem fit. You could also use skin colour models within some region of interest to further refine your seeds
b> If you have some data on what kind of foreground you expect after segmentation, you can train colour models and use them as well to mark even more pixels
The list will go on. You need to creatively come up with different ways to adds more accurate seeds.
2> Post Processing: Try simple post processing techniques like the Opening and Closing operations to smoothen your fgmask. They will help you get rid of a lot of noise in the final output.
In general graphcut (and hence grabcut) tends to snap quickly to edges and hence if you have strong edges close to your foreground boundary, you can expect inaccuracies in the result.
Last week i have chosen my major project. It is a vision based system to monitor cyclists in time trial events passing certain points on the course. It should detect the bright yellow race number on a cyclist's back and extract the number from it, and besides record the time.
I done some research about it and i decided to use Tesseract Android Tools by Robert Theis called Tess Two. To speed up the process of recognizing the text i want to use a fact that the number is mend to be extracted from bright (yellow) rectangle on the cyclist back and to focus the actual OCR only on it. I have not found any piece of code or any ideas how to detect the geometric figures with specific color. Thank you for any help. And sorry if i made any mistakes I am pretty new on this website.
Where are the images coming from? I ask because I was asked to provide some technical help for the design of a similar application (we were working with footballer's shirts) and I can tell you that you'll have a few problems:
Use a high quality video feed rather than rely on a couple of digital camera images.
The number will almost certainly be 'curved' or distorted because of the movement of the rider and being able to use a series of images will sometimes allow you to work out what number it really is based on a series of 'false reads'
Train for the font you're using but also apply as much logic as you can (if the numbers are always two digits and never start with '9', use this information to help you get the right number
If you have the luxury of being able to position the camera (we didn't!), I would have thought your ideal spot would be above the rider and looking slightly forward so you can capture their back with the minimum of distortions.
We found that merging several still-frames from the video into one image gave us the best overall image of the number - however, the technology that was used for this was developed by a third-party and they do not want to release it, I'm afraid :(
Good luck!
I have an application where I want to track 2 objects at a time that are rather small in the picture.
This application should be running on Android and iPhone, so the algorithm should be efficient.
For my customer it is perfectly fine if we deliver some patterns along with the software that are attached to the objects to be tracked to have a well-recognizable target.
This means that I can make up a pattern on my own.
As I am not that much into image processing yet, I don't know which objects are easiest to recognize in a picture even if they are rather small.
Color is also possible, although processing several planes separately is not desired because of the generated overhead.
Thank you for any advice!!
Best,
guitarflow
If I get this straight, your object should:
Be printable on an A4
Be recognizeable up to 4 meters
Rotational invariance is not so important (I'm making the assumption that the user will hold the phone +/- upright)
I recommend printing a large checkboard and using a combination of color-matching and corner detection. Try different combinations to see what's faster and more robust at difference distances.
Color: if you only want to work on one channel, you can print in red/green/blue*, and then work only on that respective channel. This will already filter a lot and increase contrast "for free".
Otherwise, a histogram backprojection is in my experience quite fast. See here.
Also, let's say you have only 4 squares with RGB+black (see image), it would be easy to get all red contours, then check if it has the correct neighbouring colors: a patch of blue to it's right and a patch of green below it, both of roughly the same area. This alone might be robust enough, and is equivalent to working on 1 channel since for each step you're only accessing one specific channel (search for contours in red, check right in blue, check below in green).
If you're getting a lot of false-positives, you can then use corners to filter your hits. In the example image, you have 9 corners already, in fact even more if you separate channels, and if it isn't enough you can make a true checkerboard with several squares in order to have more corners. It will probably be sufficient to check how many corners are detected in the ROI in order to reject false-positives, otherwise you can also check that the spacing between detected corners in x and y direction is uniform (i.e. form a grid).
Corners: Detecting corners has been greatly explored and there are several methods here. I don't know how efficient each one is, but they are fast enough, and after you've reduced the ROIs based on color, this should not be an issue.
Perhaps the simplest is to simply erode/dilate with a cross to find corners. See here .
You'll want to first threshold the image to create a binary map, probably based on color as metnioned above.
Other corner detectors such as Harris detector are well documented.
Oh and I don't recommend using Haar-classifiers. Seems unnecessarily complicated and not so fast (though very robust for complex objects: i.e. if you can't use your own pattern), not to mention the huge amount of work for training.
Haar training is your friend mate.
This tutorial should get you started: http://note.sonots.com/SciSoftware/haartraining.html
Basically you train something called a classifier based on sample images (2000 or so of the object you want to track). OpenCV already has the tools required to build these classifiers and functions in the library to detect objects.
Is the technology there for the camera of a smartphone to detect a light flashing and to detect it as morse code, at a maximum of 100m?
There's already at least one app in the iPhone App store that does this for some unknown distance. And the camera can detect luminance at a much greater distance, given enough contrast of the exposure between on and off light levels, a slow enough dot rate to not alias against the frame rate (remember about Nyquist sampling), and maybe a tripod to keep the light centered on some small set of pixels. So the answer is probably yes.
I think it's possible in ideal conditions. Clear air and no other "light noise", like in a dark night in the mountain or so. The problem is that users would try to use it in the city, discos etc... where it would obviously fail.
If you can record a video of the light and easily visually decode it upon watching, then there's a fair chance you may be able to do so programmatically with enough work.
The first challenge would be finding the light in the background, especially if its small and/or there's any movement of the camera or source. You might actually be able to leverage some kinds of video compression technology to help filter out the movement.
The second question is if the phone has enough horsepower and your algorithm enough efficiency to decode it in real time. For a slow enough signaling rate, the answer would be yes.
Finally there might be things you could do to make it easier. For example, if you could get the source to flash at exactly half the camera frame rate when it is on instead of being steady on, it might be easier to identify since it would be in every other frame. You can't synchronize that exactly (unless both devices make good use of GPS time), but might get close enough to be of help.
Yes, the technology is definitely there. I written an Android application for my "Advanced Internet Technology" class, which does exactly what you describe.
The application has still problems with bright noise (when other light sources leave or enter the camera view while recording). The approach that I'm using just uses the overall brightness changes to extract the Morse signal.
There are some more or less complicated algorithms in place to correct the auto exposure problem (the image darkens shortly after the light is "turned on") and to detect the thresholds for the Morse signal strength and speed.
Overall performance of the application is good. I tested it during the night in the mountains and as long as the sending signal is strong enough, there is no problem. In the library (with different light-sources around), it was less accurate. I had to be careful not to have additional light-sources at the "edge" of the camera screen. The application required the length of a "short" Morse signal to be 300ms at least.
The better approach would be to "search" the screen for the actual light-source. For my project it turned out to be too much work, but you should get good detection in noisy environment with this.