I would like to detect some nutrition facts on food package with an Android Application, with OpenCV.
So far I managed to do it with one image of a nutrition table, but of course it only works with this one.
The goal is to detect and retrieve the value of Energy, Proteines, and Glucides, for 100g of product. These informations are present in almost every table, that is why I focus only on them for the moment.
So I was wondering if there a good method to do so ? For the moment, I try to detect each block of text, recognise it with Tesseract, and if it fits the word I'm looking for, I get the corresponding column and line in the picture, to finally get the value I want.
Is there any way to track the words straightly, and get the value that fits best in the image (in terms of alignement with the "100g" column).
Typical image : hpics.li/4231f79
Sorry if my problem is not well explained, just ask if something is not clearor if you want me to explain more what I've done for the moment.. Also sorry for my english
Cheers
Just a few ideas:
1. Convert image to HSV color space and look only for black and white regions (using inRange function). Blobs which contains only those 2 colors probably will be your informations (but unfortunetely some other things too - barcode, maybe some drawing or logo).
2 You regions should be rectangles, so if the blob is not rectangle - discard it.
3. If founded rectangle is rotated use affineTransform function to align it vertically - here i've explained how to do it. Note that rectangle width and height should stay the same.
4. After using affine transform your rectangle might be rotated by 90, 180 or 270 degrees. In the example you provided on the top there is a black region - it it's true for all you images than finding top is quite easy - just find black rectangle within you region. In other case finding top might be harder - a quick idea, which might be worth testing is to look for black pixels in each white rectangle. In most cases there are aligned to center (not interesting case for us) or to the left - if you find left sied of rectangle, finding top is obvious :) Alternatively you may look for characters which are always on the right side - %, g and mg
If you will have any problems, give us more examples and describe what you have done already - right know it's hard to tell something more.
Related
Let's say for example I have a bitmap image of a tree, and I want to position other images (such as bitmaps of apples) on the tree leaves. Is there a way that I could put markers on the leaves... red dots for instance... and then and then programmatically place apple images centered on those dots?
As a very basic test, I have image with a white background with one red pixel in the center. I'd like to calculate the coordinates of this red point, and then set an ImageView to be placed on those coordinates.
How might I go about this?
It depends, where your 'red point' marker is. If it's in the center or in any specific point (like 2/3 of width, 1/3 of height), you can just divide layout width and height to get right coordinates.
In other cases it would be better to set white background and draw markers manually in overriden dispatchDraw method. In such case you would just know the coordinates of the marker.
You want to position an image over the red dot, right?
I'm thinking of two different ways:
A-> You could make the red dot to be an ImageView itself, and then centering it by using gravity in order to transform it into another kind of image.
Or...
B-> Make a container that uses the white background with red dot as background resource. Then center it by using gravity too, and finally, positioning your image to the center of the container so it will be over the red dot.
No calculation is needed if you thing this could help.
It sounds like you are the one putting the markers onto your bitmaps.
If that is the case, is there a really good reason why you would want to be trying to embed the markers as data in the bitmap itself? That leads you to the problem of having to rediscover the locations. This could be a fuzzy task...what if there is a red barn next to the tree? Are you going to put an apple image on every red pixel making up the barn?
What you might actually want is to define a format which has a bitmap with no markers on it, and then a separate list of coordinates for where you want the apples to go. That doesn't require discovery of any kind...you just ship the image along with the list and you are done.
There are some cases where there is no "place on the side" that you can put information, and you actually need it to go into the bitmap file. If so, consider also that there are some hidden places you can put data in bitmaps... metadata like Exif:
http://en.wikipedia.org/wiki/Exchangeable_image_file_format
So that's a middle-ground, where you can manage to get the list of points to "stow away" into the file containing the image without actually requiring the modification of the pixels.
If you find you are really stuck in a situation where you must put these coordinate specifications into the image data, then something a little bit more unique than a red dot would be easier to detect with certainty. Maybe there's something you know about your images... for instance, that they are PNG files and do not have any transparency. You could make transparent dots indicate substitution points.
The larger and weirder the pattern, the more rare it is...so if you know your objects being pasted are always going to be bigger than 3x3 you could come up with a very unusual 3x3 pixel imprint for your markers that would be unlikely to occur in nature. Uncompressed in 24-bit color, a sufficiently random pattern would only happen 1/(2^24^9) by accident. Small number; although compression would create more gray areas.
But greater point being: if you don't have a good reason to turn a simple problem into a complex image-recognition exercise, don't. Just keep the list of points on the side somewhere so you don't have to hunt for them in the image.
Okay so I have these images:
Basically what I'm trying to do is to create a "mosaic" of about 5 to 12 hexagons, with most of them roughly centralised, and where all of the lines meet up.
For example:
I'm aware that I could probably just brute-force it, but as I'm developing for Android I need a faster, more efficient and less processor-intensive way of doing it.
Can anybody provide me with a solution, or even just point me in the right direction?
A random idea that I had is to go with what Deepak said about defining a class that tracks the state of each of its six edges (say, in an int[] neighbor in which neighbor[0] states if top edge has neighbor, neighbor[1] states if top-right edge has neighbor, and so on going clockwise)
Then for each hexagon on screen, convert its array to an integer via binary. Based on that integer, use a lookup table to determine which hexagon image to use + how it should be oriented/flipped, then assign that hexagon object to that image.
For instance, let's take the central hexagon with four neighbors in your first screenshot. Its array would be [1, 0, 1, 1, 0, 1] based on the scheme mentioned above. Take neighbor[0] to be the least-significant bit (2^0) and neighbor[5] to be the most-significant bit (2^5), and we have [1, 0, 1, 1, 0, 1] --> 45. Somewhere in a lookup table we would have already defined 45 to mean the 5th hexagon image, flipped horizontally*, among the seven base hexagon icons you've posted.
Yes, brute-force is involved, but it's a "smarter" brute-force since you're not rotating to see if a hexagon will fit. Rather, it involves a more efficient look-up table.
*or rotated 120 degrees clockwise if you prefer ;)
Nice and tricky question. What you can start with is define object for each image which has attributes that specify which edge has a line attached to it. Then while adding the images in the layout you can rotate it in such a way that the edge with line in one image lies adjacent to the other image's edge with line. It may be little complicated but I hope you can at least start with something like this.
I want to recognize shapes like a circle,triangle and rectangle which is drawn on screen.My main aim is a user draws a shape on screen and I need a code to recognize this shape.How should i approach this problem?
What you are trying to achieve can be quite tricky, but I happened to implement something similar a while ago, and here is the approach that I used:
stick to black & white drawings
have a smallish database of (black & white) drawings (50 or so) with a fixed resolution, let's say 256x256 (you can store them in sqlite as binary blobs if you wish). Make sure that you use decently thick lines for these drawings (10 px should be OK, or something about twice as thick as the user's input drawing). Also, the drawings should be normalized, meaning that they must have at least one of their dimensions as large as the image itself.
extract the shape drawn by the user and process it:
a) if it has an aspect ratio close to a square, then simply crop the white space around it and enlarge it such that it has the same size as your database images
b) Otherwise, it will most likely have one dimension about two times larger than the other one, in which case you crop the white space, rotate it to have the height as it's biggest dimension, enlarge it to 256x128 and then add on both sides 64 px of white space.
you'll have to compare your drawing with each of your database images pixel by pixel and determine the amount of black pixels which overlap for each database image. Then you sort these numbers and you'll get the best match. Even if the best match has less than 20% overlapping pixels, the results are usually good.
Because some shapes can be considered the same, even if they are rotated (imagine various ways to place a triangle in an image: one tip pointing up, or down, or towards one side etc), you'll probably want to rotate your input drawing around 12 - 24 times (by 15 - 30 degrees at each step) and compare each rotation to every image in your database. Given that this step will most likely require a lot of processing power, you might consider storing all the rotations of your initial database drawings in the database, as different pictures, thus making the database bigger, but saving you the effort of rotating the input image, which is costly.
Given that the above algorithm is a bit of a resource hog, you might consider having a server somewhere, which can do the actual comparisons, especially if you want to add many images to your database. Since I already implemented this algorithm for a demo application, I can already tell you that you're going to have to do a lot of pixel operations. Also, rotating images with the Android SDK can be annoying, because it changes the image dimensions...
If you are feeling adventurous, here are a couple of papers describing state of the art algorithms for tackling this problem: "Shape contexts enable efficient retrieval of similar shapes" by Greg Mori, Serge Belongie and Jitendra Malik (2001) and "Shape Matching: Similarity Measures and Algorithms" by Remco C. Veltkamp (2001). The maths might be a bit heavy, though.
You should look into GestureOverlayView.
A good tutorial is: http://www.vogella.com/articles/AndroidGestures/article.html
I'm making an android app that takes an image of a billiards game in progress and detects the positions of the various balls. The image is taken from someone's phone, so of course I don't have a perfect overhead view of the table. Right now I'm using houghcircles to find the balls, and it's doing an ok job, but it seems to miss a few balls here and there, and then there are the false positives.
My biggest problem right now is, how do I cut down on the false positives found outside the table? I'm using an ROI to cut off the top portion of the image because it's mostly wasted space, but I can't make it any smaller or I risk cutting off portions of the table since it's a trapezoidal shape. My current idea is to overlay the guide that the user sees when taking the picture on top of the image, but the problem with that is that I don't know what the resolution of the their cameras would be, and therefore the overlay might cover up the wrong spots. Ideally I think I would want to use houghlines but when I tried it my app crashed from what I believe was a lack of memory. Any ideas?
Here is a link to the results I'm getting:
http://graphiquest.com/cvhoughcircles.html
Here is my code:
IplImage img = cvLoadImage("/sdcard/DCIM/test/picture"+i+".jpg",1);
IplImage gray = opencv_core.cvCreateImage( opencv_core.cvSize( img.width(), img.height() ), opencv_core.IPL_DEPTH_8U, 1);
cvCvtColor(img, gray, opencv_imgproc.CV_RGB2GRAY );
cvSetImageROI(gray, cvRect(0, (int)(img.height()*.15), (int)img.width(), (int)(img.height()-(img.height()*.20))));
cvSmooth(gray,gray,opencv_imgproc.CV_GAUSSIAN,9,9,2,2);
Pointer circles = CvMemStorage.create();
CvSeq seq = cvHoughCircles(gray, circles, CV_HOUGH_GRADIENT, 2.5d, (double)gray.height()/30, 70d, 100d, 0, 80);
for(int j=0; j<seq.total(); j++){
CvPoint3D32f point = new CvPoint3D32f(cvGetSeqElem(seq, j));
float xyr[] = {point.x(),point.y(),point.z()};
CvPoint center = new CvPoint(Math.round(xyr[0]), Math.round(xyr[1]));
int radius = Math.round(xyr[2]);
cvCircle(gray, center, 3, CvScalar.GREEN, -1, 8, 0);
cvCircle(gray, center, radius, CvScalar.BLUE, 3, 8, 0);
}
String path = "/sdcard/DCIM/test/";
File photo=new File(path, "picture"+i+"_2.jpg");
if (photo.exists())
{
photo.delete();
}
cvSaveImage("/sdcard/DCIM/test/picture"+i+"_2.jpg", gray);
There are some very helpful constraints you could apply. In addition to doing a rectangular region of interest, you should mask your results with the actual trapezoidal shape of the pool table. Use the color information of the image to find the pool table region. You know that the pool table is a solid color. It doesn't have to be green - you can use some histogram techniques in HSV color space to find the most prevalent color in the image, perhaps favoring pixels toward the center. It's very likely to detect the color of the pool table. Select pixels matching this color, perform morphological operations to remove noise, and then you can treat the mask as a contour, and find its convexHull. Fill the hull to remove the holes created by the pool balls.
What I've said so far should suggest a different approach than Hough circles. Hough circles is probably not working too well since the billiard balls are not evenly illuminated. So, another way to find billiard balls is to subtract the pool table color mask from its convexHull. You'll be left with the areas of the table that are obscured by balls.
I've thought about working on this problem, too, since I play pool and snooker.
A few points:
Judging from the Hough circle fits, it looks like you're not filtering the edge points, or your threshold for edge strength isn't high enough. Are you simply using a binary indicator for edge points, or are you selecting edge points based on edge strength?
Can you work in RGB space? That'd help with detecting the table bed, the rails, and also in identifying the balls. A blue blob on the table bed could be the 2-ball, the 10-ball, or maybe a hunk of chalk.
In your parameter space, you should be able to limit the search for circles of a very limited radius. This would be helped in part if...
Detect the table surface and the rails. A Stroke Width Transform could help you find the rails, especially if you search in a color plane (green) in which the rails will have high contrast. You can also use the six pockets (or at least three pockets) to help identify the pose (position and orientation) of the table.
Once the rails are detected, you can use an affine transform to correct for perspective distortion. You'll need to do this anyway to place the balls with any sort of accuracy, especially if you want the ball placement to satisfy a serious pool player such as someone who plays One Pocket or Straight Pool. Once you have the affine transform, you can set fairly tight tolerances for radius in your Hough parameter space.
Once you've detected the table bed, you could perform an initial segmentation (that is, region labeling or blob finding) and search only for blobs of a certain area and roundness.
A strong, even, diffuse overhead light could help eliminate shadows.
You can help filter edge points by accepting (or at least favoring) edge points that have gradients that are pointed towards other edge points with parallel gradients. If a local collection of edge point pairs "point" at each other via their edge gradients, then they are good candidates for detection.
Once you've detected a candidate ball, perform further processing to accept/reject. A ball should be a relatively uniform hue (cue ball, 1 - 8, or a stripe viewed from the proper angle), or it should have a detectable color stripe and white. The ball surface will not be highly textured like the wood grain of the table.
Have an option that the user take two pictures from slightly different angles. You then have two chances to find balls, and could conceivably solve the correspondence problem of matching the tables and balls in the two images to help locate the balls in the 2D space of the table bed.
Consider having a second algorithm such as normalized cross-correlation (simple template matching) to help identify balls or at least likely ball locations.
Insist that the center point of the image be located somewhere within the table bed. This can help you identifying the positions of the rails since you can then search radially outward for the edges of the rails, and once four (or even just three) rails are found you can reject edge points at radial distances beyond them.
Good luck! It's a fun problem.
EDIT:
I was reading another StackOverflow post and read about this paper. The paper which will give you a much more thorough introduction to the technique I suggested to filter edge points (item 8).
"Fast Circle Detection Using Gradient Pair Vectors" by Rad, Faez, and Qaragozlou
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.9956
I haven't implemented their algorithm myself yet, but it looks promising. Here's the post where the paper was mentioned:
Three Dimensional Hough Space
So what I want to do is write an application that, at least in the future, could be ported to mobile platforms(such as android) that can scan an image of a protein gel and return data such as the number of bands(ie weights) in a column, relative concentration(thickness of the band), and the weights of each in each column.
For those who aren't familiar, mixtures of denatured proteins(basically, molecules made completed straight) are loaded into each column, and with the use of electricity the proteins are pulled through a gel(because the proteins are polar molecules). The end columns of each side of this image http://i52.tinypic.com/205cyrl.gif are where you place a mixture of proteins of known weights(so if you have 4 different weights, the band on top is the largest weight, and the weight/size of the protein decreases the further it travels down). Is something like this possible to analyze using OpenCV? The given image is a really clean looking gel, they can often get really messy(see google images). I figured if I allowed a user to enter the number of columns, which columns contain known weight markers and their actual weights, as well as provide an adjustable rectangle to size around the edges of the gel, that maybe it would be possible to scan and extract data from the images of these gels? I skimmed through a textbook on OpenCV but I didn't see any obvious and reliable way I could approach this. Any ideas? Maybe a different library would be better suited?
I believe you can do this using OpenCV
My approach would be a color based separation. And then counting the separate different components.
In big steps your app would do the following steps:
Load the image, rotate it scale manually through the GUI of your app, to match your needs
Create a second grayscale image in which each pixel contains a value between [0,255], that represents how good the color of the original point matches the target color (in the case of this image the shade of blue)
In one of my experiments I've used the concept of fuzzy sets and alpha cuts to extract objects of a certain color. The triangular membership function gave me pretty good results. This simply meant that I've defined triangular functions for all three color channels RGB, and summed their result for each color given as input. If the values of the color were close to the centers of the triangles, then I had a strong similarity. Plus, by controlling the width of the triangles you can define the tolerance of the matches. (another option would be to use trapezoidal membership functions)
At this point you have a grayscale image, where the background (gel) is black and the proteins are gray/white. If you wish to clear up some noise use the morphological operators (page 127) erode and dilate (cvErode and cvDelate in openCV).
After it, can use this great openCV based blob extraction library to extract the bounding boxes of the remaining gray areas - representing the proteins
Having all the coordinates of the bounding boxes you can apply your own algorithms, to extract whatever data you wish
In my opinion OpenCV gives you all the necesarry tools. However a fully automated solution might be hard to obtain. But I'm sure you can easily build a GUI where you can set the parameters of the operators you apply during the above described steps
As for Android: I didn't develop for mobile platforms, but I know that you can create C++ apps for these devices - have read several questions regarding iPhone & openCV -, so I think that your app would be portable, or at least the image processing part of it (the GUI might be too platform specific).