sorry for my bad english. I have the following problem:
Lets say the camera of my mobile device is showing this picture.
In the picture you can see 4 different positions. Every position is known to me (longitude, latitude).
Now i want to know, where in the picture a specific position is. For example, i want to have a rectangle 20 meters in front and 5 meters to the left of me. I just know the latitude/longitude of this point, but i don't know, where i have to place it inside of the picture (x,y). For example, POS3 is at (0,400) in my view. POS4 is at (600,400) and so on.
Where do i have to put the new point, which is 20 meters in front and 5 meters to the left of me? (So my Input is: (LatXY,LonXY) and my result should be (x,y) on the screen)
I also got the height of the camera and the angles of x,y and z - axis from the camera.
Can i use simple mathematic operations to solve this problem?
Thank you very much!
The answer you want will depend on the accuracy of the result you need. As danaid pointed out, nonlinearity in the image sensor and other factors, such as atmospheric distortion, may induce errors, but would be difficult problems to solve with different cameras, etc., on different devices. So let's start by getting a reasonable approximation which can be tweaked as more accuracy is needed.
First, you may be able to ignore the directional information from the device, if you choose. If you have the five locations, (POS1 - POS4 and camera, in a consistent basis set of coordinates, you have all you need. In fact, you don't even need all those points.
A note on consistent coordinates. At his scale, once you use the convert the lat and long to meters, using cos(lat) for your scaling factor, you should be able to treat everyone from a "flat earth" perspective. You then just need to remember that the camera's x-y plane is roughly the global x-z plane.
Conceptual Background
The diagram below lays out the projection of the points onto the image plane. The dz used for perspective can be derived directly using the proportion of the distance in view between far points and near points, vs. their physical distance. In the simple case where the line POS1 to POS2 is parallel to the line POS3 to POS4, the perspective factor is just the ratio of the scaling of the two lines:
Scale (POS1, POS2) = pixel distance (pos1, pos2) / Physical distance (POS1, POS2)
Scale (POS3, POS4) = pixel distance (pos3, pos4) / Physical distance (POS3, POS4)
Perspective factor = Scale (POS3, POS4) / Scale (POS1, POS2)
So the perspective factor to apply to a vertex of your rect would be the proportion of the distance to the vertex between the lines. Simplifying:
Factor(rect) ~= [(Rect.z - (POS3, POS4).z / ((POS1, POS2).z - (POS3, POS4).z)] * Perspective factor.
Answer
A perspective transformation is linear with respect to the distance from the focal point in the direction of view. The diagram below is drawn with the X axis parallel to the image plane, and the Y axis pointing in the direction of view. In this coordinate system, for any point P and an image plane any distance from the origin, the projected point p has an X coordinate p.x which is proportional to P.x/P.y. These values can be linearly interpolated.
In the diagram, tp is the desired projection of the target point. to get tp.x, interpolate between, for example, pos1.x and pos3.x using adjustments for the distance, as follows:
tp.x = pos1.x + ((pos3.x-pos1.x)*((TP.x/TP.y)-(POS1.x/POS1.y))/((POS3.x/POS3.y)-(POS1.x/POS1.y))
The advantage of this approach is that it does not require any prior knowledge of the angle viewed by each pixel, and it will be relatively robust against reasonable errors in the location and orientation of the camera.
Further refinement
Using more data means being able to compensate for more errors. With multiple points in view, the camera location and orientation can be calibrated using the Tienstra method. A concise proof of this approach, (using barycentric coordinates), can be found here.
Since the transformation required are all linear based on homogeneous coordinates, you could apply barycentric coordinates to interpolate based on any three or more points, given their X,Y,Z,W coordinates in homogeneous 3-space and their (x,y) coordinates in image space. The closer the points are to the destination point, the less significant the nonlinearities are likely to be, so in your example, you would use POS 1 and POS3, since the rect is on the left, and POS2 or POS4 depending on the relative distance.
(Barycentric coordinates are likely most familiar as the method used to interpolate colors on a triangle (fragment) in 3D graphics.)
Edit: Barycentric coordinates still require the W homogeneous coordinate factor, which is another way of expressing the perspective correction for the distance from the focal point. See this article on GameDev for more details.
Two related SO questions: perspective correction of texture coordinates in 3d and Barycentric coordinates texture mapping.
I see a couple of problems.
The only real mistake is you're scaling your projection up by _canvasWidth/2 etc instead of translating that far from the principal point - add those value to the projected result, multiplication is like "zooming" that far into the projection.
Second, dealing in a global cartesian coordinate space is a bad idea. With the formulae you're using, the difference between (60.1234, 20.122) and (60.1235, 20.122) (i.e. a small, latitude difference) causes changes of similar magnitude in all 3 axes which doesn't feel right.
It's more straightforward to take the same approach as computer graphics: set your camera as the origin of your "camera space", and convert between world objects and camera space by getting the haversine distance (or similar) between your camera location and the location of the object. See here: http://www.movable-type.co.uk/scripts/latlong.html
Third your perspective projection calculations are for an ideal pinhole camera, which you probably do not have. It will only be a small correction, but to be accurate you need to figure out how to additionally apply the projection that corresponds to the intrinsic camera parameters of your camera. There are two ways to accomplish this: you can do it as a post multiplication to the scheme you already have, or you can change from multiplying by a 3x3 matrix to using a full 4x4 camera matrix:http://en.wikipedia.org/wiki/Camera_matrix with the parameters in there.
Using this approach the perspective projection is symmetric about the origin - if you don't check for z depth you'll project points behind you onto you screen as if they were the same z distance in front of you.
Then lastly I'm not sure about android APIs but make sure you're getting true north bearing and not magnetic north bearing. Some platform return either depending on an argument or configuration. (And your degrees are in radians if that's what the APIs want etc - silly things, but I've lost hours debugging less :) ).
If you know the points in the camera frame and the real world coordinates, some simple linear algebra will suffice. A package like OpenCV will have this type of functionality, or alternatively you can create the projection matrices yourself:
http://en.wikipedia.org/wiki/3D_projection
Once you have a set of points it is as simple as filling in a few vectors to solve the system of equations. This will give you a projection matrix. Once you have a projection matrix, you can assume the 4 points are planar. Multiply any 3D coordinate to find the corresponding 2D image plane coordinate.
Related
I have two 4x4 rotation matrices M and N. M is describing my current object attitude in space, and N is a desired object attitude. Now I would like to rotate M matrix towards N, so the object will slowly rotate towards desired position in following iterations. Any idea how to approach this?
If these matrices are not strange which should be the case describing "rotation matrices" you should do this by interpolating their base vectors in polar system.
To examine we want to convert top-left 3x3 matrix to 3 vectors defined by angles and distance. Once this is done you should do a linear interpolation on angles and distances for that top-left 3x3 part while the rest should have a direct cartesian interpolation. From angles and distances you can then convert back to cartesian coordinates.
Naturally there is still work internally like choosing which way to rotate (using closest) and checking there are no edge cases where one base vector rotates into different direction then the other...
I managed to successfully do this in 2D system which is a bit easier but should be no different in 3D.
To note a cartesian interpolation works fairly fine as long as angles are relatively small (<10 degrees to guess) which is most likely not your case at all.
I'm working on an android application which can track movements. My problem is that I need to convert gps coordinates to 2-d coordinate on a plane.
The coordinates which I obtain now are in WSG84 format. I need those coordinates because a I need to compute the distance between a point and a line in order to understand if I'm getting too far from a pre-defined path. The area in which I'm working on is not big in respect to the whole earth, so I think it's okay to not care about z axis.
I have no map, so I just need to compute these coordinates in background. Thanks!
This:
I need to compute the distance between a point and a line in order to understand if I'm getting too far from a pre-defined path
seems to be the key question you are posing. So that is what I will answer.
In a strict sense, converting from spherical to euclidean coordinates is not possible, which is why people invented to the UTM system among many others. The problem with using UTM for your purpose is that a given lat/lon may be in different regions, so the (x,y) pairs will not correlate usefully.
Assuming you have three points A, B and P, where A & B define the start and end of your line, and P is the point you want to know about, then my best suggestion is to:
calculate the great circle distance (d) from P to B
calculate the bearing from the P to B (theta)
calculate the bearing from A to B (alpha)
If the points are actually very close to each other then the space you are working in is locally euclidean, so the distance from the line from A to B is:
d sin(alpha - theta)
There are numerous online references on calculating great circle distance and bearing in Java for WGS84. eg.
http://openmap.bbn.com/svn/openmap/trunk/src/openmap/com/bbn/openmap/proj/GreatCircle.java
...is just one of many.
In OpenCV I use the camera to capture a scene containing two squares a and b, both at the same distance from the camera, whose known real sizes are, say, 10cm and 30cm respectively. I find the pixel widths of each square, which let's say are 25 and 40 pixels (to get the 'pixel-width' OpenCV detects the squares as cv::Rect objects and I read their width field).
Now I remove square a from the scene and change the distance from the camera to square b. The program gets the width of square b now, which let's say is 80. Is there an equation, using the configuration of the camera (resolution, dpi?) which I can use to work out what the corresponding pixel width of square a would be if it were placed back in the scene at the same distance as square b?
The math you need for your problem can be found in chapter 9 of "Multiple View Geometry in Computer Vision", which happens to be freely available online: https://www.robots.ox.ac.uk/~vgg/hzbook/hzbook2/HZepipolar.pdf.
The short answer to your problem is:
No not in this exact format. Given you are working in a 3D world, you have one degree of freedom left. As a result you need to get more information in order to eliminate this degree of freedom (e.g. by knowing the depth and/or the relation of the two squares with respect to each other, the movement of the camera...). This mainly depends on your specific situation. Anyhow, reading and understanding chapter 9 of the book should help you out here.
PS: to me it seems like your problem fits into the broader category of "baseline matching" problems. Reading around about this, in addition to epipolar geometry and the fundamental matrix, might help you out.
Since you write of "squares" with just a "width" in the image (as opposed to "trapezoids" with some wonky vertex coordinates) I assume that you are considering an ideal pinhole camera and ignoring any perspective distortion/foreshortening - i.e. there is no lens distortion and your planar objects are exactly parallel to the image/sensor plane.
Then it is a very simple 2D projective geometry problem, and no separate knowledge of the camera geometry is needed. Just write down the projection equations in the first situation: you have 4 unknowns (the camera focal length, the common depth of the squares, the horizontal positions of their left sides (say), and 4 equations (the projections of each of the left and right sides of the squares). Solve the system and keep the focal length and the relative distance between the squares. Do the same in the second image, but now with known focal length, and compute the new depth and horizontal location of square b. Then add the previously computed relative distance to find where square a would be.
In order to understand the transformations performed by the camera to project the 3D world in the 2D image you need to know its calibration parameters. These are basically divided into two sets:
Intrensic parameters: These are fixed parameters that are specific for each camera. They are normally represented by a Matrix called k.
Extrensic parameters: These depend on the camera position in the 3D world. Normally they are represented by two matrices: R and T where the first one represents the rotation and the second one represents the translation
In order to calibrate a camera your need some pattern (basically a set of 3D points which coordinates are known). There are several examples for this in OpenCV library which provides support to perform the camera calibration:
http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
Once you have your camera calibrated you can transform from 3D to 2D easily by the following equation:
Pimage = K · R · T · P3D
So it will not only depend on the position of the camera but it depends on all the calibration parameters. The following presentation go through the camera calibration details and the different steps and equations that are used during the 3D <-> Image transformations.
https://www.cs.umd.edu/class/fall2013/cmsc426/lectures/camera-calibration.pdf
With this in mind you can project whatever 3D point to the image and get its coordinate on it. The reverse transformation is not unique since going back from 2D to 3D will give you a line instead of a unique point.
I have a a drawing program where a user can trace with their finger, and in a manner similar to the FingerPaint program, a series of Path's are drawn to represent the lines.
Now, I am doing some collision detection to allow the user to enter an 'erase' mode and delete selected lines, and am trying to determine how to track the individual pixels of the Path. Essentially, I am tracking the RectF that encompasses the Path, and if the RectF is intersected when in erase mode, I'd like to then do pixel-by-pixel intersection tests. So, I need to create some structure for storing the pixels, likely a two dimensional array where each element will be a 1 or 0, based on whether or not the underlying pixel is occupied by the drawn Path.
It is this last part that I am struggling with. While the user is drawing the line, I am feeding the passed X/Y values in as control points for a quadratic bezier curve via Path.quadTo(). The problem is that while Path uses these points to represent a continuous line, I am only being fed discontinous X/Y points from the touch device. Essentially, I need a way to duplicate what the Path object itself is doing, and take the passed X/Y points and interpolate that into a continous curve, but as a set of X/Y coordinates rather than a Path object...
Any pointers to get started on this?
Thanks
EDIT/MORE:
Ok, so as I mentioned, each Path is created (roughly) using the method found in FingerPaint, which means that it is a series of segments, where each segment is a quadratic bezier curve. Given that I know P0, P1 and P2 when I add these curved segments to the larger Path, I can determine the X/Y coordinates along the curve with:
So, my only problem now is determining a 'continous' set of adjacent X/Y coordinates, such that there are no gaps in this set that a user's finger might pass through without hitting one. This would mean determining each X/Y point at 1 pixel intervals. As the above formula would yield points at an infinite number of intervals, given the values of T ranging from 0 through 1, any idea how to programmatically determine the right values of T that will yield points at 1 pixel intervals?
I would do all collision detection using not curves or pixels, but just lines, it's much easier to find intersecting lines, i.e. the intersection of two sequential x/y coordinates of the user's swipe, and the lines of existing lines
When I listen to orientation event in an android app, I get a SensorEvent, which contains 3 floats - azimuth, pitch, and roll in relation to the real-world's axis.
Now say I am building an app like labyrinth, but I don't want to force the user the be over the phone and hold the phone such that the xy plane is parallel to the ground. Instead I want to be able to allow the user to hold the phone as they wish, laying down or, perhaps, sitting down and holding the phone at an angle. In other words, I need to calibrate the phone in accordance with the user's preference.
How can I do that?
Also note that I believe that my answer has to do with getRotationMatrix and getOrientation, but I am not sure how!
Please help! I've been stuck at this for hours.
For a Labyrinth style app, you probably care more for the acceleration (gravity) vector than the axes orientation. This vector, in Phone coordinate system, is given by the combination of the three accelerometers measurements, rather than the rotation angles. Specifically, only the x and y readings should affect the ball's motion.
If you do actually need the orientation, then the 3 angular readings represent the 3 Euler angles. However, I suspect you probably don't really need the angles themselves, but rather the rotation matrix R, which is returned by the getRotationMatrix() API. Once you have this matrix, then it is basically the calibration that you are looking for. When you want to transform a vector in world coordinates to your device coordinates, you should multiply it by the inverse of this matrix (where in this special case, inv(R) = transpose(R).
So, following the example I found in the documentation, if you want to transform the world gravity vector g ([0 0 g]) to the device coordinates, multiply it by inv(R):
g = inv(R) * g
(note that this should give you the same result as reading the accelerometers)
Possible APIs to use here: invertM() and multiplyMV() methods of the matrix class.
I don't know of any android-specific APIs, but all you want to do is decrease the azimuth by a certain amount, right? So you move the "origin" from (0,0,0) to whatever they want. In pseudocode:
myGetRotationMatrix:
return getRotationMatrix() - origin