I was inspired by Pokemon GO and wanted to make a simple prototype for learning purposes. I am a total beginner in image processing.
I did a little research on the subject. Here is what I came up with. In order to place any 3D model in the real world, I must know the orientation. Say if I am placing a cube on a table:
1) I need to know the angles $\theta$, $\phi$ and $\alpha$ where $\theta$ is the rotation along the global UP vector, $\phi$ is the rotation along the camera's FORWARD vector and $\alpha$ is rotation along camera's RIGHT vector.
2) Then I have to multiply these three rotation matrices with the object's transform using these Euler angles.
3) The object's position would be at the center point of the surface for protyping.
4) I can find the distance of the surface using android's camera's inbuilt distance estimation using focal length. Then I can scale the object accordingly.
Is there any more straight forward way to do this using OpenCV or am I going in the right track?
Related
I have two 4x4 rotation matrices M and N. M is describing my current object attitude in space, and N is a desired object attitude. Now I would like to rotate M matrix towards N, so the object will slowly rotate towards desired position in following iterations. Any idea how to approach this?
If these matrices are not strange which should be the case describing "rotation matrices" you should do this by interpolating their base vectors in polar system.
To examine we want to convert top-left 3x3 matrix to 3 vectors defined by angles and distance. Once this is done you should do a linear interpolation on angles and distances for that top-left 3x3 part while the rest should have a direct cartesian interpolation. From angles and distances you can then convert back to cartesian coordinates.
Naturally there is still work internally like choosing which way to rotate (using closest) and checking there are no edge cases where one base vector rotates into different direction then the other...
I managed to successfully do this in 2D system which is a bit easier but should be no different in 3D.
To note a cartesian interpolation works fairly fine as long as angles are relatively small (<10 degrees to guess) which is most likely not your case at all.
I wonder how to solve this problem using opengl on android. I have camera with quaternion, which describes its rotation. I want to have an object f.e. Cube will be in the same screen relative position. So, i want to rotate it and translate around camera position using this quaternion. It is something similiar to crosshair in games, but I dont want to use this for overlay, but for drag drop feature for 3D objects.
You are trying to find a point in the direction represented by a quaternion at a given distance. For this, you cannot use quaternion directly like you do for vectors.
You can convert quaternion to matrix and then do the same logic as below.
glm::mat4 rotate = glm::mat4_cast(rotationQuat);
glm::vec4 newPosition = rotate * vector;
I am working on an AR app that needs to move an image depending on device's position and orientation.
It seems that Game Rotation Vector should provide the necessary data to achieve this.
However I cant seem to understand what the values that I get from GRV sensor show. For instance in order to reach the same value on the Z axis I have to rotate the device 720 degrees. This seems odd.
If I could somehow convert these numbers to angles from the reference frame of the device towards the x,y,z coordinates my problem would be solved.
I have googled this issue for days and didn't find any sensible information on the meaning of GRV coordinates, and how to use them.
TL:DR What do the numbers of the GRV sensor show? And how to convert them to angles?
As the docs state, the GRV sensor gives back a 3D rotation vector. This is represented as three component numbers which make this up, given by:
x axis (x * sin(θ/2))
y axis (y * sin(θ/2))
z axis (z * sin(θ/2))
This is confusing however. Each component is a rotation around that axis, so each angle (θ which is pronounced theta) is actually a different angle, which isn't clear at all.
Note also that when working with angles, especially in 3D, we generally use radians, not degrees, so theta is in radians. This looks like a good introductory explanation.
But the reason why it's given to us in the format is that it can easily be used in matrix rotations, especially as a quaternion. In fact, these are the first three components of a quaternion, the components which specify rotation. The 4th component specifies magnitude, i.e. how far away from the origin (0, 0) a point it. So a quaternion turns general rotation information into an actual point in space.
These are directly usable in OpenGL which is the Android (and the rest of the world's) 3D library of choice. Check this tutorial out for some OpenGL rotations info, this one for some general quaternion theory as applied to 3D programming in general, and this example by Google for Android which shows exactly how to use this information directly.
If you read the articles, you can see why you get it in this form and why it's called Game Rotation Vector - it's what's been used by 3D programmers for games for decades at this point.
TLDR; This example is excellent.
Edit - How to use this to show a 2D image which is rotated by this vector in 3D space.
In the example above, SensorManage.getRotationMatrixFromVector converts the Game Rotation Vector into a rotation matrix which can be applied to rotate anything in 3D. To apply this rotation a 2D image, you have to think of the image in 3D, so it's actually a segment of a plane, like a sheet of paper. So you'd map your image, which in the jargon is called a texture, onto this plane segment.
Here is a tutorial on texturing cubes in OpenGL for Android with example code and an in depth discussion. From cubes it's a short step to a plane segment - it's just one face of a cube! In fact that's a good resource for getting to grips with OpenGL on Android, I'd recommend reading the previous and subsequent tutorial steps too.
As you mentioned translation also. Look at the onDrawFrame method in the Google code example. Note that there is a translation using gl.glTranslatef and then a rotation using gl.glMultMatrixf. This is how you translate and rotate.
It matters the order in which these operations are applied. Here's a fun way to experiment with that, check out Livecodelab, a live 3D sketch coding environment which runs inside your browser. In particular this tutorial encourages reflection on the ordering of operations. Obviously the command move is a translation.
In OpenCV I use the camera to capture a scene containing two squares a and b, both at the same distance from the camera, whose known real sizes are, say, 10cm and 30cm respectively. I find the pixel widths of each square, which let's say are 25 and 40 pixels (to get the 'pixel-width' OpenCV detects the squares as cv::Rect objects and I read their width field).
Now I remove square a from the scene and change the distance from the camera to square b. The program gets the width of square b now, which let's say is 80. Is there an equation, using the configuration of the camera (resolution, dpi?) which I can use to work out what the corresponding pixel width of square a would be if it were placed back in the scene at the same distance as square b?
The math you need for your problem can be found in chapter 9 of "Multiple View Geometry in Computer Vision", which happens to be freely available online: https://www.robots.ox.ac.uk/~vgg/hzbook/hzbook2/HZepipolar.pdf.
The short answer to your problem is:
No not in this exact format. Given you are working in a 3D world, you have one degree of freedom left. As a result you need to get more information in order to eliminate this degree of freedom (e.g. by knowing the depth and/or the relation of the two squares with respect to each other, the movement of the camera...). This mainly depends on your specific situation. Anyhow, reading and understanding chapter 9 of the book should help you out here.
PS: to me it seems like your problem fits into the broader category of "baseline matching" problems. Reading around about this, in addition to epipolar geometry and the fundamental matrix, might help you out.
Since you write of "squares" with just a "width" in the image (as opposed to "trapezoids" with some wonky vertex coordinates) I assume that you are considering an ideal pinhole camera and ignoring any perspective distortion/foreshortening - i.e. there is no lens distortion and your planar objects are exactly parallel to the image/sensor plane.
Then it is a very simple 2D projective geometry problem, and no separate knowledge of the camera geometry is needed. Just write down the projection equations in the first situation: you have 4 unknowns (the camera focal length, the common depth of the squares, the horizontal positions of their left sides (say), and 4 equations (the projections of each of the left and right sides of the squares). Solve the system and keep the focal length and the relative distance between the squares. Do the same in the second image, but now with known focal length, and compute the new depth and horizontal location of square b. Then add the previously computed relative distance to find where square a would be.
In order to understand the transformations performed by the camera to project the 3D world in the 2D image you need to know its calibration parameters. These are basically divided into two sets:
Intrensic parameters: These are fixed parameters that are specific for each camera. They are normally represented by a Matrix called k.
Extrensic parameters: These depend on the camera position in the 3D world. Normally they are represented by two matrices: R and T where the first one represents the rotation and the second one represents the translation
In order to calibrate a camera your need some pattern (basically a set of 3D points which coordinates are known). There are several examples for this in OpenCV library which provides support to perform the camera calibration:
http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
Once you have your camera calibrated you can transform from 3D to 2D easily by the following equation:
Pimage = K · R · T · P3D
So it will not only depend on the position of the camera but it depends on all the calibration parameters. The following presentation go through the camera calibration details and the different steps and equations that are used during the 3D <-> Image transformations.
https://www.cs.umd.edu/class/fall2013/cmsc426/lectures/camera-calibration.pdf
With this in mind you can project whatever 3D point to the image and get its coordinate on it. The reverse transformation is not unique since going back from 2D to 3D will give you a line instead of a unique point.
I am having trouble rotating my 3D objects in Open GL. I start each draw frame by loading the identity (glLoadIdentity()) and then I push and pop on the stack according to what I need (for the camera, etc). I then want 3D objects to be able to roll, pitch and yaw and then have them displayed correctly.
Here is the catch... I want to be able to do incremental rotations as if I was flying an airplane. So every time the up button is pushed the object rotates around it's own x axis. But then if the object is pitched down and chooses to yaw, the rotation should then be around the object's up vector and not the Y axis.
I've tried doing the following:
glRotatef(pitchTotal, 1,0,0);
glRotatef(yawTotal, 0,1,0);
glRotate(rollTotal, 0,0,1);
and those don't seem to work. (Keeping in mind that the vectors are being computed correctly)I've also tried...
glRotatef(pitchTotal, 1,0,0);
glRotatef(yawTotal, 0,1,0);
glRotate(rollTotal, 0,0,1);
and I still get weird rotations.
Long story short... What is the proper way to rotate a 3D object in Open GL using the object's look, right and up vector?
You need to do the yaw rotation around (around Y) before you do the pitch one. Otherwise, the pitch will be off.
E.g. you have a 45 degrees downward pitch and a 180 degrees yaw. By doing the pitch first, and then rotate the yaw around the airplane's Y vector, the airplane would end up pointing up and backwards despite the pitch being downwards. By doing the yaw first, the plane points backwards, then the pitch around the plane's X vector will make it point downwards correctly.
The same logic applies for roll, which needs to be applied last.
So your code should be :
glRotatef(yawTotal, 0,1,0);
glRotatef(pitchTotal, 1,0,0);
glRotatef(rollTotal, 0,0,1);
Cumulative rotations will suffer from gimbal lock. Look at it this way: suppose you are in an aeroplane, flying level. You apply a yaw of 90 degrees anticlockwise. You then apply a roll of 90 degrees clockwise. You then apply a yaw of 90 degrees clockwise.
Your plane is now pointing straight downward — the total effect is a pitch of 90 degrees clockwise. But if you just tried to add up the different rotations then you'd end up with a roll of 90 degrees, and no pitch whatsoever because you at no point applied pitch to the plane.
Trying to store and update rotation as three separate angles doesn't work.
Common cited solutions are to use a quaternion or to store the object orientation directly as a matrix. The matrix solution is easier to build because you can prototype it with OpenGL's built-in matrix stacks. Most people also seem to find matrices easier to understand than quaternions.
So, assuming you want to go matrix, your prototype might do something like (please forgive my lack of decent Java knowledge; I'm going to write C essentially):
GLfloat myOrientation[16];
// to draw the object:
glMultMatrixf(myOrientation);
/* drawing here */
// to apply roll, assuming the modelview stack is active:
glPushMatrix(); // backup what's already on the stack
glLoadIdentity(); // start with the identity
glRotatef(angle, 0, 0, 1);
glMultMatrixf(myOrientation); // premultiply the current orientation by the roll
// update our record of orientation
glGetFloatv(GL_MODELVIEW_MATRIX, myOrientation);
glPopMatrix();
You possibly don't want to use the OpenGL stack in shipping code because it's not really built for this sort of use and so performance may be iffy. But you can prototype and profile rather than making an assumption. You also need to consider floating point precision problems — really you should be applying a step that ensures myOrientation is still orthonormal after it has been adjusted.
It's probably easiest to check Google for that, but briefly speaking you'll use the dot product to remove erroneous crosstalk from two of the axes to the third, then to remove from one of the first two axes from the second, then renormalise all three.
Thanks for the responses. The first response pointed me in the right direction, the second response helped a little too, but ultimately it boiled down to a combination of both. Initially, your 3D object should have a member variable which is a float array size 16. [0-15]. You then have to initialize it to the identity matrix. Then the member methods of your 3D object like "yawObject(float amount)" just know that you are yawing the object from "the objects point of view" and not the world, which would allow the incremental rotation. Inside the yawObject method (or pitch,roll ojbect) you need to call the Matrix.rotateM(myfloatarray,0,angle,0,1,0). That will store the new rotation matrix (as describe in the first response). You can then when you are about to draw your object, multiply the model matrix by the myfloatarray matrix using gl.glMultMatrix.
Good luck and let me know if you need more information than that.