I want to do some Structure from Motion using OpenCV. This should happen on Android.
Currently I am having the cameraMatrix (intrinsic parameters) and the distortion coefficients from the camera calibration.
The user should now take 2 images from building and the app should generate a pointcloud.
Note: the user maybe also rotates the camera of the smartphone a little bit as he moves along one side of the building...
At the current point, I have the following information:
the undistorted left image
the undistorted right image
a list of good matches using SIFT
the homography matrix
the fundamental matrix
I've searched the internet and now I am very confused how I should proceed...
Some say I need to use stereoRectify for getting Q and use Q with reprojectImageTo3D() for getting the pointCloud.
Others say that I need to use stereoRectifyUncalibrated and use H1 and H2 from this method to fill all the parameters of triangulatePoints.
In triangulatePoints I need the projectionMatrix of each camera/image but from my understanding this seems definitly wrong.
So for me there are some problems:
How do I get R and T (Rotation and Translation) from all the information I already have
If I use stereoRectify, the first 4 parameters are cameraMatrix1, distortionCoeff1, cameraMatrix2, distortionCoeff2) - If I do not have a stereoCamera like Kinect, are the ameraMatrix1 and cameraMatrix2 equals for my setup (mono camera on a smartphone)
How can I obtain Q (guess if I have R and T I can get it from stereoRectify)
Is there anonther way of getting the projectioMatrices for each camera so I can use the triangulationmethod provided by OpenCV
I know this are a lot of questions, but googeling confused me so I need to get this straight. I hope someone can help me with my problems.
Thanks
PS as this are more theoretical questions I did not post some code. If you want / need to see code or the values of my camera calibration, just ask and I will add them to my posting.
I wrote something about using Farneback's optical flow for Structure from Motion before. You can read the details here.
But here's the code snippet, it's a somewhat working, but not great implementation. Hope that you can use it as a reference.
/* Try to find essential matrix from the points */
Mat fundamental = findFundamentalMat( left_points, right_points, FM_RANSAC, 0.2, 0.99 );
Mat essential = cam_matrix.t() * fundamental * cam_matrix;
/* Find the projection matrix between those two images */
SVD svd( essential );
static const Mat W = (Mat_<double>(3, 3) <<
0, -1, 0,
1, 0, 0,
0, 0, 1);
static const Mat W_inv = W.inv();
Mat_<double> R1 = svd.u * W * svd.vt;
Mat_<double> T1 = svd.u.col( 2 );
Mat_<double> R2 = svd.u * W_inv * svd.vt;
Mat_<double> T2 = -svd.u.col( 2 );
static const Mat P1 = Mat::eye(3, 4, CV_64FC1 );
Mat P2 =( Mat_<double>(3, 4) <<
R1(0, 0), R1(0, 1), R1(0, 2), T1(0),
R1(1, 0), R1(1, 1), R1(1, 2), T1(1),
R1(2, 0), R1(2, 1), R1(2, 2), T1(2));
/* Triangulate the points to find the 3D homogenous points in the world space
Note that each column of the 'out' matrix corresponds to the 3d homogenous point
*/
Mat out;
triangulatePoints( P1, P2, left_points, right_points, out );
/* Since it's homogenous (x, y, z, w) coord, divide by w to get (x, y, z, 1) */
vector<Mat> splitted = {
out.row(0) / out.row(3),
out.row(1) / out.row(3),
out.row(2) / out.row(3)
};
merge( splitted, out );
return out;
This isn't OpenCV, but here is an example of exactly what you are asking for:
http://boofcv.org/index.php?title=Example_Stereo_Single_Camera
There is an Android demonstration application which includes that code here:
https://play.google.com/store/apps/details?id=org.boofcv.android
Related
Is there a way to check if I touched the object on the screen ? As I understand the HitResult class allows me to check if I touched the recognized and maped surface. But I want to check this I touched the object that is set on that surface.
ARCore doesn't really have a concept of an object, so we can't directly provide that. I suggest looking at ray-sphere tests for a starting point.
However, I can help with getting the ray itself (to be added to HelloArActivity):
/**
* Returns a world coordinate frame ray for a screen point. The ray is
* defined using a 6-element float array containing the head location
* followed by a normalized direction vector.
*/
float[] screenPointToWorldRay(float xPx, float yPx, Frame frame) {
float[] points = new float[12]; // {clip query, camera query, camera origin}
// Set up the clip-space coordinates of our query point
// +x is right:
points[0] = 2.0f * xPx / mSurfaceView.getMeasuredWidth() - 1.0f;
// +y is up (android UI Y is down):
points[1] = 1.0f - 2.0f * yPx / mSurfaceView.getMeasuredHeight();
points[2] = 1.0f; // +z is forwards (remember clip, not camera)
points[3] = 1.0f; // w (homogenous coordinates)
float[] matrices = new float[32]; // {proj, inverse proj}
// If you'll be calling this several times per frame factor out
// the next two lines to run when Frame.isDisplayRotationChanged().
mSession.getProjectionMatrix(matrices, 0, 1.0f, 100.0f);
Matrix.invertM(matrices, 16, matrices, 0);
// Transform clip-space point to camera-space.
Matrix.multiplyMV(points, 4, matrices, 16, points, 0);
// points[4,5,6] is now a camera-space vector. Transform to world space to get a point
// along the ray.
float[] out = new float[6];
frame.getPose().transformPoint(points, 4, out, 3);
// use points[8,9,10] as a zero vector to get the ray head position in world space.
frame.getPose().transformPoint(points, 8, out, 0);
// normalize the direction vector:
float dx = out[3] - out[0];
float dy = out[4] - out[1];
float dz = out[5] - out[2];
float scale = 1.0f / (float) Math.sqrt(dx*dx + dy*dy + dz*dz);
out[3] = dx * scale;
out[4] = dy * scale;
out[5] = dz * scale;
return out;
}
If you're calling this several times per frame see the comment about the getProjectionMatrix and invertM calls.
Apart from Mouse Picking with Ray Casting, cf. Ian's answer, the other commonly used technique is a picking buffer, explained in detail (with C++ code) here
The trick behind 3D picking is very simple. We will attach a running
index to each triangle and have the FS output the index of the
triangle that the pixel belongs to. The end result is that we get a
"color" buffer that doesn't really contain colors. Instead, for each
pixel which is covered by some primitive we get the index of this
primitive. When the mouse is clicked on the window we will read back
that index (according to the location of the mouse) and render the
select triangle red. By combining a depth buffer in the process we
guarantee that when several primitives are overlapping the same pixel
we get the index of the top-most primitive (closest to the camera).
So in a nutshell:
Every object's draw method needs an ongoing index and a boolean for whether this draw renders the pixel buffer or not.
The render method converts the index into a grayscale color and the scene is rendered
After the whole rendering is done, retrieve the pixel color at the touch position GL11.glReadPixels(x, y, /*the x and y of the pixel you want the colour of*/). Then translate the color back to an index and the index back to an object. VoilĂ , you have your clicked object.
To be fair, for a mobile usecase you should probably read a 10x10 rectangle, iterate trough it and pick the first found non-background color - because touches are never that precise.
This approach works independently of the complexity of your objects
Not sure if this is the right way to ask, but please help. I have an image of a dented car. I have to process it and highlight the dents and return the number of dents. I was able to do it reasonably well with the following result:
The matlab code is:
img2=rgb2gray(i1);
imshow(img2);
img3=imtophat(img2,strel('disk',15));
img4=imadjust(img3);
layer=img4(:,:,1);
img5=layer>100 & layer<250;
img6=imfill(img5,'holes');
img7=bwareaopen(img6,5);
[L,ans]=bwlabeln(img7);
imshow(img7);
I=imread(i1);
Ians=CarDentIdentification(I);
However, when I try to do this using opencv, I get this:
With the following code:
Imgproc.cvtColor(source, middle, Imgproc.COLOR_RGB2GRAY);
Imgproc.equalizeHist(middle, middle);
Imgproc.threshold(middle, middle, 150, 255, Imgproc.THRESH_OTSU);
Please tell me how can I obtain better results in opencv, and also how to count the dents? I tried findcontour() but it gives a very large number. I tried on other images as well, but I'm not getting proper results.
Please help.
So you basically from the MATLAB site, imtophat does - Top-hat filtering computes the morphological opening of the image (using imopen) and then subtracts the result from the original image.
You could do this in OpenCV with the following steps:
Step 1: Get the disk structuring element
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (15, 15))
Step 2: Compute opening of the image and then subtract the result from the original image
tophat = cv2.morphologyEx(v, cv2.MORPH_TOPHAT, kernel)
This gives following result -
Step 3 - Now you could just manually threshold it or use Otsu -
ret, thresh = cv2.threshold(tophat, 17, 255, 0)
which gives you the following image -
Since the OP wants the code in Java, here is the probable code in Java:
private Mat topHat(Mat image)
{
Mat element = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(15, 15), new Point (0, 0));
Mat dst = new Mat;
Imgproc.morphologyEx(image, dst, Imgproc.MORPH_TOPHAT, element, new Point(0, 0));
return dst;
}
Make sure you do this on a gray scale image (CvType.8UC1) and then you can threshold suitably.
I have a renderer using directx and openGL, and a 3d scene. The viewport and the window are of the same dimensions.
How do I implement picking given mouse coordinates x and y in a platform independent way?
If you can, do the picking on the CPU by calculating a ray from the eye through the mouse pointer and intersect it with your models.
If this isn't an option I would go with some type of ID rendering. Assign each object you want to pick a unique color, render the objects with these colors and finally read out the color from the framebuffer under the mouse pointer.
EDIT: If the question is how to construct the ray from the mouse coordinates you need the following: a projection matrix P and the camera transform C. If the coordinates of the mouse pointer is (x, y) and the size of the viewport is (width, height) one position in clip space along the ray is:
mouse_clip = [
float(x) * 2 / float(width) - 1,
1 - float(y) * 2 / float(height),
0,
1]
(Notice that I flipped the y-axis since often the origin of the mouse coordinates are in the upper left corner)
The following is also true:
mouse_clip = P * C * mouse_worldspace
Which gives:
mouse_worldspace = inverse(C) * inverse(P) * mouse_clip
We now have:
p = C.position(); //origin of camera in worldspace
n = normalize(mouse_worldspace - p); //unit vector from p through mouse pos in worldspace
Here's the viewing frustum:
First you need to determine where on the nearplane the mouse click happened:
rescale the window coordinates (0..640,0..480) to [-1,1], with (-1,-1) at the bottom-left corner and (1,1) at the top-right.
'undo' the projection by multiplying the scaled coordinates by what I call the 'unview' matrix: unview = (P * M).inverse() = M.inverse() * P.inverse(), where M is the ModelView matrix and P is the projection matrix.
Then determine where the camera is in worldspace, and draw a ray starting at the camera and passing through the point you found on the nearplane.
The camera is at M.inverse().col(4), i.e. the final column of the inverse ModelView matrix.
Final pseudocode:
normalised_x = 2 * mouse_x / win_width - 1
normalised_y = 1 - 2 * mouse_y / win_height
// note the y pos is inverted, so +y is at the top of the screen
unviewMat = (projectionMat * modelViewMat).inverse()
near_point = unviewMat * Vec(normalised_x, normalised_y, 0, 1)
camera_pos = ray_origin = modelViewMat.inverse().col(4)
ray_dir = near_point - camera_pos
Well, pretty simple, the theory behind this is always the same
1) Unproject two times your 2D coordinate onto the 3D space. (each API has its own function, but you can implement your own if you want). One at Min Z, one at Max Z.
2) With these two values calculate the vector that goes from Min Z and point to Max Z.
3) With the vector and a point calculate the ray that goes from Min Z to MaxZ
4) Now you have a ray, with this you can do a ray-triangle/ray-plane/ray-something intersection and get your result...
I have little DirectX experience, but I'm sure it's similar to OpenGL. What you want is the gluUnproject call.
Assuming you have a valid Z buffer you can query the contents of the Z buffer at a mouse position with:
// obtain the viewport, modelview matrix and projection matrix
// you may keep the viewport and projection matrices throughout the program if you don't change them
GLint viewport[4];
GLdouble modelview[16];
GLdouble projection[16];
glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, modelview);
glGetDoublev(GL_PROJECTION_MATRIX, projection);
// obtain the Z position (not world coordinates but in range 0 - 1)
GLfloat z_cursor;
glReadPixels(x_cursor, y_cursor, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &z_cursor);
// obtain the world coordinates
GLdouble x, y, z;
gluUnProject(x_cursor, y_cursor, z_cursor, modelview, projection, viewport, &x, &y, &z);
if you don't want to use glu you can also implement the gluUnProject you could also implement it yourself, it's functionality is relatively simple and is described at opengl.org
Ok, this topic is old but it was the best I found on the topic, and it helped me a bit, so I'll post here for those who are are following ;-)
This is the way I got it to work without having to compute the inverse of Projection matrix:
void Application::leftButtonPress(u32 x, u32 y){
GL::Viewport vp = GL::getViewport(); // just a call to glGet GL_VIEWPORT
vec3f p = vec3f::from(
((float)(vp.width - x) / (float)vp.width),
((float)y / (float)vp.height),
1.);
// alternatively vec3f p = vec3f::from(
// ((float)x / (float)vp.width),
// ((float)(vp.height - y) / (float)vp.height),
// 1.);
p *= vec3f::from(APP_FRUSTUM_WIDTH, APP_FRUSTUM_HEIGHT, 1.);
p += vec3f::from(APP_FRUSTUM_LEFT, APP_FRUSTUM_BOTTOM, 0.);
// now p elements are in (-1, 1)
vec3f near = p * vec3f::from(APP_FRUSTUM_NEAR);
vec3f far = p * vec3f::from(APP_FRUSTUM_FAR);
// ray in world coordinates
Ray ray = { _camera->getPos(), -(_camera->getBasis() * (far - near).normalize()) };
_ray->set(ray.origin, ray.dir, 10000.); // this is a debugging vertex array to see the Ray on screen
Node* node = _scene->collide(ray, Transform());
cout << "node is : " << node << endl;
}
This assumes a perspective projection, but the question never arises for the orthographic one in the first place.
I've got the same situation with ordinary ray picking, but something is wrong. I've performed the unproject operation the proper way, but it just doesn't work. I think, I've made some mistake, but can't figure out where. My matix multiplication , inverse and vector by matix multiplications all seen to work fine, I've tested them.
In my code I'm reacting on WM_LBUTTONDOWN. So lParam returns [Y][X] coordinates as 2 words in a dword. I extract them, then convert to normalized space, I've checked this part also works fine. When I click the lower left corner - I'm getting close values to -1 -1 and good values for all 3 other corners. I'm then using linepoins.vtx array for debug and It's not even close to reality.
unsigned int x_coord=lParam&0x0000ffff; //X RAW COORD
unsigned int y_coord=client_area.bottom-(lParam>>16); //Y RAW COORD
double xn=((double)x_coord/client_area.right)*2-1; //X [-1 +1]
double yn=1-((double)y_coord/client_area.bottom)*2;//Y [-1 +1]
_declspec(align(16))gl_vec4 pt_eye(xn,yn,0.0,1.0);
gl_mat4 view_matrix_inversed;
gl_mat4 projection_matrix_inversed;
cam.matrixProjection.inverse(&projection_matrix_inversed);
cam.matrixView.inverse(&view_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&projection_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&view_matrix_inversed);
line_points.vtx[line_points.count*4]=pt_eye.x-cam.pos.x;
line_points.vtx[line_points.count*4+1]=pt_eye.y-cam.pos.y;
line_points.vtx[line_points.count*4+2]=pt_eye.z-cam.pos.z;
line_points.vtx[line_points.count*4+3]=1.0;
I am trying to create a 2D game. Because I am using OpenGL ES I have to plot everything in 3D, but I just fix the z coordinate, which is fine. Now what I want to do is calculate the angle between two vectors (C = player center, P = point just above player, T = touch point) CP and CT so that I can make the player face that direction. I know how to get the angle between 2 vectors, but my problem is getting all the points to exist on the same plane (by translating the T).
I know that T exists on a plane where (0,0) is upper left and UP is actually DOWN (visually). I also know that C and P's UP is actually UP and that any their X and Y is on a completely 3 dimensional different plane to T. I need to get either C and P onto T's plane (which I have tried below) or get T onto C and P's plane. Can anyone help me? I am using the standard OpenGL projection model and I am 0,0,-4 zoomed out of the frustrum (I am looking directly at (0,0,0)). My 2D objects all sit on the plane (0,0,1);
private float getRotation(float touch_x, float touch_y)
{
//center_x = this.getWidth() / 2;
//center_y = this.getHeight() / 2;
float cx, cy, tx, ty, ux, uy;
cx = (player.x * _renderer.centerx);
cy = (player.y * -_renderer.centery);
ux = cx;
uy = cy+1.0f;
tx = (touch_x - _renderer.centerx);
ty = (touch_y - _renderer.centery);
Log.d(TAG, "center x: "+cx+"y:"+cy);
Log.d(TAG, "up x: "+ux+"y:"+uy);
Log.d(TAG, "touched x: "+tx+"y:"+ty);
float P12 = length(cx,cy,tx,ty);
float P13 = length(cx,cy,ux,uy);
float P23 = length(tx,ty,ux,uy);
return (float)Math.toDegrees(Math.acos((P12*P12 + P13*P13 - P23*P23)/2.0 * P12 * P13));
}
Basically I want to know if there is a way I can translate (tx, ty, -4) to (x, y, 1) using the standard view frustum.
I have tried some other things now. In my touch event I am trying to do this:
float[] coords = new float[4];
GLU.gluUnProject(touch_x, touch_y, -4.0f, renderer.model, 0, renderer.project, 0, renderer.view, 0, coords, 0);
Which is throwing an exception I am setting up the model, projection and view in the OnSurfaceChanged of the Renderer object:
GL11 gl11 = (GL11)gl;
model = new float[16];
project = new float[16];
int[] view = new int[4];
gl11.glGetFloatv(GL10.GL_MODELVIEW, model, 0);
gl11.glGetFloatv(GL10.GL_PROJECTION, project, 0);
gl11.glGetIntegerv(GL11.GL_VIEWPORT, view, 0);
I have several textbooks on openGL and after dusting one off I found that the term for what I want to do is called picking. Once I knew what I was asking, I found a lot of good web sites and references:
http://www.lighthouse3d.com/opengl/picking/
OpenGL ES (iPhone) Touch Picking
Coordinate Picking with OpenGL ES 2.0
Android OpenGL 3D picking
converting 2D mouse coordinates to 3D space in OpenGL ES
Coordinate Picking with OpenGL ES 2.0
Ray-picking in OpenGL ES 2.0
Android: GLES20: Called unimplemented OpenGL ES API
...
The list is almost innumerable. There are 700 ways to do this, and none of them worked for me. Ultimately I have decided to go back to basics and do a thorough OpenGL|ES learning stint, to which effect I have bought the book here: http://www.amazon.com/Graphics-Programming-Android-Programmer-ebook/dp/B0070D83W2/ref=sr_1_2?s=digital-text&ie=UTF8&qid=1362250733&sr=1-2&keywords=opengl+es+2.0+android
One thing I have already learnt is that I was most definitely using the wrong type of projection. I should not use full 3D for a 2D game. In order to do picking in a full 3D environment I would have to cast a ray from the screen point onto the surface of the 3D plane where the game was taking place. In addition to being a horrendous waste of resources (raycasting per click), there were other tell-tales. I would render my player with a circle encompassing her, and as I moved her, the circle would go off center of the player. This is due to the full 3D environment rendered on a 2D plane. It just will not produce a professional result. I need to use an orthographic projection.
I think you're trying to do too much all at once. I can understand each sentence of your question separately; but strung all together, it's very confusing.
For the exceptions, you probably need to pass identity matrices instead of zero matrices to get a basic 1-to-1 projection.
Then I'd suggest that you scale the y dimension by -1 so all the UPs and DOWNs match at least.
I hope this helps, because I'm not 100% sure what you're trying to do. Particularly, " translate (tx, ty, -4) to (x, y, 1) using the standard view frustum" doesn't make sense to me. You can translate with a translation matrix. You can clip to a view frustum, or project an object from the frustum to a plane (usually the view plane). But if all your Zs are constant, you can just discard them right? So, assuming x=tx and y=ty, then tz += 5?
I am developing an application which uses OpenGL for rendering of the images.
Now I just want to determine the touch event on the opengl sphere object which I have drwn.
Here i draw 4 object on the screen. now how should I come to know that which object has been
touched. I have used onTouchEvent() method. But It gives me only x & y co-ordinates but my
object is drawn in 3D.
please help since I am new to OpenGL.
Best Regards,
~Anup
t Google IO there was a session on how OpenGL was used for Google Body on Android. The selecting of body parts was done by rendering each of them with a solid color into a hidden buffer, then based on the color that was on the touch x,y the corresponding object could be found. For performance purposes, only a small cropped area of 20x20 pixels around the touch point was rendered that way.
Both approach (1. hidden color buffer and 2. intersection test) has its own merit.
1. Hidden color buffer: pixel read-out is a very slow operation.
Certainly an overkill for a simple ray-sphere intersection test.
Ray-sphere intersection test: this is not that difficult.
Here is a simplified version of an implementation in Ogre3d.
std::pair<bool, m_real> Ray::intersects(const Sphere& sphere) const
{
const Ray& ray=*this;
const vector3& raydir = ray.direction();
// Adjust ray origin relative to sphere center
const vector3& rayorig = ray.origin() - sphere.center;
m_real radius = sphere.radius;
// Mmm, quadratics
// Build coeffs which can be used with std quadratic solver
// ie t = (-b +/- sqrt(b*b + 4ac)) / 2a
m_real a = raydir%raydir;
m_real b = 2 * rayorig%raydir;
m_real c = rayorig%rayorig - radius*radius;
// Calc determinant
m_real d = (b*b) - (4 * a * c);
if (d < 0)
{
// No intersection
return std::pair<bool, m_real>(false, 0);
}
else
{
// BTW, if d=0 there is one intersection, if d > 0 there are 2
// But we only want the closest one, so that's ok, just use the
// '-' version of the solver
m_real t = ( -b - sqrt(d) ) / (2 * a);
if (t < 0)
t = ( -b + sqrt(d) ) / (2 * a);
return std::pair<bool, m_real>(true, t);
}
}
Probably, a ray that corresponds to cursor position also needs to be calculated. Again you can refer to Ogre3d's source code: search for getCameraToViewportRay. Basically, you need the view and projection matrix to calculate a Ray (a 3D position and a 3D direction) from 2D position.
In my project, the solution I chose was:
Unproject your 2D screen coordinates to a virtual 3D line going through your scene.
Detect possible intersections of that line and your scene objects.
This is quite a complex tast.
I have only done this in Direct3D rather than OpenGL ES, but these are the steps:
Find your modelview and projection matrices. It seems that OpenGL ES has removed the ability to retrieve the matrices set by gluProject() etc. But you can use android.opengl.Matrix member functions to create these matrices instead, then set with glLoadMatrix().
Call gluUnproject() twice, once with winZ=0, then with winZ=1. Pass the matrices you calculated earlier.
This will output a 3d position from each call. This pair of positions define a ray in OpenGL "world space".
Perform a ray - sphere intersection test on each of your spheres in order. (Closest to camera first, otherwise you may select a sphere that is hidden behind another.) If you detect an intersection, you've touched the sphere.
for find touch point is inside circle or not..
public boolean checkInsideCircle(float x,float y, float centerX,float centerY, float Radius)
{
if(((x - centerX)*(x - centerX))+((y - centerY)*(y - centerY)) < (Radius*Radius))
return true;
else
return false;
}
where
1) centerX,centerY are center point of circle.
2) Radius is radius of circle.
3) x,y point of touch..