how to check ray intersection with object in ARCore

how to check ray intersection with object in ARCore - android

Is there a way to check if I touched the object on the screen ? As I understand the HitResult class allows me to check if I touched the recognized and maped surface. But I want to check this I touched the object that is set on that surface.

ARCore doesn't really have a concept of an object, so we can't directly provide that. I suggest looking at ray-sphere tests for a starting point.
However, I can help with getting the ray itself (to be added to HelloArActivity):
/**
* Returns a world coordinate frame ray for a screen point. The ray is
* defined using a 6-element float array containing the head location
* followed by a normalized direction vector.
*/
float[] screenPointToWorldRay(float xPx, float yPx, Frame frame) {
float[] points = new float[12]; // {clip query, camera query, camera origin}
// Set up the clip-space coordinates of our query point
// +x is right:
points[0] = 2.0f * xPx / mSurfaceView.getMeasuredWidth() - 1.0f;
// +y is up (android UI Y is down):
points[1] = 1.0f - 2.0f * yPx / mSurfaceView.getMeasuredHeight();
points[2] = 1.0f; // +z is forwards (remember clip, not camera)
points[3] = 1.0f; // w (homogenous coordinates)
float[] matrices = new float[32]; // {proj, inverse proj}
// If you'll be calling this several times per frame factor out
// the next two lines to run when Frame.isDisplayRotationChanged().
mSession.getProjectionMatrix(matrices, 0, 1.0f, 100.0f);
Matrix.invertM(matrices, 16, matrices, 0);
// Transform clip-space point to camera-space.
Matrix.multiplyMV(points, 4, matrices, 16, points, 0);
// points[4,5,6] is now a camera-space vector. Transform to world space to get a point
// along the ray.
float[] out = new float[6];
frame.getPose().transformPoint(points, 4, out, 3);
// use points[8,9,10] as a zero vector to get the ray head position in world space.
frame.getPose().transformPoint(points, 8, out, 0);
// normalize the direction vector:
float dx = out[3] - out[0];
float dy = out[4] - out[1];
float dz = out[5] - out[2];
float scale = 1.0f / (float) Math.sqrt(dx*dx + dy*dy + dz*dz);
out[3] = dx * scale;
out[4] = dy * scale;
out[5] = dz * scale;
return out;
}
If you're calling this several times per frame see the comment about the getProjectionMatrix and invertM calls.

Apart from Mouse Picking with Ray Casting, cf. Ian's answer, the other commonly used technique is a picking buffer, explained in detail (with C++ code) here
The trick behind 3D picking is very simple. We will attach a running
index to each triangle and have the FS output the index of the
triangle that the pixel belongs to. The end result is that we get a
"color" buffer that doesn't really contain colors. Instead, for each
pixel which is covered by some primitive we get the index of this
primitive. When the mouse is clicked on the window we will read back
that index (according to the location of the mouse) and render the
select triangle red. By combining a depth buffer in the process we
guarantee that when several primitives are overlapping the same pixel
we get the index of the top-most primitive (closest to the camera).
So in a nutshell:
Every object's draw method needs an ongoing index and a boolean for whether this draw renders the pixel buffer or not.
The render method converts the index into a grayscale color and the scene is rendered
After the whole rendering is done, retrieve the pixel color at the touch position GL11.glReadPixels(x, y, /*the x and y of the pixel you want the colour of*/). Then translate the color back to an index and the index back to an object. Voilà, you have your clicked object.
To be fair, for a mobile usecase you should probably read a 10x10 rectangle, iterate trough it and pick the first found non-background color - because touches are never that precise.
This approach works independently of the complexity of your objects

Related

Using gluUnProject to map touches to x,y cords on z=0 plane in Android OpenGL ES 2.0

I've drawn a grid at z=0 in OpenGL 2.0 ES, and just want to convert touch inputs to x/y coordinates on that plane. It seems like this is best done through ray tracing, which involves running gluUnProject on 0 and 1, then creating a ray, solving that ray for z=0?
I found this code, but it is OpenGL ES 1.0: i-schuetz / Android_OpenGL_Picking
Screenshot of app running so you can see camera distortion.
My code on Github, only 4 files. The unproject function I'm trying to write is in MyGLRenderer.java:
public float[] unproject(float rx, float ry) {
float rz = 1;
float[] xyzw = {0, 0, 0, 0};
int[] viewport = {0, 0, mDisplayWidth, mDisplayHeight};
android.opengl.GLU.gluUnProject(
rx, ry, rz, // window coordinates
mViewMatrix, 0,
mProjectionMatrix, 0,
viewport, 0,
xyzw, 0);
xyzw[0] /= xyzw[3];
xyzw[1] /= xyzw[3];
xyzw[2] /= xyzw[3];
xyzw[3] = 1;
return xyzw;
}
I would like this function to take an rx and ry for the screen, and return an rx and ry for the z=0 plane.

There is nothing particularly special about what gluUnProject (...) does. If you have all of the matrices and the viewport dimensions (x,y and width,height) I can walk you through the process of implementing it yourself.
NOTE: I tend to call each coordinate space by a different name than you might be used to, understand that screen space is another name for window space, view space is another name for eye space, object space is another name for model space.
Step 1: Screen Space to NDC space (Undo Viewport Transform)
NDCX = (2.0 × (ScreenX - ViewportX) / ViewportW) - 1.0
NDCY = (2.0 × (ScreenY - ViewportY) / ViewportH) - 1.0
Screen Space to NDC space (Undo Depth Range Mapping)
Typically in screen space, the Depth Range will map z=0 to near and z=1 to far:
NDCZ = 2.0 × ScreenZ - 1.0
Step 2†: NDC space to Object space (Undo Projection, View and Model Transforms)
(Projection Matrix)-1 × NDCXYZ1 = ViewXYZW
(ModelView Matrix)-1 × ViewXYZW = ObjectXYZW
† This can actually be combined into a single step, as you will see below...
ObjectXYZw = (Projection Matrix × ModelView Matrix)-1 × NDCXYZ1
Now, you may notice that I crossed-out W in ObjectXYZ, we really do not care about this at all but the math will produce a pesky W value nevertheless. At this point, you can return the individual components of ObjectXYZ as your rX, rY and rZ.

How to tell what part of a texture on a 3d cube was touched [duplicate]

I have a renderer using directx and openGL, and a 3d scene. The viewport and the window are of the same dimensions.
How do I implement picking given mouse coordinates x and y in a platform independent way?

If you can, do the picking on the CPU by calculating a ray from the eye through the mouse pointer and intersect it with your models.
If this isn't an option I would go with some type of ID rendering. Assign each object you want to pick a unique color, render the objects with these colors and finally read out the color from the framebuffer under the mouse pointer.
EDIT: If the question is how to construct the ray from the mouse coordinates you need the following: a projection matrix P and the camera transform C. If the coordinates of the mouse pointer is (x, y) and the size of the viewport is (width, height) one position in clip space along the ray is:
mouse_clip = [
float(x) * 2 / float(width) - 1,
1 - float(y) * 2 / float(height),
0,
1]
(Notice that I flipped the y-axis since often the origin of the mouse coordinates are in the upper left corner)
The following is also true:
mouse_clip = P * C * mouse_worldspace
Which gives:
mouse_worldspace = inverse(C) * inverse(P) * mouse_clip
We now have:
p = C.position(); //origin of camera in worldspace
n = normalize(mouse_worldspace - p); //unit vector from p through mouse pos in worldspace

Here's the viewing frustum:
First you need to determine where on the nearplane the mouse click happened:
rescale the window coordinates (0..640,0..480) to [-1,1], with (-1,-1) at the bottom-left corner and (1,1) at the top-right.
'undo' the projection by multiplying the scaled coordinates by what I call the 'unview' matrix: unview = (P * M).inverse() = M.inverse() * P.inverse(), where M is the ModelView matrix and P is the projection matrix.
Then determine where the camera is in worldspace, and draw a ray starting at the camera and passing through the point you found on the nearplane.
The camera is at M.inverse().col(4), i.e. the final column of the inverse ModelView matrix.
Final pseudocode:
normalised_x = 2 * mouse_x / win_width - 1
normalised_y = 1 - 2 * mouse_y / win_height
// note the y pos is inverted, so +y is at the top of the screen
unviewMat = (projectionMat * modelViewMat).inverse()
near_point = unviewMat * Vec(normalised_x, normalised_y, 0, 1)
camera_pos = ray_origin = modelViewMat.inverse().col(4)
ray_dir = near_point - camera_pos

Well, pretty simple, the theory behind this is always the same
1) Unproject two times your 2D coordinate onto the 3D space. (each API has its own function, but you can implement your own if you want). One at Min Z, one at Max Z.
2) With these two values calculate the vector that goes from Min Z and point to Max Z.
3) With the vector and a point calculate the ray that goes from Min Z to MaxZ
4) Now you have a ray, with this you can do a ray-triangle/ray-plane/ray-something intersection and get your result...

I have little DirectX experience, but I'm sure it's similar to OpenGL. What you want is the gluUnproject call.
Assuming you have a valid Z buffer you can query the contents of the Z buffer at a mouse position with:
// obtain the viewport, modelview matrix and projection matrix
// you may keep the viewport and projection matrices throughout the program if you don't change them
GLint viewport[4];
GLdouble modelview[16];
GLdouble projection[16];
glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, modelview);
glGetDoublev(GL_PROJECTION_MATRIX, projection);
// obtain the Z position (not world coordinates but in range 0 - 1)
GLfloat z_cursor;
glReadPixels(x_cursor, y_cursor, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &z_cursor);
// obtain the world coordinates
GLdouble x, y, z;
gluUnProject(x_cursor, y_cursor, z_cursor, modelview, projection, viewport, &x, &y, &z);
if you don't want to use glu you can also implement the gluUnProject you could also implement it yourself, it's functionality is relatively simple and is described at opengl.org

Ok, this topic is old but it was the best I found on the topic, and it helped me a bit, so I'll post here for those who are are following ;-)
This is the way I got it to work without having to compute the inverse of Projection matrix:
void Application::leftButtonPress(u32 x, u32 y){
GL::Viewport vp = GL::getViewport(); // just a call to glGet GL_VIEWPORT
vec3f p = vec3f::from(
((float)(vp.width - x) / (float)vp.width),
((float)y / (float)vp.height),
1.);
// alternatively vec3f p = vec3f::from(
// ((float)x / (float)vp.width),
// ((float)(vp.height - y) / (float)vp.height),
// 1.);
p *= vec3f::from(APP_FRUSTUM_WIDTH, APP_FRUSTUM_HEIGHT, 1.);
p += vec3f::from(APP_FRUSTUM_LEFT, APP_FRUSTUM_BOTTOM, 0.);
// now p elements are in (-1, 1)
vec3f near = p * vec3f::from(APP_FRUSTUM_NEAR);
vec3f far = p * vec3f::from(APP_FRUSTUM_FAR);
// ray in world coordinates
Ray ray = { _camera->getPos(), -(_camera->getBasis() * (far - near).normalize()) };
_ray->set(ray.origin, ray.dir, 10000.); // this is a debugging vertex array to see the Ray on screen
Node* node = _scene->collide(ray, Transform());
cout << "node is : " << node << endl;
}
This assumes a perspective projection, but the question never arises for the orthographic one in the first place.

I've got the same situation with ordinary ray picking, but something is wrong. I've performed the unproject operation the proper way, but it just doesn't work. I think, I've made some mistake, but can't figure out where. My matix multiplication , inverse and vector by matix multiplications all seen to work fine, I've tested them.
In my code I'm reacting on WM_LBUTTONDOWN. So lParam returns [Y][X] coordinates as 2 words in a dword. I extract them, then convert to normalized space, I've checked this part also works fine. When I click the lower left corner - I'm getting close values to -1 -1 and good values for all 3 other corners. I'm then using linepoins.vtx array for debug and It's not even close to reality.
unsigned int x_coord=lParam&0x0000ffff; //X RAW COORD
unsigned int y_coord=client_area.bottom-(lParam>>16); //Y RAW COORD
double xn=((double)x_coord/client_area.right)*2-1; //X [-1 +1]
double yn=1-((double)y_coord/client_area.bottom)*2;//Y [-1 +1]
_declspec(align(16))gl_vec4 pt_eye(xn,yn,0.0,1.0);
gl_mat4 view_matrix_inversed;
gl_mat4 projection_matrix_inversed;
cam.matrixProjection.inverse(&projection_matrix_inversed);
cam.matrixView.inverse(&view_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&projection_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&view_matrix_inversed);
line_points.vtx[line_points.count*4]=pt_eye.x-cam.pos.x;
line_points.vtx[line_points.count*4+1]=pt_eye.y-cam.pos.y;
line_points.vtx[line_points.count*4+2]=pt_eye.z-cam.pos.z;
line_points.vtx[line_points.count*4+3]=1.0;

Translate between 'Touch Plane' and 'Game Plane'

I am trying to create a 2D game. Because I am using OpenGL ES I have to plot everything in 3D, but I just fix the z coordinate, which is fine. Now what I want to do is calculate the angle between two vectors (C = player center, P = point just above player, T = touch point) CP and CT so that I can make the player face that direction. I know how to get the angle between 2 vectors, but my problem is getting all the points to exist on the same plane (by translating the T).
I know that T exists on a plane where (0,0) is upper left and UP is actually DOWN (visually). I also know that C and P's UP is actually UP and that any their X and Y is on a completely 3 dimensional different plane to T. I need to get either C and P onto T's plane (which I have tried below) or get T onto C and P's plane. Can anyone help me? I am using the standard OpenGL projection model and I am 0,0,-4 zoomed out of the frustrum (I am looking directly at (0,0,0)). My 2D objects all sit on the plane (0,0,1);
private float getRotation(float touch_x, float touch_y)
{
//center_x = this.getWidth() / 2;
//center_y = this.getHeight() / 2;
float cx, cy, tx, ty, ux, uy;
cx = (player.x * _renderer.centerx);
cy = (player.y * -_renderer.centery);
ux = cx;
uy = cy+1.0f;
tx = (touch_x - _renderer.centerx);
ty = (touch_y - _renderer.centery);
Log.d(TAG, "center x: "+cx+"y:"+cy);
Log.d(TAG, "up x: "+ux+"y:"+uy);
Log.d(TAG, "touched x: "+tx+"y:"+ty);
float P12 = length(cx,cy,tx,ty);
float P13 = length(cx,cy,ux,uy);
float P23 = length(tx,ty,ux,uy);
return (float)Math.toDegrees(Math.acos((P12*P12 + P13*P13 - P23*P23)/2.0 * P12 * P13));
}
Basically I want to know if there is a way I can translate (tx, ty, -4) to (x, y, 1) using the standard view frustum.
I have tried some other things now. In my touch event I am trying to do this:
float[] coords = new float[4];
GLU.gluUnProject(touch_x, touch_y, -4.0f, renderer.model, 0, renderer.project, 0, renderer.view, 0, coords, 0);
Which is throwing an exception I am setting up the model, projection and view in the OnSurfaceChanged of the Renderer object:
GL11 gl11 = (GL11)gl;
model = new float[16];
project = new float[16];
int[] view = new int[4];
gl11.glGetFloatv(GL10.GL_MODELVIEW, model, 0);
gl11.glGetFloatv(GL10.GL_PROJECTION, project, 0);
gl11.glGetIntegerv(GL11.GL_VIEWPORT, view, 0);

I have several textbooks on openGL and after dusting one off I found that the term for what I want to do is called picking. Once I knew what I was asking, I found a lot of good web sites and references:
http://www.lighthouse3d.com/opengl/picking/
OpenGL ES (iPhone) Touch Picking
Coordinate Picking with OpenGL ES 2.0
Android OpenGL 3D picking
converting 2D mouse coordinates to 3D space in OpenGL ES
Coordinate Picking with OpenGL ES 2.0
Ray-picking in OpenGL ES 2.0
Android: GLES20: Called unimplemented OpenGL ES API
...
The list is almost innumerable. There are 700 ways to do this, and none of them worked for me. Ultimately I have decided to go back to basics and do a thorough OpenGL|ES learning stint, to which effect I have bought the book here: http://www.amazon.com/Graphics-Programming-Android-Programmer-ebook/dp/B0070D83W2/ref=sr_1_2?s=digital-text&ie=UTF8&qid=1362250733&sr=1-2&keywords=opengl+es+2.0+android
One thing I have already learnt is that I was most definitely using the wrong type of projection. I should not use full 3D for a 2D game. In order to do picking in a full 3D environment I would have to cast a ray from the screen point onto the surface of the 3D plane where the game was taking place. In addition to being a horrendous waste of resources (raycasting per click), there were other tell-tales. I would render my player with a circle encompassing her, and as I moved her, the circle would go off center of the player. This is due to the full 3D environment rendered on a 2D plane. It just will not produce a professional result. I need to use an orthographic projection.

I think you're trying to do too much all at once. I can understand each sentence of your question separately; but strung all together, it's very confusing.
For the exceptions, you probably need to pass identity matrices instead of zero matrices to get a basic 1-to-1 projection.
Then I'd suggest that you scale the y dimension by -1 so all the UPs and DOWNs match at least.
I hope this helps, because I'm not 100% sure what you're trying to do. Particularly, " translate (tx, ty, -4) to (x, y, 1) using the standard view frustum" doesn't make sense to me. You can translate with a translation matrix. You can clip to a view frustum, or project an object from the frustum to a plane (usually the view plane). But if all your Zs are constant, you can just discard them right? So, assuming x=tx and y=ty, then tz += 5?

Strange Matrix transformation for SVG rotate

I have a java code for SVG drawing. It processes transforms including rotate, and does this very well, as far as I can see in numerous test pictures compared against their rendering in Chrome. Next what I need is to get actual object location, which is in many images declared via transforms. So I decided just to read X and Y from Matrix used for drawing. Unfortunately I get incorrect values for rotate transform, that is they do not correspond to real object location in the image.
The stripped down code looks like this:
Matrix matrix = new Matrix();
float cx = 1000; // suppose this is an object X coordinate
float cy = 300; // this is its Y coordinate
float angle = -90; // rotate counterclockwise, got from "rotate(-90, 1000, 300)"
// shift to -X,-Y, so object is in the center
matrix.postTranslate(-cx, -cy);
// rotate actually
matrix.postRotate(angle);
// shift back
matrix.postTranslate(cx, cy);
// debug goes here
float[] values = new float[9];
matrix.getValues(values);
Log.v("HELLO", values[Matrix.MTRANS_X] + " " + values[Matrix.MTRANS_Y]);
The log outputs the values 700 and 1300 respectively. I'd expect 0 and 0, because I see the object rotated inplace in my image (that is there is no any movement), and postTranslate calls should compensate each other. Of course, I see how these values are formed from 1000 and 300, but don't understand why. Once again, I point out that the matrix with these strange values is used for actual object drawing, and it looks correct. Could someone explain what happens here? Am I missing something? So far I have only one solution of my problem: just do not try to obtain position from rotate, do it only for explicit matrix and translate transforms. But this approach lacks generality, and anyway I thought matrix should have reasonable values (including offsets) for any transformation type.

The answer is that the matrix is an operator for space transformation, and should not be used for direct extraction of object position. Instead, one should get initial object coordinates, as specified in x and y attributes of an SVG tag, and apply the matrix on them:
float[] src = new float[2];
src[0] = cx;
src[1] = cy;
matrix.mapPoints(src);
After this we get proper location values in x and y variables.

Detect touch on OpenGL object?

I am developing an application which uses OpenGL for rendering of the images.
Now I just want to determine the touch event on the opengl sphere object which I have drwn.
Here i draw 4 object on the screen. now how should I come to know that which object has been
touched. I have used onTouchEvent() method. But It gives me only x & y co-ordinates but my
object is drawn in 3D.
please help since I am new to OpenGL.
Best Regards,
~Anup

t Google IO there was a session on how OpenGL was used for Google Body on Android. The selecting of body parts was done by rendering each of them with a solid color into a hidden buffer, then based on the color that was on the touch x,y the corresponding object could be found. For performance purposes, only a small cropped area of 20x20 pixels around the touch point was rendered that way.

Both approach (1. hidden color buffer and 2. intersection test) has its own merit.
1. Hidden color buffer: pixel read-out is a very slow operation.
Certainly an overkill for a simple ray-sphere intersection test.
Ray-sphere intersection test: this is not that difficult.
Here is a simplified version of an implementation in Ogre3d.
std::pair<bool, m_real> Ray::intersects(const Sphere& sphere) const
{
const Ray& ray=*this;
const vector3& raydir = ray.direction();
// Adjust ray origin relative to sphere center
const vector3& rayorig = ray.origin() - sphere.center;
m_real radius = sphere.radius;
// Mmm, quadratics
// Build coeffs which can be used with std quadratic solver
// ie t = (-b +/- sqrt(b*b + 4ac)) / 2a
m_real a = raydir%raydir;
m_real b = 2 * rayorig%raydir;
m_real c = rayorig%rayorig - radius*radius;
// Calc determinant
m_real d = (b*b) - (4 * a * c);
if (d < 0)
{
// No intersection
return std::pair<bool, m_real>(false, 0);
}
else
{
// BTW, if d=0 there is one intersection, if d > 0 there are 2
// But we only want the closest one, so that's ok, just use the
// '-' version of the solver
m_real t = ( -b - sqrt(d) ) / (2 * a);
if (t < 0)
t = ( -b + sqrt(d) ) / (2 * a);
return std::pair<bool, m_real>(true, t);
}
}
Probably, a ray that corresponds to cursor position also needs to be calculated. Again you can refer to Ogre3d's source code: search for getCameraToViewportRay. Basically, you need the view and projection matrix to calculate a Ray (a 3D position and a 3D direction) from 2D position.

In my project, the solution I chose was:
Unproject your 2D screen coordinates to a virtual 3D line going through your scene.
Detect possible intersections of that line and your scene objects.
This is quite a complex tast.

I have only done this in Direct3D rather than OpenGL ES, but these are the steps:
Find your modelview and projection matrices. It seems that OpenGL ES has removed the ability to retrieve the matrices set by gluProject() etc. But you can use android.opengl.Matrix member functions to create these matrices instead, then set with glLoadMatrix().
Call gluUnproject() twice, once with winZ=0, then with winZ=1. Pass the matrices you calculated earlier.
This will output a 3d position from each call. This pair of positions define a ray in OpenGL "world space".
Perform a ray - sphere intersection test on each of your spheres in order. (Closest to camera first, otherwise you may select a sphere that is hidden behind another.) If you detect an intersection, you've touched the sphere.

for find touch point is inside circle or not..
public boolean checkInsideCircle(float x,float y, float centerX,float centerY, float Radius)
{
if(((x - centerX)*(x - centerX))+((y - centerY)*(y - centerY)) < (Radius*Radius))
return true;
else
return false;
}
where
1) centerX,centerY are center point of circle.
2) Radius is radius of circle.
3) x,y point of touch..

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.