I am developing an application which uses OpenGL for rendering of the images.
Now I just want to determine the touch event on the opengl sphere object which I have drwn.
Here i draw 4 object on the screen. now how should I come to know that which object has been
touched. I have used onTouchEvent() method. But It gives me only x & y co-ordinates but my
object is drawn in 3D.
please help since I am new to OpenGL.
Best Regards,
~Anup
t Google IO there was a session on how OpenGL was used for Google Body on Android. The selecting of body parts was done by rendering each of them with a solid color into a hidden buffer, then based on the color that was on the touch x,y the corresponding object could be found. For performance purposes, only a small cropped area of 20x20 pixels around the touch point was rendered that way.
Both approach (1. hidden color buffer and 2. intersection test) has its own merit.
1. Hidden color buffer: pixel read-out is a very slow operation.
Certainly an overkill for a simple ray-sphere intersection test.
Ray-sphere intersection test: this is not that difficult.
Here is a simplified version of an implementation in Ogre3d.
std::pair<bool, m_real> Ray::intersects(const Sphere& sphere) const
{
const Ray& ray=*this;
const vector3& raydir = ray.direction();
// Adjust ray origin relative to sphere center
const vector3& rayorig = ray.origin() - sphere.center;
m_real radius = sphere.radius;
// Mmm, quadratics
// Build coeffs which can be used with std quadratic solver
// ie t = (-b +/- sqrt(b*b + 4ac)) / 2a
m_real a = raydir%raydir;
m_real b = 2 * rayorig%raydir;
m_real c = rayorig%rayorig - radius*radius;
// Calc determinant
m_real d = (b*b) - (4 * a * c);
if (d < 0)
{
// No intersection
return std::pair<bool, m_real>(false, 0);
}
else
{
// BTW, if d=0 there is one intersection, if d > 0 there are 2
// But we only want the closest one, so that's ok, just use the
// '-' version of the solver
m_real t = ( -b - sqrt(d) ) / (2 * a);
if (t < 0)
t = ( -b + sqrt(d) ) / (2 * a);
return std::pair<bool, m_real>(true, t);
}
}
Probably, a ray that corresponds to cursor position also needs to be calculated. Again you can refer to Ogre3d's source code: search for getCameraToViewportRay. Basically, you need the view and projection matrix to calculate a Ray (a 3D position and a 3D direction) from 2D position.
In my project, the solution I chose was:
Unproject your 2D screen coordinates to a virtual 3D line going through your scene.
Detect possible intersections of that line and your scene objects.
This is quite a complex tast.
I have only done this in Direct3D rather than OpenGL ES, but these are the steps:
Find your modelview and projection matrices. It seems that OpenGL ES has removed the ability to retrieve the matrices set by gluProject() etc. But you can use android.opengl.Matrix member functions to create these matrices instead, then set with glLoadMatrix().
Call gluUnproject() twice, once with winZ=0, then with winZ=1. Pass the matrices you calculated earlier.
This will output a 3d position from each call. This pair of positions define a ray in OpenGL "world space".
Perform a ray - sphere intersection test on each of your spheres in order. (Closest to camera first, otherwise you may select a sphere that is hidden behind another.) If you detect an intersection, you've touched the sphere.
for find touch point is inside circle or not..
public boolean checkInsideCircle(float x,float y, float centerX,float centerY, float Radius)
{
if(((x - centerX)*(x - centerX))+((y - centerY)*(y - centerY)) < (Radius*Radius))
return true;
else
return false;
}
where
1) centerX,centerY are center point of circle.
2) Radius is radius of circle.
3) x,y point of touch..
Related
Is there a way to check if I touched the object on the screen ? As I understand the HitResult class allows me to check if I touched the recognized and maped surface. But I want to check this I touched the object that is set on that surface.
ARCore doesn't really have a concept of an object, so we can't directly provide that. I suggest looking at ray-sphere tests for a starting point.
However, I can help with getting the ray itself (to be added to HelloArActivity):
/**
* Returns a world coordinate frame ray for a screen point. The ray is
* defined using a 6-element float array containing the head location
* followed by a normalized direction vector.
*/
float[] screenPointToWorldRay(float xPx, float yPx, Frame frame) {
float[] points = new float[12]; // {clip query, camera query, camera origin}
// Set up the clip-space coordinates of our query point
// +x is right:
points[0] = 2.0f * xPx / mSurfaceView.getMeasuredWidth() - 1.0f;
// +y is up (android UI Y is down):
points[1] = 1.0f - 2.0f * yPx / mSurfaceView.getMeasuredHeight();
points[2] = 1.0f; // +z is forwards (remember clip, not camera)
points[3] = 1.0f; // w (homogenous coordinates)
float[] matrices = new float[32]; // {proj, inverse proj}
// If you'll be calling this several times per frame factor out
// the next two lines to run when Frame.isDisplayRotationChanged().
mSession.getProjectionMatrix(matrices, 0, 1.0f, 100.0f);
Matrix.invertM(matrices, 16, matrices, 0);
// Transform clip-space point to camera-space.
Matrix.multiplyMV(points, 4, matrices, 16, points, 0);
// points[4,5,6] is now a camera-space vector. Transform to world space to get a point
// along the ray.
float[] out = new float[6];
frame.getPose().transformPoint(points, 4, out, 3);
// use points[8,9,10] as a zero vector to get the ray head position in world space.
frame.getPose().transformPoint(points, 8, out, 0);
// normalize the direction vector:
float dx = out[3] - out[0];
float dy = out[4] - out[1];
float dz = out[5] - out[2];
float scale = 1.0f / (float) Math.sqrt(dx*dx + dy*dy + dz*dz);
out[3] = dx * scale;
out[4] = dy * scale;
out[5] = dz * scale;
return out;
}
If you're calling this several times per frame see the comment about the getProjectionMatrix and invertM calls.
Apart from Mouse Picking with Ray Casting, cf. Ian's answer, the other commonly used technique is a picking buffer, explained in detail (with C++ code) here
The trick behind 3D picking is very simple. We will attach a running
index to each triangle and have the FS output the index of the
triangle that the pixel belongs to. The end result is that we get a
"color" buffer that doesn't really contain colors. Instead, for each
pixel which is covered by some primitive we get the index of this
primitive. When the mouse is clicked on the window we will read back
that index (according to the location of the mouse) and render the
select triangle red. By combining a depth buffer in the process we
guarantee that when several primitives are overlapping the same pixel
we get the index of the top-most primitive (closest to the camera).
So in a nutshell:
Every object's draw method needs an ongoing index and a boolean for whether this draw renders the pixel buffer or not.
The render method converts the index into a grayscale color and the scene is rendered
After the whole rendering is done, retrieve the pixel color at the touch position GL11.glReadPixels(x, y, /*the x and y of the pixel you want the colour of*/). Then translate the color back to an index and the index back to an object. VoilĂ , you have your clicked object.
To be fair, for a mobile usecase you should probably read a 10x10 rectangle, iterate trough it and pick the first found non-background color - because touches are never that precise.
This approach works independently of the complexity of your objects
I looking for some advices about recognition of three handwritten shapes - circles, diamonds and rectangles. I tried diffrent aproaches but they failed so maybe you could point me in another, better direction.
What I tried:
1) Simple algorithm based on dot product between points of handwritten shape and ideal shape. It works not so bad at recognition of rectangle, but failed on circles and diamonds. The problem is that dot product of the circle and diamond is quite similiar even for ideal shapes.
2) Same aproach but using Dynamic Time Warping as measure of simililarity. Similiar problems.
3) Neural networks. I tried few aproaches - giving points data to neural networks (Feedforward and Kohonen) or giving rasterized image. For Kohonen it allways classified all the data (event the sample used to train) into the same category. Feedforward with points was better (but on the same level as aproach 1 and 2) and with rasterized image it was very slow (I needs at least size^2 input neurons and for small sized of raster circle is indistinguishable even for me ;) ) and also without success. I think is because all of this shapes are closed figures? I am not big specialist of ANN (had 1 semester course of them) so maybe I am using them wrong?
4) Saving the shape as Freeman Chain Code and using some algorithms for computing similarity. I though that in FCC the shapes will be realy diffrent from each other. No success here (but I havent explorer this path very deeply).
I am building app for Android with this but I think the language is irrelevant here.
Here's some working code for a shape classifier. http://jsfiddle.net/R3ns3/ I pulled the threshold numbers (*Threshold variables in the code) out of the ether, so of course they can be tweaked for better results.
I use the bounding box, average point in a sub-section, angle between points, polar angle from bounding box center, and corner recognition. It can classify hand drawn rectangles, diamonds, and circles. The code records points while the mouse button is down and tries to classify when you stop drawing.
HTML
<canvas id="draw" width="300" height="300" style="position:absolute; top:0px; left:0p; margin:0; padding:0; width:300px; height:300px; border:2px solid blue;"></canvas>
JS
var state = {
width: 300,
height: 300,
pointRadius: 2,
cornerThreshold: 125,
circleThreshold: 145,
rectangleThreshold: 45,
diamondThreshold: 135,
canvas: document.getElementById("draw"),
ctx: document.getElementById("draw").getContext("2d"),
drawing: false,
points: [],
getCorners: function(angles, pts) {
var list = pts || this.points;
var corners = [];
for(var i=0; i<angles.length; i++) {
if(angles[i] <= this.cornerThreshold) {
corners.push(list[(i + 1) % list.length]);
}
}
return corners;
},
draw: function(color, pts) {
var list = pts||this.points;
this.ctx.fillStyle = color;
for(var i=0; i<list.length; i++) {
this.ctx.beginPath();
this.ctx.arc(list[i].x, list[i].y, this.pointRadius, 0, Math.PI * 2, false);
this.ctx.fill();
}
},
classify: function() {
// get bounding box
var left = this.width, right = 0,
top = this.height, bottom = 0;
for(var i=0; i<this.points.length; i++) {
var pt = this.points[i];
if(left > pt.x) left = pt.x;
if(right < pt.x) right = pt.x;
if(top > pt.y) top = pt.y;
if(bottom < pt.y) bottom = pt.y;
}
var center = {x: (left+right)/2, y: (top+bottom)/2};
this.draw("#00f", [
{x: left, y: top},
{x: right, y: top},
{x: left, y: bottom},
{x: right, y: bottom},
]);
// find average point in each sector (9 sectors)
var sects = [
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0},
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0},
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0}
];
var x3 = (right + (1/(right-left)) - left) / 3;
var y3 = (bottom + (1/(bottom-top)) - top) / 3;
for(var i=0; i<this.points.length; i++) {
var pt = this.points[i];
var sx = Math.floor((pt.x - left) / x3);
var sy = Math.floor((pt.y - top) / y3);
var idx = sy * 3 + sx;
sects[idx].x += pt.x;
sects[idx].y += pt.y;
sects[idx].c ++;
if(sx == 1 && sy == 1) {
return "UNKNOWN";
}
}
// get the significant points (clockwise)
var sigPts = [];
var clk = [0, 1, 2, 5, 8, 7, 6, 3]
for(var i=0; i<clk.length; i++) {
var pt = sects[clk[i]];
if(pt.c > 0) {
sigPts.push({x: pt.x / pt.c, y: pt.y / pt.c});
} else {
return "UNKNOWN";
}
}
this.draw("#0f0", sigPts);
// find angle between consecutive 3 points
var angles = [];
for(var i=0; i<sigPts.length; i++) {
var a = sigPts[i],
b = sigPts[(i + 1) % sigPts.length],
c = sigPts[(i + 2) % sigPts.length],
ab = Math.sqrt(Math.pow(b.x-a.x,2)+Math.pow(b.y-a.y,2)),
bc = Math.sqrt(Math.pow(b.x-c.x,2)+ Math.pow(b.y-c.y,2)),
ac = Math.sqrt(Math.pow(c.x-a.x,2)+ Math.pow(c.y-a.y,2)),
deg = Math.floor(Math.acos((bc*bc+ab*ab-ac*ac)/(2*bc*ab)) * 180 / Math.PI);
angles.push(deg);
}
console.log(angles);
var corners = this.getCorners(angles, sigPts);
// get polar angle of corners
for(var i=0; i<corners.length; i++) {
corners[i].t = Math.floor(Math.atan2(corners[i].y - center.y, corners[i].x - center.x) * 180 / Math.PI);
}
console.log(corners);
// whats the shape ?
if(corners.length <= 1) { // circle
return "CIRCLE";
} else if(corners.length == 2) { // circle || diamond
// difference of polar angles
var diff = Math.abs((corners[0].t - corners[1].t + 180) % 360 - 180);
console.log(diff);
if(diff <= this.circleThreshold) {
return "CIRCLE";
} else {
return "DIAMOND";
}
} else if(corners.length == 4) { // rectangle || diamond
// sum of polar angles of corners
var sum = Math.abs(corners[0].t + corners[1].t + corners[2].t + corners[3].t);
console.log(sum);
if(sum <= this.rectangleThreshold) {
return "RECTANGLE";
} else if(sum >= this.diamondThreshold) {
return "DIAMOND";
} else {
return "UNKNOWN";
}
} else {
alert("draw neater please");
return "UNKNOWN";
}
}
};
state.canvas.addEventListener("mousedown", (function(e) {
if(!this.drawing) {
this.ctx.clearRect(0, 0, 300, 300);
this.points = [];
this.drawing = true;
console.log("drawing start");
}
}).bind(state), false);
state.canvas.addEventListener("mouseup", (function(e) {
this.drawing = false;
console.log("drawing stop");
this.draw("#f00");
alert(this.classify());
}).bind(state), false);
state.canvas.addEventListener("mousemove", (function(e) {
if(this.drawing) {
var x = e.pageX, y = e.pageY;
this.points.push({"x": x, "y": y});
this.ctx.fillStyle = "#000";
this.ctx.fillRect(x-2, y-2, 4, 4);
}
}).bind(state), false);
Given the possible variation in handwritten inputs I would suggest that a neural network approach is the way to go; you will find it difficult or impossible to accurately model these classes by hand. LastCoder's attempt works to a degree, but it does not cope with much variation or have promise for high accuracy if worked on further - this kind of hand-engineered approach was abandoned a very long time ago.
State-of-the-art results in handwritten character classification these days is typically achieved with convolutional neural networks (CNNs). Given that you have only 3 classes the problem should be easier than digit or character classification, although from experience with the MNIST handwritten digit dataset, I expect that your circles, squares and diamonds may occasionally end up being difficult for even humans to distinguish.
So, if it were up to me I would use a CNN. I would input binary images taken from the drawing area to the first layer of the network. These may require some preprocessing. If the drawn shapes cover a very small area of the input space you may benefit from bulking them up (i.e. increasing line thickness) so as to make the shapes more invariant to small differences. It may also be beneficial to centre the shape in the image, although the pooling step might alleviate the need for this.
I would also point out that the more training data the better. One is often faced with a trade-off between increasing the size of one's dataset and improving one's model. Synthesising more examples (e.g. by skewing, rotating, shifting, stretching, etc) or spending a few hours drawing shapes may provide more of a benefit than you could get in the same time attempting to improve your model.
Good luck with your app!
A linear Hough transform of the square or the diamond ought to be easy to recognize. They will both produce four point masses. The square's will be in pairs at zero and 90 degrees with the same y-coordinates for both pairs; in other words, a rectangle. The diamond will be at two other angles corresponding to how skinny the diamond is, e.g. 45 and 135 or else 60 and 120.
For the circle you need a circular Hough transform, and it will produce a single bright point cluster in 3d (x,y,r) Hough space.
Both linear and circular Hough transforms are implemented in OpenCV, and it's possible to run OpenCV on Android. These implementations include thresholding to identify lines and circles. See pg. 329 and pg. 331 of the documentation here.
If you are not familiar with Hough transforms, the Wikipedia page is not bad.
Another algorithm you may find interesting and perhaps useful is given in this paper about polygon similarity. I implemented it many years ago, and it's still around here. If you can convert the figures to loops of vectors, this algorithm could compare them against patterns, and the similarity metric would show goodness of match. The algorithm ignores rotational orientation, so if your definition of square and diamond is with respect to the axes of the drawing surface, you will have to modify the algorithm a bit to differentiate these cases.
What you have here is a fairly standard clasification task, in an arguably vision domain.
You could do this several ways, but the best way isn't known, and can sometimes depend on fine details of the problem.
So, this isn't an answer, per se, but there is a website - Kaggle.com that runs competition for classifications. One of the sample/experiemental tasks they list is reading single hand written numeric digits. That is close enough to this problem, that the same methods are almost certainly going to apply fairly well.
I suggest you go to https://www.kaggle.com/c/digit-recognizer and look around.
But if that is too vague, I can tell you from my reading of it, and playing with that problem space, that Random Forests are a better basic starting place than Neural networks.
In this case (your 3 simple objects) you could try RanSaC-fitting for ellipse (getting the circle) and lines (getting the sides of the rectangle or diamond) - on each connected object if there are several objects to classify at the same time. Based on the actual setting (expected size, etc.) the RanSaC-parameters (how close must a point be to count as voter, how many voters you need at minimun) must be tuned. When you have found a line with RanSaC-fitting, remove the points "close" to it and go for the next line. The angles of the lines should make a distinction between diamand and rectangle easy.
A very simple approach optimized for classifying exactly these 3 objects could be the following:
compute the center of gravity of an object to classify
then compute the distances of the center to the object points as a function on the angle (from 0 to 2 pi).
classify the resulting graph based on the smoothness and/or variance and the position and height of the local maxima and minima (maybe after smoothing the graph).
I propose a way to do it in following steps : -
Take convex hull of the image (consider the shapes being convex)
divide into segments using clustering algorithms
Try to fit a curves or straight line to it and measure & threshold using training set which can be used for classifications
For your application try to divide into 4 clusters .
once you classify clusters as line or curves you can use the info to derive whether curve is circle,rectangle or diamond
I think the answers that are already in place are good, but perhaps a better way of thinking about it is that you should try to break the problem into meaningful pieces.
If possible avoid the problem entirely. For instance if you are recognizing gestures, just analyze the gestures in real time. With gestures you can provide feedback to the user as to how your program interpreted their gesture and the user will change what they are doing appropriately.
Clean up the image in question. Before you do anything come up with an algorithm to try to select what the correct thing is you are trying to analyze. Also use an appropriate filter (convolution perhaps) to remove image artifacts before you begin the process.
Once you have figured out what the thing is you are going to analyze then analyze it and return a score, one for circle, one for noise, one for line, and the last for pyramid.
Repeat this step with the next viable candidate until you come up with the best candidate that is not noise.
I suspect you will find that you don't need a complicated algorithm to find circle, line, pyramid but that it is more so about structuring your code appropriately.
If I was you I'll use already available Image Processing libraries like "AForge".
Take A look at this sample article:
http://www.aforgenet.com/articles/shape_checker
I have a jar on github that can help if you are willing to unpack it and obey the apache license. You can try to recreate it in any other language as well.
Its an edge detector. The best step from there could be to:
find the corners (median of 90 degrees)
find mean median and maximum radius
find skew/angle from horizontal
have a decision agent decide what the shape is
Play around with it and find what you want.
My jar is open to the public at this address. It is not yet production ready but can help.
Just thought I could help. If anyone wants to be a part of the project, please do.
I did this recently with identifying circles (bone centers) in medical images.
Note: Steps 1-2 are if you are grabbing from an image.
Psuedo Code Steps
Step 1. Highlight the Edges
edges = edge_map(of the source image) (using edge detector(s))
(laymens: show the lines/edges--make them searchable)
Step 2. Trace each unique edge
I would (use a nearest neighbor search 9x9 or 25x25) to identify / follow / trace each edge, collecting each point into the list (they become neighbors), and taking note of the gradient at each point.
This step produces: a set of edges.
(where one edge/curve/line = list of [point_gradient_data_structure]s
(laymens: Collect a set of points along the edge in the image)
Step 3. Analyze Each Edge('s points and gradient data)
For each edge,
if the gradient similar for a given region/set of neighbors (a run of points along an edge), then we have a straight line.
If the gradient is changing gradually, we have a curve.
Each region/run of points that is a straight line or a curve, has a mean (center) and other gradient statistics.
Step 4. Detect Objects
We can use the summary information from Step 3 to build conclusions about diamonds, circles, or squares. (i.e. 4 straight lines, that have end points near each other with proper gradients is a diamond or square. One (or more) curves with sufficient points/gradients (with a common focal point) makes a complete circle).
Note: Using an image pyramid can improve algorithm performance, both in terms of results and speed.
This technique (Steps 1-4) would get the job done for well defined shapes, and also could detect shapes that are drawn less than perfectly, and could handle slightly disconnected lines (if needed).
Note: With some machine learning techniques (mentioned by other posters), it could be helpful/important to have good "classifiers" to basically break the problem down into smaller parts/components, so then a decider further down the chain could use to better understand/"see" the objects. I think machine learning might be a little heavy-handed for this question, but still could produce reasonable results. PCA(face detection) could potentially work too.
I have a renderer using directx and openGL, and a 3d scene. The viewport and the window are of the same dimensions.
How do I implement picking given mouse coordinates x and y in a platform independent way?
If you can, do the picking on the CPU by calculating a ray from the eye through the mouse pointer and intersect it with your models.
If this isn't an option I would go with some type of ID rendering. Assign each object you want to pick a unique color, render the objects with these colors and finally read out the color from the framebuffer under the mouse pointer.
EDIT: If the question is how to construct the ray from the mouse coordinates you need the following: a projection matrix P and the camera transform C. If the coordinates of the mouse pointer is (x, y) and the size of the viewport is (width, height) one position in clip space along the ray is:
mouse_clip = [
float(x) * 2 / float(width) - 1,
1 - float(y) * 2 / float(height),
0,
1]
(Notice that I flipped the y-axis since often the origin of the mouse coordinates are in the upper left corner)
The following is also true:
mouse_clip = P * C * mouse_worldspace
Which gives:
mouse_worldspace = inverse(C) * inverse(P) * mouse_clip
We now have:
p = C.position(); //origin of camera in worldspace
n = normalize(mouse_worldspace - p); //unit vector from p through mouse pos in worldspace
Here's the viewing frustum:
First you need to determine where on the nearplane the mouse click happened:
rescale the window coordinates (0..640,0..480) to [-1,1], with (-1,-1) at the bottom-left corner and (1,1) at the top-right.
'undo' the projection by multiplying the scaled coordinates by what I call the 'unview' matrix: unview = (P * M).inverse() = M.inverse() * P.inverse(), where M is the ModelView matrix and P is the projection matrix.
Then determine where the camera is in worldspace, and draw a ray starting at the camera and passing through the point you found on the nearplane.
The camera is at M.inverse().col(4), i.e. the final column of the inverse ModelView matrix.
Final pseudocode:
normalised_x = 2 * mouse_x / win_width - 1
normalised_y = 1 - 2 * mouse_y / win_height
// note the y pos is inverted, so +y is at the top of the screen
unviewMat = (projectionMat * modelViewMat).inverse()
near_point = unviewMat * Vec(normalised_x, normalised_y, 0, 1)
camera_pos = ray_origin = modelViewMat.inverse().col(4)
ray_dir = near_point - camera_pos
Well, pretty simple, the theory behind this is always the same
1) Unproject two times your 2D coordinate onto the 3D space. (each API has its own function, but you can implement your own if you want). One at Min Z, one at Max Z.
2) With these two values calculate the vector that goes from Min Z and point to Max Z.
3) With the vector and a point calculate the ray that goes from Min Z to MaxZ
4) Now you have a ray, with this you can do a ray-triangle/ray-plane/ray-something intersection and get your result...
I have little DirectX experience, but I'm sure it's similar to OpenGL. What you want is the gluUnproject call.
Assuming you have a valid Z buffer you can query the contents of the Z buffer at a mouse position with:
// obtain the viewport, modelview matrix and projection matrix
// you may keep the viewport and projection matrices throughout the program if you don't change them
GLint viewport[4];
GLdouble modelview[16];
GLdouble projection[16];
glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, modelview);
glGetDoublev(GL_PROJECTION_MATRIX, projection);
// obtain the Z position (not world coordinates but in range 0 - 1)
GLfloat z_cursor;
glReadPixels(x_cursor, y_cursor, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &z_cursor);
// obtain the world coordinates
GLdouble x, y, z;
gluUnProject(x_cursor, y_cursor, z_cursor, modelview, projection, viewport, &x, &y, &z);
if you don't want to use glu you can also implement the gluUnProject you could also implement it yourself, it's functionality is relatively simple and is described at opengl.org
Ok, this topic is old but it was the best I found on the topic, and it helped me a bit, so I'll post here for those who are are following ;-)
This is the way I got it to work without having to compute the inverse of Projection matrix:
void Application::leftButtonPress(u32 x, u32 y){
GL::Viewport vp = GL::getViewport(); // just a call to glGet GL_VIEWPORT
vec3f p = vec3f::from(
((float)(vp.width - x) / (float)vp.width),
((float)y / (float)vp.height),
1.);
// alternatively vec3f p = vec3f::from(
// ((float)x / (float)vp.width),
// ((float)(vp.height - y) / (float)vp.height),
// 1.);
p *= vec3f::from(APP_FRUSTUM_WIDTH, APP_FRUSTUM_HEIGHT, 1.);
p += vec3f::from(APP_FRUSTUM_LEFT, APP_FRUSTUM_BOTTOM, 0.);
// now p elements are in (-1, 1)
vec3f near = p * vec3f::from(APP_FRUSTUM_NEAR);
vec3f far = p * vec3f::from(APP_FRUSTUM_FAR);
// ray in world coordinates
Ray ray = { _camera->getPos(), -(_camera->getBasis() * (far - near).normalize()) };
_ray->set(ray.origin, ray.dir, 10000.); // this is a debugging vertex array to see the Ray on screen
Node* node = _scene->collide(ray, Transform());
cout << "node is : " << node << endl;
}
This assumes a perspective projection, but the question never arises for the orthographic one in the first place.
I've got the same situation with ordinary ray picking, but something is wrong. I've performed the unproject operation the proper way, but it just doesn't work. I think, I've made some mistake, but can't figure out where. My matix multiplication , inverse and vector by matix multiplications all seen to work fine, I've tested them.
In my code I'm reacting on WM_LBUTTONDOWN. So lParam returns [Y][X] coordinates as 2 words in a dword. I extract them, then convert to normalized space, I've checked this part also works fine. When I click the lower left corner - I'm getting close values to -1 -1 and good values for all 3 other corners. I'm then using linepoins.vtx array for debug and It's not even close to reality.
unsigned int x_coord=lParam&0x0000ffff; //X RAW COORD
unsigned int y_coord=client_area.bottom-(lParam>>16); //Y RAW COORD
double xn=((double)x_coord/client_area.right)*2-1; //X [-1 +1]
double yn=1-((double)y_coord/client_area.bottom)*2;//Y [-1 +1]
_declspec(align(16))gl_vec4 pt_eye(xn,yn,0.0,1.0);
gl_mat4 view_matrix_inversed;
gl_mat4 projection_matrix_inversed;
cam.matrixProjection.inverse(&projection_matrix_inversed);
cam.matrixView.inverse(&view_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&projection_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&view_matrix_inversed);
line_points.vtx[line_points.count*4]=pt_eye.x-cam.pos.x;
line_points.vtx[line_points.count*4+1]=pt_eye.y-cam.pos.y;
line_points.vtx[line_points.count*4+2]=pt_eye.z-cam.pos.z;
line_points.vtx[line_points.count*4+3]=1.0;
I am trying to create a 2D game. Because I am using OpenGL ES I have to plot everything in 3D, but I just fix the z coordinate, which is fine. Now what I want to do is calculate the angle between two vectors (C = player center, P = point just above player, T = touch point) CP and CT so that I can make the player face that direction. I know how to get the angle between 2 vectors, but my problem is getting all the points to exist on the same plane (by translating the T).
I know that T exists on a plane where (0,0) is upper left and UP is actually DOWN (visually). I also know that C and P's UP is actually UP and that any their X and Y is on a completely 3 dimensional different plane to T. I need to get either C and P onto T's plane (which I have tried below) or get T onto C and P's plane. Can anyone help me? I am using the standard OpenGL projection model and I am 0,0,-4 zoomed out of the frustrum (I am looking directly at (0,0,0)). My 2D objects all sit on the plane (0,0,1);
private float getRotation(float touch_x, float touch_y)
{
//center_x = this.getWidth() / 2;
//center_y = this.getHeight() / 2;
float cx, cy, tx, ty, ux, uy;
cx = (player.x * _renderer.centerx);
cy = (player.y * -_renderer.centery);
ux = cx;
uy = cy+1.0f;
tx = (touch_x - _renderer.centerx);
ty = (touch_y - _renderer.centery);
Log.d(TAG, "center x: "+cx+"y:"+cy);
Log.d(TAG, "up x: "+ux+"y:"+uy);
Log.d(TAG, "touched x: "+tx+"y:"+ty);
float P12 = length(cx,cy,tx,ty);
float P13 = length(cx,cy,ux,uy);
float P23 = length(tx,ty,ux,uy);
return (float)Math.toDegrees(Math.acos((P12*P12 + P13*P13 - P23*P23)/2.0 * P12 * P13));
}
Basically I want to know if there is a way I can translate (tx, ty, -4) to (x, y, 1) using the standard view frustum.
I have tried some other things now. In my touch event I am trying to do this:
float[] coords = new float[4];
GLU.gluUnProject(touch_x, touch_y, -4.0f, renderer.model, 0, renderer.project, 0, renderer.view, 0, coords, 0);
Which is throwing an exception I am setting up the model, projection and view in the OnSurfaceChanged of the Renderer object:
GL11 gl11 = (GL11)gl;
model = new float[16];
project = new float[16];
int[] view = new int[4];
gl11.glGetFloatv(GL10.GL_MODELVIEW, model, 0);
gl11.glGetFloatv(GL10.GL_PROJECTION, project, 0);
gl11.glGetIntegerv(GL11.GL_VIEWPORT, view, 0);
I have several textbooks on openGL and after dusting one off I found that the term for what I want to do is called picking. Once I knew what I was asking, I found a lot of good web sites and references:
http://www.lighthouse3d.com/opengl/picking/
OpenGL ES (iPhone) Touch Picking
Coordinate Picking with OpenGL ES 2.0
Android OpenGL 3D picking
converting 2D mouse coordinates to 3D space in OpenGL ES
Coordinate Picking with OpenGL ES 2.0
Ray-picking in OpenGL ES 2.0
Android: GLES20: Called unimplemented OpenGL ES API
...
The list is almost innumerable. There are 700 ways to do this, and none of them worked for me. Ultimately I have decided to go back to basics and do a thorough OpenGL|ES learning stint, to which effect I have bought the book here: http://www.amazon.com/Graphics-Programming-Android-Programmer-ebook/dp/B0070D83W2/ref=sr_1_2?s=digital-text&ie=UTF8&qid=1362250733&sr=1-2&keywords=opengl+es+2.0+android
One thing I have already learnt is that I was most definitely using the wrong type of projection. I should not use full 3D for a 2D game. In order to do picking in a full 3D environment I would have to cast a ray from the screen point onto the surface of the 3D plane where the game was taking place. In addition to being a horrendous waste of resources (raycasting per click), there were other tell-tales. I would render my player with a circle encompassing her, and as I moved her, the circle would go off center of the player. This is due to the full 3D environment rendered on a 2D plane. It just will not produce a professional result. I need to use an orthographic projection.
I think you're trying to do too much all at once. I can understand each sentence of your question separately; but strung all together, it's very confusing.
For the exceptions, you probably need to pass identity matrices instead of zero matrices to get a basic 1-to-1 projection.
Then I'd suggest that you scale the y dimension by -1 so all the UPs and DOWNs match at least.
I hope this helps, because I'm not 100% sure what you're trying to do. Particularly, " translate (tx, ty, -4) to (x, y, 1) using the standard view frustum" doesn't make sense to me. You can translate with a translation matrix. You can clip to a view frustum, or project an object from the frustum to a plane (usually the view plane). But if all your Zs are constant, you can just discard them right? So, assuming x=tx and y=ty, then tz += 5?
I am porting some code to Android from Visual C++. The VC++ ArcTo function takes the bounding rectangle and the start and end points as parameters to define the arc. The android.graphics.Path function arcTo takes the bounding rectangle and the "start angle" and "sweep angle" as parameters.
I am not clear how to convert from the VC set of coordinates to the Android set, or what these two angles are. The arc also has direction (CW or ACW) - I am not clear how to incorporate these in a single Path, or how to switch between one and the other.
One oddity I came across is that in the Android function, angles are expressed in degrees, rather than radians which is what most calculations would use and what one would expect.
I hope my question makes some sort of sense and that someone can help!
Edit: following on from the help I got from Dr Dredel, and with much drawing of diagrams, here's how I eventually translated the VC++ call to Android:
else if (coord.isArc())
{
ptCentre = getPoint(new Coord(coord.getArcLat(), coord.getArcLong()));
nRadius = getPixels(coord.getArcRadius());
rect = new RectF(ptCentre.x - nRadius, ptCentre.y - nRadius,
ptCentre.x + nRadius, ptCentre.y + nRadius);
if (coord.isClockwise())
{
alpha = Math.atan2(ptCentre.y - ptStart.y, ptCentre.x - ptStart.x) *
Constants.k_d180Pi;
beta = Math.atan2(ptCentre.y - ptEnd.y, ptEnd.x - ptCentre.x) *
Constants.k_d180Pi;
path.arcTo(rect, (float)(alpha + 180), (float)(180 - beta - alpha));
}
else
{
}
As you can see, I haven't done the anti-clockwise arc yet, but it should be similar. My calculation wasn't perfect, as I originally had (360 - beta - alpha) instead of (180 - beta - alpha), and the original version gave some very funny results!
(Wow! this formatting mechanism is the other side of weird!)