I've been getting my butt kicked trying to get a vertically placed 3d model GLB format placed properly on a vertical surface.
Just to be clear, I am not referring to the difficulty of identifying vertical surface, that is a whole other problem in itself.
Removing common boilerplate of setup to minimize this post.
I am using a fragment that extends ARFragment.
class SceneFormARFragment: ArFragment() {
Then of course I have supplied the config with a few tweaks.
override fun getSessionConfiguration(session: Session?): Config {
val config = super.getSessionConfiguration(session)
// By default we are not tracking and tracking is driven by startTracking()
config.planeFindingMode = Config.PlaneFindingMode.DISABLED
config.focusMode = Config.FocusMode.AUTO
return config
}
And to start and stop my AR experience I wrote a couple of methods inside the fragment as follows.
private fun startTracking() = viewScope.launchWhenResumed {
try {
arSceneView.session?.apply {
val changedConfig = config
changedConfig.planeFindingMode = Config.PlaneFindingMode.HORIZONTAL_AND_VERTICAL
configure(changedConfig)
}
logv("startTracking")
planeDiscoveryController.show()
arSceneView.planeRenderer.isVisible = true
arSceneView.cameraStreamRenderPriority = 7
} catch (ex: Exception) {
loge("error starting ar session: ${ex.message}")
}
}
private fun stopTracking() = viewScope.launchWhenResumed {
try {
arSceneView.session?.apply {
val changedConfig = config
changedConfig.planeFindingMode = Config.PlaneFindingMode.DISABLED
configure(changedConfig)
}
logv("stopTracking")
planeDiscoveryController.hide()
arSceneView.planeRenderer.isVisible = false
arSceneView.cameraStreamRenderPriority = 0
} catch (ex: Exception) {
loge("error stopping ar session: ${ex.message}")
}
}
In case you are wondering the reason for "starting and stopping" the AR experience is to maximize the GPU cycles for other UX interactions that are heavy on this overlaid screen, so we wait to start or stop based on current live data state of other things that are happening.
Ok moving on.
Let's review the HitResult handling:
In this method I do a few things:
Load two variations of TV 3d models from the cloud (wall mount and stand mount)
I remove any active models if they have tapped a new area
Create an anchor node from the hitresult and assign it a name to remove it later
Add a TVTransformableNode to it and assign it a name to retrieve and manipulate it later
Determine the look direction of the horizontal stand mount 3D Model TV and set the worldRotation of the anchorNode to the new lookRotation. (NOTE*, I feel like the rotation should be applied to the TVNode, but it only seems to work when I apply it to the AnchorNode for whatever reason.) This camera position math also seems to help the vertical wall mount TV face outwards and anchor correctly. (I have reviewed the GLB models and I know they are properly anchored from the back on the wall model and from the bottom on the floor model)
I then limit the plane movement of the node to it's own respective plane type so that a floor model doesn't slide up to a wall and so that a wall model doesn't slide down to the floor.
That's about it. The horizontal placement works great, but the vertical placement is always randomized.
OnTapArPlane Code below:
private fun onARSurfaceTapped() {
setOnTapArPlaneListener { hitResult, plane, _ ->
var isHorizontal = false
val renderable = when (plane.type) {
Plane.Type.HORIZONTAL_UPWARD_FACING -> {
isHorizontal = true
standmountTVRenderable
}
Plane.Type.VERTICAL -> wallmountTVRenderable
else -> {
activity?.toast("Do you want it to fall on your head really?")
return#setOnTapArPlaneListener
}
}
lastSelectedPlaneOrientation = plane.type
removeActive3DTVModel()
val anchorNode = AnchorNode(hitResult.createAnchor())
anchorNode.name = TV_ANCHOR_NAME
anchorNode.setParent(arSceneView.scene)
val tvNode = TransformableNode(this.transformationSystem)
tvNode.scaleController.isEnabled = false
tvNode.setParent(anchorNode)
tvNode.name = TV_NODE_NAME
tvNode.select()
// Set orientation towards camera
// Ref: https://github.com/google-ar/sceneform-android-sdk/issues/379
val cameraPosition = arSceneView.scene.camera.worldPosition
val tvPosition = anchorNode.worldPosition
val direction = Vector3.subtract(cameraPosition, tvPosition)
if(isHorizontal) {
tvNode.translationController.allowedPlaneTypes.clear()
tvNode.translationController.allowedPlaneTypes.add(Plane.Type.HORIZONTAL_UPWARD_FACING)
} else {
tvNode.translationController.allowedPlaneTypes.clear()
tvNode.translationController.allowedPlaneTypes.add(Plane.Type.VERTICAL)
}
val lookRotation = Quaternion.lookRotation(direction, Vector3.up())
anchorNode.worldRotation = lookRotation
tvNode.renderable = renderable
addVideoTo3DModel(renderable)
}
}
Ignore the addvideoTo3dModel call, as that works fine, and I commented it
out just to ensure it doesn't play a role.
Things I've tried.
Extracting Translation without Rotation like described here interestingly enough, it does cause the TV to appear level with the floor each time, but then the TV is always mounted as if the anchor is at the base instead of the center back. So it's bad.
I've tried reviewing various posts and translating Unity or ARCore stuff directly into Sceneform, but failed to get anything to affect the outcome. example
I've tried creating the anchor from the plane and the pose as indicated in this answer with no luck
I've reviewed this link but never found anything useful
I've tracked this issue and tried solutions recommended by people in the thread, but no luck
The last thing I tried, and this is a bit embarrassing lol. I opened all 256 tagged with "SceneForm" in Stack Overflow and reviewed EVERY SINGLE one of them for anything that would help.
So I've exhausted the internet. All I have left is to ask the community and of course send help to SceneForm team at Android which I'm also going to do.
My best guess is that I need to do the Quaternion.axisRotation(Vector3, Float), but everything I have guessed at or trialed and errored has not worked. I assume I need to set the localRotation using worldPostion values for xyz of the phone maybe to help identify gravity. I really just don't know anymore lol.
I know Sceneform is pretty new and the documentation is HORRIBLE and may as well not exist with the lack of content or doc headers on it. The developers must really not want people to use it yet I'm guessing :(.
Last thing I'll say, is everything is working perfectly in my current implementation with the exception of the rotated vertical placement. Just to avoid rabbit trails on this discussion, I'm not having any other issues.
Oh and one last clue that I've noticed.
The TV almost seems to pivot around the center of the vertical plane, based on where I tap, the bottom almost seems to point towards the arbitrary center of the plane, if that helps anyone figure it out.
Oh and yes, I know my textures are missing from the GLBs, I packaged them incorrectly and intend to fix it later.
Screenshots attached.
Well I finally got it. Took awhile and some serious trial and error of rotating every node, axis, angle, and rotation before I finally got it to place nicely. So I'll share my results in case anyone else needs this as well.
End Result looked like:
Of course it is mildly subjective to how you held the phone and it's understanding of the surroundings, but it's always pretty darn close to level now without fail in both landscape and portrait testing that I have done.
So here's what I've learned.
Setting the worldRotation on the anchorNode will help keep the 3DModel facing towards the cameraview using a little subtraction.
val cameraPosition = arSceneView.scene.camera.worldPosition
val tvPosition = anchorNode.worldPosition
val direction = Vector3.subtract(cameraPosition, tvPosition)
val lookRotation = Quaternion.lookRotation(direction, Vector3.up())
anchorNode.worldRotation = lookRotation
However, this did not fix the orientation issue on the vertical placement. I found that if i did an X Rotation of 90 degress on the look rotation it worked everytime. It may differ based on your 3d model, but my anchor is center middle back, so I'm not sure how it determine which way was up. However, I noticed whenever I would set a worldRotation on the tvNode it would place the TV level, but would be leaning forward 90 degress. So after playing with the various rotations, I finally got the answer.
val tvRotation = Quaternion.axisAngle(Vector3(1f, 0f, 0f), 90f)
tvNode.worldRotation = tvRotation
That fixed up my problem. So The end Result of the onSurfaceTap and placement was this:
setOnTapArPlaneListener { hitResult, plane, _ ->
var isHorizontal = false
val renderable = when (plane.type) {
Plane.Type.HORIZONTAL_UPWARD_FACING -> {
isHorizontal = true
standmountTVRenderable
}
Plane.Type.VERTICAL -> wallmountTVRenderable
else -> {
activity?.toast("Do you want it to fall on your head really?")
return#setOnTapArPlaneListener
}
}
lastSelectedPlaneOrientation = plane.type
removeActive3DTVModel()
val anchorNode = AnchorNode(hitResult.createAnchor())
anchorNode.name = TV_ANCHOR_NAME
anchorNode.setParent(arSceneView.scene)
val tvNode = TransformableNode(this.transformationSystem)
tvNode.scaleController.isEnabled = false //disable scaling
tvNode.setParent(anchorNode)
tvNode.name = TV_NODE_NAME
tvNode.select()
val cameraPosition = arSceneView.scene.camera.worldPosition
val tvPosition = anchorNode.worldPosition
val direction = Vector3.subtract(cameraPosition, tvPosition)
//restrict moving node to active surface orientation
if (isHorizontal) {
tvNode.translationController.allowedPlaneTypes.clear()
tvNode.translationController.allowedPlaneTypes.add(Plane.Type.HORIZONTAL_UPWARD_FACING)
} else {
tvNode.translationController.allowedPlaneTypes.clear()
tvNode.translationController.allowedPlaneTypes.add(Plane.Type.VERTICAL)
//x 90 degree rotation to flat mount TV vertical with gravity
val tvRotation = Quaternion.axisAngle(Vector3(1f, 0f, 0f), 90f)
tvNode.worldRotation = tvRotation
}
//set anchor nodes world rotation to face the camera view and up
val lookRotation = Quaternion.lookRotation(direction, Vector3.up())
anchorNode.worldRotation = lookRotation
tvNode.renderable = renderable
viewModel.updateStateTo(AriaMainViewModel.ARFlowState.REPOSITIONING)
}
This has been tested pretty thoroughly without issues so far in portrait and landscape. I still have other issues with Sceneform, such as the dots only showing up about half the time even when there is a valid surface, and of course vertical detection on a mono color wall is not possible with the current SDK without a picture on the wall or something to distinguish the wall.
Also performing screenshots is not good as it doesn't include the 3D Model so that required custom Pixel Copy work and my screenshots are a bit slow, but at least they work, no thanks to the SDK.
So they have a long ways to go and it's frustrating to blaze the trail with their product and lack of documentation and definitely lack of responsiveness to customer serivce as well as GitHub logged issues, but hey at least I got it, and I hope this helps someone else.
Happy Coding!
I'm developing an app using ARCore. In this app I need:
1) to place an object always staying at the same pose in world space. Following the "Working with Anchors" article recommendations (https://developers.google.com/ar/develop/developer-guides/anchors) I'm attaching an anchor to the ARCore Session. That is, I'm not using Trackables at all.
2) as a secondary requisite the object must be placed automatically, that is, without tapping on the screen.
I've managed to solve the two requisites, having now the object "floating" in front of me, this way (very common code):
private void onSceneUpdate(FrameTime frameTime) {
...
if (_renderable!=null && _anchorNode==null) {
float[] position = {0f,0f,-10f};
float[] rotation = {0,0,0,1};
//
Anchor anchor=_arFragment.getArSceneView().getSession().createAnchor(new Pose(position,rotation));
//
_anchorNode = new AnchorNode(anchor);
_anchorNode.setRenderable(_renderable);
_anchorNode.setParent(_arFragment.getArSceneView().getScene());
_anchorNode.setLocalScale(new Vector3(0.01f,0.01f,0.01f)); //cm -> m
...
}
As i want the object to be on the floor, I need to find out what the height of my physical (device) camera above the floor is, in order to subtract that number from the current object's Y coordinate:
float[] position = {0f,HERE_THE_VALUE_TO_SUBTRACT_FROM_CAMERA_HEIGHT,-10f};
Certainly, it's an easy implementation when plane Trackables are used but here I have the requisites above-named.
I've managed to solve the two requisites, having now the object "floating" in front of me.
As i want the object to be on the floor, I need to find out what the height of my physical (device) camera above the floor is, in order to subtract that number from the current object's Y coordinate.
Trying with different camera/device Pose retrieval APIs, namely: frame.getAndroidSensorPose(), frame.getCamera().getPose() and frame.getCamera().getDisplayOrientedPose() are showing not valid values.
Thanks for your advice.
P.S.:Certainly, it's an easy implementation when plane Trackables are used but here I have other requisites, as above-named.
EDIT after Michael Dougan comments.
Well I think we have then two ways to achieve the requisites:
1) leave the code w/o changes, keeping on using the Session Anchor, asking the user to launch the app and the to follow a "calibration process" which the device on the floor. As this is a professional use app, and not a consumer one, we think it is perfectly suitable;
2) go ahead with the good-and-old Trackables, by means of the usual floor as an anchor, including the pose of that anchor in the calculation of the position of the 3D model.
I'm having a hard time to pan a view of a gameObject in Unity3d. I'm new to scripting and I'm trying to develop an AR (Augmented Reality) application for Android.
I need to have a gameObject (e.g. a model of a floor), from the normal top down view, rendered to a "pseudo" iso view, inclined to 45 degrees. As the gameObject is inclined, I need to have a panning function on its view, utilizing four (4) buttons (for left, right, forward(or up), backward(or down)).
The problem is that, I cannot use any of the known panning script snippets around the forum, as the AR camera has to be static in the scene.
Need to mention that, I need the panning function to be active only at the isometric view, (which I already compute with another script), not on top down view. So there must be no problem with the inclination of the axes of the gameObject, right?
Following, are two mockup images of the states, the gameObject (model floor) is rendered and the script code (from Unity reference), that I'm currently using, which is not very much functional for my needs.
Here is the code snippet, for left movement of the gameObject. I use the same with a change in -, +speed values, for the other movements, but I get it only move up, down, not forth, backwards:
#pragma strict
// The target gameObject.
var target: Transform;
// Speed in units per sec.
var speedLeft: float = -10;
private static var isPanLeft = false;
function FixedUpdate()
{
if(isPanLeft == true)
{
// The step size is equal to speed times frame time.
var step = speedLeft * Time.deltaTime;
// Move model position a step closer to the target.
transform.position = Vector3.MoveTowards(transform.position, target.position, step);
}
}
static function doPanLeft()
{
isPanLeft = !isPanLeft;
}
It would be great, if someone be kind enough to take a look at this post, and make a suggestion on how this functionality can be coded the easiest way, as I'm a newbie?
Furthermore, if a sample code or a tutorial can be provided, it will be appreciated, as I can learn from this, a lot. Thank you all in advance for your time and answers.
If i understand correctly you have a camera with some fixed rotation and position and you have a object you want to move up/down/left/right from the cameras perspective
To rotated an object to a set of angles you simply do
transform.rotation = Quaternion.Euler(45, 45, 45);
Then to move it you use the cameras up/right/forward in worldspace like this to move it up and left
transform.position += camera.transform.up;
transform.position -= camera.transform.right;
If you only have one camera in your scene you can access its transform by Camera.main.transform
An example of how to move it when someone presses the left arrow
if(Input.GetKeyDown(KeyCode.LeftArrow))
{
transform.position -= camera.transform.right;
}
I am attempting to translate an object depending on the touch position of the user.
The problem with it is, when I test it out, the object disappears as soon as I drag my finger on my phone screen. I am not entirely sure what's going on with it?
If somebody can guide me that would be great :)
Thanks
This is the Code:
#pragma strict
function Update () {
for (var touch : Touch in Input.touches)
{
if (touch.phase == TouchPhase.Moved) {
transform.Translate(0, touch.position.y, 0);
}
}
}
The problem is that you're moving the object by touch.position.y. This isn't a point inworld, it's a point on the touch screen. What you'll want to do is probably Camera.main.ScreenToWorldPoint(touch.position).y which will give you the position inworld for wherever you've touched.
Of course, Translate takes a vector indicating distance, not final destination, so simply sticking the above in it still won't work as you're intending.
Instead maybe try this:
Vector3 EndPos = Camera.main.ScreenToWorldPoint(touch.position);
float speed = 1f;
transform.position = Vector3.Lerp(transform.position, EndPos, speed * Time.deltaTime);
which should move the object towards your finger while at the same time keeping its movements smooth looking.
You'll want to ask this question at Unity's dedicated Questions/Answers site: http://answers.unity3d.com/index.html
There are very few people that come to stackoverflow for Unity specific question, unless they relate to Android/iOS specific features.
As for the cause of your problem, touch.position.y is define in screen space (pixels) where as transform.Translate is expecting world units (meters). You can convert between the two using the Camera.ScreenToWorldPoint() method, then creating a vector out of the camera position and screen world point. With this vector you can then either intersect some geometry in the scene or simply use it as a point in front of the camera.
http://docs.unity3d.com/Documentation/ScriptReference/Camera.ScreenToWorldPoint.html
I looking for some advices about recognition of three handwritten shapes - circles, diamonds and rectangles. I tried diffrent aproaches but they failed so maybe you could point me in another, better direction.
What I tried:
1) Simple algorithm based on dot product between points of handwritten shape and ideal shape. It works not so bad at recognition of rectangle, but failed on circles and diamonds. The problem is that dot product of the circle and diamond is quite similiar even for ideal shapes.
2) Same aproach but using Dynamic Time Warping as measure of simililarity. Similiar problems.
3) Neural networks. I tried few aproaches - giving points data to neural networks (Feedforward and Kohonen) or giving rasterized image. For Kohonen it allways classified all the data (event the sample used to train) into the same category. Feedforward with points was better (but on the same level as aproach 1 and 2) and with rasterized image it was very slow (I needs at least size^2 input neurons and for small sized of raster circle is indistinguishable even for me ;) ) and also without success. I think is because all of this shapes are closed figures? I am not big specialist of ANN (had 1 semester course of them) so maybe I am using them wrong?
4) Saving the shape as Freeman Chain Code and using some algorithms for computing similarity. I though that in FCC the shapes will be realy diffrent from each other. No success here (but I havent explorer this path very deeply).
I am building app for Android with this but I think the language is irrelevant here.
Here's some working code for a shape classifier. http://jsfiddle.net/R3ns3/ I pulled the threshold numbers (*Threshold variables in the code) out of the ether, so of course they can be tweaked for better results.
I use the bounding box, average point in a sub-section, angle between points, polar angle from bounding box center, and corner recognition. It can classify hand drawn rectangles, diamonds, and circles. The code records points while the mouse button is down and tries to classify when you stop drawing.
HTML
<canvas id="draw" width="300" height="300" style="position:absolute; top:0px; left:0p; margin:0; padding:0; width:300px; height:300px; border:2px solid blue;"></canvas>
JS
var state = {
width: 300,
height: 300,
pointRadius: 2,
cornerThreshold: 125,
circleThreshold: 145,
rectangleThreshold: 45,
diamondThreshold: 135,
canvas: document.getElementById("draw"),
ctx: document.getElementById("draw").getContext("2d"),
drawing: false,
points: [],
getCorners: function(angles, pts) {
var list = pts || this.points;
var corners = [];
for(var i=0; i<angles.length; i++) {
if(angles[i] <= this.cornerThreshold) {
corners.push(list[(i + 1) % list.length]);
}
}
return corners;
},
draw: function(color, pts) {
var list = pts||this.points;
this.ctx.fillStyle = color;
for(var i=0; i<list.length; i++) {
this.ctx.beginPath();
this.ctx.arc(list[i].x, list[i].y, this.pointRadius, 0, Math.PI * 2, false);
this.ctx.fill();
}
},
classify: function() {
// get bounding box
var left = this.width, right = 0,
top = this.height, bottom = 0;
for(var i=0; i<this.points.length; i++) {
var pt = this.points[i];
if(left > pt.x) left = pt.x;
if(right < pt.x) right = pt.x;
if(top > pt.y) top = pt.y;
if(bottom < pt.y) bottom = pt.y;
}
var center = {x: (left+right)/2, y: (top+bottom)/2};
this.draw("#00f", [
{x: left, y: top},
{x: right, y: top},
{x: left, y: bottom},
{x: right, y: bottom},
]);
// find average point in each sector (9 sectors)
var sects = [
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0},
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0},
{x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0}
];
var x3 = (right + (1/(right-left)) - left) / 3;
var y3 = (bottom + (1/(bottom-top)) - top) / 3;
for(var i=0; i<this.points.length; i++) {
var pt = this.points[i];
var sx = Math.floor((pt.x - left) / x3);
var sy = Math.floor((pt.y - top) / y3);
var idx = sy * 3 + sx;
sects[idx].x += pt.x;
sects[idx].y += pt.y;
sects[idx].c ++;
if(sx == 1 && sy == 1) {
return "UNKNOWN";
}
}
// get the significant points (clockwise)
var sigPts = [];
var clk = [0, 1, 2, 5, 8, 7, 6, 3]
for(var i=0; i<clk.length; i++) {
var pt = sects[clk[i]];
if(pt.c > 0) {
sigPts.push({x: pt.x / pt.c, y: pt.y / pt.c});
} else {
return "UNKNOWN";
}
}
this.draw("#0f0", sigPts);
// find angle between consecutive 3 points
var angles = [];
for(var i=0; i<sigPts.length; i++) {
var a = sigPts[i],
b = sigPts[(i + 1) % sigPts.length],
c = sigPts[(i + 2) % sigPts.length],
ab = Math.sqrt(Math.pow(b.x-a.x,2)+Math.pow(b.y-a.y,2)),
bc = Math.sqrt(Math.pow(b.x-c.x,2)+ Math.pow(b.y-c.y,2)),
ac = Math.sqrt(Math.pow(c.x-a.x,2)+ Math.pow(c.y-a.y,2)),
deg = Math.floor(Math.acos((bc*bc+ab*ab-ac*ac)/(2*bc*ab)) * 180 / Math.PI);
angles.push(deg);
}
console.log(angles);
var corners = this.getCorners(angles, sigPts);
// get polar angle of corners
for(var i=0; i<corners.length; i++) {
corners[i].t = Math.floor(Math.atan2(corners[i].y - center.y, corners[i].x - center.x) * 180 / Math.PI);
}
console.log(corners);
// whats the shape ?
if(corners.length <= 1) { // circle
return "CIRCLE";
} else if(corners.length == 2) { // circle || diamond
// difference of polar angles
var diff = Math.abs((corners[0].t - corners[1].t + 180) % 360 - 180);
console.log(diff);
if(diff <= this.circleThreshold) {
return "CIRCLE";
} else {
return "DIAMOND";
}
} else if(corners.length == 4) { // rectangle || diamond
// sum of polar angles of corners
var sum = Math.abs(corners[0].t + corners[1].t + corners[2].t + corners[3].t);
console.log(sum);
if(sum <= this.rectangleThreshold) {
return "RECTANGLE";
} else if(sum >= this.diamondThreshold) {
return "DIAMOND";
} else {
return "UNKNOWN";
}
} else {
alert("draw neater please");
return "UNKNOWN";
}
}
};
state.canvas.addEventListener("mousedown", (function(e) {
if(!this.drawing) {
this.ctx.clearRect(0, 0, 300, 300);
this.points = [];
this.drawing = true;
console.log("drawing start");
}
}).bind(state), false);
state.canvas.addEventListener("mouseup", (function(e) {
this.drawing = false;
console.log("drawing stop");
this.draw("#f00");
alert(this.classify());
}).bind(state), false);
state.canvas.addEventListener("mousemove", (function(e) {
if(this.drawing) {
var x = e.pageX, y = e.pageY;
this.points.push({"x": x, "y": y});
this.ctx.fillStyle = "#000";
this.ctx.fillRect(x-2, y-2, 4, 4);
}
}).bind(state), false);
Given the possible variation in handwritten inputs I would suggest that a neural network approach is the way to go; you will find it difficult or impossible to accurately model these classes by hand. LastCoder's attempt works to a degree, but it does not cope with much variation or have promise for high accuracy if worked on further - this kind of hand-engineered approach was abandoned a very long time ago.
State-of-the-art results in handwritten character classification these days is typically achieved with convolutional neural networks (CNNs). Given that you have only 3 classes the problem should be easier than digit or character classification, although from experience with the MNIST handwritten digit dataset, I expect that your circles, squares and diamonds may occasionally end up being difficult for even humans to distinguish.
So, if it were up to me I would use a CNN. I would input binary images taken from the drawing area to the first layer of the network. These may require some preprocessing. If the drawn shapes cover a very small area of the input space you may benefit from bulking them up (i.e. increasing line thickness) so as to make the shapes more invariant to small differences. It may also be beneficial to centre the shape in the image, although the pooling step might alleviate the need for this.
I would also point out that the more training data the better. One is often faced with a trade-off between increasing the size of one's dataset and improving one's model. Synthesising more examples (e.g. by skewing, rotating, shifting, stretching, etc) or spending a few hours drawing shapes may provide more of a benefit than you could get in the same time attempting to improve your model.
Good luck with your app!
A linear Hough transform of the square or the diamond ought to be easy to recognize. They will both produce four point masses. The square's will be in pairs at zero and 90 degrees with the same y-coordinates for both pairs; in other words, a rectangle. The diamond will be at two other angles corresponding to how skinny the diamond is, e.g. 45 and 135 or else 60 and 120.
For the circle you need a circular Hough transform, and it will produce a single bright point cluster in 3d (x,y,r) Hough space.
Both linear and circular Hough transforms are implemented in OpenCV, and it's possible to run OpenCV on Android. These implementations include thresholding to identify lines and circles. See pg. 329 and pg. 331 of the documentation here.
If you are not familiar with Hough transforms, the Wikipedia page is not bad.
Another algorithm you may find interesting and perhaps useful is given in this paper about polygon similarity. I implemented it many years ago, and it's still around here. If you can convert the figures to loops of vectors, this algorithm could compare them against patterns, and the similarity metric would show goodness of match. The algorithm ignores rotational orientation, so if your definition of square and diamond is with respect to the axes of the drawing surface, you will have to modify the algorithm a bit to differentiate these cases.
What you have here is a fairly standard clasification task, in an arguably vision domain.
You could do this several ways, but the best way isn't known, and can sometimes depend on fine details of the problem.
So, this isn't an answer, per se, but there is a website - Kaggle.com that runs competition for classifications. One of the sample/experiemental tasks they list is reading single hand written numeric digits. That is close enough to this problem, that the same methods are almost certainly going to apply fairly well.
I suggest you go to https://www.kaggle.com/c/digit-recognizer and look around.
But if that is too vague, I can tell you from my reading of it, and playing with that problem space, that Random Forests are a better basic starting place than Neural networks.
In this case (your 3 simple objects) you could try RanSaC-fitting for ellipse (getting the circle) and lines (getting the sides of the rectangle or diamond) - on each connected object if there are several objects to classify at the same time. Based on the actual setting (expected size, etc.) the RanSaC-parameters (how close must a point be to count as voter, how many voters you need at minimun) must be tuned. When you have found a line with RanSaC-fitting, remove the points "close" to it and go for the next line. The angles of the lines should make a distinction between diamand and rectangle easy.
A very simple approach optimized for classifying exactly these 3 objects could be the following:
compute the center of gravity of an object to classify
then compute the distances of the center to the object points as a function on the angle (from 0 to 2 pi).
classify the resulting graph based on the smoothness and/or variance and the position and height of the local maxima and minima (maybe after smoothing the graph).
I propose a way to do it in following steps : -
Take convex hull of the image (consider the shapes being convex)
divide into segments using clustering algorithms
Try to fit a curves or straight line to it and measure & threshold using training set which can be used for classifications
For your application try to divide into 4 clusters .
once you classify clusters as line or curves you can use the info to derive whether curve is circle,rectangle or diamond
I think the answers that are already in place are good, but perhaps a better way of thinking about it is that you should try to break the problem into meaningful pieces.
If possible avoid the problem entirely. For instance if you are recognizing gestures, just analyze the gestures in real time. With gestures you can provide feedback to the user as to how your program interpreted their gesture and the user will change what they are doing appropriately.
Clean up the image in question. Before you do anything come up with an algorithm to try to select what the correct thing is you are trying to analyze. Also use an appropriate filter (convolution perhaps) to remove image artifacts before you begin the process.
Once you have figured out what the thing is you are going to analyze then analyze it and return a score, one for circle, one for noise, one for line, and the last for pyramid.
Repeat this step with the next viable candidate until you come up with the best candidate that is not noise.
I suspect you will find that you don't need a complicated algorithm to find circle, line, pyramid but that it is more so about structuring your code appropriately.
If I was you I'll use already available Image Processing libraries like "AForge".
Take A look at this sample article:
http://www.aforgenet.com/articles/shape_checker
I have a jar on github that can help if you are willing to unpack it and obey the apache license. You can try to recreate it in any other language as well.
Its an edge detector. The best step from there could be to:
find the corners (median of 90 degrees)
find mean median and maximum radius
find skew/angle from horizontal
have a decision agent decide what the shape is
Play around with it and find what you want.
My jar is open to the public at this address. It is not yet production ready but can help.
Just thought I could help. If anyone wants to be a part of the project, please do.
I did this recently with identifying circles (bone centers) in medical images.
Note: Steps 1-2 are if you are grabbing from an image.
Psuedo Code Steps
Step 1. Highlight the Edges
edges = edge_map(of the source image) (using edge detector(s))
(laymens: show the lines/edges--make them searchable)
Step 2. Trace each unique edge
I would (use a nearest neighbor search 9x9 or 25x25) to identify / follow / trace each edge, collecting each point into the list (they become neighbors), and taking note of the gradient at each point.
This step produces: a set of edges.
(where one edge/curve/line = list of [point_gradient_data_structure]s
(laymens: Collect a set of points along the edge in the image)
Step 3. Analyze Each Edge('s points and gradient data)
For each edge,
if the gradient similar for a given region/set of neighbors (a run of points along an edge), then we have a straight line.
If the gradient is changing gradually, we have a curve.
Each region/run of points that is a straight line or a curve, has a mean (center) and other gradient statistics.
Step 4. Detect Objects
We can use the summary information from Step 3 to build conclusions about diamonds, circles, or squares. (i.e. 4 straight lines, that have end points near each other with proper gradients is a diamond or square. One (or more) curves with sufficient points/gradients (with a common focal point) makes a complete circle).
Note: Using an image pyramid can improve algorithm performance, both in terms of results and speed.
This technique (Steps 1-4) would get the job done for well defined shapes, and also could detect shapes that are drawn less than perfectly, and could handle slightly disconnected lines (if needed).
Note: With some machine learning techniques (mentioned by other posters), it could be helpful/important to have good "classifiers" to basically break the problem down into smaller parts/components, so then a decider further down the chain could use to better understand/"see" the objects. I think machine learning might be a little heavy-handed for this question, but still could produce reasonable results. PCA(face detection) could potentially work too.