Project Tango: Converting between coordinate systems and merging point clouds

Project Tango: Converting between coordinate systems and merging point clouds - android

I am trying to convert point clouds sampled and stored in XYZij data (which, according to the document, stores data in camera space) into a world coordinate system so that they can be merged. The frame pair I use for the Tango listener has COORDINATE_FRAME_START_OF_SERVICE as the base frame and COORDINATE_FRAME_DEVICE as the target frame.
This is the way I implement the transformation:
Retrieve the rotation quaternion from TangoPoseData.getRotationAsFloats() as q_r, and the point position from XYZij as p.
Apply the following rotation, where q_mult is a helper method computing the Hamilton product of two quaternions (I have verified this method against another math library):
p_transformed = q_mult(q_mult(q_r, p), q_r_conjugated);
Add the translate retrieved from TangoPoseData.getTranslationAsFloats() to p_transformed.
But eventually, points at p_transformed always seem to end up in clutter of partly overlapped point clouds instead of an aligned, merged point cloud.
Am I missing anything here? Is there a conceptual mistake in the transformation?
Thanks in advance.

Ken & Vincenzo, thanks for the reply.
I somehow get better results by performing ICP registration using CloudCompare on individual point clouds after they are transformed into world coordinates using pose data alone. Below is a sample result from ~30 scans of a computer desk. Points on the farther end are still a bit off, but with carefully tuned parameters this might be improved. Also CloudCompare's command line interface makes it suitable for batch processing.
Besides the inevitable integration error that needs to be corrected, a mistake I made earlier was wrongly taking the camera space frame (the camera on the device), which is described here in the documentation, to be the same as the OpenGL camera frame, which is the same as the device frame as described here. But they are not.
Also, moving the camera slowly to get more overlap between two adjacent frames also helps registration. And a good visible lighting setup of the scene is important, since besides the motion sensors, Tango also relies on the fish eye camera on its back for motion tracking.
Hope the tips also work for more general cases other than mine.

There are two different "standard" forms of the quaternion notation. One has the rotation angle first, i.e. x i j k, and one has the rotation angle last, i.e. x y z w. The Tango API docs list the TangoPoseData::orientation as x y z w. The Wikipedia page on quaternions lists them as x i j k. You might want to check what notation is assumed in your product method.

Where is your pose data coming from? Are you getting the most recent pose after you are in the callback for the point cloud data or are you asking for the pose that corresponds to the timestamp in the XYZij struct? You should be asking for the pose at time "timestamp" from the XYZij struct.
I tried it, it does not work.
I tried to queue the pose and get the nearest one to the XYZij.
Look at the blue wall
The real wall

we from roomplan.de created an opensource sample how to use pcl in project tango apps. It records pointclouds and transforms them into a common coordinate frame (the StartOf Service Frame). You can find the sample code here: https://github.com/roomplan/tango-examples-java/tree/master/PointCloudJava_with_PCL the specific funtion is in jni/jni_part.cpp function: Java_com_tangoproject_experiments_javapointcloud_PointCloudActivity_saveRotatedPointCloud
If you want the sample to compile, you need to clone the complete folder and integrate pcl into your project. A solution how this can be done can be found on our website.
sample pictures can be viewed at the demo app in the playstore. (Cant post them here yet) https://play.google.com/store/apps/details?id=com.tangoproject.experiments.javapointcloud&hl=en

Related

What's the difference between an Anchor and a Pose in arcore? [duplicate]

I'm trying to read and make sense of Google ARCore's domain model, particularly the Android SDK packages. Currently this SDK is in "preview" mode and so there are no tutorials, blogs, articles, etc. available on understanding how to use this API. Even Google itself suggests just reading the source code, source code comments and Javadocs to understand how to use the API. Problem is: if you're not already a computer vision expert, the domain model will feel a little alien & unfamiliar to you.
Specifically I'm interested in understanding the fundamental differences between, and proper usages of, the following classes:
Frame
Anchor
Pose
PointCloud
According to Anchor's javadoc:
"Describes a fixed location and orientation in the real world. To stay at a fixed location in physical space, the numerical description of this position will update as ARCore's understanding of the space improves. Use getPose() to get the current numerical location of this anchor. This location may change any time update() is called, but will never spontaneously change."
So Anchors have a Pose. Sounds like you "drop an Anchor" onto something thats visible in the camera, and then ARCore tracks that Anchor and constantly updates its Pose to reflect the nature of its onscreen coordinates maybe?
And from Pose's javadoc:
"Represents an immutable rigid transformation from one coordinate frame to another. As provided from all ARCore APIs, Poses always describe the transformation from object's local coordinate frame to the world coordinate frame (see below)...These changes mean that every frame should be considered to be in a completely unique world coordinate frame."
So it sounds like a Pose is something that is only unique to the "current frame" of the camera and that each time the frame is updated, all poses for all anchors are recalculated maybe? If not, then what's the relationship between an Anchor, its Pose, the current frame and the world coordinate frame? And what's a Pose really, anyways? Is a "Pose" just a way of storing matrix/point data so that you can convert an Anchor from the current frame to the world frame? Or something else?
Finally, I see a strong correlation between Frames, Poses and Anchors, but then there's PointCloud. The only class I can see inside com.google.ar.core that uses these is the Frame. PointClouds appear to be (x,y,z)-coordinates with a 4th property representing ARCore's "confidence" that the x/y/z components are actually correct. So if an Anchor has a Pose, I would have imagined that a Pose would also have a PointCloud representing the Anchor's coordinates & confidence in those coordinates. But Pose does not have a PointCloud, and so I must be completely misunderstanding the concepts that these two classes model.
The question
I've posed several different questions above, but they all boil down to a single, concise, answerable question:
What is the difference in the concepts behind Frame, Anchor, Pose and PointCloud and when do you use each of them (and for what purposes)?

A Pose is a structured transformation. It is a fixed numerical transformation from one coordinate system (typically object local) to another (typically world).
An Anchor represents a physically fixed location in the world. It's getPose() will update as the understanding of the world changes. For example, imagine you have a building with a hallway around the outside. If you walk all the way around that hallway, sensor drift results in you not winding up at the same coordinates you started at. However, ARCore can detect (using visual features) that it is in the same space it started it. When this happens, it distorts the world so that your current location and original location line up. As part of this distortion, the location of anchors will be adjusted as well so that they stay in the same physical place.
Because of this distortion, a Pose relative to the world should be considered valid only for the duration of the frame during which it was returned. As soon as you call update() the next time, the world may have reshaped at that pose could be useless. If you need to keep a location longer than a frame, create an Anchor. Just make sure to removeAnchors() anchors that you're no longer using, as there is ongoing cost for each live anchor.
A Frame captures the current state at an instant and changes between two calls to update().
PointClouds are sets of 3D visual feature points detected in the world. They are in their own local coordinate system, which can be accessed from Frame.getPointCloudPose(). Developers looking to have better spatial understanding than the plane detection provides can try using the point clouds to learn more about the structure of the 3D world.
Does that help?

Using the following link you can find and answer about Frame, Anchor and Pose:
ARCore – Session, Frame, Camera and Pose.
Additionally, here's an info on What a Point Cloud is:
Point Cloud is a visual cloud of points (of yellow color, usually) in World Space which represent a reliable positions for dots for 3D tracking on a real-world objects. Point Cloud looks like this:
And here's what Google says about Point Cloud:
PointCloud contains a set of observed 3D points and confidence values. This class implements Closeable and usually should be used in a Java try-with-resources or Kotlin use block, for example:
To get a PointCloud use the following code:
Frame frame = session.update();
try (PointCloud pointCloud = frame.acquirePointCloud()) {
// Accessing point cloud data.......
}

Two Dimensional Vector from Accelerometer

I'm trying to make an Android application that uses a smartphone moved along on a flat surface (e.g. a desk) as a mouse. Since I want to emulate a mouse, I ignore the z-axis, and figure that the best way to utilize the accelerometer data would be to construct a two dimensional vector that I could then scale to the size of the screen.
I've read other answers on SO and I see that the integration method has a large error as t increases, but I'm not sure if this error is a factor considering the short duration and position change of mouse movements (How long is the average mouse movement? I'd assume less than 2 sec.).
How would I go about designing an algorithm that meets my needs? Is an integration-based algorithm sufficient?

Yes, an accelerometer data have high mistake, that would create a large errors if we'll try to get absolute coordinates out of them. But a mouse needs no absolute coordinates. Relative ones are absolutely enough. Use your integration, not a doubt in it.
"the integration method has a large error as t increases" - correct, but a user is really interested in the last movement only. So, it will work as a mouse, and it will be felt as a normal mouse. How good the mouse will be, is up to the concrete device and the task. I am not at all sure about serious gaming, for example. You will have to do your own survey about it. But it will do really a very bad tablet/pen simulator.
Be careful about ignoring the Z axis, for notice, even for placing a point on the map GPS uses all three coordinates - for better precision. Often movements will not have Z change equal to 0. And simply ignoring one of the coordinates, instead of recounting all three of them into two you really need, will cause greater mistakes. I am not sure you can allow it. And you simply needn't - it is NOT a heavy algorithm, devouring much time and battery. And for a user the possibility to move the device in the air could bring much convenience - not everybody wants to scratch his device against a table. So, COUNT two coordinates from three source ones, but not simply GET two of the source ones, ignoring the third.
The problem will be elsewhere. When you use mouse and an error collected, you can raise the mouse up and move it to another point and start from it anew. You should realize something similar, too, for your device will collect errors in time as well.

Object selection in opengl es

I'm new to 3D programming and have been playing around with OpenGL ES for Android for a little while now and I've seen some options of this questions beaning ray tracking/tracing and Object Picking and something about using the pixels to select 3D Objects. I'm trying to make something like a paint program with OpenGL ES for Android to where I can select a line from a cube and delete it or objects to be deleted or modified. Anyway, I'm unsure of where to start learning this I've tried Google and didn't really find anything helpful. Maybe if there's a video tutorial or a website that explains this better or any help to point me in the direction to go would be very grateful. Thank you so much in advanced.
Yes I know this is a possible duplicate Question.

I'm an iOS dev myself, but I recently implemented ray casting for my game, so I'll try to answer this in a platform agnostic way.
There are two steps to the ray-casting operation: firstly, you need to get the ray from the user's tap, and secondly, you need to test the triangles defining your model for intersections. Note that this requires you to still have them in memory or be able to recover them -- you can't just be keeping them in a vbo on the graphics card.
First, the conversion to world coordinates. Since you are no doubt using a projection matrix to get a 3-D perspective for your models, you need to unproject the point to get it in world coordinates. There are many libraries with this already implemented, such as glut's glunproject which I believe are available on Android. I believe that mathematically this amounts to taking the inverse of all the transformations which are currently acting on your models. Regardless, there are many implementations publicly available online you can copy from.
At this point, you are going to need a Z coordinate for the point you are trying to unproject. You actually want to unproject twice, once with a Z coord of 0 and once with a Z coord of 1. The vector which results from the z-Coord of 0 is the origin of the ray, and by subtracting this vector from your z-coord of 1 vector you will get the direction. Now you are ready to test for intersections of your model's polygons.
I have had success with the method presented in this paper (http://www.cs.virginia.edu/~gfx/Courses/2003/ImageSynthesis/papers/Acceleration/Fast%20MinimumStorage%20RayTriangle%20Intersection.pdf) for doing the actual intersection test. The algorithm is implemented in C at the end, but you should be able to convert it to Java with little trouble.

The selection features of OpenGL are not available with OpenGL ES, so you would have to build that yourself.
I recommend starting with OpenGL ES sample programs that are specific to Android. I think this article will help:
http://software.intel.com/en-us/articles/porting-opengl-games-to-android-on-intel-atom-processors-part-1

Drawing in 2D space using Accelerometer (gyroscope?)

I am trying to create an application that will track movement of the device in 2D space. After doing research online, all I could find that one way to do it is integrate linear acceleration twice but the error is horrible.
Are there any solutions to this problem? I would like to be able to move my phone up, which would cause a vertical line to be drawn on the screen, to scale of how far the phone was moved. Then if I move the phone to the left, horizontal line would be drawn - effectively allowing me to draw on the screen using movements of the phone.
Can this be done at all? If so, what direction should I take in the development? I don't know where to start...
EDIT: More about the project:
I am trying to make an exercise app that will track the movement of the leg/arm: for example, when you are doing stomach crunches and the phone is attached with an armstrap to your ankle.
The app would track repeated movements of the leg.

Unfortunately the accelerometers in these phones are nowhere near what you need to implement an inertial measurement unit. The big problem is since you are integrating twice an integration always comes with a constant integral(x,dx) = x^2/2 +c this constant is what makes this difficult. To make things worse you get it twice, once when integrating to get velocity and once to get position.
One method of fixing this that I have seen in commercial innertial measurement units is called a zero velocity null, this is where you use some other source of data to tell it when you have stopped the motion of the device so you can zero out the velocity. For example I saw a project put an inertial measurement unit on a shoe and it would zero the velocity whenever it detected the shoe being put on the ground which vastly improved the accuracy. Its possible that you could use a camera or something to determine this, however I have not seen it done. If you would like to start messing with this then you are an awesome person and I would love to hear how it turns out.
Edit: I should clarify that the constant I mention above is where the error accumulates. If you can zero velocity null it then you periodically drop the accumulated error from your stored current velocity. The error in position will still accumulate, however this would make it not drift when they are holding it relatively still which may make it passable for drawing.

I know no other way other than integrating the acceleration twice.
Moreover I think that it's not possible if you don't have knowledge about other sensors that might be in your device (for example on one of my devices I have 7 (seven) sensors related to various physical signals the device might be receiving).
Other than that remember that the sensor data is noisy and almost always must be pre-filtered. For example you can use geometric mean of last 10 samples. That should lower your error by providing a smoother input data to the integrating function.

Detect physical gesture with accelerometer

In an Android app I'm making, I would like to detect when a user is holding a phone in his hand, makes a gesture like he would when throwing a frissbee. I have seen a couple of apps implementing this, but I can't find any example code or tutorial on the web.
It would be great with some thoughts on how this could be done, and ofc.
It would be even better with some example code or link to a tutorial.

Accelerometer provides you with a stream of 3d vectors. In case your phone is help in hand, its direction is opposite of earth gravity pull and size is the same. (this way you can determine phone orientation)
If user lets if fall, vector value will go to 0 (the process as weighlessness on space station)
If user makes some gesture without throwing it, directon will shift, and amplitude will rise, then fall and then rise again (when user stops movement). To determine how it looks like, you can do some research by recording accelerometer data and performing desireg gestures.
Keep in mind, that accelerometer is pretty noisy - you will have to do some averaging over nearby values to get meaningful results.
I think that one workable approach to match gesture would be invariant moments (like Hu moments used to image recognition) - accelerometer vector over time defines 4 dimensional space, and you will need set of scaling / rotation invariant moments. Designing such set is not easy, but comptuing is not complicated.
After you got your moments, you may use standart techniques of matching vectors to clusters. ( see "moments" and "cluster" modules from our javaocr project: http://javaocr.svn.sourceforge.net/viewvc/javaocr/trunk/plugins/ )
PS: you may get away with just speed over time, which produces 2-Dimensional space and can be analysed with javaocr on the spot.

Not exactly what you are looking for:
Store orientation to an array - and compare
Tracking orientation works well. Perhaps you can do something similar with the accelerometer data (without any integration).
A similar question is Drawing in air with Android phone.
I am curious what other answers you will get.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.