Generate and export point cloud from Project Tango - android

After some weeks of waiting I finally have my Project Tango. My idea is to create an app that generates a point cloud of my room and exports this to .xyz data. I'll then use the .xyz file to show the point cloud in a browser! I started off by compiling and adjusting the point cloud example that's on Google's github.
Right now I use the onXyzIjAvailable(TangoXyzIjData tangoXyzIjData) to get a frame of x y and z values; the points. I then save these frames in a PCLManager in the form of Vector3. After I'm done scanning my room, I simple write all the Vector3 from the PCLManager to a .xyz file using:
OutputStream os = new FileOutputStream(file);
size = pointCloud.size();
for (int i = 0; i < size; i++) {
String row = String.valueOf(pointCloud.get(i).x) + " "
+ String.valueOf(pointCloud.get(i).y) + " "
+ String.valueOf(pointCloud.get(i).z) + "\r\n";
os.write(row.getBytes());
}
os.close();
Everything works fine, not compilation errors or crashes. The only thing that seems to be going wrong is the rotation or translation of the points in the cloud. When I view the point cloud everything is messed up; the area I scanned is not recognizable, though the amount of points is the same as recorded.
Could this have to do something with the fact that I don't use PoseData together with the XyzIjData? I'm kind of new to this subject and have a hard time understanding what the PoseData exactly does. Could someone explain it to me and help me fix my point cloud?

Yes, you have to use TangoPoseData.
I guess you are using TangoXyzIjData correctly; but the data you get this way is relative to where the device is and how the device is tilted when you take the shot.
Here's how i solved this:
I started from java_point_to_point_example. In this example they get the coords of 2 different points with 2 different coordinate system and then write those coordinates wrt the base Coordinate frame pair.
First of all you have to setup your exstrinsics, so you'll be able to perform all the transformations you'll need. To do that I call mExstrinsics = setupExtrinsics(mTango) function at the end of my setTangoListener() function. Here's the code (that you can find also in the example I linked above).
private DeviceExtrinsics setupExtrinsics(Tango mTango) {
//camera to IMU tranform
TangoCoordinateFramePair framePair = new TangoCoordinateFramePair();
framePair.baseFrame = TangoPoseData.COORDINATE_FRAME_IMU;
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_COLOR;
TangoPoseData imu_T_rgb = mTango.getPoseAtTime(0.0,framePair);
//IMU to device transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_DEVICE;
TangoPoseData imu_T_device = mTango.getPoseAtTime(0.0,framePair);
//IMU to depth transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_DEPTH;
TangoPoseData imu_T_depth = mTango.getPoseAtTime(0.0,framePair);
return new DeviceExtrinsics(imu_T_device,imu_T_rgb,imu_T_depth);
}
Then when you get the point Cloud you have to "normalize" it. Using your exstrinsics is pretty simple:
public ArrayList<Vector3> normalize(TangoXyzIjData cloud, TangoPoseData cameraPose, DeviceExtrinsics extrinsics) {
ArrayList<Vector3> normalizedCloud = new ArrayList<>();
TangoPoseData camera_T_imu = ScenePoseCalculator.matrixToTangoPose(extrinsics.getDeviceTDepthCamera());
while (cloud.xyz.hasRemaining()) {
Vector3 rotatedV = ScenePoseCalculator.getPointInEngineFrame(
new Vector3(cloud.xyz.get(),cloud.xyz.get(),cloud.xyz.get()),
camera_T_imu,
cameraPose
);
normalizedCloud.add(rotatedV);
}
return normalizedCloud;
}
This should be enough, now you have a point cloud wrt you base frame of reference.
If you overimpose two or more of this "normalized" cloud you can get the 3D representation of your room.
There is another way to do this with rotation matrix, explained here.
My solution is pretty slow (it takes around 700ms to the dev kit to normalize a cloud of ~3000 points), so it is not suitable for a real time application for 3D reconstruction.
Atm i'm trying to use Tango 3D Reconstruction Library in C using NDK and JNI. The library is well documented but it is very painful to set up your environment and start using JNI. (I'm stuck at the moment in fact).
Drifting
There still is a problem when I turn around with the device. It seems that the point cloud spreads out a lot.
I guess you are experiencing some drifting.
Drifting happens when you use Motion Tracking alone: it consist of a lot of very small error in estimating your Pose that all together cause a big error in your pose relative to the world. For instance if you take your tango device and you walk in a circle tracking your TangoPoseData and then you draw you trajectory in a spreadsheet or whatever you want you'll notice that the Tablet will never return at his starting point because he is drifting away.
Solution to that is using Area Learning.
If you have no clear ideas about this topic i'll suggest watching this talk from Google I/O 2016. It will cover lots of point and give you a nice introduction.
Using area learning is quite simple.
You have just to change your base frame of reference in TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION. In this way you tell your Tango to estimate his pose not wrt on where it was when you launched the app but wrt some fixed point in the area.
Here's my code:
private static final ArrayList<TangoCoordinateFramePair> FRAME_PAIRS =
new ArrayList<TangoCoordinateFramePair>();
{
FRAME_PAIRS.add(new TangoCoordinateFramePair(
TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION,
TangoPoseData.COORDINATE_FRAME_DEVICE
));
}
Now you can use this FRAME_PAIRS as usual.
Then you have to modify your TangoConfig in order to issue Tango to use Area Learning using the key TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION. Remember that when using TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION you CAN'T use learningmode and load ADF (area description file).
So you cant use:
TangoConfig.KEY_BOOLEAN_LEARNINGMODE
TangoConfig.KEY_STRING_AREADESCRIPTION
Here's how I initialize TangoConfig in my app:
TangoConfig config = tango.getConfig(TangoConfig.CONFIG_TYPE_DEFAULT);
//Turning depth sensor on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DEPTH, true);
//Turning motiontracking on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_MOTIONTRACKING,true);
//If tango gets stuck he tries to autorecover himself.
config.putBoolean(TangoConfig.KEY_BOOLEAN_AUTORECOVERY,true);
//Tango tries to store and remember places and rooms,
//this is used to reduce drifting.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION,true);
//Turns the color camera on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_COLORCAMERA, true);
Using this technique you'll get rid of those spreads.
PS
In the Talk i linked above, at around 22:35 they show you how to port your application to Area Learning. In their example they use TangoConfig.KEY_BOOLEAN_ENABLE_DRIFT_CORRECTION. This key does not exist anymore (at least in Java API). Use TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION instead.

Related

Google Tango - Getting pose data with IMU as base frame

I am working with Google Project Tango and I tried a basic example with getting pose data:
TangoCoordinateFramePair pair;
pair.base = TANGO_COORDINATE_FRAME_START_OF_SERVICE;
pair.target = TANGO_COORDINATE_FRAME_CAMERA_COLOR;
base = TANGO_SUPPORT_ENGINE_OPENGL;
target = TANGO_SUPPORT_ENGINE_OPENGL;
error = TangoSupport_getPoseAtTime(poseTimestamp, pair.base, pair.target, base, target, ROTATION_0, &pose);
This gives TANGO_SUCCESS.
However, if I only change base to this
pair.base = TANGO_COORDINATE_FRAME_IMU;
...I keep getting TANGO_INVALID.
I tried using C API and Unity SDK, and both have a same invalid result.
Why is that? Why can't I use TANGO_COORDINATE_FRAME_IMU?
I am trying to fix Camera offset as mentioned here:
Camera-Offset | Project Tango
but without any success...
TangoSupport_getPoseAtTime only works for getting a pose between a fixed coordinate frame and a moving coordinate frame. The TANGO_INVALID error results from the fact that TANGO_COORDINATE_FRAME_CAMERA_COLOR and TANGO_COORDINATE_FRAME_IMU are both moving coordinate frames.
In order to find the offset between TANGO_COORDINATE_FRAME_IMU and TANGO_COORDINATE_FRAME_CAMERA_COLOR (or between any pair of moving coordinate frames), you need to use TangoService_getPoseAtTime instead.
This code snippet should give you the transform you're looking for:
TangoCoordinateFramePair pair;
pair.base = TANGO_COORDINATE_FRAME_IMU;
pair.target = TANGO_COORDINATE_FRAME_CAMERA_COLOR;
TangoPoseData pose;
TangoErrorType result = TangoService_getPoseAtTime(0.0, pair, &pose);
Note also that since both of these coordinate frames are moving (i.e. in a fixed position with respect to the device, and each other) the pose resulting from this call will not change as the device moves.

Getting pose data for OpenGL Coordinate System using Google Tango with Unity

I am using Unity with Tango and I am having problems getting pose data.
Unity application with Tango Unity SDK is built for Android device, device gets pose data and it sends it to the computer where additional processing is done using OpenGL.
My question is, in which coordinate system is pose data returned since I can't define engine like with C API?
Unity handles geting pose data like this, and nothing additional could be sent:
#if UNITY_EDITOR
GetEmulatedPoseAtTime(poseData, timeStamp, framePair);
#else // ANDROID
int returnValue = API.TangoService_getPoseAtTime(timeStamp, framePair, poseData);
if (returnValue != Common.ErrorType.TANGO_SUCCESS)
{
Debug.Log(CLASS_NAME + ".GetPoseAtTime() Could not get pose at time : " + timeStamp);
}
#endif
Just to prove that my application with OpenGL works as it should, I've created Tango project using C API with the same idea (get pose data and send it):
TangoCoordinateFramePair pair;
pair.base = TANGO_COORDINATE_FRAME_START_OF_SERVICE;
pair.target = TANGO_COORDINATE_FRAME_DEVICE;
base = TANGO_SUPPORT_ENGINE_OPENGL;
target = TANGO_SUPPORT_ENGINE_OPENGL;
error = TangoSupport_getPoseAtTime(poseTimestamp, pair.base, pair.target, base, target, ROTATION_0, &pose);
... and this works.
I thought that data is maybe in Tango Coordinate System and I tried to convert pose data with C# equivalent functions to QuatTangoToGl and Vec3GlToTango form here, but still, it is not correct.
So, in which coordinate system is pose data in Unity SDK and is it possible to somehow define which engine I want?
I realized that I could expose function TangoSupport_getPoseAtTime in TangoSupport and add enums EngineType and RotationType (values are matched with C API).
So, I added this in TangoSupport.cs under TangoSupportAPI:
[DllImport(TANGO_SUPPORT_UNITY_DLL)]
public static extern int TangoSupport_getPoseAtTime(
double timestamp, TangoEnums.TangoCoordinateFrameType baseFrame, TangoEnums.TangoCoordinateFrameType targetFrame,
Common.EngineType baseEngine, Common.EngineType targetEngine, Common.RotationType rotation, [In, Out] TangoPoseData pose);
and added proper function in TangoSupport class.
Now I get poses that are set correctly in OpenGL project.
Without defining engine types, given poseData is for Tango Coordinate System.

Plane fitting with data from previous Tango session

I'm facing the following problem while developing Tango app and not sure whether I'm on the right track or not.
What I'm trying to achieve:
User takes a picture. In the background the app saves to persistent the current point cloud and pose.
The server is getting that image and doing some magic processing behind the scene and sends (x,y) coordinate back to the app(Async and unrelated to current Tango session).
Restart the app, start a new tango session and show a 3d object at (x,y) using the persist copy of the point cloud and pose.
I expect that I'll be able to use these parameters - (x,y), point cloud and Pose in the following algorithm and get a Pose, which is a Rajawali object that RajawaliRenderer knows how to render.
tango initialization is accoring to the following coordinate frame:
TANGO_WORLD_BASE_COORDINATE_FRAME = new TangoCoordinateFramePair(
TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION,
TangoPoseData.COORDINATE_FRAME_DEVICE
);
Plan Fit using intersection point -
private void convertByIntersectionPoint(float x, float y,
TangoPointCloudData tangoPointCloudData, TangoPoseData devicePose,
TangoPoseData colorTdepthPose) {
if (tangoPointCloudData != null) {
TangoSupport.IntersectionPointPlaneModelPair intersectionPointPlaneModelPair =
TangoSupport.fitPlaneModelNearPoint(tangoPointCloudData,
colorTdepthPose, x, y);
if (devicePose.statusCode == TangoPoseData.POSE_VALID) {
mRenderer.updateObjectPose(
intersectionPointPlaneModelPair.intersectionPoint,
intersectionPointPlaneModelPair.planeModel,
devicePose);
}
}
}
It throws TangoErrorException on TangoSupport.fitPlaneModelNearPoint.
To my understanding the fitPlaneModelNearPoint method should do pure algorithm that doesn't rely on current Tango session but I cannot be sure because I don't have its implementation.
Any help'd be much appreciated.
Okay it was totally my mistake.
There was a bug in serializing the point cloud.
Gson library do not know how to deserialize into subclass and always construct into a parent class - which in this case creates corrupted data

play-services-vision: How do I sync the Face Detection speed with the Camera Preview speed?

I have some code that allows me to detect faces in a live camera preview and draw a few GIFs over their landmarks using the play-services-vision library provided by Google.
It works well enough when the face is static, but when the face moves at a moderate speed, the face detector takes longer than the camera's framerate to detect the landmarks at the face's new position. I know it might have something to do with the bitmap draw speed, but I took steps to minimize the lag in them.
(Basically I get complaints that the GIFs' repositioning isn't 'smooth enough')
EDIT: I did try getting the coordinate detection code...
List<Landmark> landmarksList = face.getLandmarks();
for(int i = 0; i < landmarksList.size(); i++)
{
Landmark current = landmarksList.get(i);
//canvas.drawCircle(translateX(current.getPosition().x), translateY(current.getPosition().y), FACE_POSITION_RADIUS, mFacePositionPaint);
//canvas.drawCircle(current.getPosition().x, current.getPosition().y, FACE_POSITION_RADIUS, mFacePositionPaint);
if(current.getType() == Landmark.LEFT_EYE)
{
//Log.i("current_landmark", "l_eye");
leftEyeX = translateX(current.getPosition().x);
leftEyeY = translateY(current.getPosition().y);
}
if(current.getType() == Landmark.RIGHT_EYE)
{
//Log.i("current_landmark", "r_eye");
rightEyeX = translateX(current.getPosition().x);
rightEyeY = translateY(current.getPosition().y);
}
if(current.getType() == Landmark.NOSE_BASE)
{
//Log.i("current_landmark", "n_base");
noseBaseY = translateY(current.getPosition().y);
noseBaseX = translateX(current.getPosition().x);
}
if(current.getType() == Landmark.BOTTOM_MOUTH) {
botMouthY = translateY(current.getPosition().y);
botMouthX = translateX(current.getPosition().x);
//Log.i("current_landmark", "b_mouth "+translateX(current.getPosition().x)+" "+translateY(current.getPosition().y));
}
if(current.getType() == Landmark.LEFT_MOUTH) {
leftMouthY = translateY(current.getPosition().y);
leftMouthX = translateX(current.getPosition().x);
//Log.i("current_landmark", "l_mouth "+translateX(current.getPosition().x)+" "+translateY(current.getPosition().y));
}
if(current.getType() == Landmark.RIGHT_MOUTH) {
rightMouthY = translateY(current.getPosition().y);
rightMouthX = translateX(current.getPosition().x);
//Log.i("current_landmark", "l_mouth "+translateX(current.getPosition().x)+" "+translateY(current.getPosition().y));
}
}
eyeDistance = (float)Math.sqrt(Math.pow((double) Math.abs(rightEyeX - leftEyeX), 2) + Math.pow(Math.abs(rightEyeY - leftEyeY), 2));
eyeCenterX = (rightEyeX + leftEyeX) / 2;
eyeCenterY = (rightEyeY + leftEyeY) / 2;
noseToMouthDist = (float)Math.sqrt(Math.pow((double)Math.abs(leftMouthX - noseBaseX), 2) + Math.pow(Math.abs(leftMouthY - noseBaseY), 2));
...in a separate thread within the View draw method, but it just nets me a SIGSEGV error.
My questions:
Is syncing the Face Detector's processing speed with the Camera Preview framerate the right thing to do in this case, or is it the other way around, or is it some other way?
As the Face Detector finds the faces in a camera preview frame, should I drop the frames that the preview feeds before the FD finishes? If so, how can I do it?
Should I just use setClassificationMode(NO_CLASSIFICATIONS) and setTrackingEnabled(false) in a camera preview just to make the detection faster?
Does the play-services-vision library use OpenCV, and which is actually better?
EDIT 2:
I read one research paper that, using OpenCV, the face detection and other functions available in OpenCV is faster in Android due to their higher processing power. I was wondering whether I can leverage that to hasten the face detection.
There is no way you can guarantee that face detection will be fast enough to show no visible delay even when the head motion is moderate. Even if you succeed to optimize the hell of it on your development device, you will sure find another model among thousands out there, that will be too slow.
Your code should be resilient to such situations. You can predict the face position a second ahead, assuming that it moves smoothly. If the users decide to twitch their head or device, no algorithm can help.
If you use the deprecated Camera API, you should pre-allocate a buffer and use setPreviewCallbackWithBuffer(). This way you can guarantee that the frames arrive to you image processor one at a time. You should also not forget to open the Camera on a background thread, so that the [onPreviewFrame()](http://developer.android.com/reference/android/hardware/Camera.PreviewCallback.html#onPreviewFrame(byte[], android.hardware.Camera)) callback, where your heavy image processing takes place, will not block the UI thread.
Yes, OpenCV face-detection may be faster in some cases, but more importantly it is more robust that the Google face detector.
Yes, it's better to turn the classificator off if you don't care about smiles and open eyes. The performance gain may vary.
I believe that turning tracking off will only slow the Google face detector down, but you should make your own measurements, and choose the best strategy.
The most significant gain can be achieved by turning setProminentFaceOnly() on, but again I cannot predict the actual effect of this setting for your device.
There's always going to be some lag, since any face detector takes some amount of time to run. By the time you draw the result, you will usually be drawing it over a future frame in which the face may have moved a bit.
Here are some suggestions for minimizing lag:
The CameraSource implementation provided by Google's vision library automatically handles dropping preview frames when needed so that it can keep up the best that it can. See the open source version of this code if you'd like to incorporate a similar approach into your app: https://github.com/googlesamples/android-vision/blob/master/visionSamples/barcode-reader/app/src/main/java/com/google/android/gms/samples/vision/barcodereader/ui/camera/CameraSource.java#L1144
Using a lower camera preview resolution, such as 320x240, will make face detection faster.
If you're only tracking one face, using the setProminentFaceOnly() option will make face detection faster. Using this and LargestFaceFocusingProcessor as well will make this even faster.
To use LargestFaceFocusingProcessor, set it as the processor of the face detector. For example:
Tracker<Face> tracker = *your face tracker implementation*
detector.setProcessor(
new LargestFaceFocusingProcessor.Builder(detector, tracker).build());
Your tracker implementation will receive face updates for only the largest face that it initially finds. In addition, it will signal back to the detector that it only needs to track that face for as long as it is visible.
If you don't need to detect smaller faces, using the setMinFaceSize() larger will make face detection faster. It's faster to detect only larger faces, since it doesn't need to spend time looking for smaller faces.
You can turn of classification if you don't need eyes open or smile indication. However, this would only give you a small speed advantage.
Using the tracking option will make this faster as well, but at some accuracy expense. This uses a predictive algorithm for some intermediate frames, to avoid the expense running full face detection on every frame.

Is this a Digital Compass or Unity limitation?

I'm interested in AR applications of mobile devices and naturally I would like to make better use of the compass.
The only issue I've been having to work against isn't how twitchy the compass is. (Angular Smoothing seems to solve this issue just fine) My main issue is that when the device is held Vertical the compass values start freaking out. Causing an on screen compass to flip about all over the place. I don't have a lot of experience with mobile application development so I'm not sure what would be causing this issue, if its a Unity issue or if its just a limitation of the digital compass. I know other apps do seem to be able to use the compass fine in any orientation, but this is all stupidly new to me.
I've definitely tried moving the phone in a figure of 8. The device I have to play around with is a Nexus 4.
using UnityEngine;
using System.Collections;
public class Compass : MonoBehaviour {
// Use this for initialization
void Start () {
Input.location.Start ();
Input.compass.enabled = true;
}
// Update is called once per frame
void Update ()
{
var heading = Input.compass.trueHeading;
transform.eulerAngles = new Vector3 (0, 0, heading);
}
}
Preamble :)
First of, I'm not an expert (unfortunately) in subjects that I will talk about. But still, I've decided to share my thoughts.
Theory
The problem can be generalized in the following way. You want to have some continuous function that takes a 3D vector (which is device orientation in your case) and returns another vector that is orthogonal to original vector. Theory says (see hairy ball theorem) that for some arguments that function will return zero vectors. In case when such a function is compass, zero vectors are returned when device is oriented vertical (and this fells quite natural if you have ever used an ordinary compass).
Practice
Sometimes you want your app to tell which side of the world does phone back (rear camera) is pointing to.
Or maybe even you want combined approach:
If the phone is oriented flat, show what is the phone's top pointing to.
If the phone is oriented vertical, show what is the phone's back pointing to.
In both cases you need to use gyroscope in addition to compass.

Categories

Resources