OpenCV resolvePnP without camera and distortion matrices

OpenCV resolvePnP without camera and distortion matrices - android

I am trying to make a AR app for android in witch the user points the camera to a square marker and a 3d model should show on top of the marker.
I am using OpenCV to get the rotation and translation of the marker but..
To get these 2 matrices, I use solvePnP for which I have to provide a camera matrix and a
distortion matrix which (from my understanding) are specific for each camera type.
Unfortunately, this is a huge drawback for me since the app should be supported by most Android devices and I also can't ask the user to run the camera calibration procedure (functions provided by openCV).
So the question: is there a way to eliminate the camera and distortion matrices? Or, is there another way to calculate the marker 3D position relative to the device?
I tried QCAR and Unity AR but (since the 3D models are downloaded form a server and are changing constantly), I was forced to go with OpenCV.
Any help will be really appreciated.
Thanks.

Bad news for you. The answer is no.
You can't know anything about your image if you don't know anything about your camera. The camera is described by the two matrices: camera mat and distortions mat.
But.. along with the bad news, there are some good news. You can do something, maybe.
Distortion matrix can be ignored in many AR applications. You just don't care about the small distortions.
And the camera matrix can be constructed if you know camera's field of view.
Camera matrix:
w/(2*tan(camera.fovX())), 0, w/2, 0,
0, h/(2*tan(camera.fovY())), h/2, 0,
0, 0, 1, 0
On Android there's an API for retrieving the aperture: getHorizontalAperture() and getVerticalAperture(). Aperture is field of view.
However, on many phones, the returned value isn't correct, because the manufacturers did not care to set it right. It might return bogus angles like 10 or 180 degrees. There's no good way to fix this apart from testing individual phone models.
Good luck!

Related

Spatial offset of a virtual object with Opengl ES perspective

Context: I'm currently working on a Augmented Reality (AR) application using OpenGL ES 2.0 and some AR glasses running on Android. My goal is to display a virtual cursor at the tip of a real object : a screwdriver. Both the glasses and the screwdriver locations are tracked by a fixed external camera. The left image just below can give you an idea of the setup.
Things that are working: So far, I'm able to display a virtual 3D object (for example a cube) at a given location in space. For example, I am able to position it at (more or less 1cm from) the tip of a tracked screwdriver. When I just rotate the head, the virtual cube gives the impression to "stay at the same place" in the real world, which is nice. This behavior is what I expected, and is consistent with its real-world anchor.
Issue: However, when I do a translation with the head (and thus a translation of the opengl camera), the cube seems to have a strange spatial offset, like if it was shifted from the object's tip (case 2 in the drawing above). This shift can be pretty significant (until 5 or 6 cm), and unconsistent with the real-world. But if I align the object exactly with any of the camera axis, the cube seems well-placed at the tip of the object, which confuses me.
Question: Is it just a strange visual perspective effect ? How can it work with head rotations but not head translations ? Did I miss something about perspective projection in OpenGL ES ?
Implementation details The fixed external camera is the origin of world coordinates. It is really precise, and gives me both the world-space position and rotation of each object (including the glasses and the screwdriver). To be more precise, it continuously send this data via Bluetooth to my Android program to make sure what the user can see is up-to-date.
In the case 1, this works like a charm: the camera correctly detects that the screwdriver is at position (0, 0, 1 meter) and whatever rotation for example, I display a cube centered around that position, and it appears correctly placed. But after a head translation (case 2), the screwdriver is still detected at the correct position (it didn't move after all), but the cube is shifted in a way that does not make sense to me.
If it was a small offset, I would put that on an accumulation of small errors, but here it seems to big to be the only explanation. Depending on the head translation I do, the cube gains a different offset and overall give the impression not to have a single fixed position in the world.
I am using perspective projection with the FOV and aspect ration of the AR glasses. The position of the opengl camera is set to the position of the AR glasses, and the Look-at values are computed according to the direction the head is currently facing.
If I modify the FOV, I loose the expected behavior I have about head rotations and correct positionning. Finally, I am using the glasses as a stereo display.

How to use the numbers from Game Rotation Vector in Android?

I am working on an AR app that needs to move an image depending on device's position and orientation.
It seems that Game Rotation Vector should provide the necessary data to achieve this.
However I cant seem to understand what the values that I get from GRV sensor show. For instance in order to reach the same value on the Z axis I have to rotate the device 720 degrees. This seems odd.
If I could somehow convert these numbers to angles from the reference frame of the device towards the x,y,z coordinates my problem would be solved.
I have googled this issue for days and didn't find any sensible information on the meaning of GRV coordinates, and how to use them.
TL:DR What do the numbers of the GRV sensor show? And how to convert them to angles?

As the docs state, the GRV sensor gives back a 3D rotation vector. This is represented as three component numbers which make this up, given by:
x axis (x * sin(θ/2))
y axis (y * sin(θ/2))
z axis (z * sin(θ/2))
This is confusing however. Each component is a rotation around that axis, so each angle (θ which is pronounced theta) is actually a different angle, which isn't clear at all.
Note also that when working with angles, especially in 3D, we generally use radians, not degrees, so theta is in radians. This looks like a good introductory explanation.
But the reason why it's given to us in the format is that it can easily be used in matrix rotations, especially as a quaternion. In fact, these are the first three components of a quaternion, the components which specify rotation. The 4th component specifies magnitude, i.e. how far away from the origin (0, 0) a point it. So a quaternion turns general rotation information into an actual point in space.
These are directly usable in OpenGL which is the Android (and the rest of the world's) 3D library of choice. Check this tutorial out for some OpenGL rotations info, this one for some general quaternion theory as applied to 3D programming in general, and this example by Google for Android which shows exactly how to use this information directly.
If you read the articles, you can see why you get it in this form and why it's called Game Rotation Vector - it's what's been used by 3D programmers for games for decades at this point.
TLDR; This example is excellent.
Edit - How to use this to show a 2D image which is rotated by this vector in 3D space.
In the example above, SensorManage.getRo‌tationMatrixFromVecto‌r converts the Game Rotation Vector into a rotation matrix which can be applied to rotate anything in 3D. To apply this rotation a 2D image, you have to think of the image in 3D, so it's actually a segment of a plane, like a sheet of paper. So you'd map your image, which in the jargon is called a texture, onto this plane segment.
Here is a tutorial on texturing cubes in OpenGL for Android with example code and an in depth discussion. From cubes it's a short step to a plane segment - it's just one face of a cube! In fact that's a good resource for getting to grips with OpenGL on Android, I'd recommend reading the previous and subsequent tutorial steps too.
As you mentioned translation also. Look at the onDrawFrame method in the Google code example. Note that there is a translation using gl.glTranslatef and then a rotation using gl.glMultMatrixf. This is how you translate and rotate.
It matters the order in which these operations are applied. Here's a fun way to experiment with that, check out Livecodelab, a live 3D sketch coding environment which runs inside your browser. In particular this tutorial encourages reflection on the ordering of operations. Obviously the command move is a translation.

Dynamic Environment mapping from camera in Augmented Reality setting

I am trying to implement something like the technique described in this (old) paper to use the phone camera's video frames to create an illusion of environment mapping in an AR app.
I want to take the camera frame, divide it into sub-areas and then use those as faces on the cube map. The division of the camera frame would look something like this:
Now the X area is easy, I can use glCopyTexImage2D to copy that square area to my cubemap texture. But I need help with the trapezoid shaped areas around X (forget about the trianlges for now).
How can I take those trapezoidal areas and distort them into square textures? I think I need the opposite transformation of the later occurring perspective projection, so that the two will cancel each other out in the final render if I render the cubemap as a skybox around my camera (does that explain what I want?).
Before doing this I tried a simpler step of putting the square X area on every side of the cubemap just to see if glCopyTexImage2D can even be used for this. It can, but the results are not rotated right, some faces are "upside down" when I render the cubemap as a skybox. The question is similar: How can I rotate them before using them as textures?
I also thought about solving the problem from the other side and modifying the "texture coordinates" to make the necessary adjustments, but that also does not seem easy since the lookup in the fragment shader with "textureCube" is more complicated than a normal texture lookup.
Any ideas?
I'm trying to do this in my AR app on Android with OpenGL ES 2.0 but I guess more general OpenGL advice might also be useful.
Update
I have come to the conclusion that this is not worth pursuing anymore. The paper makes it look nice, but my experiments with a phone camera have shown a major contradiction. If you want to reflect the environment in an object rendered in AR, the camera view is very limited. When the camera is far away from the tracked object you have enough environment information for a good reflection, but you will barely see it because the camera is far away. But when you bring the camera closer to see the awesome reflection in detail, the tracked object will fill most of the camera's field of view and you barely have any environment to reflect anymore. So in either case you lose and the result is not worth the effort.

It seems that you need to create mesh with UV mapping described in article and render it with texture from camera to another texture. Then use it as cubemap.

Android camera calibration without chessboard

I know there is some post talking about this topic but I could not find my answer.
I want to calibrate my android camera without chessboard for 3d reconstruction, so I need my intrinsic and extrinsic parameters.
My first goal is to extract the 3D real system to be able to put some 3d Model on screen.
My step :
From a picture of a building I extract 4 points that represent the real 3D system
/!\ this step require camera calibration /!\
Convert them to 3d Point (solvePnP for exemple)
Then from my 3D Axis I create a OpenGL projection and modelview matrix
My main problem is that I want to avoid a calibration step, so how can calibrate without chessboard? I have some data from android such as focal length. I can guess that the projection center is the center of my camera picture.
Any idea or advice? or other way to do it ?

here is nochess calibation of qtcalib.
This scheme is recomended when you need obtain a camera calibration
from a image that don't have calibration chessboard. In this case, you
can approximate the camera calibration if you know 4 points in the
image forming a flat rectangle in real world. Is important to remark
that the aproximated calibration depends on the 4 selected points and
the values that you will set for the dimensions of the rectangle

Difference between Camera.translate and Matrix.preTranslate or Matrix.postTranslate?

We use Camera to do 3D transformations in canvas.We usually rotate camera and get it's Matrix then translate it.But Camera also has translate method.The results of using methods are different.
My question is : What is difference between Camera.translate and Matrix.preTranslate or Matrix.postTranslate?

The reason there are both, is because matrix multiplication must be done in a certain order to achieve the proper result (as you may already know).
The sequence of translations/rotations/scales are done in reverse order as you type them.
So if you do something like this:
Camera.rotate(15, 0, 0);
Camera.scale(.5f, .5f, .5f);
Camera.translate(70, 70, 70);
You're first translating 70,70,70 then scaling by 50% in all directions, then rotating 15 degrees about the X axis.
So Matrix has a pre and post translate (well, pre and post everything), because maybe you want to actually rotate it first by 15 degrees and then translate it, and then finally scale it.
So that answers the pre and post translates. Now the reason Camera has a straight rotate and translate is for people that know how this works already (like me!), so I never use Matrix or Camera for that matter, because I can simply do my rotations and translations directly on the Canvas. You can too as long as you know that translations, scales, and rotates are done in reverse order.
Also, if you know what I have told you, it gives you more power. You can do a sequence of 10 matrices without surrounding them in multiple Matrix objects for each one (for example you want to do a swing motion that swings outward AND rotates about the center to simulate centrifugal force). This would need to be done with multiple rotates and translations (surrounded by multiple Matrix objects being passed into one another), but if you know how each translate works, you can simply do a series of .translate(), .rotate(), and .scale().
This information is especially useful if you ever do 3D graphics, because that's when these matrices give people headaches.
I hope this helps!

The result would be visually the same if you i.e. do not touch the canvas but rotate the camera 90 degs or keep camera still but rotate the canvas it looks at by -90 degs.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.