Our company want to develop a android app to analyse 2D camera's video frame. By integrating openni sdk, app can get human body keypoints. With keypoints, app find the hand keypoints and hand's location in frame. But after our research, we now find some problems:
openni's repository in github didn't maintain for 6 years.
Can we use openni to get human's body keypoints by 2D picture or frame? And how to do.
If openni can't, is there any other way to solve this in android?
Related
I am developing an Android app in which I want to track a 2D image/a piece of paper, analyze what the user write/draw on it, and correctly display different 3D contents on it.
I am working on the tracking and displaying simple 3D contents part, which can actually be achieved using SDKs like Vuforia and Wikitude. However, I am not using them for several reasons.
There are other analysis on the image to be done, e.g. drawings analysis.
The image may not be as rich in features, e.g. paper with lines or some figures.
SDKs like Vuforia may not expose some underlying functionalities like feature detection etc. to developers.
Anyway, right now I only want to achieve the following result.
I have a piece of paper, probably with lines and figures on it. You can think of it as the kind of paper for children to practice writing or drawing on. Example: https://i.pinimg.com/236x/89/3a/80/893a80336adab4120ff197010cd7f6a1--dr-seuss-crafts-notebook-paper.jpg
I point my phone (the camera) at the paper while capturing the video frames.
I want to register the paper, track it and display a simple wire-frame cube on it.
I have been messing around with OpenCV, and have tried the following approaches.
Using homography:
Detect features in the 2D image (ORB, FAST etc.).
Describe the features (ORB).
Do the same in each video frame.
Match the features and find good matches.
Find the homography, use the homography and successfully draw a rectangle around the image in the video frame.
Did not know how to use the homography decomposition (into rotations, translations and normals) to display a 3D object like a cube.
Using solvePnP:
1 to 4 are the same as the above.
Convert all 2D good match points in the image to 3D by assuming the image lies on the world's x-y plane, thus all having z = 0.
Use solvePnP with those 3D points and 2D points in the current frame to retrieve the rotation and translation vectors, and further convert it to the projection matrix using Rodrigues() in OpenCV.
Construct the 3D points of a cube.
Project them into the 2D image using the projection and the camera matrix.
The issue is the cube is jumping around, which I believe is due to the feature detection and mapping not being stable and accurate, thus affecting solvePnP.
Using contours or corners:
I simply grayscale the camera frame, Gaussian-smooth it, dilate or erode it and try to find the biggest 4-edge contour so that I can track it using solvePnP etc. This, unsurprisingly, doesn't give good results, or I'm just doing it wrong.
So my questions are:
How can I solve the two bold problems mentioned above.
More generally, given the type of image target I want to track, what would be the optimal algorithm/solution/technique to track it?
What are the things that I can improve/change in my way of solving the problem?
Thank you very much.
I’m trying to create a simple AR simulation in Unity, and I want to speed up the process of re-localizing based on the ADF after I lose tracking in game. For example, is it better to have landmarks that are 3D shapes in the environment that are unchanging, or is it better to have landmarks that are 2D markings?
If it has to be one of these two, I would say 2D marking (visual features) would be preferred. So first, Tango is not using depth sensor for relocalization or pose estimations, 3D geometry is not necessary helping on the tracking. In a extremely case, if the device is in a pure white environment (with no shadows) with lots of boxes in it, it will still lost tracking eventually, because there's no visual features being tracking.
On the other hand, if there's a empty room, with lots of poster in it. Even it's not that "interesting" from its geometry. But it is good for tracking because it has enough visual feature to tracking.
Motion tracking API of Tango uses MonoSLAM algorithm. It uses wideangle camera and motion sensors to estimate pose of device. It doesn't use depth information into consideration to estimate pose vector of device.
In general SLAM algorithms uses feature detectors like Harris corner detection, FAST feature detection to detect features and track them. So it's better to put up 2D markers with rich of features like say any random pattern or any painting. This will help in feature tracking in case of MonoSLAM and generating rich ADF. Putting up 2D patterns at different places and at different 3D levels will even improve tracking of project tango.
Regards community
I just want to build a similar app like this,
with my own content of course.
How to capture 360 degree video (cameras, format, ingest, audio)?
Implementation:
2.1 Which one Cardboard SDK works best for my interests (Android or Unity)
2.2 Do you know any blogs, websites, tutorials, samples in which I can support.
Thank you
MovieTextures are a great way to do this in Unity, but unfortunately MovieTextures are not implemented on Android (maybe this will change in Unity 5). See the docs here:
For a simple wrap-a-texture-on-a-sphere app, the Cardboard Java SDK should work. But if you would rather use Unity due to other aspects of the app, the next best way to do this would be to allocate a RenderTexture and then grab the GL id and pass it to a native plugin that you would write.
This native code would be decoding the video stream, and for each frame it would fill the texture. Then Unity can handle the rest of the rendering, as detailed by the previous answer.
First of all, you need content, and to record stereo 360 video, you'll need a rig of at least 12 cameras. Such rigs can be purchased for GoPro cams. That's gonna be expensive.
The recently released Unity 5 is a great option and I strongly suggest using it. The most basic way of doing 360 stereo videos in Unity is to create two spheres with MovieTextures showing your 360 video. Then you turn them "inside out", so that they display their back faces instead of the front faces. This can be done with a simple shader, turning on front face culling and removing the mirror effect. Then you place your cameras inside the spheres. If you are using the google cardboard sdk camera rig, put the spheres on different culling layers and make the cameras only see the appropriate spheres. Remember to put the spheres in proper positions regarding the cameras.
There may be other ways to do this, leading to better results, but they won't be as simple. You can also look for some paid scripts/plugins/assets to do 360 video in VR.
I have implemented Augmented reality program using Qualcomm's vuforia library. Now I want to add Optical character recognition feature to my program so that i can translate the text from one language to another in real time. I am planning to use Tesseract OCR library. But my question is How do i Integrate Tesseract with QCAR?
can some body suggest me proper way to do it?
What you need is an access to the camera frames, so you can send them to Tesseract. The Vuforia SDK offers a way to access the frames using the QCAR::UpdateCallback interface (documentation here).
What you need to do is create a class that implements this protocol, register it to the Vuforia SDK using the QCAR::registerCallback() (see here), and from there you'll get notified each time the Vuforia SDK has processed a frame.
This callback will be provided a QCAR::State object, from which you can get access to the camera frame (see the doc for QCAR::State::getFrame() here), and send it to the Tesseract SDK.
But be aware of the fact that the Vuforia SDK works with frames in a rather low resolution (on many phones I tested, it returns frames in the 360x240 to 720x480 range, and more often the former than the latter), which may not be accurate enough for Tesseract to detect text.
As complimentary information to #mbrenon 's answer: Tesseract only does text recognition and doesn't support ROI text extraction, so you will need to add that to your system after capturing your image.
You can read these academic papers which report on the additional steps for using Tesseract on mobile phones and provide some evaluation performances:
TranslatAR: Petter, M.; Fragoso, V.; Turk, M.; Baur, Charles, "Automatic text detection for mobile augmented reality translation," Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on , vol., no., pp.48,55, 6-13 Nov. 2011
Mobile Camera Based Detection and Translation
I am developing a simple AR application which renders a 3D image on top of camera view. I could successfully implement that in Windows 7.I used OpenCv's native POSE estimation functions which internally uses POSIT algorithm, so as to give Translation, Rotation matrix, which could be applied on the the 3D modal.
I want to implement the same functionality in an Android application. The problem I am facing is, one of the arguments to the POSE estimation function is the Camera intrinsic and distortion parameters. Which i am not able to find out.
I tried studying various AR platforms - AandAR, ARToolKit etc. But after reverse engineering their Sources, i could get to any conclusion about usage of these in POSE estimation.
Please suggest me an appropriate method for POSE estimation (if it involves camera distortion parameters, then how would i determine it) and hence 3D object rendering over camera view in an Android application
OpenCV is a great library, which gives you the methods to perform camera calibration, if you want to figure out what the intrinsic parameters of your camera are, see Camera calibration documentation. Many of those functionalities are also implemented in the Android OpenCV library.
Finally you can also use the internal sensors Android ships with to obtain Gravity, Linear Acceleration and Orientation. See this inspiring talk by David Sachs on how to use Sensor Fusion on Android.
I was successfully able to perform Camera Calibration for the mobile device. I clicked chessboard snaps using the mobile camera, from various angles,and then used them in C++ Camera calibration code.
Then using these intrinsic and extrinsic camera parameters, I was trying to make the POSE estimation module.
I parsed the XML files and build the matrices.
Then using this OpenCv tutorial - OpenCv POSIT .I was able to perform POSE estimation.
However, I am yet to check its credibility over an actual program. But theoretically its achievable and code doesn't give any build or compilation error.