I am new to image processing. I have a data set of images and I want to perform calibration on those images based on a target image. I have surfed a lot on image calibration but the majority of the results yield camera calibration. I am confused as to whether these are same or different things. Can anybody explain to me the difference between these two terms?
On reading through one of the results on image calibration, I got to know that there are three steps that I need to perform:
Bias Frame Calibration
Dark Frame Calibration
Flat Field Frame Calibration
Also, I need to perform this in Android. For that, I have figured out that I will need to use OpenCV or JavaCV.
So, I want to know if these 3 steps will be possible using OpenCV/JavaCV or not?
Calibration is process that is exploiting some knowledge about the data to reconstruct measurements to be more accurate or suite a specific need. As we have no idea what is the desired result of your calibration then it is hard to say.
In general the difference is as follows:
Camera calibration
you got camera and want to achieve that captured images will suffice some condition. This process usually mean taking image of some predefined objects like color markers, geometry checker board, LASER sweeps, etc. This way you can obtain camera parameters needed to reconstruct some specific feature of image for any other image taken (assuming important parameters not change with time like camera position or exposure time ...)
Image calibration
Is similar but the input image can be obtained from different sources (different cameras, render, simulation, etc. ) or under different circumstances (exposure, lighting, etc.). In this case we have not the luxury of calibration process so instead we need to find some kind of know feature in the images and correct the rest of image (for example object of known size, color, temperature, etc.)
So the difference is The Camera calibration is when you got single imaging device as a source of image and Image calibration is when you got multiple image sources (often unknown).
I am not using OpenCV but as people using this lib for such tasks then it should have support for operations like this.
Here small example of such operation:
OpenCV Birdseye view without loss of data
Related
I am modifying (Java) the TF Lite sample app for object detection. It has a live video feed that shows boxes around common objects. It takes in ImageReader frames at 640*480.
I want to use these bounds to crop the items, but I want to crop them from a high-quality image. I think the 5T is capable of 4K.
So, is it possible to run 2 instances of ImageReader, one low-quality video feed (used by TF Lite), and one for capturing full-quality still images? I also can't pin the 2nd one to any Surface for user preview, pic has to be captured in the background.
In this medium article (https://link.medium.com/2oaIYoY58db) it says "Due to hardware constraints, only a single configuration can be active in the camera sensor at any given time; this is called the active configuration."
I'm new to android here, so couldn't make much sense of this.
Thanks for your time!
PS: as far as I know, this isn't possible with CameraX, yet.
As the cited article explains, you can use a lower-resolution preview stream and periodically capture higher-rez still images. Depending on hardware, this 'switch' may take time, or be really quick.
In your case, I would run a preview capture session at maximum resolution, and shrink (resize) the frames to feed into TFLite when necessary.
If I use camera2 API to capture some image I will get "final" image after image processing, so after noise reduction, color correction, some vendor algorithms and etc.
I should also be able to get raw camera image following this.
The question is can I get intermediate stages of image as well? For example let's say that raw image is stage 0, then noise reduction is stage 1 color correction stage 2 and etc. I would like to get all of those stages and present them to user in an app.
In general, no. The actual hardware processing pipelines vary a great deal between different chip manufacturers and chip versions even from the same manufacturer. Plus each Android device maker then adds their own software on top of that.
And often, it's not possible to dump outputs from every step of the process, only some of them.
So making a consistent API for fetching this isn't very feasible, and the camera2 API doesn't have support for it.
You can somewhat simulate it by turning things like noise reduction entirely off (if supported by the device) and capturing multiple images, but that of course isn't as good as multiple versions of a single capture.
I am trying to produce a point cloud where each point has a colour. I can get just the point cloud or I can get the camera to take a picture, but I need them to be as simultaneous as possible. If I could look up an RGB image with a timestamp or call a function to get the current frame when onXYZijAvailable() is called I would be done. I could just go over the points, find out where it would intersect with the image plane and get the colour of that pixel.
As it is now I have not found any way to get the pixel info of an image or get coloured points. I have seen AR apps where the camera is connected to the CameraView and then things are rendered on top, but the camera stream is never touched by the application.
According to this post it should be possible to get the data I want and synchronize the point cloud and the image plane by a simple transformation. This post is also saying something similar. However, I have no idea how to get the RGB data. I cant find any open source projects or tutorials.
The closest I have gotten is finding out when a frame is ready by using this:
public void onFrameAvailable(final int cameraId) {
if (cameraId == TangoCameraIntrinsics.TANGO_CAMERA_COLOR) {
//Get the new rgb frame somehow.
}
}
I am working with the Java API and I would very much like to not delve into JNI and the NDK if at all possible. How can I get the frame that most closely matches the timestamp of my current point cloud?
Thank you for your help.
Update:
I implemented a CPU version of it and even after optimising it a bit I only managed to get .5 FPS on a small point cloud. This is also due to the fact that the colours have to be converted from the android native NV21 colour space to the GPU native RGBA colour space. I could have optimized it further, but I am not going to get a real time effect with this. The CPU on the android device simply can not perform well enough. If you want to do this on more than a few thousand points, go for the extra hassle of using the GPU or do it in post.
Tango normally delivers color pixel data directly to an OpenGLES texture. In Java, you create the destination texture and register it with Tango.connectTextureId(), then in the onFrameAvailable() callback you update the texture with Tango.updateTexture(). Once you have the color image in a texture, you can access it using OpenGLES drawing calls and shaders.
If your goal is to color a Tango point cloud, the most efficient way to do this is in the GPU. That is, instead of pulling the color image out of the GPU and accessing it in Java, you instead pass the point data into the GPU and use OpenGLES shaders to transform the 3D points into 2D texture coordinates and look up the colors from the texture. This is rather tricky to get right if you're doing it for the first time but may be required for acceptable performance.
If you really want direct access to pixel data without using the C API,
you need to render the texture into a buffer and then read the color data from the buffer. It's kind of tricky if you aren't used to OpenGL and writing shaders, but there is an Android Studio app that demonstrates that here, and is further described in this answer. This project demonstrates both how to draw the camera texture to the screen, and how to draw to an offscreen buffer and read RGBA pixels.
If you really want direct access to pixel data but decide that the NDK might be less painful than OpenGLES, the C API has TangoService_connectOnFrameAvailable() which gives you pixel data directly, i.e. without going through OpenGLES. Note, however, that the format of the pixel data is NV21, not RGB or RGBA.
I am doing this now by capturing depth with onXYZijAvailable() and images with onFrameAvailable(). I am using native code, but the same should work in Java. For every onFrameAvailable() I get the image data and put it in a preallocated ring buffer. I have 10 slots and a counter/pointer. Each new image increments the counter, which loops back from 9 to 0. The counter is an index into an array of images. I save the image timestamp in a similar ring buffer. When I get a depth image, onXYZijAvailable(), I grab the data and the timestamp. Then I go back through the images, starting with the most recent and moving backwards, until I find the one with the closest timestamp to the depth data. As you mentioned, you know that the image data will not be from the same frame as the depth data because they use the same camera. But, using these two calls (in JNI) I get within +/- 33msec, i.e. the previous or next frame, on a consistent basis.
I have not checked how close it would be to just naively use the most recently updated rgb image frame, but that should be pretty close.
Just make sure to use the onXYZijAvailable() to drive the timing, because depth updates more slowly than rgb.
I have found that writing individual images to the file system using OpenCV::imwrite() does not keep up with the real time of the camera. I have not tried streaming to a file using the video codec. That should be much faster. Depending on what you plan to do with the data in the end you will need to be careful how you store your results.
Purpose of my application is to take a photo hand (gesture) and comparing it with a picture that is in the database. The first option I was use is background subtraction on images:
http://docs.opencv.org/trunk/doc/tutorials/video/background_subtraction/background_subtraction.html.
The solution works, but sometimes, depending on the first picture is not properly cut hand.
The second option is to detect skin color: http://bytefish.de/blog/opencv/skin_color_thresholding/
Or is it better to use a hand detection based on xml files? To compare the images I wanted to use this method:http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison.html
Let me remind you that I'm talking about comparing the images to which they are gestures. I also read about the possibility of making the histogram is not the entire image only on the subject of the photo and then the data is more reliable, but do not know how to do it.
I want to compare the gesture as a single image, I do not mean here with the sequences. 1 picture for comparison with the baseline. Gesture detection is to enable someone eg. Will ring, and now I will have for example. 5 seconds to take pictures and compare them with the base because I'm not certain if appeared in the lens of the hand or not. Unless there is another solution.
Ultimately, it is to be a comparison of two images in which there will be some hand gestures.
If your goal is to perform gesture recognition, you should take into account that gestures are sequence of images.
Thus, if you want to compare gesture you'll have to find a "smart" way to compare the whole sequences, and not single images, because one frame can belong to different gestures.
State of the art approaches for gesture recognition involves the extraction of the Optical Flow between two consecutive frames and then compute the histogram of optical flow (HOF). Having computed the histograms for all the frame pair in the video sequence, you can use different strategies to compare gestures:
You can concatenate all the HOF in the sequence and then perform histogram intersection to compare the two sequences
You can use the Bag of Word paradigm to create a representation of the HOF
Here there are some pointers to this strateies:
Optical Flow
You can check this article for extracting HOF: "Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on
Nonlinear Dynamical Systems for the Recognition of Human Actions"[Bag of words]
Bag of Words
However, if your application requires just the comparison between two images, I would suggest to extract the Histogram of Gradient (HOG) for each image and then compare them with the histogram intersection measure or, again, use the Bag of words paradigm (it is better if you're looking for higher level representations of the images). The HOG are provided within the OpenCv libraries link
I am developing an android camera app. The camera pictures are later processed by OCR, so the picture must be as sharp as possible.
If you shake the camera, it looks as if the digital camera overlays multiple images, to create the effect of motion blur:
Example 1: http://i.stack.imgur.com/nqrmd.jpg
Example 2: http://i.stack.imgur.com/ZBx6F.jpg
If you examine the pictures closely, the motion blur looks to consist of 2 or 3 images taken in quick succession and blended together to simulate light exposure. I understand that this amounts to the way digital cameras work.
But I'd prefer having a single crisp image rather than a properly exposed one. The app can use histogram corrections to make the text readable again for OCR. The image does not have to appeal to the human eye.
Is there a way to better control the camera to get these sort of raw image snapshots?
I had some limited success using the "Action" scene mode on the camera. Not much, but it's as far as you can get.