It seems I've found myself in the deep weeds of the Google Vision API for barcode scanning. Perhaps my mind is a bit fried after looking at all sorts of alternative libraries (ZBar, ZXing, and even some for-cost third party implementations), but I'm having some difficulty finding any information on where I can implement some sort of scan region limiting.
The use case is a pretty simple one: if I'm a user pointing my phone at a box with multiple barcodes of the same type (think shipping labels here), I want to explicitly point some little viewfinder or alignment straight-edge on the screen at exactly the thing I'm trying to capture, without having to worry about anything outside that area of interest giving me some scan results I don't want.
The above case is handled in most other Android libraries I've seen, taking in either a Rect with relative or absolute coordinates, and this is also a part of iOS' AVCapture metadata results system (it uses a relative CGRect, but really the same concept).
I've dug pretty deep into the sample app for the barcode-reader
here, but the implementation is a tad opaque to get anything but the high level implementation details down.
It seems an ugly patch to, on successful detection of a barcode anywhere within the camera's preview frame, to simple no-op on barcodes outside of an area of interest, since the device is still working hard to compute those frames.
Am I missing something very simple and obvious on this one? Any ideas on a way to implement this cleanly, otherwise?
Many thanks for your time in reading through this!
The API currently does not have an option to limit the detection area. But you could crop the preview image before it gets passed into the barcode detector. See here for an outline of how to wrap a detector with your own class:
Mobile Vision API - concatenate new detector object to continue frame processing
You'd implement the "detect" method to take the frame received from the camera, create a cropped version of the frame, and pass that through to the underlying detector.
Related
This question is about using Google's Mobile Vision Face API on Android.
The Story (Background) and what I want to do
I am trying to implement a function that detects faces in a camera view, and overlaying images on those faces.
Now, I have already successfully implemented such feature by using Mobile Vision API's Face Detection. Its mechanism is like this:
A CameraView (I am using
Fotoapparat here) that
can pass each camera frame in a callback
I turn that frame into a Bitmap
The bitmap is passed to Mobile Vision API for face detection
When Face is detected, I get its position and size
Using that position information, draw something on another custom View.
The problem is, the above process obviously takes too much time. I can only update the overlaying image position 3-5 times every second, even less frequent on slower devices.
By looking at the profiling, the most heavy method is, surprisingly, step 3 (Face detetction). It takes an average of 100ms to perform once.
Second heavy method is converting the NV21 frame into a Bitmap object, which takes around 90ms.
Summing up everything now I get an update FPS of 3~5.
But other than that, every thing works perfectly - image can be captured in high quality, with auto focus and pinch zooming.
How about Face Tracking?
On the other hand, Mobile Vision API provides another API - Face Tracking.
In Google's sample app, the tracking is very fast, it follows almost instantaneously with the faces in the camera preview. As stated in the document, this is because the mechanism is completely different - instead of detecting faces on each frame, once a face is detected, the position simply follows its movement without performing any face detection.
But it fact such mechanism is good enough in my use case!
However, by looking at the sample code, it looks like I have to use its built-in implementation of CameraSource. This can be find in the code below:
mCameraSource = new CameraSource.Builder(context, detector)
.setRequestedPreviewSize(640, 480)
.setFacing(CameraSource.CAMERA_FACING_BACK)
.setRequestedFps(30.0f)
.build();
detector is the main character here, and it is only used here when passing to CameraSource. It looks like I have to stick to using this CameraSource.
However, although this camera source has takePicture() method, I cannot find any way to implement auto focus and zooming.
Finally, the question
My ultimate objective is to implement the feature I have mentioned in the beginning, with the below requirements:
High quality image captured
Auto focus
Zoom
Fast face position
update (About 10 times in a second is good enough)
1-3 can be done using Face Detection, but not 4;
While 4 can be done using Face Tracker, but not 1-3.
Is there a way to accomplish all 1-4? I welcome any suggestion even if it is to use another library instead of Mobile Vision.
Thanks for reading such long question till the end!
CameraSource.java is available on GitHub under a permissive Apache license. Feel free to add auto-focus and zoom.
For my application I have been looking into using BoofCV to detect if I am on a pathway or not. The pathway is just gravel so it is the color of a standard roadway. I'm not sure exactly what image processing technique to use. The BoofCV demo app has a lot of features, but I would like to know which one is appropriate for what I'm trying to do.
Ultimately I'd like to have a toast appear on the screen when I am on a pathway.
From your question, I'm guessing that you' re using a regular camera, as real time input from a moving object. In that case you may need to:
Calibrate and Stabilize your input frames (since your pathway is made from gravel). BoofCV provides libraries.
Adjust exposure, contrast or brightness (for night/low light vision cameras or low contrast frames).
Use BoofCV's Binary Image Ops, according to your app's needs (Image Thresholding, Binary Labeling etc).
Use a classifier for 2 classes ("inside pathway", "outside pathway").
Process your output and feedback results to your "decision operator", to make a choice and guide your moving object.
More details about your project may help for a better answer.
I'm building an Android app that has to identify, in realtime, a mark/pattern which will be on the four corners of a visiting card. I'm using a preview stream of the rear camera of the phone as input.
I want to overlay a small circle on the screen where the mark is present. This is similar to how reference dots will be shown on screen by a QR reader at the corner points of the QR code preview.
I'm aware about how to get the frames from camera using native Android SDK, but I have no clue about the processing which needs to be done and optimization for real time detection. I tried messing around with OpenCV and there seems to be a bit of lag in its preview frames.
So I'm trying to write a native algorithm usint raw pixel values from the frame. Is this advisable? The mark/pattern will always be the same in my case. Please guide me with the algorithm to use to find the pattern.
The below image shows my pattern along with some details (ratios) about the same (same as the one used in QR, but I'm having it at 4 corners instead of 3)
I think one approach is to find black and white pixels in the ratio mentioned below to detect the mark and find coordinates of its center, but I have no idea how to code it in Android. I looking forward for an optimized approach for real-time recognition and display.
Any help is much appreciated! Thanks
Detecting patterns on four corners of a visiting card:
Assuming background is white, you can simply try this method.
Needs to be done and optimization for real time detection:
Yes, you need OpenCV
Here is an example of real-time marker detection on Google Glass using OpenCV
In this example, image showing in tablet has delay (blutooth), Google Glass preview is much faster than that of tablet. But, still have lag.
The requirement is to create an Android application running on one specific mobile device that records video of a human eye pupil dilating in response to a bright light (which is physically attached to the mobile device). The video is then post-processed frame by frame on the device to detect & measure the diameter of the pupil AND the iris in each frame. Note the image processing does NOT need doing in real-time. The end result will be a dataset describing the changes in pupil (& iris) size over time. It's expected that the iris size can be used to enhance confidence in the pupil diameter data (eg removing pupil size data that's wildly wrong), but also as a relative measure for how dilated the eye is at any point.
I am familiar with developing Android mobile apps, but my experience with image processing is very limited. I've researched solutions and it seems that the answer may lie with the OpenCV/JavaCv libraries, which should provide shape detection (eg http://opencvlover.blogspot.co.uk/2012/07/hough-circle-in-javacv.html) but can anyone provide guidance on these specific questions:
Am I right to think it can detect the two circle shapes within a bitmap, one inside the other? ie shapes inside each other is not a problem.
Is it true that JavaCv can detect a circle, and return a position & radius/diameter? ie it doesn't return a set of vertices that then require further processing to compare with a circle? It seems to have a HoughCircle method, so I think yes.
What processing of each frame is typically used before doing shape detection? For example an algorithm to enhance edges, smooth, or remove colour?
Can I use it to not just detect presence of, but measure the diameter of the detected circles? (in pixels, but then can easily be converted to real-world measurements because known hardware is being used). I think yes, but would be great to hear confirmation from those more familiar.
This project is a non-commercial charitable project, so any help especially appreciated.
I would really suggest using ndk as it is a bit richer in features. Also it allows you to run and test your algorithms on a laptop with images before pushing it to a device, speeding up development.
Pre-processing steps:
Typically one would use thresholding or canny edge detection and morphological operations like erode dilate.
For detection of iris / pupil, houghcircles is not a very good method, feature detection methods like MSER work better for not-so-well-defined circles. Here is another answer I wrote on the same topic which has code that could help.
If you are looking to measure the regions, I would suggest going through this blog. It has a clear explanation on the steps involved for a reasonably accurate measurement.
I need to scan a special object within my android application.
I thought about using OpenCV but it is scanning all objects inside the view of the camera. I only need the camera to regognize a rectangular piece of paper.
How can i do that?
My first thought was: How do barcode scanners work? They are able to regognize the barcode area and automatically take a picture when the barcode is inside a predefined area of the screen and when its sharp. I guess it must be possible to transfer that to my problem (tell me if im wrong).
So step by step:
Open custom camera application
Scan objects inside the view of the camera
Recognize the rectangular piece of paper
If paper is inside a predefined area and sharp -> take a picture
I would combine this with audio. If the camera recognized the paper make some noice like a peep or something and the more the object is fitting the predefined area the faster the peep sound is played. That would make taking pictures for blind people possible.
Hope someone got ideas on that.
OpenCV is an image processing framework/library. It does not "scan all objects inside the view of the camera". By itself it does nothing and yet it gives the use of a number of useful functions, many of which could be used for your specified application.
If the image is not cluttered and nothing is on the paper, I would look into using edge detection (i.e. Canny or similar) or even colour blobs (even though colour is never a good idea, if your application is always for white uncovered paper, it should work robustly).
OpenCV does add some overhead, but it would allow you to quickly use functions for a simple solution.