Are any of the current text capture APIs (e.g. Google's Text API) fast enough to capture text from a phone's video feed, and draw a box that stays on the text even as the camera moves?
I don't need fast enough to do full OCR per-frame (though that would be amazing!). I'm just looking for fast enough to recognize blocks of text and keep the bounding box displayed in sync with the live image.
There are two major options for good results. They are both C++ but there are wrappers. I've personally played with OpenCV for face recognition and the results were promising. Below links with small tutorials and demos.
OpenCV
Tessaract by Google
Firebase
onDeviceTextRecognizer is simple and working for me.
Related
I have integrated firebase MLKit in my android application. I'm using the on-device TextRecognizer API to detect the text on the live camera feed. It detects the text but it takes a long time to process the image(from 300 milliseconds up to 1000 milliseconds). Due to large latency, the overlay is not smooth like the Google lens app does.
What can I do so that the detected text overlay gets smooth transition between the frames gets processed in larger latency?
Also, I noticed that google lens app detects the text as a whole sentence instead of showing blocks of the texts. How does google lens app able to detect the text as sentences/paragraphs?
I'm assuming you have seen the performance tips in the API docs. One thing that is not mentioned there is that the amount of text in an image has a big impact on the latency. A single line of text, even with the same image resolution, takes much less time to process than the page of a book.
If you don't need to recognize all the text in the camera view, only do the recognition for a small section of the screen. It may help to take a look at the
ML Kit Translate Demo with Material Design that makes use of this "trick" to get great performance.
To your second point, Google Lens uses updated text recognition models that do a better job of grouping blocks of text into paragraphs. We hope to adopt these new models in ML Kit soon. In addition we are looking at hardware acceleration to ensure real-time experiences with large amount of text can be achieved.
I am working on app that detect eye blink of the user. I have been searching the web for 2 days but still don't have clear vision about how this can be done.
As far as i have knew is that the system supports face detection which is detecting if there is a face in the picture and locating it.
But this works only with images and detect only faces which is not what i need. I need to open an camera activity and directly detect the face of the user and locate his eyes and other facial parts and wait till he blinks, like when you long click on the screen on snap chat.
I have seen a lot about open-cv but still not sure what it is or how to use it or if it seize my goals.
Note: snap chat has no API released for the technology used, and even it doesn't let anyone to talk to the engineers behind this technology.
I know that openCV has the ability to allow image processing on the device's camera feed (as opposed to only being able to process still images).
Here is an introductory tutorial on eye detection using openCV:
http://romanhosek.cz/android-eye-detection-and-tracking-with-opencv/
If you can't find eye-blink detection tutorials in a google search, I think you'll have to create the code for eye-blink detection on your own, but I think openCV will be a helpful tool in doing so. There are lots of beginner openCV tutorials to help you get started.
Is there any way to get text data present in .jpeg or .png file which is captured by camera?
For example-
If i captured debit card by device camera then how to get debit card number or card holder name present on it from captured photo.
Basically, as already suggested above, you are to plunge into the science of optical recognition. These are quite complex algorithms that analyze pixels of an image and try to 'see' some text or, let's say, faces in the images. This objective, obvious for a human eye and mind, is quite complex especially considering that the image may have been shot with some particular lighting (back light or side light), with right or wrong white balance, etc.
Despite the whole complexity, there is good news: Google has provided a special library that does exactly that: recognizes texts, bar codes and faces. It is called Mobile Vision
Even without knowing the recognizing algorithms, you basically initiate this library and then feed your images to these algorithms with Face API, Barcode API or Text API. And then after the processing within that library you are given whatever was found by those algorithms. It's a kind of Magic :)
Useful links here:
Tutorial with Text API
The code sample of the app using Text API
I'm building an Android app that has to identify, in realtime, a mark/pattern which will be on the four corners of a visiting card. I'm using a preview stream of the rear camera of the phone as input.
I want to overlay a small circle on the screen where the mark is present. This is similar to how reference dots will be shown on screen by a QR reader at the corner points of the QR code preview.
I'm aware about how to get the frames from camera using native Android SDK, but I have no clue about the processing which needs to be done and optimization for real time detection. I tried messing around with OpenCV and there seems to be a bit of lag in its preview frames.
So I'm trying to write a native algorithm usint raw pixel values from the frame. Is this advisable? The mark/pattern will always be the same in my case. Please guide me with the algorithm to use to find the pattern.
The below image shows my pattern along with some details (ratios) about the same (same as the one used in QR, but I'm having it at 4 corners instead of 3)
I think one approach is to find black and white pixels in the ratio mentioned below to detect the mark and find coordinates of its center, but I have no idea how to code it in Android. I looking forward for an optimized approach for real-time recognition and display.
Any help is much appreciated! Thanks
Detecting patterns on four corners of a visiting card:
Assuming background is white, you can simply try this method.
Needs to be done and optimization for real time detection:
Yes, you need OpenCV
Here is an example of real-time marker detection on Google Glass using OpenCV
In this example, image showing in tablet has delay (blutooth), Google Glass preview is much faster than that of tablet. But, still have lag.
Is it possible to detect numbers or text while capturing video in android using opencv or any other best image& video processing api's
Yes, you can. OpenCV 3.0.0 includes a text detection tool which is based in Extremal Region detection and classification. You can use this in a video stream. Notice that this is detection, and the results can be used as the input of any OCR (as Tesseract). But remember:
Image -> Text Detection -> Preprocessing (binarization,etc) -> OCR -> Recognized Text
Hope that it helps!
You will need a capable OCR engine which will try to detect characters by any number of means. Tesseract is a good open source engine which will try to 'parse' your image for characters by masking.
However, there are several steps or approaches you need to take before you feed your OCR engine(Tesseract) your image. In order to ensure more accurate results, you need to 'clean' your image using binarization along with a number of other conventional methods such as 'canny' edge detection. This is where 'OpenCV' can help.
Also, you should isolate/detect characters in images. This can be done with powerful algorithms such as the Stroke Width Transform
Regarding detection on a video stream in android, you can run your captured frames through the cleansing and OCR engine as they are recieved through:
onPreviewFrame(byte[] data, Camera camera)
Also, check out this app which allows OCR in 'continuous preview' mode.
Hope this helps