I downloaded the ML Kit sample code for Android and i'm having a hard time to go about setting a limited detection area for the live camera detection (I'm only interested in text recognition, i got rid of everything else).
I need to limit the text recognition to only a part of the screen (say, a tiny rectangle or square in the center of the screen). Has anyone done such a workaround with ML kit?
Please take a look at the ML Kit Translate Showcase App which shows how to limit Text recognition to a specific section of the screen.
Related
I am using Android's ML Kit to detect faces which works really well however I want to detect a person's face (specifically their mouth) when their eyes are not visible in the image.
On the left is what ML Kit usually detects and on the right is the image I will be providing (only the nose and mouth are visible):
Currently when I provide an image which only shows the nose and mouth it really struggles to detect the face.
Note that if there is an alternative library (even cloud based) that does this then I am interested.
I have integrated firebase MLKit in my android application. I'm using the on-device TextRecognizer API to detect the text on the live camera feed. It detects the text but it takes a long time to process the image(from 300 milliseconds up to 1000 milliseconds). Due to large latency, the overlay is not smooth like the Google lens app does.
What can I do so that the detected text overlay gets smooth transition between the frames gets processed in larger latency?
Also, I noticed that google lens app detects the text as a whole sentence instead of showing blocks of the texts. How does google lens app able to detect the text as sentences/paragraphs?
I'm assuming you have seen the performance tips in the API docs. One thing that is not mentioned there is that the amount of text in an image has a big impact on the latency. A single line of text, even with the same image resolution, takes much less time to process than the page of a book.
If you don't need to recognize all the text in the camera view, only do the recognition for a small section of the screen. It may help to take a look at the
ML Kit Translate Demo with Material Design that makes use of this "trick" to get great performance.
To your second point, Google Lens uses updated text recognition models that do a better job of grouping blocks of text into paragraphs. We hope to adopt these new models in ML Kit soon. In addition we are looking at hardware acceleration to ensure real-time experiences with large amount of text can be achieved.
I am wondering how can I use the Google text recognition (OCR) with ARCore?
When I use the OCR sample to put some text above the detected text? It is jumping around as images are coming in. However I would like to anchor it to the text so when the camera moves, it is attached to it like ARcore
I couldn't find a way to do that? Google Lens does that
Any help or pointer is appreciated
Thank you
Are any of the current text capture APIs (e.g. Google's Text API) fast enough to capture text from a phone's video feed, and draw a box that stays on the text even as the camera moves?
I don't need fast enough to do full OCR per-frame (though that would be amazing!). I'm just looking for fast enough to recognize blocks of text and keep the bounding box displayed in sync with the live image.
There are two major options for good results. They are both C++ but there are wrappers. I've personally played with OpenCV for face recognition and the results were promising. Below links with small tutorials and demos.
OpenCV
Tessaract by Google
Firebase
onDeviceTextRecognizer is simple and working for me.
I'm building an Android app that has to identify, in realtime, a mark/pattern which will be on the four corners of a visiting card. I'm using a preview stream of the rear camera of the phone as input.
I want to overlay a small circle on the screen where the mark is present. This is similar to how reference dots will be shown on screen by a QR reader at the corner points of the QR code preview.
I'm aware about how to get the frames from camera using native Android SDK, but I have no clue about the processing which needs to be done and optimization for real time detection. I tried messing around with OpenCV and there seems to be a bit of lag in its preview frames.
So I'm trying to write a native algorithm usint raw pixel values from the frame. Is this advisable? The mark/pattern will always be the same in my case. Please guide me with the algorithm to use to find the pattern.
The below image shows my pattern along with some details (ratios) about the same (same as the one used in QR, but I'm having it at 4 corners instead of 3)
I think one approach is to find black and white pixels in the ratio mentioned below to detect the mark and find coordinates of its center, but I have no idea how to code it in Android. I looking forward for an optimized approach for real-time recognition and display.
Any help is much appreciated! Thanks
Detecting patterns on four corners of a visiting card:
Assuming background is white, you can simply try this method.
Needs to be done and optimization for real time detection:
Yes, you need OpenCV
Here is an example of real-time marker detection on Google Glass using OpenCV
In this example, image showing in tablet has delay (blutooth), Google Glass preview is much faster than that of tablet. But, still have lag.