Is it possible to detect numbers or text while capturing video in android using opencv or any other best image& video processing api's
Yes, you can. OpenCV 3.0.0 includes a text detection tool which is based in Extremal Region detection and classification. You can use this in a video stream. Notice that this is detection, and the results can be used as the input of any OCR (as Tesseract). But remember:
Image -> Text Detection -> Preprocessing (binarization,etc) -> OCR -> Recognized Text
Hope that it helps!
You will need a capable OCR engine which will try to detect characters by any number of means. Tesseract is a good open source engine which will try to 'parse' your image for characters by masking.
However, there are several steps or approaches you need to take before you feed your OCR engine(Tesseract) your image. In order to ensure more accurate results, you need to 'clean' your image using binarization along with a number of other conventional methods such as 'canny' edge detection. This is where 'OpenCV' can help.
Also, you should isolate/detect characters in images. This can be done with powerful algorithms such as the Stroke Width Transform
Regarding detection on a video stream in android, you can run your captured frames through the cleansing and OCR engine as they are recieved through:
onPreviewFrame(byte[] data, Camera camera)
Also, check out this app which allows OCR in 'continuous preview' mode.
Hope this helps
Related
Are any of the current text capture APIs (e.g. Google's Text API) fast enough to capture text from a phone's video feed, and draw a box that stays on the text even as the camera moves?
I don't need fast enough to do full OCR per-frame (though that would be amazing!). I'm just looking for fast enough to recognize blocks of text and keep the bounding box displayed in sync with the live image.
There are two major options for good results. They are both C++ but there are wrappers. I've personally played with OpenCV for face recognition and the results were promising. Below links with small tutorials and demos.
OpenCV
Tessaract by Google
Firebase
onDeviceTextRecognizer is simple and working for me.
I have been working with an app that uses the Tesseract API in order to support OCR. This is done by using a Surfaceview which shows the camera output (Camera2 API) and a ImageReader instance which is used to get the images from the camera. The camera is setup to be of the type setRepeatingRequest so new images are available very frequent. When I make a call to the getutf8text() method to get the readable text in images it makes the preview of the camera which is showed on the Surfaceview lag.
Are there any settings in the Tesseract API which can be set so it speeds up the getutf8text() method call or anything else I can do in order to get the preview Surfaceview to not lag?
Any help or guidance is appreciated!
Most of the things that you can do to speed up performance occur separately from the Tesseract API itself:
Run the OCR on a separate, non-UI thread
Grab a new image to start OCR on after OCR has finished on the last image. Try capture instead of setRepeatingRequest.
Downsample the image before OCR, so that it's smaller
Experiment with different Tesseract page segmentation modes to see what's the fastest on your data
Re-train the Tesseract trained data file to use fewer characters and a smaller dictionary, depending on what your app is used for
Modify Tesseract to perform only recognition pass #1
Don't forget to consider OpenCV or other approaches altogether
You didn't say what Tesseract API settings you're using now, and you didn't describe what your app does in a general sense, so it's hard to tell you where to start, but these points should get you started.
There are a few other things which you can try.
Init tesseract with OEM_TESSERACT_ONLY
Instead of using full-blown training data, use a faster alternative from https://github.com/tesseract-ocr/tessdata_fast.
Move the recognition to the computation thread.
I am going to create an android app which can capture the document paper to do OCR by using opencv camera. I am looking for the best way to turn opencv camera to get OCR optimize images (focus on document text, make text clearer) before OCR process.
I know there are some pre-processing image methods before OCR, but I want to get OCR optimize input sources before process them, for example, by using CvCameraViewListener2 to process frame.
Is there any suggestion to do this?
I have implemented Augmented reality program using Qualcomm's vuforia library. Now I want to add Optical character recognition feature to my program so that i can translate the text from one language to another in real time. I am planning to use Tesseract OCR library. But my question is How do i Integrate Tesseract with QCAR?
can some body suggest me proper way to do it?
What you need is an access to the camera frames, so you can send them to Tesseract. The Vuforia SDK offers a way to access the frames using the QCAR::UpdateCallback interface (documentation here).
What you need to do is create a class that implements this protocol, register it to the Vuforia SDK using the QCAR::registerCallback() (see here), and from there you'll get notified each time the Vuforia SDK has processed a frame.
This callback will be provided a QCAR::State object, from which you can get access to the camera frame (see the doc for QCAR::State::getFrame() here), and send it to the Tesseract SDK.
But be aware of the fact that the Vuforia SDK works with frames in a rather low resolution (on many phones I tested, it returns frames in the 360x240 to 720x480 range, and more often the former than the latter), which may not be accurate enough for Tesseract to detect text.
As complimentary information to #mbrenon 's answer: Tesseract only does text recognition and doesn't support ROI text extraction, so you will need to add that to your system after capturing your image.
You can read these academic papers which report on the additional steps for using Tesseract on mobile phones and provide some evaluation performances:
TranslatAR: Petter, M.; Fragoso, V.; Turk, M.; Baur, Charles, "Automatic text detection for mobile augmented reality translation," Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on , vol., no., pp.48,55, 6-13 Nov. 2011
Mobile Camera Based Detection and Translation
Using an Android (2.3.3) phone, I can use the camera to retrieve a preview with the onPreviewFrame(byte[] data, Camera camera) method to get the YUV image.
For some image processing, I need to convert this data to an RGB image and show it on the device. Using the basic java / android method, this runs at a horrible rate of less then 5 fps...
Now, using the NDK, I want to speed things up. The problem is: How do I convert the YUV array to an RGB array in C? And is there a way to display it (using OpenGL perhaps?) in the native code? Real-time should be possible (the Qualcomm AR demos showed us that).
I cannot use the setTargetDisplay and put an overlay on it!
I know Java, recently started with the Android SDK and have zero experience in C
Have you considered using OpenCV's Android port? It can do a lot more than just color conversion, and it's quite fast.
A Google search returned this page for a C implementation of YUV->RGB565. The author even included the JNI wrapper for it.
You can also succeed by staying with Java. I did this for the imagedetectíon of the androangelo-app.
I used the sample code which you find here by searching "decodeYUV".
For processing the frames, the essential part to consider is the image-size.
Depending on the device you may get quite large images. i.e. for the Galaxy S2
the smallest supported previewsize is 640*480. This is a big amount of pixels.
What I did, is to use only every second row and every second column, after yuvtorgb decoding. So processing a 320*240 image works quite well and allowed me to get frame-rates of 20fps. (including some noise-reduction, a color-conversion from rgb to hsv and a circledetection)
In addition You should carefully check the size of the image-buffer provided to the setPreview function. If it is too small, the garbage-collection will spoil everything.
For the result you can check the calibration-screen of the androangelo-app. There I had an overlay of the detected image over the camera-preview.