Does anyone have a code snippet showing how to do OCR using LeadTools in real-time? I want the OCR to be for a specific region within the camera preview. I am doing the development on Android.
You can add an OCRZone to the IOCRPage Zone collection prior to calling Recognize.
If there are no zones present in the IOcrZoneCollection prior to calling Recognize, the AutoZone method is called internally and the image is segmented automatically. If there is a zone present, then only that zone is used in the recognition.
Here is some sample code you can use in your Android application:
mOcrEngine = OcrEngineManager.createEngine(OcrEngineType.Advantage);
mOcrEngine.startup(codecsForOCR, "", OCR_RUNTIME_DIRECTORY, sharedLibsPath);
document = mOcrEngine.getDocumentManager().createDocument();
image=imgViewer.getImage();
ocrPage = document.getPages().addPage(image, null);
//the left/top/width/height are from your camera settings
LeadRect rect = new LeadRect(left, top, width, height);
OcrZone zone = new OcrZone();
zone.setBounds(rect).
zone.setZoneType(OcrZoneType.TEXT.getValue());
ocrPage.getZones().addZone(ocrZone);
ocrPage.recognize(null);
If you need any addional assistance with this, feel free to contact our free Technical support for the SDK here:
https://www.leadtools.com/support/chat
Related
I am using this part of code for detection but whatever I do it can't detect Machine Readable Zone (MRZ).
String recognizedText = firebaseVisionCloudText.getText();
for (FirebaseVisionCloudText.Page page: firebaseVisionCloudText.getPages()) {
List<FirebaseVisionCloudText.DetectedLanguage> languages =
page.getTextProperty().getDetectedLanguages();
int height = page.getHeight();
int width = page.getWidth();
float confidence = page.getConfidence();
for (FirebaseVisionCloudText.Block block: page.getBlocks()) {
Rect boundingBox = block.getBoundingBox();
List<FirebaseVisionCloudText.DetectedLanguage> blockLanguages =
block.getTextProperty().getDetectedLanguages();
float blockConfidence = block.getConfidence();
}
}
Is it possible that it can't read the font? If so is there an option to add font?
Also is it possible to combine Ml Kit with Tesseract?
Is it possible that it can't read the font?
It is possible. That said, things like Driver licenses work for Text recognition with ML Kit. Have you tried running the quick starter app or the codelab on your use case?
is there an option to add font?
You cannot add it directly. We will have to update the model with that font. If your use case does not work out, please feel free to reach out to Firebase Support and we will be happy to understand your use case and update the model.
Also is it possible to combine Ml Kit with Tesseract?
Definitely. You will have to do it yourself though outside of the ML Kit API call.
I am working with Google Project Tango and I tried a basic example with getting pose data:
TangoCoordinateFramePair pair;
pair.base = TANGO_COORDINATE_FRAME_START_OF_SERVICE;
pair.target = TANGO_COORDINATE_FRAME_CAMERA_COLOR;
base = TANGO_SUPPORT_ENGINE_OPENGL;
target = TANGO_SUPPORT_ENGINE_OPENGL;
error = TangoSupport_getPoseAtTime(poseTimestamp, pair.base, pair.target, base, target, ROTATION_0, &pose);
This gives TANGO_SUCCESS.
However, if I only change base to this
pair.base = TANGO_COORDINATE_FRAME_IMU;
...I keep getting TANGO_INVALID.
I tried using C API and Unity SDK, and both have a same invalid result.
Why is that? Why can't I use TANGO_COORDINATE_FRAME_IMU?
I am trying to fix Camera offset as mentioned here:
Camera-Offset | Project Tango
but without any success...
TangoSupport_getPoseAtTime only works for getting a pose between a fixed coordinate frame and a moving coordinate frame. The TANGO_INVALID error results from the fact that TANGO_COORDINATE_FRAME_CAMERA_COLOR and TANGO_COORDINATE_FRAME_IMU are both moving coordinate frames.
In order to find the offset between TANGO_COORDINATE_FRAME_IMU and TANGO_COORDINATE_FRAME_CAMERA_COLOR (or between any pair of moving coordinate frames), you need to use TangoService_getPoseAtTime instead.
This code snippet should give you the transform you're looking for:
TangoCoordinateFramePair pair;
pair.base = TANGO_COORDINATE_FRAME_IMU;
pair.target = TANGO_COORDINATE_FRAME_CAMERA_COLOR;
TangoPoseData pose;
TangoErrorType result = TangoService_getPoseAtTime(0.0, pair, &pose);
Note also that since both of these coordinate frames are moving (i.e. in a fixed position with respect to the device, and each other) the pose resulting from this call will not change as the device moves.
After some weeks of waiting I finally have my Project Tango. My idea is to create an app that generates a point cloud of my room and exports this to .xyz data. I'll then use the .xyz file to show the point cloud in a browser! I started off by compiling and adjusting the point cloud example that's on Google's github.
Right now I use the onXyzIjAvailable(TangoXyzIjData tangoXyzIjData) to get a frame of x y and z values; the points. I then save these frames in a PCLManager in the form of Vector3. After I'm done scanning my room, I simple write all the Vector3 from the PCLManager to a .xyz file using:
OutputStream os = new FileOutputStream(file);
size = pointCloud.size();
for (int i = 0; i < size; i++) {
String row = String.valueOf(pointCloud.get(i).x) + " "
+ String.valueOf(pointCloud.get(i).y) + " "
+ String.valueOf(pointCloud.get(i).z) + "\r\n";
os.write(row.getBytes());
}
os.close();
Everything works fine, not compilation errors or crashes. The only thing that seems to be going wrong is the rotation or translation of the points in the cloud. When I view the point cloud everything is messed up; the area I scanned is not recognizable, though the amount of points is the same as recorded.
Could this have to do something with the fact that I don't use PoseData together with the XyzIjData? I'm kind of new to this subject and have a hard time understanding what the PoseData exactly does. Could someone explain it to me and help me fix my point cloud?
Yes, you have to use TangoPoseData.
I guess you are using TangoXyzIjData correctly; but the data you get this way is relative to where the device is and how the device is tilted when you take the shot.
Here's how i solved this:
I started from java_point_to_point_example. In this example they get the coords of 2 different points with 2 different coordinate system and then write those coordinates wrt the base Coordinate frame pair.
First of all you have to setup your exstrinsics, so you'll be able to perform all the transformations you'll need. To do that I call mExstrinsics = setupExtrinsics(mTango) function at the end of my setTangoListener() function. Here's the code (that you can find also in the example I linked above).
private DeviceExtrinsics setupExtrinsics(Tango mTango) {
//camera to IMU tranform
TangoCoordinateFramePair framePair = new TangoCoordinateFramePair();
framePair.baseFrame = TangoPoseData.COORDINATE_FRAME_IMU;
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_COLOR;
TangoPoseData imu_T_rgb = mTango.getPoseAtTime(0.0,framePair);
//IMU to device transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_DEVICE;
TangoPoseData imu_T_device = mTango.getPoseAtTime(0.0,framePair);
//IMU to depth transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_DEPTH;
TangoPoseData imu_T_depth = mTango.getPoseAtTime(0.0,framePair);
return new DeviceExtrinsics(imu_T_device,imu_T_rgb,imu_T_depth);
}
Then when you get the point Cloud you have to "normalize" it. Using your exstrinsics is pretty simple:
public ArrayList<Vector3> normalize(TangoXyzIjData cloud, TangoPoseData cameraPose, DeviceExtrinsics extrinsics) {
ArrayList<Vector3> normalizedCloud = new ArrayList<>();
TangoPoseData camera_T_imu = ScenePoseCalculator.matrixToTangoPose(extrinsics.getDeviceTDepthCamera());
while (cloud.xyz.hasRemaining()) {
Vector3 rotatedV = ScenePoseCalculator.getPointInEngineFrame(
new Vector3(cloud.xyz.get(),cloud.xyz.get(),cloud.xyz.get()),
camera_T_imu,
cameraPose
);
normalizedCloud.add(rotatedV);
}
return normalizedCloud;
}
This should be enough, now you have a point cloud wrt you base frame of reference.
If you overimpose two or more of this "normalized" cloud you can get the 3D representation of your room.
There is another way to do this with rotation matrix, explained here.
My solution is pretty slow (it takes around 700ms to the dev kit to normalize a cloud of ~3000 points), so it is not suitable for a real time application for 3D reconstruction.
Atm i'm trying to use Tango 3D Reconstruction Library in C using NDK and JNI. The library is well documented but it is very painful to set up your environment and start using JNI. (I'm stuck at the moment in fact).
Drifting
There still is a problem when I turn around with the device. It seems that the point cloud spreads out a lot.
I guess you are experiencing some drifting.
Drifting happens when you use Motion Tracking alone: it consist of a lot of very small error in estimating your Pose that all together cause a big error in your pose relative to the world. For instance if you take your tango device and you walk in a circle tracking your TangoPoseData and then you draw you trajectory in a spreadsheet or whatever you want you'll notice that the Tablet will never return at his starting point because he is drifting away.
Solution to that is using Area Learning.
If you have no clear ideas about this topic i'll suggest watching this talk from Google I/O 2016. It will cover lots of point and give you a nice introduction.
Using area learning is quite simple.
You have just to change your base frame of reference in TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION. In this way you tell your Tango to estimate his pose not wrt on where it was when you launched the app but wrt some fixed point in the area.
Here's my code:
private static final ArrayList<TangoCoordinateFramePair> FRAME_PAIRS =
new ArrayList<TangoCoordinateFramePair>();
{
FRAME_PAIRS.add(new TangoCoordinateFramePair(
TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION,
TangoPoseData.COORDINATE_FRAME_DEVICE
));
}
Now you can use this FRAME_PAIRS as usual.
Then you have to modify your TangoConfig in order to issue Tango to use Area Learning using the key TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION. Remember that when using TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION you CAN'T use learningmode and load ADF (area description file).
So you cant use:
TangoConfig.KEY_BOOLEAN_LEARNINGMODE
TangoConfig.KEY_STRING_AREADESCRIPTION
Here's how I initialize TangoConfig in my app:
TangoConfig config = tango.getConfig(TangoConfig.CONFIG_TYPE_DEFAULT);
//Turning depth sensor on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DEPTH, true);
//Turning motiontracking on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_MOTIONTRACKING,true);
//If tango gets stuck he tries to autorecover himself.
config.putBoolean(TangoConfig.KEY_BOOLEAN_AUTORECOVERY,true);
//Tango tries to store and remember places and rooms,
//this is used to reduce drifting.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION,true);
//Turns the color camera on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_COLORCAMERA, true);
Using this technique you'll get rid of those spreads.
PS
In the Talk i linked above, at around 22:35 they show you how to port your application to Area Learning. In their example they use TangoConfig.KEY_BOOLEAN_ENABLE_DRIFT_CORRECTION. This key does not exist anymore (at least in Java API). Use TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION instead.
Is there a known API or way to SCAN the text from a card without actually manually saving (and uploading) the picture? (iOS and Android)
Then I would need to know if that API can determine the marquee within the camera that should be scanned.
I want a behaviour similar to the one of QR scanners, or Augmented Reality apps. Where the user just directs the camera and the action occurs.
I have printed cards with a Redeem code in Text, and including QR will need to change the current card production.
The text is inside a white box, which may make it easier to recognise:
On iOS, you would use CIDetector with an AVCaptureSession. It can be used to process capture session output buffers as they come in from the camera without having to take a picture and provide text scanning.
For text detection, using CIDetector with CIDetectorTypeText will return areas that are likely to have text in them, but you would have to perform additional processing for Optical Character Recognition.
You could also use OpenCV for a not out of the box solution.
You can try this: https://github.com/gali8/Tesseract-OCR-iOS
Usage:
// Specify the image Tesseract should recognize on
tesseract.image = [[UIImage imageNamed:#"image_sample.jpg"] g8_blackAndWhite];
// Optional: Limit the area of the image Tesseract should recognize on to a rectangle
tesseract.rect = CGRectMake(20, 20, 100, 100);
// Optional: Limit recognition time with a few seconds
tesseract.maximumRecognitionTime = 2.0;
// Start the recognition
[tesseract recognize];
Is it possible to make an app that will take a picture from a user's phone's gallery and convert it to a android wearable watch face?
I've been reading up on these Android articles
https://developer.android.com/training/wearables/watch-faces/index.html
https://developer.android.com/training/wearables/watch-faces/drawing.html
and it seems if I can get a user to select a picture from the gallery and convert it to a bitmap it would then be plausible to set that as the watch face. I'm definitely a beginner programmer when it comes to Android and apks.
Confirmation from a more advanced Android developer would be great.
Now where I'm getting confused is if the picking of the picture would happen on the user's phone and send it to the android wearable app or if the wearable app has the ability to access the gallery of the user's phone and select it directly. Does anyone know if wearable apps can access the gallery of a users phone?
Assuming I already have a reference of the image selected it would be something like this? Correct me if I'm wrong. (Taken from second article under "Initialize watch face elements"
#Override
public void onCreate(SurfaceHolder holder) {
super.onCreate(holder);
// configure the system UI (see next section)
...
// load the background image
Resources resources = AnalogWatchFaceService.this.getResources();
//at this point the user should have already picked the picture they want
//so set "backgroundDrawable" to the image the user picture
int idOfUserSelectPicture = GetIdOfUserSelectedPictureSomehow();
Drawable backgroundDrawable = resources.getDrawable(idOfUserSelectPicture, null);
//original implementation from article
//Drawable backgroundDrawable = resources.getDrawable(R.drawable.bg, null);
mBackgroundBitmap = ((BitmapDrawable) backgroundDrawable).getBitmap();
// create graphic styles
mHourPaint = new Paint();
mHourPaint.setARGB(255, 200, 200, 200);
mHourPaint.setStrokeWidth(5.0f);
mHourPaint.setAntiAlias(true);
mHourPaint.setStrokeCap(Paint.Cap.ROUND);
...
// allocate a Calendar to calculate local time using the UTC time and time zone
mCalendar = Calendar.getInstance();
}
Thank you for any and all help.
The way to implement this would be to create a configuration Activity that runs on the phone, that picks from an image on your device. You can then send this image as an Asset via the Data Layer http://developer.android.com/training/wearables/data-layer/index.html and it will be received on the watch side, and you can then make it the background of the watch face.
It is not possible for an Android Wear device to see the photo collection on your phone, they are totally separate devices and nothing is shared by default unless you write an application that does this.
The Data Layer sample shows how to take a photo on the phone, and then send it to the wearable: https://github.com/googlesamples/android-DataLayer