Detect x y coordinates of specific text - android

I tried write an automation for my games in Android with Tasker and AutoTools plugins. Somethings ok at this point but i need capture the screenshot and need interpret it for my needs.
That's exactly what I need;
Some texts are important in games and i want to click on it wherever they are in the screen. So i need OCR for this task i think. I follow some solutions but fail or stuck everytime. Let me explain which solutions i tried.
Following Solution 1:
I tried AutoInput (Tasker plugin) UIQuery method but fail. Because UIQuery of AutoInput just works on android UI i think. Cant get any information from 3D App like games.
Following Solution 2:
I search OCR solution and find AutoTools (Tasker plugin)
Create a task and take screenshot and interpret it with AutoTools OCR method. Thats ok. AutoTools OCR succesfully read a text from Image file.
But i stuck again. Because i succesfully read a text from image file but i dont know x y coordinate of important text.
What suggest at this point?
Should i learn android and write own app?

You should checkout out the ocr-reader Google sample. It's quick to run and not too difficult to get what you're looking for. What you would need to do is modify the OcrDetectorProcess that comes with the sample to break down the text into individual words, then you can easily calculate the boundaries and center points of each word. Here's some code to get you started:
#Override
public void receiveDetections(Detector.Detections<TextBlock> detections) {
mGraphicOverlay.clear();
// Get all detected items.
SparseArray<TextBlock> items = detections.getDetectedItems();
for (int i = 0; i < items.size(); ++i) {
TextBlock item = items.valueAt(i);
// Get individual lines in each item.
List<Line> lines = (List<Line>) item.getComponents();
for (Line line : lines) {
// Get individual "words" in each line.
List<Element> elements = (List<Element>) line.getComponents();
for (Element e : elements) {
// Now get the position of each element.
Rect rect = e.getBoundingBox();
Point[] points = e.getCornerPoints();
int centerX = (points[0].x + points[2].x) / 2;
int centerY = (points[0].y + points[2].y) / 2;
// DO STUFF
}
}
}
}

I contact with the developer who write the "AutoTools" Tasker plugin.
He/She add some function to plugin and solve it.
Plugin, interpret with OCR granted image and return words and center of xy positions of each words now.
If anyone search like this function for Android and Tasker App please visit this forum topic link. Its very useful.

Related

Generate and export point cloud from Project Tango

After some weeks of waiting I finally have my Project Tango. My idea is to create an app that generates a point cloud of my room and exports this to .xyz data. I'll then use the .xyz file to show the point cloud in a browser! I started off by compiling and adjusting the point cloud example that's on Google's github.
Right now I use the onXyzIjAvailable(TangoXyzIjData tangoXyzIjData) to get a frame of x y and z values; the points. I then save these frames in a PCLManager in the form of Vector3. After I'm done scanning my room, I simple write all the Vector3 from the PCLManager to a .xyz file using:
OutputStream os = new FileOutputStream(file);
size = pointCloud.size();
for (int i = 0; i < size; i++) {
String row = String.valueOf(pointCloud.get(i).x) + " "
+ String.valueOf(pointCloud.get(i).y) + " "
+ String.valueOf(pointCloud.get(i).z) + "\r\n";
os.write(row.getBytes());
}
os.close();
Everything works fine, not compilation errors or crashes. The only thing that seems to be going wrong is the rotation or translation of the points in the cloud. When I view the point cloud everything is messed up; the area I scanned is not recognizable, though the amount of points is the same as recorded.
Could this have to do something with the fact that I don't use PoseData together with the XyzIjData? I'm kind of new to this subject and have a hard time understanding what the PoseData exactly does. Could someone explain it to me and help me fix my point cloud?
Yes, you have to use TangoPoseData.
I guess you are using TangoXyzIjData correctly; but the data you get this way is relative to where the device is and how the device is tilted when you take the shot.
Here's how i solved this:
I started from java_point_to_point_example. In this example they get the coords of 2 different points with 2 different coordinate system and then write those coordinates wrt the base Coordinate frame pair.
First of all you have to setup your exstrinsics, so you'll be able to perform all the transformations you'll need. To do that I call mExstrinsics = setupExtrinsics(mTango) function at the end of my setTangoListener() function. Here's the code (that you can find also in the example I linked above).
private DeviceExtrinsics setupExtrinsics(Tango mTango) {
//camera to IMU tranform
TangoCoordinateFramePair framePair = new TangoCoordinateFramePair();
framePair.baseFrame = TangoPoseData.COORDINATE_FRAME_IMU;
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_COLOR;
TangoPoseData imu_T_rgb = mTango.getPoseAtTime(0.0,framePair);
//IMU to device transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_DEVICE;
TangoPoseData imu_T_device = mTango.getPoseAtTime(0.0,framePair);
//IMU to depth transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_DEPTH;
TangoPoseData imu_T_depth = mTango.getPoseAtTime(0.0,framePair);
return new DeviceExtrinsics(imu_T_device,imu_T_rgb,imu_T_depth);
}
Then when you get the point Cloud you have to "normalize" it. Using your exstrinsics is pretty simple:
public ArrayList<Vector3> normalize(TangoXyzIjData cloud, TangoPoseData cameraPose, DeviceExtrinsics extrinsics) {
ArrayList<Vector3> normalizedCloud = new ArrayList<>();
TangoPoseData camera_T_imu = ScenePoseCalculator.matrixToTangoPose(extrinsics.getDeviceTDepthCamera());
while (cloud.xyz.hasRemaining()) {
Vector3 rotatedV = ScenePoseCalculator.getPointInEngineFrame(
new Vector3(cloud.xyz.get(),cloud.xyz.get(),cloud.xyz.get()),
camera_T_imu,
cameraPose
);
normalizedCloud.add(rotatedV);
}
return normalizedCloud;
}
This should be enough, now you have a point cloud wrt you base frame of reference.
If you overimpose two or more of this "normalized" cloud you can get the 3D representation of your room.
There is another way to do this with rotation matrix, explained here.
My solution is pretty slow (it takes around 700ms to the dev kit to normalize a cloud of ~3000 points), so it is not suitable for a real time application for 3D reconstruction.
Atm i'm trying to use Tango 3D Reconstruction Library in C using NDK and JNI. The library is well documented but it is very painful to set up your environment and start using JNI. (I'm stuck at the moment in fact).
Drifting
There still is a problem when I turn around with the device. It seems that the point cloud spreads out a lot.
I guess you are experiencing some drifting.
Drifting happens when you use Motion Tracking alone: it consist of a lot of very small error in estimating your Pose that all together cause a big error in your pose relative to the world. For instance if you take your tango device and you walk in a circle tracking your TangoPoseData and then you draw you trajectory in a spreadsheet or whatever you want you'll notice that the Tablet will never return at his starting point because he is drifting away.
Solution to that is using Area Learning.
If you have no clear ideas about this topic i'll suggest watching this talk from Google I/O 2016. It will cover lots of point and give you a nice introduction.
Using area learning is quite simple.
You have just to change your base frame of reference in TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION. In this way you tell your Tango to estimate his pose not wrt on where it was when you launched the app but wrt some fixed point in the area.
Here's my code:
private static final ArrayList<TangoCoordinateFramePair> FRAME_PAIRS =
new ArrayList<TangoCoordinateFramePair>();
{
FRAME_PAIRS.add(new TangoCoordinateFramePair(
TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION,
TangoPoseData.COORDINATE_FRAME_DEVICE
));
}
Now you can use this FRAME_PAIRS as usual.
Then you have to modify your TangoConfig in order to issue Tango to use Area Learning using the key TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION. Remember that when using TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION you CAN'T use learningmode and load ADF (area description file).
So you cant use:
TangoConfig.KEY_BOOLEAN_LEARNINGMODE
TangoConfig.KEY_STRING_AREADESCRIPTION
Here's how I initialize TangoConfig in my app:
TangoConfig config = tango.getConfig(TangoConfig.CONFIG_TYPE_DEFAULT);
//Turning depth sensor on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DEPTH, true);
//Turning motiontracking on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_MOTIONTRACKING,true);
//If tango gets stuck he tries to autorecover himself.
config.putBoolean(TangoConfig.KEY_BOOLEAN_AUTORECOVERY,true);
//Tango tries to store and remember places and rooms,
//this is used to reduce drifting.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION,true);
//Turns the color camera on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_COLORCAMERA, true);
Using this technique you'll get rid of those spreads.
PS
In the Talk i linked above, at around 22:35 they show you how to port your application to Area Learning. In their example they use TangoConfig.KEY_BOOLEAN_ENABLE_DRIFT_CORRECTION. This key does not exist anymore (at least in Java API). Use TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION instead.

Create a Manga App

I've asked this before but apparently I was too broad on my description so i'll give it a try again. I'm using a library from Flandmark to actually use facial recognition of a person - figure out where their eyes, nose and mouth are. After that want I want to do is to generate a manga image of the person. I'm not sure how to do this. The first way I thought of was using a large database of manga images of specific areas such as the eyes, and map them to the original image. Question is, is there a way I can make the image look like a manga image in terms of background, colours, etc.
The first thing I thought would be useful is to get the size of the eyes and width of the mouth. This is done using this part of the Flandmark code:
flandmark_detect(input, bbox, model, landmarks);
// display landmarks
cvRectangle(orig, cvPoint(bbox[0], bbox[1]), cvPoint(bbox[2], bbox[3]), CV_RGB(255,0,0) );
cvRectangle(orig, cvPoint(model->bb[0], model->bb[1]), cvPoint(model->bb[2], model->bb[3]), CV_RGB(0,0,255) );
cvCircle(orig, cvPoint((int)landmarks[0], (int)landmarks[1]), 3, CV_RGB(0, 0,255), CV_FILLED);
for (int i = 2; i < 2*model->data.options.M; i += 2)
{
cvCircle(orig, cvPoint(int(landmarks[i]), int(landmarks[i+1])), 3, CV_RGB(255,0,0), CV_ED);
}
Any help would be appreciated as I don't know the best way to do this and im really stuck. Thanks

Box2d - Is there a way to check whether there is a body at a specific location?

In my game, a body is randomly relocated on the screen after the user does something. However, if the object is relocated on top of another body, then both are pushed slightly (to make room!). I would like to check the location of the randomly generated coordinates first, so that the relocation only takes place if the position is free (within a certain diameter anyway).
Something like.. location.hasBody(). There surely must be a function for this that I haven't found. Thanks!
There is no way to query a world with a point and get the body, but what you can do is query the world with a small box:
// Make a small box.
b2AABB aabb;
b2Vec2 d;
d.Set(0.001f, 0.001f);
aabb.lowerBound = p - d;
aabb.upperBound = p + d;
// Query the world for overlapping shapes.
QueryCallback callback(p);
m_world->QueryAABB(&callback, aabb);
if (callback.m_fixture)
{
//it had found a fixture at that position
}
Solution originally posted here: Cocos2d-iphone forum
Not sure if box2d includes a 'clean' way to do it. I'd just manually iterate over all bodies in the world just before adding a new one, and manually check if their positions + radio/size overlap with the new body shape.
try
b2Vec2 vec = body->GetPosition(); // in meters
or
CGPoint pos = ccp(body->GetPosition().x * PTM_RATIO, body->GetPosition().y * PTM_RATIO); // in pixels

how to merge Images and impose on each other

Suppose I'm uploading two or more than two pics in some Framelayout. Hereby I'm uploading three pics with a same person in three different position in all those three pictures. Then what image processing libraries in Android or java or Native's are available to do something as shown in the pic.
I would like to impose multiple pictures on each other.
Something like these:-
One idea is to :
Do some layering in all those pictures and find mismatching areas in the pics and merge them.
How one can merge multiple picture with other? By checking the di-similarity and merge with each other?
Are there any Third party Api's or some Photoshop service which can help me in doing these kinda image processing?
In this case you are not just trying to combine the images. You really want to combine a scene containing the same object in different positions.
Therefore, it is not just a simple combination or an alpha compositve where the color of a given pixel in the output image is the sum of the value of this pixel in each image, divided by the number of images.
In this case, you might do:
Determine the scene background analysing the pixels that do not change considering multiple images.
Begin with the output image being just the background.
For each image, remove the background to get the desired object and combine it with the output image.
There is a Marvin plug-in to perform this task, called MergePhoto. The program below use that plug-in to combine a set of parkour photos.
import marvin.image.MarvinImage;
import marvin.io.MarvinImageIO;
import marvin.plugin.MarvinImagePlugin;
import marvin.util.MarvinPluginLoader;
public class MergePhotosApp {
public MergePhotosApp(){
// 1. load images 01.jpg, 02.jpg, ..., 05.jpg into a List
List<MarvinImage> images = new ArrayList<MarvinImage>();
for(int i=1; i<=5; i++){
images.add(MarvinImageIO.loadImage("./res/0"+i+".jpg"));
}
// 2. Load plug-in and process the image
MarvinImagePlugin merge = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.combine.mergePhotos");
merge.setAttribute("threshold", 38);
// 3. Process the image list and save the output
MarvinImage output = images.get(0).clone();
merge.process(images, output);
MarvinImageIO.saveImage(output, "./res/merge_output.jpg");
}
public static void main(String[] args) {
new MergePhotosApp();
}
}
The input images and the output image are shown below.
I don't know if this will qualify to be in your definition of "natives", but there is the following .NET library that could help: http://dynamicimage.apphb.com/
If the library itself can give you want you want, then depending on your architecture you could set up a small ASP.NET site to do the image manipulation on the server.
Check the accepted answer here.
In above link there is merging of two images which is done by openCV sdk.
If you dont want to use openCV and just want to try with your self then you will have to play little with framlayout and with three imageview. Give options to user to select specific part of the image to show for all three images. So the selected part will be shown of the selected image. on this way you will get the result like above what you have said.
Hope you got my point. If not then let me know.
Enjoy coding... :)
You can overlay the images using openCV you can check at OpenCV and here or here
// Read the main background image
cv::Mat image= cv::imread("Background.png");
// Read the mans character image to be placed
cv::Mat character= cv::imread("character.png");
// define where you want to place the image
cv::Mat newImage;
//The 10,10 are the initial coordinates in pixels
newImage= image(cv::Rect(10,10,character.cols,character.rows));
// add it to the background, The 1 is the aplha values
cv::addWeighted(newImage,1,character,1,0,newImage);
// show result
cv::namedWindow("with character");
cv::imshow("with character",image);
//Write Image
cv::imwrite("output.png", newImage);
or you can create it as a watermark effect
Or you can try it in java like merging two images
try using this class
public class MergeImages {
public static void main(String[] args) {
File inner = new File("Inner.png");
File outter = new File("Outter.png");
try {
BufferedImage biInner = ImageIO.read(inner);
BufferedImage biOutter = ImageIO.read(outter);
System.out.println(biInner);
System.out.println(biOutter);
Graphics2D g = biOutter.createGraphics();
g.setComposite(AlphaComposite.getInstance(AlphaComposite.SRC_OVER, 0.8f));
int x = (biOutter.getWidth() - biInner.getWidth()) / 2;
int y = (biOutter.getHeight() - biInner.getHeight()) / 2;
System.out.println(x + "x" + y);
g.drawImage(biInner, x, y, null);
g.dispose();
ImageIO.write(biOutter, "PNG", new File("Outter.png"));
} catch (Exception e) {
e.printStackTrace();
}
}
}

How the "FBReader" do the pagination of html files in epub

I'm trying to make an epub reader
I want to do the pagination like fbreader does
Now I have source code of fbreader, but I don't know where it implement pagination
I have my implementation on other features
All I need from fbreader is the pagination
Is there anyone who have done the similar thing?
Thanks for your time to read this question.
ps: the pagination is to spit html file to pages, depending on the size of screen and size of font, and language is also in consideration, when changed the font size, the page number also changed. And epub file content is html format
It is fascinating code. I would love to see a translation of the original student project (but I presume the original document is in Russian). As this is a port of a C++ project it has an interesting style of coding in places.
The app keeps track of where you are in the book by using paragraph cursors (ZLTextParagraphCursor). This situation is comparative with database cursors and record pagination. The class that is responsible for serving up the current page and calculating the number of pages is ZLTextView.
As epubs are reflowable documents and not page-oriented there isn't really a concrete definition of a page - it just depends on where in the document you happen to be looking (paragraph, word, character) and with what display settings.
As McLaren says, FBReader doesn't implement pagination: It uses the ZLibrary, which is available from the same website as FBReader.
The original code uses this to calculate the current page number:
size_t ZLTextView::pageNumber() const {
if (textArea().isEmpty()) {
return 0;
}
std::vector<size_t>::const_iterator i = nextBreakIterator();
const size_t startIndex = (i != myTextBreaks.begin()) ? *(i - 1) : 0;
const size_t endIndex = (i != myTextBreaks.end()) ? *i :
textArea().model()->paragraphsNumber();
return (myTextSize[endIndex] - myTextSize[startIndex]) / 2048 + 1;
}
The Java version uses this function to compute the page number:
private synchronized int computeTextPageNumber(int textSize) {
if (myModel == null || myModel.getParagraphsNumber() == 0) {
return 1;
}
final float factor = 1.0f / computeCharsPerPage();
final float pages = textSize * factor;
return Math.max((int)(pages + 1.0f - 0.5f * factor), 1);
}
This is located in org.geometerplus.zlibrary.text.view.TextView
It's so simplistic, though, that you might as well implement your own.
How I understood it is that it uses 3 bitmaps previous current and next. What they have done is written a text which gets stored and read over this 3 bitmaps. Over as what you see on the top they calculate paragraphs data of how long it is for the scroll you see on the others example. You can start reverse engineering at android.view package class bitmapManager. This should explain everything about how they do their paging.

Categories

Resources