Using OCR mobile vision to anchor image to detected text - android

I am using the Text Recognition (mobile vision/ML) by Google to detect text on Camera feed. Once I detect text and ensure it is equal to "HERE WE GO", I draw a heart shape beside the detected text using the passed boundries.
The problem I am facing that the shape is jumping and lagging behind. I want it more like Anchored to the detected text. Is there something I can do to improve that?
I heard about ArCore library but it seems it is based on existing images to determine the anchor however in my case it can be any text that matches "HERE WE GO".
Any suggestions ?

I believe you are trying to overlay text on the camera preview in realtime. There will be small delay between the camera input and detection. Since the API is async by the time the output returns you would be showing another frame.
To alleviate that you can either make the processing part sync with using some lock/mutex or overlay another image that only refreshes after the processing is done.
We have some examples here: https://github.com/firebase/quickstart-android/tree/master/mlkit
and also I fixed a similar problem on iOS by using DispatchGroup https://github.com/googlecodelabs/mlkit-ios/blob/master/translate/TranslateDemo/CameraViewController.swift#L245

Option 1: Refer tensor flow android sample here
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android
especially these classes:
1. Object tracker: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/tracking/ObjectTracker.java
2.Overlay
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/OverlayView.java
3.Camera Activity and Camera Fragment https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/CameraActivity.java
Option 2: A sample code can be found in below code lab. They are doing something similar for barcode.
https://codelabs.developers.google.com/codelabs/barcodes/index.html?index=..%2F..index#0

Related

Android ARCore - how to render a UI element without ArSceneView

I am a beginner in ARCore and I need to display an AR object than can be tapped and can respond with an action (e.g. displaying another activity).
I have tried to do it using examples such as this one - https://creativetech.blog/home/ui-elements-for-arcore-renderable which use sceneform to display UI elements. But sceneform has some disadvantages for my application, and also I do not need plane detection. My questions are:
Can I display a 'tappable' object, a UI element such as button or a textview, but with GLSurfaceView instead of sceneform?
If UI elements cannot bi displayed this way, is it possible to react to a tap on an object displayed on a GLSurfaceView?
Sceneform ha now been 'open sourced and archived' - see the note at (https://developers.google.com/sceneform/develop).
The main example, at this time, for ARCore is OpenGL based and will allow you display an AR object as I think you want.
Have a look here for the overview: https://developers.google.com/ar/develop/java/quickstart
Some of the links to the code appear to be broken at the moment but is available here (look at the 'hello_ar-java' sample: https://github.com/google-ar/arcore-android-sdk/tree/master/samples

empty view when displaying WebView (WebGL an Video) inside 3D app

I inserted a webview in my 3D Game. I can manage to retrieve URL and display it on a canvas and texture it in my 3D rendering.
Everything work fine except from the Video the WebGL and Threeje rendering.
For this 3 case I cannot see anything except the 2 canvas from threeJS (the frame canvas information) and ( the check box option)
In all these 3 case the view where WebGL or video are empty should be rendered are empty.
For the video I found out that I can see the image only when I touch the seekbar under the play button and audio button.
So it looks like the rendering view seems to be hidden or not visible or something else.
So I tried to get the view rendered by WebView to check the status. But WebView only got One View and it is not possible to get the elements view inside to check it.
So I am not sure if the problem come from the view status or from conflict with the 3D environment.
So I would like to know if someone have an idea about the problem. Or if someone could tell me how to retrieve the detailed view from webview, if it is possible.
NEW INFORMATION
I think that the problem could come from:
running WEBGL(opengl 1.4) from WebView inside GvrActivity(OpenGL es 2)
may cause a conflict when rendering both Android OpenGL at same time.
Concerning the media and audio. I am also running voice recognition
(Context.AUDIO_SERVICE) and mediaplayer to watch loader video.
LAST TESTING:
So if I run WebView in another activity everything is fine. The problem is that I would like to retrieve an access to this activity to get the Webview VIEW displayed in the main activity layout.
Is that possible?
LAST TESTING:
Changing Context Activity starting WebView does not resolve the problem.
So I found out that if I attach the WebView to the getWindow().addContentView
there is no problem But is if I add the view to MyGLSurfaceView extends GvrView
i got the problem. An I think it is because I already use the Media player to render video To 3D mesh and OpenGL to draw The 3D scene. But I am not sure of that.
LAST TESTING : (je rame (french expression))
I try everything. I think I did. And I think that after 3 weeks I start to be out of resources.
Concerning the audio from the webview I am sure a get in conflict with the voice recognition mAudioManager = (AudioManager) context.getSystemService(Context.AUDIO_SERVICE); Which I think is use by Webview because I can use it displaying video inside canvas in 3D OpenglES.
Concerning the video from Webview in need to touch The progression bar to see something. But as Webview is a black box. No way to get the progress bar view and act on it.
Concerning WebGL and threeJS. I can display web text and image but nothing related to opengl, white display but not transparent, because a set transparency to check. Can only be displayed outside of the OPenGL environment and outside of Surface and surfacetexture.
Concerning Webview: Only one view output. So all the view rendering seems to be done internally using tag parser for position and canvas construction, so no view may be used. But is I consider that the ourput view of Webview is a complete bitmap of all the tag canvas (that I my one interpretation it is easy to use tag tree for rendering fast access and easy structured design information structure). But I am wondering in this case why WebGL and ThreeJs cannot be copied using surfacetexture.updateTexImage() if webview output is a bitmap canvas.
Because everything are 2D canvas when they are displayed inside view.
So 3 week trying to find answer. I hope that someone will find it.
Because I was planing to do an ART gallery in VR where anyone could watch video or 3D ART. For the video and 360 video I could make it but not for WebGL and ThreeJS.
3D inside 3D is the top in ART technology. Imagine that you could go in any 3D shop or Accessing Web URL to any Video of artiste or whatever. The last thing is possible but not the WEGgl and TrheeJS. Watching YouTube 360 VR is easy a can make it, it works very well with voice command, just Hack to get the download address, easy. Why to block when the best is to be open and let the imagination create the tomorrow tools?
So I give up and go back to my OPenCL recognition application which is nearly finished. Form recognition could be great new tools.
By the way. Do not hesitate to ask me for my APK if you want to check by yourself. Stack Overflow are allowed to contact me as they got my email address.
LAST TEST (07/07/2020)
So i think I understood the problem concerning the non accessibility to the Video and webgL display. The first intention concerning Video is surely the fact that they wanted to avoid the ability to copy the video and the problem of optimization of the rendering. Concerning WebGL their is no need to avoid the copying. But it need to be optimized to avoid to redraw the all webview picture when only one part is modified. So it look like the rendering of video and webGL is done in another thread and that the rendering is independent of the other HTML TAG.
Some said that because it is done in other thread it is not possible to access it. Which is false, because the canvas where the rendering is done must be accessible if we want it to be accessible. It is just a bitmap for the video and a frame buffer(viewport) for webGL. I agree that is done by GPU card and not by software but the resulting canvas(rectangle) could be accessible because at the end it is displayed. The problem is that it is not part anymore of the WebView final view, they just send the position where the display must be done to the GPU as viewport coordinate.
So when I try to SurfaceTexture.updateTexImage() I just get some part of the web page, no video and no webgl.
Here is an example to anderstand:
if you load this URL "https://webglfundamentals.org/webgl/webgl-2d-triangle-with-position-for-color.html" and look at the code in you browser.
I can see and manipulate the slider without problem but the GL result is not visible. Because the canvas is an openGL context and it is process by the GPU and directly copy to the final frame buffer at the right place but it is not part of the Webview final view, which is unique. For the video if a can get the slider i can have some picture but i have to get it and check is modification, hard work and i am not really interested by video, found other way to watch it in OpenGL .
So I think that Android could make an effort to try to give the access to the Canvas for webgl rendering. And find a way to give an access to it inside WebView. 3D inside 3D is already 4 dimension. Could be good. But not available at the moment. But it will.
I will look at it, just by curiosity. But no chance I think.
here is the only answer i have found :
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/3wrULcul8lw
today (24/09/2020) i read this :
DRM-protected video can be presented only on an overlay plane. Video players that support protected content must be implemented with SurfaceView. Software running on unprotected hardware can't read or write the buffer; hardware-protected paths must appear on the Hardware Composer overlay (that is, protected videos disappear from the display if Hardware Composer switches to OpenGL ES composition).
30/09/2020 :
i am able to see and manipulate WebGl display using Surface.lockHardwareCanvas but a looze the focus during manipulation and sometimes it does not work, little probleme ;)).
But using lockHardwareCanvas when a display youtube web page with video or without, everything is frezzing, cannot udtate the view neither scroll it.
Mais ça avance ;))
Here is my solution.
WebGL and Jthree can be display using Surface.lockHardwareCanvas
the problem is using Surface.lockHardwareCanvas with htlm other than accelerated View. In this case there is no frame to listen so no refreshing of the view. so must be done in other way.
concerning the OpenGL,WebGL and Jthree. It depends on the application somme are not weel displayed for many reason (no GL_TEXTURE_EXTERNAL_OES, canvas not facteur of 2, and bad view size). And the problem is that there is no new onframeavailable so need to find other way to refresh the view.
It took quite a long time to anderstand but it is working even in GVRActivity ;))
Regards.

How to render in Unity without the editor/scene/player being visible?

In my Android-Unity app I have some data such as text, text size, text coordinates etc that I will use to create the output and store it as a screenshot I will later use in my app.
But I need to make this happen when the user isn't seeing the Unity player/scene. So my question is that is it possible to render the contents and then take a screenshot of the same without the user seeing the Unity editor/player/scene whatever one may call it? Is there a way to do it in the background? Thanks!
I'm not sure exactly what you're trying to accomplish but I can clarify the use of a culling mask here. Here is what the scene looks like. I have embedded the view from Camera 2 into the bottom left of the game. You can see that Camera 2 is not displaying the floor. This is because I set it's culling to only respond to objects tagged with a custom layer here. Where as the Main Camera's culling mask is set to everything. Now, anything I tag with the "Custom" layer will be visible in the second camera and everything else would not be. I'm assuming what you want to do from here is to tag the things you want visible when you take a screenshot with a specific layer, then set the Culling Mask of your "Screenshot Camera" to that layer and take a screenshot with that Camera. That way you can specify what Objects/UI want visible in that camera.

Android add real time overlay to camera feed

I am trying to add a real time overlay to video capture from the camera feed. Which api should I use and how?
Idea is below,
Get camera feed (left)
Generate overlay from the feed, I'm using deep models here (middle)
Add overlay on top of the original video feed in real time(right)
OpenCV (https://opencv.org) will allow you take your video feed, and frame by frame:
load the frame
analyse it and generate your overlay
add your overlay to the frame (or replace the frame with your merged overlay)
display and/or save the frame
Depending on the amount of processing you need to do, the platform or device you are running on and whether you need it in real time you may find it hard to complete this every frame for high frame rates. One solution to this, if it is ok for your problem domain, is to only do the processing very nth frame.
I have used something similar with the GraphicOverlay for text.
Also, ViewOverlay may be something to look into.

Android: How to detect these objects in images? (Image included). Tried OpenCV and metaioSDK, but both are not working good enough

i have been working with object detection / recognition in images captured from an android device camera recently.
the object i am trying to detect are all kinds of buttons that look like this:
Picture of buttons
so far i have been trying with OpenCV and also with the metaio SDK. results:
OpenCV was always detecting something, but gave lots of false hits. also it is too much work to collect all the pictures for what i have in mind. i have tried three ways with OpenCV:
FeatureDetection (SURF, ORB and so on) -> was way too slow and not enough features on my objects.
Template Matching -> seems to only work when the template is exactly a part out of the scene image
Training classifiers -> this worked the best so far, but is too much work for my goal, and still gives too many false detections.
metaioSDK was working ok when i took my reference images (the icon part of each button) out of a picture like shown above, then printed the full image and pointed my android device camera at the printed picture. but when i tried with the real buttons (not a picture of them) then almost nothing got detected anymore. in the metaio documentation it is said that the reference images need to have lots of features and color differences and also should not only consist of white text. well, as you see my reference images are exactly the opposite from what they should be. but thats just how the buttons look ;)
so, my question would be: does any of you have a suggestion about what else i could try to detect and recognize each of those buttons when i point my android camera at them?
As a suggestion can you try the following approach:
Class-Specific Hough Forest for Object Detection
they provide a C code implementation. Compile and run it and see the results, then replace positive and negative training images with the ones you have according the following rules:
In a car you will need to define the following 3 areas:
target region (the image you provided is a good representation of a target region)
nearby working area (this area have information regarding you target relative location) I would recommend: area 3-5 times the target regions, around the target, can be a good working area
everything outside the above can be used as negative images
then,
Use "many" positive images (100-1000) at different viewing angles (-30 - +30 degrees) and various distances.
You will have to make assumptions at which viewing angles and distances your users will use the application. The more strict they are the better performance you will get. A simple "hint" camera overlay can give a good idea to people what you expect the working area to be.
Use few times (3-5) more different negative image set which includes pictures of things that might be in the camera but should not contribute any target position information.
Do not use big images, somewhere around 100-300px in width should be enough
Assemble the database, and modify the configuration file that the code comes with. Run the program, see if performance is OK for your needs.
The program will return a voting map cloud of the object you are looking fore. Add gaussian blur to it, and apply some threshold to it (you will have to make another assumption for this threshold value).
Extracted mask will define the area you are looking for. The size of the masked region can give you good estimate of the object scale. Given this information it will be much easier to select proper template and perform template matching.
(Also some thoughts) You can also try to do a small trick by using goodFeaturesToTrack function with the mask you got, to get a set of locations and compare them with the corresponding locations on a template. Constuct an SSD and solve it for rotation, scale and transition parameters, by mimizing alignment error (but not sure if this approach will work)

Categories

Resources