How to overcome EAST Text Detector slowness in Android app?

How to overcome EAST Text Detector slowness in Android app? - android

Im working on an Android app that's basically detects text and recognizes it (OCR) from pictures taken by users.
Im using Opencv V4.3 and Tesseract V4, and because of the fact that most of the docs (opencv) are in C++ and python, i try to test things in python before implementing them in java - Android.
So for EAST it takes precisely 1.6 seconds to executes on pyhton but in Android app it takes a whole lot more (didnt calculated it yet).
I have been thinking of using either multithreding, or async task for parallel processing bounding boxes (executing time 1 sec in python) but due to im new to mobile app dev and computer vision :/, i wanted to do some researching/testing first and take advice from SOF community.
Thanks.
Code used in Python : https://www.pyimagesearch.com/2018/09/17/opencv-ocr-and-text-recognition-with-tesseract/
Code used in Java : https://gist.github.com/berak/788da80d1dd5bade3f878210f45d6742

After a long search, finally i discovered TFLITE, then i thought about converting EAST Detector model to tflite format although it might a litle bit slower, but its a progress, refer to : Converting EAST to TFLite
Now i begin to consider training my own model for text detection, i'll keep this thread updated for others in the future.

Related

How to record video with AR effects and save it?

I am trying to create an application like Snapchat that applies face filters while recording the video and saves it with the filter on.
I know there are packages like AR core and flutter_camera_ml_vision but these are not helping me.
I want to provide face filters and apply them at the time of video recording on the face, and also save the video with the filter on the face.

Not The Easiest Question To Answer, But...
I'll give it a go, let's see how things turn out.
First of all, you should fill in some more details about the statements given in the question, especially what you're trying to say here:
I know there are packages like AR core and flutter_camera_ml_vision but these are not helping me.
How did you approach the problem and what makes you say that it didn't help you?
In the Beginning...
First of all, let's get some needed basics out of the way to better understand your current situation and level in the prerequisite areas of knowledge:
Do you have any experience using Computer Vision & Machine Learning frameworks in other languages / in other apps?
Do you have the required math skills needed to use this technology?
As you're using Flutter, my guess is that cross-platform compatibility is high priority, have you done much Flutter programming before and what devices are your main targets?
So, What is required for creating a Snapchat-like filter for use in live video recording?
Well, quite a lot of work happens behind the scenes when you apply a filter to live video using any app that implements this in a decent way.
Snapchat uses in-house software that they've built up over years, using technology acquired from multiple multi-million dollar company acquisitions, often established companies that specialized in Computer Vision and AR technology, in addition to their own efforts, and has steadily grown to be quite impressive through the last 5-6 years in particular.
This isn't something you can throw together by yourself as an "all night'er" and expect good results. But there are tools available for easing the general learning curve, but these tools also require a firm understanding of the underlying concepts and technologies being used, and quite a lot of math.
The Technical Detour
OK, I know I may have went a bit overboard here, but this is fundamental building blocks, not so many are aware of the actual amount of computation needed for seemingly "basic" functionality, so please, TLDR; or not, this is fundamental stuff.
To create a good filter for live capture using a camera on something like an iPhone or Android device, you could, and most probably would, use AR as you mentioned you wanted to use in the end, but realize that this is a sub-set of the broad field of Computer Vision (CV) that uses various algorithms from Artificial Intelligence (AI) and Machine Learning (ML) for the main tasks of:
Facial Recognition
Given frames of video content from the live camera, define the area containing a human face (some also works with animals, but let's keep it as simple as possible) and output a rectangle suitable for use as a starting point in (x, y, for width & height).
The analysis phase alone will require a rather complex combination of algorithms / techniques from different parts of the AI universe, and this being video, not a single static image file, this must be continuously updated as the person / camera moves, so it must be done in close to real-time, in the millisecond range.
I believe different implementations combining HOG (Histogram of Oriented Gradients) from Computer Vision and SVMs (Support Vector Machines / Networks) from Machine Learning are still pretty common.
Detection of Facial Landmarks
This is what will define how well a certain effect / filter will adapt to different types of facial features and detect accessories like glasses, hats etc. Also called "facial keypoint detection", "facial feature detection" and other variants in different literature on the subject.
Head Pose Estimation
Once you know a few landmark points, you can also estimate the pose of the head.
This is an important part of effects like "face swap" to correctly re-align one face with another in an acceptable manner. A toolkit like OpenFace (Uses Python, OpenCV, OpenBLAS, Dlib ++) contains a lot of useful functionality, capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation, delivering pretty decent results.
The Compositing of Effects into the Video Frames
After the work with the above is done, the rest involves applying the target filter, dog ears, rabbit teeth, whatever to the video frames, using compositing techniques.
As this answer is starting to look more like an article, I'll leave it to you to go figure out if you want to know more of the details in this part of the process.
Hey, Dude. I asked for AR in Flutter, remember?
Yep.
I know, I can get a bit carried away.
Well, my point is that it takes a lot more than one would usually imagine creating something like you ask for.
BUT.
My best advice if Flutter is your tool of choice would be to learn how to use the Cloud-Based ML services from Google's Firebase suite of tools, Firebase Machine Learning and Google's MLKit.
Add to this some AR-specific plugins, like the ARCore Plugin, and I'm sure you'll be able to get the pieces together if you have the right background and attitude, plus a good amount of aptitude for learning.
Hope this wasn't digressing too far from your core question, but there are no shortcuts that I know of that cut more corners than what I've already mentioned.

You could absolutely use the flutter_camera_ml_vision plugin and it's face recognition which will give you positions for landmarks of a face, such as, nose, eyes etc. Then simply stack the CameraPreview with a CustomPaint(foregroundPainter: widget in which you draw your filters using the different landmarks as coordinates for i.e. glasses, beards or whatever you want at the correct position of the face in the camera preview.
Google ML Kit also has face recognition that produces landmarks and you could write your own flutter plugin for that.
You can capture frames from the live camera preview and reformat them and then it as a byte buffer to ML kit or ML vision. I am currently writing a flutter plugin for ML kit pose detection with live capture so if you have any specific question about that let me know.
You will then have to merge the two surfaces and save to file in appropriate format. This is unknown territory for me so I can not provide any details about this part.

Feasibility of running an ML model on phone hardware?

I've trained a TensorFlow model which takes my RTX2080 several seconds per action (in addition to 20-30 seconds to initialise the model).
I've been looking into turning this into an iOS/Android app running on tensorflow lite, but apart from the technical challenge of converting the model into a tensorflow lite model and everything else,
I am wondering about the feasibility of this running on phone hardware even on a reasonably modern phone with inbuilt GPU would this still likely be too slow for practical purposes?
Can anyone who has built an iOS/Android app with tensorflow lite where the phone is responsible for computation comment on performance and other practical considerations?
The only other option of having requests served by my own server(s) on AWS, for example, would turn into a major expense if the app had significant use.

I have created an image recognition app in swift using TensorFlow lite and did not find any performance issues with it. The prediction took anywhere from 3 - 5 seconds consistently , which I personally think is not that bad.So I would suggest to go ahead with your app using TF Model.Thanks!

Train Android app to recognize sensor patterns using machine learning

I'd like my app to be able to detect when the user carrying the phone falls, using only accelerometer data (because it's the only sensor available on all smartphones).
I first tried to implement an algorithm to detect free fall (accelerometer total acceleration nearing zero, followed by high acceleration due to ground hitting, and a short period of motionlessness to ditch false positives when the user is just walking downstairs quickly), but there's a lot of ways to fall, and for my algorithm implementation, I can always find a case where a fall is not detected, or where a fall is wrongly detected.
I think Machine Learning can help me solve this issue, by learning from a lot of sensor values coming from different devices, with different sampling rates, what is a fall and what is not.
Tensorflow seems to be what I need for this as it seems it can run on Android, but while I could find tutorials to use it for offline image classifying (here for example), I didn't find any help to make model that learns patterns from motion sensors values.
I tried to learn how to use Tensorflow using the Getting Started page, but failed to, probably because I'm not fluent in Python, and do not have machine learning background. (I'm fluent in Java and Kotlin, and used to Android APIs).
I'm looking for help from the community to help me use Tensorflow (or something else in machine learning) to train my app to recognize falls and other motion sensors patterns.
As a reminder, Android reports motion sensors values at a random rate, but provides a timestamp in nanoseconds for each sensor event, which can be used to infer the time elapsed since the previous sensor event, and the sensor readings are provided as a float (32bits) for each axis (x, y, z).

If you have your data well organized, then you might be able to use the Java-based Weka machine learning environment:
http://www.cs.waikato.ac.nz/ml/weka/
You can use Weka to play around with all the different algorithms on your data. Weka uses a ARFF file for the data. It's pretty easy to create that if you have your data in JSON or CSV.
Once you find a algo/model that works, the you can easily put that into your Android app:
http://weka.wikispaces.com/Use+Weka+in+your+Java+code
You really don't need Tensorflow if you dont require deep learning algos, which I don't think you require. If you did need the deep learning algo, then DeepLearning4J is a java based open source solution for Android:
https://deeplearning4j.org/android

STEP 1)
Create a training database.
You need some sample of accelerometer data labelled ‘falling’ and ‘not falling’.
So you will basically record the acceleration in different situations and label them. i.e. To give an order of magnitude of the quantity of data, 1000 to 100,000 periods of 0.5 to 5 seconds.
STEP 2)
Use SK learn with python. Try different model to classify your data.
X is your vectors containing your sample of 3 accelerations axes.
Y is your target. (falling/not falling)
You will create a classifier that can classify X to Y.
STEP 3)
Make your classifier compatible with Android.
Sklearn-porter will port you code in the coding language that you like.
https://github.com/nok/sklearn-porter
STEP 4)
Implement this ported classifier in your app. Feed it with data.

GPU Programming (Parallel Computing) On Android

I have a project that is an image processing app for android devices. For working with image I choose opencv android framework. The whole project consist of some general parts such as blocking the input image, compute dct of each block, sorting the result, compare the feature that get from each block, and finally show some results.
I write this project but it contain so many heavy computing like dct, sorting etc, so I can't even run it on my emulator because it take long time and my laptop shutdown middle of processing. I decided to optimize the processing using parallel computing and gpu programming (it is obvious that some parts like computing dct of blocks can become parallel, but I am not sure about some other parts like sorting), anyway there is a problem that I can't find any straightforward tutorial for doing this.
Here is the question, is there any way to do that or not ? I need it to be global for most of android device not for an especial device !!!
Or beside the gpu programming and parallel computing is there anyway to speed the processing up? (maybe there is other libraries better than opencv!)

Processing performance in android

I would like to know how are the performances of Processing sketches in Android. Here is the link for more info about Processing-Android : http://wiki.processing.org/w/Android#Instructions
I don't really know at which level lies Processing in Android and how it is implemented. That's why I would like to know what are the performance of a Processing's sketch embedded in an android app in comparison with a normal Canvas of the android API.
Processing let us create relatively easily programs and if the performance were good I'm sure we could save a lot of time drawing certain things of our app with Processing (or at least for a beginner like me, Processing's language seems much more easy than the Java used in android as we can call easily drawing function etc...)
So I would like to have your opinion whereas Processing sketches could be as efficient (in term of performance/optimization) as using Android Java API directly ?
Thanks

I've done some tests with the examples given with Processing and I thought it could be useful to some person... So here are the results :
Device : Samsung Galaxy S II : Android 2.3.6, 1GB RAM, Dual-core 1.2 GHz Cortex-A9.
Tests : (on Processing 2.0a4)
No = to much lag to do anything (around 5 FPS)
Soso = we can see what the sketch is doing but still a lot of lag (around 10/15 FPS)
OK = working (around 25 FPS or more)
Basics:
Pointillism=OK
Sprite=OK
... most of the basic examples are working correctly
Topics:
Interaction:
Follow examples =OK
Animation:
Sequential=OK
Effects :
Unlimited Sprites=OK
Motion:
Brownian=OK
Bouncy Bubbles=OK
Simulate :
Fluid=Soso
Flocking =OK (yet sometimes to FPS get a bit lower but acceptable)
Simple Particle System=OK
Smoke Particle System=OK
Spring=OK
Multiple Particle Systems=OK
Chain=OK
OpenGL:
Birds: without PShape3D=Soso, with PShape3D=OK
Earth=OK
Rocket=OK
Extrusion=NO
Electric=OK
CameraLight=OK
YellowTail=OK
Planets=OK
Contributed Libraries:
Fisicia:
Bubbles=Soso
Droppings=Soso
Joints=OK
Buttons=OK
Polygons=OK
Restitutions=OK
PBox 2D : couldn't get it working
Some Sketches from OpenProcessing.org
http://www.openprocessing.org/visuals/?visualID=3330 = OK
http://www.openprocessing.org/visuals/?visualID=1247 = OK
http://www.openprocessing.org/visuals/?visualID=8168 = OK
http://www.openprocessing.org/visuals/?visualID=5671 = OK
http://www.openprocessing.org/visuals/?visualID=10109 = NO
http://www.openprocessing.org/visuals/?visualID=7631 =NO
http://www.openprocessing.org/visuals/?visualID=7327 = NO
Note: I've run all sketches in their original size, I didn't re-scaled them to fit my SGSII (which has a resolution of 480 x 800) so I guess the performance may vary following the size of the sketch.
Conclusion : Processing is really interesting as a graphic library for android. Most of the example given with Processing are working very well and smoothly on my Phone (including OpenGl examples). Yet it is not as optimized as on PC, indeed simulation like Smoke or Vortex where many particles are involved are really laggy.
Fisicia library is working well on android which is a really good point.
Voila :)

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.