I want to build an android app for gaze tracking and I would like to ask which of the following tools I should use for better results.
Google Cloud Vision API
OpenCV (ex HaarCascade classifier)
Firebase ML kit with facial landmarks
I don't know if you plan to create a commercial application or if it's for research purposes, the things to consider change a bit in these two scenarios.
For object tracking I'd problably go with google's mlkit, it has some ready-to-use models that also works offline, it also simplifies all the hard work of pure tensorflow (even on iOS) if you want to use your custom models. So your hard work will be to create an efficient model and not running it.
Google Cloud Vision API I've not used yet, just the GCP machines to train a NN and they came in handy for it.
OpenCV is a good one but might be hard to implement and mantain after, your app size will also considerably increase. I've used HaarCascade in my final paper 2 years ago, the work was hard and the result not that accurate, today I'd check the OpenCV's DNN module and go with Yolo like here. To summarize, I'd just recomment it if you have some specific image processing demand, but first check the Android's ColorFilter or ImageFilterView. If you choose to use OpenCV, I'd recommend you to compile it by yourself with cmake like described here just with the modules you need to use, so you app size won't increase that much.
There's also some other options like Dlib or PyTorch, I've been working with dlib's SVM with a custom model last year, its results were good but it's slow to run, about 3~4 seconds, compared to a NN with tensorflow that runs in 50~60 milliseconds (even faster with quantized models). I don't have experience with PyTorch or other framework to share something with you.
Related
What I have: A trained recurrent neural network in Tensorflow.
What I want: A mobile application that can run this network as fast as possible (inference mode only, no training).
I believe there are multiple ways how I can accomplish my goal, but I would like you feedback/corrections and additions because I have never done this before.
Tensorflow Lite. Pro: Straight forward, available on Android and iOS. Contra: Probably not the fastest method, right?
TensorRT. Pro: Very fast + I can write custom C code to make it faster. Contra: Used for Nvidia devices so no easy way to run on Android and iOS, right?
Custom Code + Libraries like openBLAS. Pro: Probably very fast and possibility to link to it on Android on iOS (if I am not mistaken). Contra: Is there much use for recurrent neural networks? Does it really work well on Android + iOS?
Re-implement Everything. I could also rewrite the whole computation in C/C++ which shouldn't be too hard with recurrent neural networks. Pro: Probably the fastest method because I can optimize everything. Contra: Will take a long time and if the network changes I have to update my code as well (although I am willing to do it this way if it really is the fastest). Also, how fast can I make calls to libraries (C/C++) on Android? Am I limited by the Java interfaces?
Some details about the mobile application. The application will take a sound recording of the user, do some processing (like Speech2Text) and output the text. I do not want to find a solution that is "fast enough", but the fastest option because this will happen over very large sound files. So almost every speed improvement counts. Do you have any advice, how I should approach this problem?
Last question: If I try to hire somebody to help me out, should I look for an Android/iOS-, Embedded- or Tensorflow- type of person?
1. TensorflowLite
Pro: it uses GPU optimizations on Android; fairly easy to incorporate into Swift/Objective-C app, and very easy into Java/Android (just adding one line in gradle.build); You can transform TF model to CoreML
Cons: if you use C++ library - you will have some issues adding TFLite as a library to your Android/Java-JNI (there is no native way to build such library without JNI); No GPU support on iOS (community works on MPS integration tho)
Also here is reference to TFLite speech-to-text demo app, it could be useful.
2. TensorRT
It uses TensorRT uses cuDNN which uses CUDA library. There is CUDA for Android, not sure if it supports the whole functionality.
3. Custom code + Libraries
I would recommend you to use Android NNet library and CoreML; in case you need to go deeper - you can use Eigen library for linear algebra. However, writing your own custom code is not beneficial in the long term, you would need to support/test/improve it - which is a huge deal, more important than performance.
Re-implement Everything
This option is very similar to the previous one, implementing your own RNN(LSTM) should be fine, as soon as you know what you are doing, just use one of the linear algebra libraries (e.g. Eigen).
The overall recommendation would be to:**
try to do it server side: use some lossy compression and serverside
speech2text;
try using Tensorflow Lite; measure performance, find bottlenecks, try to optimize
if some parts of TFLite would be too slow - reimplement them in custom operations; (and make PR to the Tensorflow)
if bottlenecks are on the hardware level - goto 1st suggestion
Maybe you should try this lib, it can run on android and ios devices.
https://github.com/Tencent/TNN
I have sucesfully implemented face detection in my app using Android's Camera.FaceDetectionListener (following the Android Developers guide), but unfortunatelly some devices does not support this feature. Is there another way to achieve the same result?
I usually work with OpenCV to make image processing algorithms.
http://opencv.org/platforms/android.html
Its algorithms are much better than Android face detection, besides if you download the SDK you have a faceDetection example.
Here are the downloads:
http://opencv.org/downloads.html
The sdk, handles camera api 2, which it works at 30 fps, with a wrapper if you want to process video frames. Besides there are samples where you can mix Java OpenCV code with JNI code, to make so much faster your algorithm.
Unfortunately, these examples are made on Eclipse projects, but they are not difficult to merge into Android studio project.
I hope that these references are useful
Cheers.
i'm new to developing android apps in general.
I'm trying to create an application that given a certain image it would detect faces and would give me the eye locations and other info.
I've done some research and i found some stuff such as, the android FaceDetector API and OpenCV.
Could anyone give me some advice on how to make an app like this or send me a link with any info related to this, all help would be great!
Thanks, Daniel.
I have worked with Face recognition for a while.
If you want to use OpenCV you could do a better effort searching in SO and you can found things like this one.
The best one for me is the SDK provide by lockheed martin... but it's too expensive :S for a single person.
Edited
"Face detection and face recognition are different things ;) Face detection tells you where is the face and face recognition tells you who's the owner of the face"
If you choose OpenCV, you can find full doc in official page.
I'm going to give you a overview :
You can use OpenCV in your app using "OpenCV Manager" or with "Static Initialization on OpenCV Android".
About the first one:
OpenCV Manager is an Android service targeted to manage OpenCV library binaries on end users devices. It allows sharing the OpenCV dynamic libraries between applications on the same device. The Manager provides the following benefits:
Less memory usage. All apps use the same binaries from service and do not keep native libs inside themselves;
Hardware specific optimizations for all supported platforms;
Trusted OpenCV library source. All packages with OpenCV are published on Google Play market;
Regular updates and bug fixes;
About the second one:
A complete tutorial using eclipse.
You might try the new Android face API. See the tutorial here about how to detect faces and facial landmarks:
https://developers.google.com/vision/detect-faces-tutorial
I explain how to do it in this article. I used a TensorFlow Lite with a MobileFaceNet implementation, achieving very accurate results and with surprisingly high speed.
You'll find the source code and an APK in this repo
My Situation
I want to build an Application that can recognize an Image to produce a corresponding model.
i.e. I focus the camera to show a printed image on the card that is designed by myself ( apple logo ) , then it will show a 3D model(.md2) on the screen which is also designed by myself.
I have googled many framework that worked on both Android & iOS, but the documentations are very limited and the trial version does not support me to test it.
for example,
http://www.metaio.com/sdk/
But their demo is not comprehensive enough to suite my situation
My Question
1.Would anyone can share their experience of developing with AR framework (not the AR core) on Android & iOS?
2.Is there any framework that support me to add a image as a key then it will map to my model with just a couples line of codes?
3.if Q2 is not possible, is there any approaches of some framework can also archive the some goal but more complex ?
//Logic flow
String key = "APPLE";
sdk.putKeyImage(key,apple.png);
...
if (sdk.identifiedAs(key)){
//Do something
//Example
sdk.showApple3DModel();
play(showSnakeEatApple.mp4);
}
I'm an android app developer. So according to my experince for android, you can go for
Vuforia (https://developer.vuforia.com/resources/sdk/android)
Wikitude (http://www.wikitude.com/developer/documentation/android)
These above 2 are enough good to implement AR app in android.
Vuforia is having well documented, also they have more libraries & mainly those are free. & Wikitude is best for making apps faster. I recommend you to use any of these for your AR app development in android.
I guess you should compare several AR frameworks to find a proper solution for your problem. Many different AR tools come to my mind. For example, Vuforia, LayAR, Kudan AR. To take a deeper look at the variety of possible options read the comprehensive comparison of the most popular frameworks: http://cases.azoft.com/top-5-tools-creating-augmented-reality-apps/
yes i did work on it not much but little bit i did research on augmented reality and after research i found one open source and cross platform well known sdk its name is
VUFORIA SDK
there are various SDK you find on google but trust me VUFORIA is best and flexible who gives lots of option
for that you need to install NDK in eclipse cause it have core cpp code who compile by NDK in eclipse
video tutorial click here
sdk and documentation click here
steps click here
hope it will be helpful for you thank you happy coding
I am trying to create an OCR app for android. I want to be able to do it on the device itself rather than sending it to a server and then see the results. Is there a library available for it? I can consider buying as well.
Thanks.
While I know zilch about Android development, what I could offer you is to explore the option of porting 'tesseract'. Have used it a little, and few other FOSS OCR tools (some actually based on tesseract), and have found it to be best of the FOSS lot. AFAIR, it does only basic recognition, and you need to massage the document prior to feeding it to tesseract. Where commercial s/w surpasses FOSS tools, is the parapharnalia of preprocessing/massaging they are able to do, and even some postprocessing -- guessing tables and fitting text/unrecognized graphics back into positionally accurate locations on page, and converting into other document formats.
Tesseract can be tried on a regular Linux desktop on your favourite distro. IIRC, there is even a MS-Windows version as well, i.e. if you care.