I am using tensorflow in android. I installed the apk for TFClassify available. I ran the application and it is running swiftly with inference time of not more than 400ms. However when I replaced the available trained model with my model, it is taking around 2000ms for computational before displaying the result. Why is there such a difference and how can I optimize my retrained_graph.pb?
Did you convert the retrained model to optimized & quantized graph ?
If not try:
tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference \
--input=retrained_graph.pb \
--output=optimized_graph.pb \
--input_names=Mul \
--output_names=final_result
tensorflow/bazel-bin/tensorflow/tools/quantization/quantize_graph \
--input=optimized_graph.pb \
--output=rounded_graph.pb \
--output_node_names=final_result \
--mode=weights_rounded
FYI, you have to build these tools first.
Related
I'm working on an android app, and I have to convert webm files to mp3.
I really want to make a custom ffmpeg build, because it reduces the ffmpeg executable size to only 2MB.
My library works absolutely fine when running on my PC, but i'm struggling to build it for android... It seems like NDK architecture has changed and tutorials are outdated, and I can't find a proper and recent guide for android compiling...
I also would like to target all architectures (aarch64, armv7, i686, and x86_64)...
I've been on this for hours, fixed many errors, but still nothing has worked ><.
Please help me ! :\
PS. I'm compiling on Linux, here is my configuration script:
#!/bin/bash
API=31 # target android api
OUTPUT=/home/romain/dev/android/ffmpeg_build
NDK=/home/romain/android-sdk/ndk/23.0.7599858
TOOLCHAIN=$NDK/toolchains/llvm/prebuilt/linux-x86_64
SYSROOT=$TOOLCHAIN/sysroot
TOOL_PREFIX="$TOOLCHAIN/bin/aarch64-linux-android"
CC="$TOOL_PREFIX$API-clang"
CXX="$TOOL_PREFIX$API-clang++"
./configure \
--prefix=$OUTPUT \
--target-os=android \
--arch=$ARCH \
--cpu=$CPU \
--disable-everything \
--disable-everything \
--disable-network \
--disable-autodetect \
--enable-small \
--enable-decoder=opus,vorbis \
--enable-demuxer=matroska \
--enable-muxer=mp3 \
--enable-protocol=file \
--enable-filter=aresample \
--enable-libshine \
--enable-encoder=libshine \
--cc=$CC \
--cxx=$CXX \
--sysroot=$SYSROOT \
--extra-cflags="-0s -fpic"
make
make install
The prefix should point to $SYSROOT/usr/ and you misunderstood what --prefix mean. Its not output directory. Other than that i think nothing problematic than that (if it still happen please provide ffbuild/config.log)
The repository pointed to by the previous answer is no longer being maintained.
Here is an updated one.
This is the android branch: https://github.com/arthenica/ffmpeg-kit/tree/main/android
I followed the both Tensorflow for Poets Tutorials:
Tensorflow for Poets 1 and Tensorflow for Poets 2.
My retrained model gives accurate results for a test on my laptop but after converting into the .tflite file and trying to classify the same image on my Android device the accuracy drops under 1%.
I used the following commands to retrain und convert:
python retrain.py \
--bottleneck_dir=tf_files/bottlenecks \
--how_many_training_steps=500 \
--model_dir=tf_files/models/ \
--summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}" \
--output_graph=tf_files/retrained_graph.pb \
--output_labels=tf_files/retrained_labels.txt \
--architecture="${ARCHITECTURE}" \
--image_dir=tf_files/flower_photos
toco \
--input_file=tf_files/retrained_graph.pb \
--output_file=tf_files/optimized_graph.lite \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--input_shape=1,224,224,3 \
--input_array=Placeholder \
--output_array=final_result \
--inference_type=FLOAT \
--input_data_type=FLOAT
Strangely the optimized file is almost as high as the original (both around 80 MB).
Using Tensorflow 1.9.0 and Python 3.6.6.
Any help or tip is appreciated!
Well I figured it out. Apparently the ARCHITECTURE variable was not set to the right value. So if anyone encounters the same problem, first of all check that
I'm trying to convert tensorflow lite quantised .pb file to .lite using toco. The command for creating .pb file is :
retrain.py is here and here.
python retrain.py \
--bottleneck_dir=/mobilenet_q/bottlenecks \
--how_many_training_steps=4000 \
--output_graph=/mobilenet_q/retrained_graph_mobilenet_q_1_224.pb \
--output_labels=/mobilenet_q/retrained_labels_mobilenet_q_1_224.txt \
--image_dir=/data \
--architecture=mobilenet_1.0_224_quantized
When I'm trying to convert the .pb file to .tflite using toco command:
bazel run --config=opt //tensorflow/contrib/lite/toco:toco \
-- --input_file= retrained_graph_mobilenet_q_1_224.pb \
--output_file= retrained_graph_mobilenet_q_1_224.lite \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--input_shape=1,224,224,3 \
--input_array=input \
--output_array=final_result \
--inference_type=FLOAT \
--input_data_type=FLOAT
I'm getting the error:
Some of the operators in the model are not supported by the standard TensorFlow Lite runtime. If you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.contrib.lite.toco_convert(). Here is a list of operators for which you will need custom implementations: Dequantize.
I've searched in github and stackoverflow but I've not come across a satisfactory answer.
The discussion and the solution are here.
I have a dataset of ASL(American Sign Language) in which 3000 images per letter and i am going to train my model by the help of tensorflow codelabs using this script
"python -m scripts.retrain \
--bottleneck_dir=tf_files/bottlenecks \
--how_many_training_steps=? \
--model_dir=tf_files/models/ \ --summaries_dir=tf_files/training_summaries/"mobilenet_1.0_224" \
--output_graph=tf_files/retrained_graph.pb \
--output_labels=tf_files/retrained_labels.txt \
--architecture="mobilenet_1.0_224" \ --image_dir=tf_files/dataset".
Can any one tell me how many steps i have to choosen for the accurate predictions?
I am new in deep learning suggestions would be helpful as i am in learning phase.
If you have about 3,000 images per letter, and 26 letters, that gives you about 78,000 images per epoch. If your batch size is b then that gives you 78,000/b training steps per epoch. I'd suggest training to 10 epoch first, and see what happens.
This is experimental science. Print the accuracy after each epoch, and see what happens, if the network improves any more. Stop training when it stops improving significantly.
I'm building some common gnu/linux console utilities for my Android phone but so far I have only been able to build them statically, with quite a size penalty. Can someone walk me through the steps for synamic compiles using shared libraries?
Here's the script(s) I'm using for configuration:
./configure --host=arm-none-linux-gnueabi \
CC="arm-none-linux-gnueabi-gcc" \
CROSS_COMPILE="arm-none-linux-gnueabi-" \
CFLAGS=" -static $_XXFLAGS" \
for shared:
./configure --host=arm-none-linux-gnueabi \
CC="arm-none-linux-gnueabi-gcc" \
CROSS_COMPILE="arm-none-linux-gnueabi-" \
--enable-shared=yes --enable-static=no
Do I need to make the libs on my android phone avaiable
to my cross-compiler? Google isn't helping me here.
You would have to provide the location for the shared libraries that you want to link against. Please post the error that you're getting for a better answer, but take a look at my answer to
install 64-bit glib2 on 32-bit system for cross-compiling
You should just need to add the right -L and -Wl,-rpath-link to the CFLAGS variable when you're running configure.