I am developing an Android application which requires an ML model integration.For it I am using TensorFlow lite for deployment.I am using Custom Model based Siamese Network for output and the output shape is [1 128].When I infer the tf lite model in python on Google Colab the output [1 128] numbers are different from the one being produced on my Android device.THe input image is same on both inferences and also the input and output shapes but still I am getting different output vectors on my Android Phone and Python TFlite model.I am using Firebase Machine Learning.
Android Code
val interpreter=Interpreter(model)
val imageBitmap= Bitmap.createScaledBitmap(BitmapFactory.decodeFileDescriptor(contentResolver.openFileDescriptor(fileUri,"r")?.fileDescriptor),256,256,true)
val inputImage=ByteBuffer.allocateDirect(256*256*3*4).order(ByteOrder.nativeOrder())
for(ycord in 0 until 256){
for(xcord in 0 until 256){
val pixel=imageBitmap.getPixel(xcord,ycord)
inputImage.putFloat(Color.red(pixel)/1.0f)
inputImage.putFloat(Color.green(pixel)/1.0f)
inputImage.putFloat(Color.blue(pixel)/1.0f)
}
}
imageBitmap.recycle()
val modelOutput=ByteBuffer.allocateDirect(outputSize).order(ByteOrder.nativeOrder())
interpreter.run(inputImage,modelOutput)
modelOutput.rewind()
val probs=modelOutput.asFloatBuffer()
success(ImageProcessResult.Success(probs))
Kindly help me.I need it soon.Any help is appreciated
You are resizing the bitmap to [256,256] in the Android platform.
Even the slightest change in input vectors would change the output vector. When you resize the bitmaps, you change the input vector. However, if the model is general enough the final result which would be argmax of the output vector (in classification) would be the same.
In the case of Siamese, I believe it won't affect the final result (similarity score) in a meaningful way if the model is not overfitted.
Related
I'm trying to implement handwriting text recogontition in my Android App. I found TensorFlow to be a doable solution, so I've tried to create a .tflite Model from the Handwriting Recognition Model from Keras
The tutorial states that it is fully compatible with TF Lite
I managed to create the .tflite model and then in Android intialize the Interpreter with the model. I then ran the Interpreter with a ByteBuffer of a bitmap and the output is a shape of [1,32,81], which is a array of floats. As far as i know the output should just be a String; the prediction text of the given input. How can I get/decode the output to the String I need?
I had a few problems
Converting the model to a .tflite but i managed to do it using certain flags as follows:
converter = tf.lite.TFLiteConverter.from_keras_model(prediction_model)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
converter._experimental_lower_tensor_list_ops = False
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tf_lite_model = converter.convert()
open('textRecognitionModel.tflite', 'wb').write(tf_lite_model)
According to the docs of TF Lite you have to use the following dependencies
implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly-SNAPSHOT'
// This dependency adds the necessary TF op support.
implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly-SNAPSHOT'
After finally creating a .tflite model file, I then added it to the assets directory of my android app and tried importing it. However, it would crash with no error message, apparently a memory failure. I updated the libraries to the latest version:
"org.tensorflow:tensorflow-lite:2.11.0"
"org.tensorflow:tensorflow-lite-select-tf-ops:2.11.0"
And converted my model to ByteBuffer as follows (I'm not sure if i'm doing it right regarding the native order logic):
// fileName is the name of the model file in the assets dir
val inputStream = assetManager.open(filename)
val output = ByteArrayOutputStream()
inputStream.copyTo(output, 1024)
val file = output.toByteArray()
val bb = ByteBuffer.allocateDirect(file.size)
bb.order(ByteOrder.nativeOrder())
bb.put(file)
return bb
And finally the initialization of the Interpreter API is finally working.
I then run the interpreter on a ByteBuffer of a Bitmap. So I'm expecting that the model will read the input and give prediction text (a String) as output. However, the output is a [1,32,81] shape, so i created an array to read the output and ran the Interpreter on it:
val output = Array(1) {
Array(32) {
FloatArray(81)
}
}
// byteBuffer: ByteBuffer of bitmap
interpreter.run(byteBuffer, output)
And the output is an array of floats which I don't understand what this means. Shouldn't it just be a String? I've attached a screenshot of the output arrayoutput screenshot
Can someone please help me??
I would highly appreciate any tips or solutions :)
Before converting the prediction_model to tflite format, you need to add a custom layer at the end and then convert it into tflite format.
prediction_model = keras.models.Model(
model.get_layer(name="image").input, model.get_layer(name="dense2").output
) # This line is present in the handwriting_recognition notebook.
def CTCDecoder():
def decode_batch_predictions(pred):
input_len = np.ones(pred.shape[0]) * pred.shape[1]
# Use greedy search. For complex tasks, you can use beam search
results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0][:, :max_length]
# Iterate over the results and get back the text
output_text = []
for res in results:
#print(res)
res = tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8")
output_text.append(res)
return output_text
return tf.keras.layers.Lambda(decode_batch_predictions, name='decode')
decoded_pred_model = keras.models.Model(prediction_model.input, outputs=CTCDecoder()(prediction_model.output))
Now you can convert decoded_pred_model to your tflite format and use it. CTCDecoder is the custom layer added on top of prediction_model.output to decode the predictions with shape [1,32,81] into texts.
I've adapted Tensorflow Lite's Salad Detector Colab and am able to train my own models and get them working on Android but I'm trying to count Objects and I need more than the 25 limit that is the default.
The models have a method for increasing detections so, in the above Colab, I inserted the following code:
spec = model_spec.get('efficientdet_lite4')
spec.tflite_max_detections=50
And on the Android side of things
val options = ObjectDetector.ObjectDetectorOptions.builder()
.setMaxResults(50)
.setScoreThreshold(10)
.build()
The models are training fine but I'm still only able to detect 25 Objects in a single image.
Is there a problem with my models? Or are there any other settings I can change in my Android code that will increase the number of detections?
Solved this myself after Googling a different SOF question on efficientdet_lite4, I stumbled on an AHA moment.
My problem was here:
spec = model_spec.get('efficientdet_lite4')
spec.tflite_max_detections=50
I needed to change the whole spec of the model:
spec = object_detector.EfficientDetLite4Spec(
model_name='efficientdet-lite4',
uri='https://tfhub.dev/tensorflow/efficientdet/lite4/feature-vector/2',
hparams='',
model_dir=None,
epochs=50,
batch_size=64,
steps_per_execution=1,
moving_average_decay=0,
var_freeze_expr='(efficientnet|fpn_cells|resample_p6)',
**tflite_max_detections=50**,
strategy=None,
tpu=None,
gcp_project=None,
tpu_zone=None,
use_xla=False,
profile=False,
debug=False,
tf_random_seed=111111,
verbose=0
)
From there I was able to train the model and things worked on the Android side of things.
This has been bugging me for a few weeks!
I am trying to port my torchscript over from Python to Android (Java). Currently, I ran into a problem of trying to squeeze/ unsqueeze my input and output tensor in Android. In Python, here's how I did it:
**tensor = torch.Tensor(image_n.transpose(2, 0, 1).astype('float32')).unsqueeze(0)
tensor = tensor.to(device)**
output tensor:
**with torch.no_grad():
prob = model.forward(tensor)
prediction = prob.squeeze().numpy().astype('uint8')**
In Android, I managed to input and output my tensors following the Pytorch tutorial as such:
**final Tensor inputTensor = TensorImageUtils.bitmapToFloat32Tensor(mBitmap,
TensorImageUtils.TORCHVISION_NORM_MEAN_RGB, TensorImageUtils.TORCHVISION_NORM_STD_RGB);
final float[] inputs = inputTensor.getDataAsFloatArray();**
and the output tensor:
**Map<String, IValue> outTensors = mModule.forward(IValue.from(inputTensor)).toDictStringKey();**
The problem is, without squeeze and unsqueeze, although the code managed to run, the dimensions are wrong and I didnt manage to get the correct output.
Does anyone know if there is actually squeeze/unsqueeze function for pytorch in Android?
Edit: just to add on, my input tensor has the size of (3,224,416) (an RGB image), and my out put tensor has the size of (1,1,224,416) (a grayscale image)
I created a Tensorflow model which takes a single 700x700 48-dimension "image" as an input (input shape is {1, 700, 700, 48}).
To do so, I used Numpy's numpy.concatenate([array_of_images], -1), when array_of_images is an array of 16 700x700 JPEG images.
I converted the model to Tensorflow Lite and I'm running it on Android.
No conversion errors or anything - all ops are valid and supported.
My question is - where in Android (or how) can I create an N-dimensional object (or container) and use it as an input to the model?
I think you have 16 RGB images,
on android you load your bitmaps into image tensors like this :
var bitmap1 = Bitmap.load( from anywhere )
var tImage1 = TensorImage(DataType.FLOAT32)
tImage1.load(bitmap1)
for each image,
then
input = arrayof(tImage1.buffer, tImage2.buffer,........tImage16.buffer)
interpreteur.runForMultipleInputsOutputs(arrayOf(input), output)
i'm not sure but this can give you an idea
I have a basic question about how to determine the image pre-processing parameters like - "IMAGE_MEAN", "IMAGE_STD" for various tensorflow pre-trained models. The Android sample applications for TensorFlow provides these parameters for a certain inception_v3 model in the ClassifierActivity.java (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/ClassifierActivity.java) as shown below -
"If you want to use a model that's been produced from the TensorFlow for Poets codelab, you'll need to set IMAGE_SIZE = 299, IMAGE_MEAN = 128, IMAGE_STD = 128"
How do I determine these parameters for other TF models
Also, while converting the TF model to CoreML model, to be used on iOS, there are additional image pre-processing parameters that need to be specified (like - red_bias, green_bias, blue_bias and image_scale) as shown in the code segment below. The below parameters are for inception_v1_2016.pb model. If I want to use another pre-trained model like - ResNet50, MobileNet, etc how do I determine these parameters
tf_converter.convert(tf_model_path = 'inception_v1_2016_08_28_frozen.pb',
mlmodel_path = 'InceptionV1.mlmodel',
output_feature_names = ['InceptionV1/Logits/Predictions/Softmax:0'],
image_input_names = 'input:0',
class_labels = 'imagenet_slim_labels.txt',
red_bias = -1,
green_bias = -1,
blue_bias = -1,
image_scale = 2.0/255.0
)
Any help will be greatly appreciated
Unfortunately, the preprocessing requirements of various ImageNet models are still under documented. ResNet and VGG models both use the same preprocessing parameters. You can find biases for each of the color channels here:
https://github.com/fchollet/deep-learning-models/blob/master/imagenet_utils.py#L11
The preprocessing for Inception_V3, MobileNet, and other models can be found in the individual model files of this repo: https://github.com/fchollet/deep-learning-models
When converting to Core ML you always need to specify preprocessing biases on a per channel basis. So in the case of a VGG-type preprocessing, you can just copy each channel's biases directly from the code linked to above. It's super important to note that the biases are applied (added) BEFORE scaling. You can read more about setting the proper values here: http://machinethink.net/blog/help-core-ml-gives-wrong-output/
The conversion code you posted looks good for MobileNet or Inception_V3 models, but would not work for VGG or ResNet. For those you'd need:
tf_converter.convert(...
red_bias=-123.68,
green_bias=-116.78,
blue_bias=-103.94
)
No scaling is required.