Bear with me while I try to elucidate.
I have an Android Application which uses OpenCV to convert a YUV420 image into a bitmap and transfers it to an Interpreter. The problem is, every time I run it, I get the exact same class prediction with the exact same confidence values irrelevant of what I point at.
...
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
...
Now before you mention my model is not trained enough, I've tested the exact same .tflite file in TFLite example provided in the Tensorflow Codelab-2. It works as it should and recognizes all 4 of my classes with 90%+ accuracy. In addition, I used a label_image.py script to test the .pb file using which my .tflite was derived from and it works as it should. I've trained the model on nearly 5000+ images of each class. Since it works on other apps, I'm guessing there's no problem with the model but my implementation. Though I just can't pinpoint it.
Following code is used to Create Mat(s) from the image bytes :
//Retrieve the camera Image from ARCore
val cameraImage = frame.acquireCameraImage()
val cameraPlaneY = cameraImage.planes[0].buffer
val cameraPlaneUV = cameraImage.planes[1].buffer
// Create a new Mat with OpenCV. One for each plane - Y and UV
val y_mat = Mat(cameraImage.height, cameraImage.width, CvType.CV_8UC1, cameraPlaneY)
val uv_mat = Mat(cameraImage.height / 2, cameraImage.width / 2, CvType.CV_8UC2, cameraPlaneUV)
var mat224 = Mat()
var cvFrameRGBA = Mat()
// Retrieve an RGBA frame from the produced YUV
Imgproc.cvtColorTwoPlane(y_mat, uv_mat, cvFrameRGBA, Imgproc.COLOR_YUV2BGRA_NV21)
// I've tried the following in the above line
// Imgproc.COLOR_YUV2RGBA_NV12
// Imgproc.COLOR_YUV2RGBA_NV21
// Imgproc.COLOR_YUV2BGRA_NV12
// Imgproc.COLOR_YUV2BGRA_NV21
Following code is used to add Image data into a ByteBuffer :
// imageFrame is a Mat object created from OpenCV by processing a YUV420 image received from ARCore
override fun setImageFrame(imageFrame: Mat) {
...
// Convert mat224 into a float array that can be sent to Tensorflow
val rgbBytes: ByteBuffer = ByteBuffer.allocate(1 * 4 * 224 * 224 * 3)
rgbBytes.order(ByteOrder.nativeOrder())
val frameBitmap = Bitmap.createBitmap(imageFrame.cols(), imageFrame.rows(), Bitmap.Config.ARGB_8888, true)
// convert Mat to Bitmap
Utils.matToBitmap(imageFrame, frameBitmap, true)
frameBitmap.getPixels(intValues, 0, frameBitmap.width, 0, 0, frameBitmap.width, frameBitmap.height)
// Iterate over all pixels and retrieve information of RGB channels
intValues.forEach { packedPixel ->
rgbBytes.putFloat((((packedPixel shr 16) and 0xFF) - 128) / 128.0f)
rgbBytes.putFloat((((packedPixel shr 8) and 0xFF) - 128) / 128.0f)
rgbBytes.putFloat(((packedPixel and 0xFF) - 128) / 128.0f)
}
}
.......
private var labelProb: Array<FloatArray>? = null
.......
// and classify
labelProb?.let { interpreter?.run(rgbBytes, it) }
.......
I checked the bitmap that gets converted from Mat. It shows up quite as best as it possibly can.
Any ideas anyone?
Update One
I changed the implementation of setImageFrame method slightly to match an implementation here. Since it works for him, I hoped it would work for me as well. It still doesn't.
override fun setImageFrame(imageFrame: Mat) {
// Reset the rgb bytes buffer
rgbBytes.rewind()
// Iterate over all pixels and retrieve information of RGB channels only
for(rows in 0 until imageFrame.rows())
for(cols in 0 until imageFrame.cols()) {
val imageData = imageFrame.get(rows, cols)
// Type of Mat is 24
// Channels is 4
// Depth is 0
rgbBytes.putFloat(imageData[0].toFloat())
rgbBytes.putFloat(imageData[1].toFloat())
rgbBytes.putFloat(imageData[2].toFloat())
}
}
Update Two
Suspicious of my float model, I changed it to a pre-built MobileNet Quant model just to eliminate a possibility. The problem persists in this as well.
...
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
...
Okay. so After 4 days, I was able to finally solve this. The Issue was how The ByteBuffer is initiated. I was doing :
private var rgbBytes: ByteBuffer = ByteBuffer.allocate(1 * 4 * 224 * 224 * 3)
instead of what I ought to be doing :
private val rgbBytes: ByteBuffer = ByteBuffer.allocateDirect(1 * 4 * 224 * 224 * 3)
I tried to understand what is the difference between ByteBuffer.allocate() and ByteBuffer.allocateDirect() here but to no avail.
I'd be glad if someone can answer two further questions :
Why does Tensorflow need a Direct Byte Buffer rather than a Non Direct buffer?
What is the difference between Direct and Non Direct ByteBuffer in a simplified description?
Related
The question itself is self-explanatory. In Python, its quite simple to do that with tf.expand_dims(image, 0). How can I do the same thing in Android?
I'm getting error on running the tensorflow model I prepared. It says,
Cannot copy to a TensorFlowLite tensor (input_3) with X bytes from
a Java Buffer with Y bytes.
I'm guessing it comes from one less dimension of image. I've run another model which is working fine. So I need to know how to do that.
My code snippet:
val contentArray =
ImageUtils.bitmapToByteBuffer(
scaledBitmap,
imageSize,
imageSize,
IMAGE_MEAN,
IMAGE_STD
)
val tfliteOptions = Interpreter.Options()
tfliteOptions.setNumThreads(4)
val tflite = Interpreter(tfliteModel, tfliteOptions)
tflite.run(contentArray, segmentationMasks)
fun bitmapToByteBuffer(
bitmapIn: Bitmap,
width: Int,
height: Int,
mean: Float = 0.0f,
std: Float = 255.0f
): ByteBuffer {
val bitmap = scaleBitmapAndKeepRatio(bitmapIn, width, height)
val inputImage = ByteBuffer.allocateDirect(1 * width * height * 3 * 4)
inputImage.order(ByteOrder.nativeOrder())
inputImage.rewind()
val intValues = IntArray(width * height)
bitmap.getPixels(intValues, 0, width, 0, 0, width, height)
var pixel = 0
for (y in 0 until height) {
for (x in 0 until width) {
val value = intValues[pixel++]
// Normalize channel values to [-1.0, 1.0]. This requirement varies by
// model. For example, some models might require values to be normalized
// to the range [0.0, 1.0] instead.
inputImage.putFloat(((value shr 16 and 0xFF) - mean) / std)
inputImage.putFloat(((value shr 8 and 0xFF) - mean) / std)
inputImage.putFloat(((value and 0xFF) - mean) / std)
}
}
inputImage.rewind()
return inputImage
}
There is JVM/Android equivalent op in the TensorFlow API: https://www.tensorflow.org/jvm/api_docs/java/org/tensorflow/op/core/ExpandDims.
However, if you are using TfLite Interpreter API to run inference on a pre-trained model, then you will most likely want to deal with the array dimensions when you construct and save the model (i.e. using Python) instead of when you call the interpreter from the Android code.
If what you mean is to expand the dimension by 1 and make the first dimension of size 1, so as to mimic a batch of tensors (the same as when you are training), the ImageProcessor class appears to be taking care of that automatically. So you shouldn't have to do it manually.
When I add a tensorflow lite model to my android app.
It suggests a auto generated code.
val model = Model.newInstance(context)
// Creates inputs for reference.
val inputFeature0 = TensorBuffer.createFixedSize(intArrayOf(1, 50), DataType.FLOAT32)
inputFeature0.loadBuffer(byteBuffer)
// Runs model inference and gets result.
val outputs = model.process(inputFeature0)
val outputFeature0 = outputs.outputFeature0AsTensorBuffer
// Releases model resources if no longer used.
model.close()
Now lets assume my input shape in python is a int array of 50 number [1,2,3...]
and it gives an output of float value.
In what ways I have to change the code.
You'll use bytebuffer to store input in little endian format (as tflite works in this format only).
Replace (1 x 4) with 50 x 4 and for input use a loop of len 50 to add data. Similarly , manipulate output using loop.
Sample code:
EditText inputEditText;
inputEditText = findViewById(R.id.editTextNumberDecimal);
Float data= Float.valueOf(inputEditText.getText().toString());
ByteBuffer byteBuffer= ByteBuffer.allocateDirect(1*4);
byteBuffer.order(ByteOrder.nativeOrder()); // ensure little endian
byteBuffer.putFloat(data);
Model1 model = Model1.newInstance(getApplicationContext());
// Creates inputs for reference.
TensorBuffer inputFeature0 = TensorBuffer.createFixedSize(new int[]{1, 1}, DataType.FLOAT32);
inputFeature0.loadBuffer(byteBuffer);
// Runs model inference and gets result.
Model1.Outputs outputs = model.process(inputFeature0);
TensorBuffer outputFeature0 = outputs.getOutputFeature0AsTensorBuffer();
// Releases model resources if no longer used.
TextView tv= findViewById(R.id.textView);
float[] data1=outputFeature0.getFloatArray();
tv.setText(outputFeature0.getDataType().toString());
tv.setText(String.valueOf(data1[0]));
model.close();
After detecting a face with CameraX and MLKit I need to pass the image to a custom TFLite model (I'm using this one), which detects a facemask. The model accepts images of 224x224 pixels, so I need to take out the part of ImageProxy#getImage() corresponding to Face#getBoundingBox() and resize it accordingly.
I've seen this answer which could have been fine but ThumbnailUtils.extractThumbnail() can't work with a Rect of 4 coordinates and it's relative to the center of the image, while the face's bounding box might be elsewhere.
The TFLite model accepts inputs like this:
val inputFeature0 = TensorBuffer
.createFixedSize(intArrayOf(1, 224, 224, 3), DataType.FLOAT32)
.loadBuffer(/* the resized image as ByteBuffer */)
Note that the ByteBuffer will have a size of 224 * 224 * 3 * 4 bytes (where 4 is DataType.FLOAT32.byteSize()).
Edit: I've cleaned up some of the old text because it was getting overwhelming. The code suggested below actually works: I just forgot to delete a piece of my own code which was already converting the same ImageProxy to Bitmap and it must have caused some internal buffer to be read until the end, so it was either necessary to rewind it manually or to delete that useless code altogether.
However, even if the cropRect is applied to the ImageProxy and the underlying Image, the resulting bitmap is still full size so there must be something else to do. The model is still returning NaN values, so I'm going to experiment with the raw output for a while.
fun hasMask(imageProxy: ImageProxy, boundingBox: Rect): Boolean {
val model = MaskDetector.newInstance(context)
val inputFeature0 = TensorBuffer.createFixedSize(intArrayOf(1, 224, 224, 3), DataType.FLOAT32)
// now the cropRect is set correctly but the image itself isn't
// cropped before being converted to Bitmap
imageProxy.setCropRect(box)
imageProxy.image?.cropRect = box
val bitmap = BitmapUtils.getBitmap(imageProxy) ?: return false
val resized = Bitmap.createScaledBitmap(bitmap, 224, 224, false)
// input for the model
val buffer = ByteBuffer.allocate(224 * 224 * 3 * DataType.FLOAT32.byteSize())
resized.copyPixelsToBuffer(buffer)
// use the model and get the result as 2 Floats
val outputFeature0 = model.process(inputFeature0).outputFeature0AsTensorBuffer
val maskProbability = outputFeature0.floatArray[0]
val noMaskProbability = outputFeature0.floatArray[1]
model.close()
return maskProbability > noMaskProbability
}
We will provide a better way to handle the image processing when working with ML Kit.
For now, you could try this method: https://github.com/googlesamples/mlkit/blob/master/android/vision-quickstart/app/src/main/java/com/google/mlkit/vision/demo/BitmapUtils.java#L74
It will convert the ImageProxy to Bitmap, and rotate it to upright. The bounding box from the face detection should be applied to the bitmap directly, which means you should be able to crop the bitmap with the Rect bounding box.
I'm trying to build a classification model with keras and deploy the model to my Android phone. I use the code from this website to deploy my own converted model, which is a .pb file, to my Android phone. I load a image from my phone and everything worked fine, but the prediction result is totally different from the result I got from my PC.
The procedure of testing on my PC are:
load the image with cv2, and convert to np.float32
use the keras resnet50 'preprocess_input' python function to preprocess the image
expand the image dimension for batching (batch size is 1)
forward the image to model and get the result
Relevant code:
img = cv2.imread('./my_test_image.jpg')
x = preprocess_input(img.astype(np.float32))
x = np.expand_dims(x, axis=0)
net = load_model('./my_model.h5')
prediction_result = net.predict(x)
And I noticed that the image preprocessing part of Android is different from the method I used in keras, which mode is caffe(convert the images from RGB to BGR, then zero-center each color channel with respect to the ImageNet dataset). It seems that the original code is for mode tf(will scale pixels between -1 to 1).
So I modified the following code of 'preprocessBitmap' to what I think it should be, and use a 3 channel RGB image with pixel value [127,127,127] to test it. The code predicted the same result as .h5 model did. But when I load a image to classify, the prediction result is different from .h5 model.
Does anyone has any idea? Thank you very much.
I have tried the following:
Load a 3 channel RGB image in my Phone with pixel value [127,127,127], and use the modified code below, and it will give me a prediction result that is same as prediction result using .h5 model on PC.
Test the converted .pb model on PC using tensorflow gfile module with a image, and it give me a correct prediction result (compare to .h5 model). So I think the converted .pb file does not have any problem.
Entire section of preprocessBitmap
// code of 'preprocessBitmap' section in TensorflowImageClassifier.java
TraceCompat.beginSection("preprocessBitmap");
// Preprocess the image data from 0-255 int to normalized float based
// on the provided parameters.
bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());
for (int i = 0; i < intValues.length; ++i) {
// this is a ARGB format, so we need to mask the least significant 8 bits to get blue, and next 8 bits to get green and next 8 bits to get red. Since we have an opaque image, alpha can be ignored.
final int val = intValues[i];
// original
/*
floatValues[i * 3 + 0] = (((val >> 16) & 0xFF) - imageMean) / imageStd;
floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - imageMean) / imageStd;
floatValues[i * 3 + 2] = ((val & 0xFF) - imageMean) / imageStd;
*/
// what I think it should be to do the same thing in mode caffe when using keras
floatValues[i * 3 + 0] = (((val >> 16) & 0xFF) - (float)123.68);
floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - (float)116.779);
floatValues[i * 3 + 2] = (((val & 0xFF)) - (float)103.939);
}
TraceCompat.endSection();
This question is old, but remains the top Google result for preprocess_input for ResNet50 on Android. I could not find an answer for implementing preprocess_input for Java/Android, so I came up with the following based on the original python/keras code:
/*
Preprocesses RGB bitmap IAW keras/imagenet
Port of https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/applications/imagenet_utils.py#L169
with data_format='channels_last', mode='caffe'
Convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling.
Returns 3D float array
*/
static float[][][] imagenet_preprocess_input_caffe( Bitmap bitmap ) {
// https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/applications/imagenet_utils.py#L210
final float[] imagenet_means_caffe = new float[]{103.939f, 116.779f, 123.68f};
float[][][] result = new float[bitmap.getHeight()][bitmap.getWidth()][3]; // assuming rgb
for (int y = 0; y < bitmap.getHeight(); y++) {
for (int x = 0; x < bitmap.getWidth(); x++) {
final int px = bitmap.getPixel(x, y);
// rgb-->bgr, then subtract means. no scaling
result[y][x][0] = (Color.blue(px) - imagenet_means_caffe[0] );
result[y][x][1] = (Color.green(px) - imagenet_means_caffe[1] );
result[y][x][2] = (Color.red(px) - imagenet_means_caffe[2] );
}
}
return result;
}
Usage with a 3D tensorflow-lite input with shape (1,224,224,3):
Bitmap bitmap = <your bitmap of size 224x224x3>;
float[][][][] imgValues = new float[1][bitmap.getHeight()][bitmap.getWidth()][3];
imgValues[0]=imagenet_preprocess_input_caffe(bitmap);
... <prep tfInput, tfOutput> ...
tfLite.run(tfInput, tfOutput);
I have an Android Project with OpenCV4.0.1 and TFLite installed.
And I want to make an inference with a pretrained MobileNetV2 of an cv::Mat which I extracted and cropped from a CameraBridgeViewBase (Android style).
But it's kinda difficult.
I followed this example.
That does the inference about a ByteBuffer variable called "imgData" (line 71, class: org.tensorflow.lite.examples.classification.tflite.Classifier)
That imgData looks been filled on the method called "convertBitmapToByteBuffer" from the same class (line 185), adding pixel by pixel form a bitmap that looks to be cropped little before.
private int[] intValues = new int[224 * 224];
Mat _croppedFace = new Mat() // Cropped image from CvCameraViewFrame.rgba() method.
float[][] outputVal = new float[1][1]; // Output value from my MobileNetV2 // trained model (i've changed the output on training, tested on python)
// Following: https://stackoverflow.com/questions/13134682/convert-mat-to-bitmap-opencv-for-android
Bitmap bitmap = Bitmap.createBitmap(_croppedFace.cols(), _croppedFace.rows(), Bitmap.Config.ARGB_8888);
Utils.matToBitmap(_croppedFace, bitmap);
convertBitmapToByteBuffer(bitmap); // This call should be used as the example one.
// runInference();
_tflite.run(imgData, outputVal);
But, it looks that the input_shape of my NN is not correct, but I'm following the MobileNet example because my NN it's a MobileNetV2.
I've solved the error, but I'm sure that it isn't the best way to do it.
Keras MobilenetV2 input_shape is: (nBatches, 224, 224, nChannels).
I just want to predict a single image, so, nBaches == 1, and I'm working on RGB mode, so nChannels == 3
// Nasty nasty, but works. nBatches == 2? -- _cropped.shape() == (244, 244), 3 channels.
float [][][][] _inputValue = new float[2][_cropped.cols()][_cropped.rows()][3];
// Fill the _inputValue
for(int i = 0; i < _croppedFace.cols(); ++i)
for (int j = 0; j < _croppedFace.rows(); ++j)
for(int z = 0; z < 3; ++z)
_inputValue [0][i][j][z] = (float) _croppedFace.get(i, j)[z] / 255; // DL works better with 0:1 values.
/*
Output val, has this shape, but I don't really know why.
I'm sure that one's of that 2's is for nClasses (I'm working with 2 classes)
But I don't really know why it's using the other one.
*/
float[][] outputVal = new float[2][2];
// Tensorflow lite interpreter
_tflite.run(_inputValue , outputVal);
On python has the same shape:
Python prediction:
[[XXXXXX, YYYYY]] <- Sure for the last layer that I made, this is just a prototype NN.
Hope some one got help, and also that someone can improve the answer because this is not very optimized.