Trying to use a retrained MobileNet model to predict dog breeds, but when using the model through Firebase MLKit, it is unable to correctly predict the dog breed. The desktop model and the tflite model are both able to correctly predict the breed, but using the same image of a pug, the desktop model and the tflite model (on desktop) are 87.8% confident that it is a pug; whereas on MLKit, the confidence is 1.47x10-2% confident.
I'm suspecting the issue is in my preprocessing of the image in the app code. The docs show how to scale the pixels in the range -1.0, 1.0; which according to the code for the keras image preprocessing function is what is required.
Here is my infer(iStream) function where I think the error may lie. Any help is greatly appreciated, this is driving me crazy.
private fun infer(iStream: InputStream?) {
Log.d("ML_TAG", "infer")
val bmp = Bitmap.createScaledBitmap(BitmapFactory.decodeStream(iStream), 224, 224, true)
i.setImageBitmap(bmp)
val bNum = 0
val input = Array(1) { Array(224) { Array(224) { FloatArray(3) } } }
for (x in 0..223) {
for (y in 0..223) {
val px = bmp.getPixel(x, y)
input[bNum][x][y][0] = (Color.red(px) - 127) / 255.0f
input[bNum][x][y][1] = (Color.green(px) - 127) / 255.0f
input[bNum][x][y][2] = (Color.blue(px) - 127) / 255.0f
}
}
val inputs = FirebaseModelInputs.Builder()
.add(input)
.build()
interpreter.run(inputs, ioOpts).addOnSuccessListener { res ->
val o = res.getOutput<kotlin.Array<FloatArray>>(0)
val prob = o[0]
val r = BufferedReader(InputStreamReader(assets.open("retrained_labels.txt")))
val arrToSort = arrayListOf<Pair<String, Float>>()
val rArr = r.readLines()
for (i in prob.indices) {
val p = Pair(rArr[i], prob[i])
arrToSort.add(p)
}
val sortedList = arrToSort.sortedWith(compareByDescending {it.second})
val topFive = sortedList.slice(0..4)
arrToSort.forEach {
if (it.first == "pug") {
Log.i("ML_TAG", "Pug: ${it.second}")
}
}
sortedList.forEach {
if(it.first == "pug") {
Log.i("ML_TAG", "Pug: ${it.second}")
}
}
topFive.forEach {
Log.i("ML_TAG", "${it.first}: ${it.second}")
}
}
.addOnFailureListener { res ->
Log.e("ML_TAG", res.message)
}
}
I think (Color.red(px) - 127) / 255.0f scales to [-0.5, 0.5]. Does (Color.red(px) - 127) / 128.0f produce better results?
Related
I have been trying to convert world coordinates into 2d screen position and since sceneform is deprecated, i need to do it by hand.
My test case is
val centerPose = frame.hitTest(centerX, centerY)[0].hitPose
val newFrame = session.update()
val shouldBeCenter = newFrame.worldToScreen(centerPose)
As mentioned in this java SO question and in the sceneform source code i tried replicating the behavior without success.
I also came accross this recent SO answer even tho i was not convinced by the calculation i had nothing to lose.
And of course after doing some tests, worldToScreen does not work at all (it returns a point in the screen but nowhere near the center).
What did i miss ?
fun Frame.worldToScreen(pose: Pose): PointF {
val screenSize = Size(
this.camera.imageIntrinsics.imageDimensions[0],
this.camera.imageIntrinsics.imageDimensions[1]
)
val viewMatrix = this.viewMatrix()
val projMatrix = this.projectionMatrix()
val poseMatrix = pose.matrix()
// from sceneform
val m = projMatrix.multiply(viewMatrix)
val newVector = (m.multiply(poseMatrix)).multiply(v4Origin)
newVector.x = (((newVector.x / newVector.w) + 1f) / 2f) * screenSize.width
newVector.y = screenSize.height - (((newVector.y / newVector.w) + 1f) / 2f) * screenSize.height
//https://stackoverflow.com/questions/73372680/arcore-3d-coordinate-system-to-2d-screen-coordinates
val thisRes = (projMatrix.multiply(poseMatrix)).multiply(v4Origin)
val thisX = (((thisRes.x / thisRes.w) + 1f) / 2f) * screenSize.width
val thisY = screenSize.height - (((thisRes.y / thisRes.w) + 1f) / 2f) * screenSize.height
return PointF(newVector.x, newVector.y)
}
// additionnal code for matrix
data class M4(val floatArray: FloatArray)
fun M4.multiply(m: M4): M4 = FloatArray(16)
.also { Matrix.multiplyMM(it, 0, floatArray, 0, m.floatArray, 0) }
.let { M4(it) }
fun Frame.projectionMatrix(): M4 = FloatArray(16)
.apply { camera.getProjectionMatrix(this, 0, 0.1f, 100f) }
.let { M4(it) }
fun Frame.viewMatrix(): M4 = FloatArray(16)
.apply { camera.getViewMatrix(this, 0) }
.let { M4(it) }
fun Pose.matrix(): M4 = FloatArray(16)
.also { toMatrix(it, 0) }
.let { M4(it) }
// V4 is just a floatArray of 4 elements with x,y,z,w accessors
val v4Origin: V4 = v4(0f, 0f, 0f, 1f)
I am able to run my custom tflite model in android but the output is totally wrong. I suspect it is due to my model needs input shape [1, 3, 640, 640] but the code makes channel last ByteBuffer. I have created tensor buffer like this TensorBuffer.createFixedSize(intArrayOf(1, 3, 640, 640), DataType.FLOAT32) but I still suspect inside the for loop, the channel is not properly set in the flat input (ByteBuffer).
I have copied this code from example where the required model shape was [1,32,32,3] (channel last). This is the reason for my doubt.
Below is my code:-
val model = YoloxPlate.newInstance(applicationContext)
val inputFeature0 = TensorBuffer.createFixedSize(intArrayOf(1, 3, 640, 640), DataType.FLOAT32)
val input = ByteBuffer.allocateDirect(640*640*3*4).order(ByteOrder.nativeOrder())
for (y in 0 until 640) {
for (x in 0 until 640) {
val px = bitmap.getPixel(x, y)
// Get channel values from the pixel value.
val r = Color.red(px)
val g = Color.green(px)
val b = Color.blue(px)
// Normalize channel values to [-1.0, 1.0]. This requirement depends on the model.
// For example, some models might require values to be normalized to the range
// [0.0, 1.0] instead.
val rf = r/ 1f
val gf = g/ 1f
val bf = b/ 1f
input.putFloat(bf)
input.putFloat(gf)
input.putFloat(rf)
}
}
inputFeature0.loadBuffer(input)
val outputs = model.process(inputFeature0)
val outputFeature0 = outputs.outputFeature0AsTensorBuffer
val flvals = outputFeature0.getFloatArray();
After using whiteboard and making and setting dim manually of the matrix, I figured it out.
It also used BGR instead of RGB as required by the model.
Working Perfectly now, here is the code (need to optimize multiple loop):-
val model = YoloxPlate.newInstance(applicationContext)
val inputFeature0 = TensorBuffer.createFixedSize(intArrayOf(1, 3, 640, 640), DataType.FLOAT32)
val input = ByteBuffer.allocateDirect(640*640*3*4).order(ByteOrder.nativeOrder())
for (y in 0 until 640) {
for (x in 0 until 640) {
val px = bitmap.getPixel(x, y)
val b = Color.blue(px)
val bf = b/ 1f
input.putFloat(bf)
}
}
for (y in 0 until 640) {
for (x in 0 until 640) {
val px = bitmap.getPixel(x, y)
val g = Color.green(px)
val gf = g/ 1f
input.putFloat(gf)
}
}
for (y in 0 until 640) {
for (x in 0 until 640) {
val px = bitmap.getPixel(x, y)
val r = Color.red(px)
val rf = r/ 1f
input.putFloat(rf)
}
}
inputFeature0.loadBuffer(input)
val outputs = model.process(inputFeature0)
val outputFeature0 = outputs.outputFeature0AsTensorBuffer
val flvals = outputFeature0.getFloatArray();
I'm trying to run a YoloV4 model on Android that's been converted to .tflite. My input shape seems to be fine [1, 224, 224, 4] but the app crashes on my output shape. I'm using code from a Udacity course on tflite.
I get the above error when I run the following code:
class TFLiteObjectDetectionAPIModel private constructor() : Classifier {
override val statString: String
get() = TODO("not implemented") //To change initializer of created properties use File | Settings | File Templates.
private var isModelQuantized: Boolean = false
// Config values.
private var inputSize: Int = 0
// Pre-allocated buffers.
private val labels = Vector<String>()
private var intValues: IntArray? = null
// outputLocations: array of shape [Batchsize, NUM_DETECTIONS,4]
// contains the location of detected boxes
private var outputLocations: Array<Array<FloatArray>>? = null
// outputClasses: array of shape [Batchsize, NUM_DETECTIONS]
// contains the classes of detected boxes
private var outputClasses: Array<FloatArray>? = null
// outputScores: array of shape [Batchsize, NUM_DETECTIONS]
// contains the scores of detected boxes
private var outputScores: Array<FloatArray>? = null
// numDetections: array of shape [Batchsize]
// contains the number of detected boxes
private var numDetections: FloatArray? = null
private var imgData: ByteBuffer? = null
private var tfLite: Interpreter? = null
override fun recognizeImage(bitmap: Bitmap): List<Classifier.Recognition> {
// Log this method so that it can be analyzed with systrace.
Trace.beginSection("recognizeImage")
Trace.beginSection("preprocessBitmap")
// Preprocess the image data from 0-255 int to normalized float based
// on the provided parameters.
bitmap.getPixels(intValues, 0, bitmap.width, 0, 0, bitmap.width, bitmap.height)
imgData!!.rewind()
for (i in 0 until inputSize) {
for (j in 0 until inputSize) {
val pixelValue = intValues!![i * inputSize + j]
if (isModelQuantized) {
// Quantized model
imgData!!.put((pixelValue shr 16 and 0xFF).toByte())
imgData!!.put((pixelValue shr 8 and 0xFF).toByte())
imgData!!.put((pixelValue and 0xFF).toByte())
} else { // Float model
imgData!!.putFloat(((pixelValue shr 16 and 0xFF) - IMAGE_MEAN) / IMAGE_STD)
imgData!!.putFloat(((pixelValue shr 8 and 0xFF) - IMAGE_MEAN) / IMAGE_STD)
imgData!!.putFloat(((pixelValue and 0xFF) - IMAGE_MEAN) / IMAGE_STD)
}
}
}
Trace.endSection() // preprocessBitmap
// Copy the input data into TensorFlow.
Trace.beginSection("feed")
outputLocations = Array(1) { Array(NUM_DETECTIONS) { FloatArray(4) } }
outputClasses = Array(1) { FloatArray(NUM_DETECTIONS) }
outputScores = Array(1) { FloatArray(NUM_DETECTIONS) }
numDetections = FloatArray(1)
val inputArray = arrayOf<Any>(imgData!!)
val outputMap = ArrayMap<Int, Any>()
outputMap[0] = outputLocations!!
outputMap[1] = outputClasses!!
outputMap[2] = outputScores!!
outputMap[3] = numDetections!!
Trace.endSection()
// Run the inference call.
Trace.beginSection("run")
tfLite!!.runForMultipleInputsOutputs(inputArray, outputMap)
Trace.endSection()
// Show the best detections.
// after scaling them back to the input size.
val recognitions = ArrayList<Classifier.Recognition>(NUM_DETECTIONS)
for (i in 0 until NUM_DETECTIONS) {
val detection = RectF(
outputLocations!![0][i][1] * inputSize,
outputLocations!![0][i][0] * inputSize,
outputLocations!![0][i][3] * inputSize,
outputLocations!![0][i][2] * inputSize)
// SSD Mobilenet V1 Model assumes class 0 is background class
// in label file and class labels start from 1 to number_of_classes+1,
// while outputClasses correspond to class index from 0 to number_of_classes
val labelOffset = 1
recognitions.add(
Classifier.Recognition(
"" + i,
labels[outputClasses!![0][i].toInt() + labelOffset],
outputScores!![0][i],
detection))
}
Trace.endSection() // "recognizeImage"
return recognitions
}
override fun enableStatLogging(debug: Boolean) {
//Not implemented
}
override fun close() {
//Not needed.
}
override fun setNumThreads(numThreads: Int) {
if (tfLite != null) tfLite!!.setNumThreads(numThreads)
}
override fun setUseNNAPI(isChecked: Boolean) {
if (tfLite != null) tfLite!!.setUseNNAPI(isChecked)
}
companion object {
// Only return this many results.
private const val NUM_DETECTIONS = 3087
// Float model
private const val IMAGE_MEAN = 128.0f
private const val IMAGE_STD = 128.0f
/** Memory-map the model file in Assets. */
#Throws(IOException::class)
private fun loadModelFile(assets: AssetManager, modelFilename: String): MappedByteBuffer {
val fileDescriptor = assets.openFd(modelFilename)
val inputStream = FileInputStream(fileDescriptor.fileDescriptor)
val fileChannel = inputStream.channel
val startOffset = fileDescriptor.startOffset
val declaredLength = fileDescriptor.declaredLength
return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength)
}
/**
* Initializes a native TensorFlow session for classifying images.
*
* #param assetManager The asset manager to be used to load assets.
* #param modelFilename The filepath of the model GraphDef protocol buffer.
* #param labelFilename The filepath of label file for classes.
* #param inputSize The size of image input
* #param isQuantized Boolean representing model is quantized or not
*/
#Throws(IOException::class)
fun create(
assetManager: AssetManager,
modelFilename: String,
labelFilename: String,
inputSize: Int,
isQuantized: Boolean): Classifier {
val d = TFLiteObjectDetectionAPIModel()
val labelsInput: InputStream?
val actualFilename = labelFilename.split("file:///android_asset/".toRegex())
.dropLastWhile { it.isEmpty() }.toTypedArray()[1]
labelsInput = assetManager.open(actualFilename)
val br: BufferedReader?
br = BufferedReader(InputStreamReader(labelsInput!!))
while (br.readLine()?.let { d.labels.add(it) } != null);
br.close()
d.inputSize = inputSize
try {
val options = Interpreter.Options()
options.setNumThreads(4)
d.tfLite = Interpreter(loadModelFile(assetManager, modelFilename), options)
} catch (e: Exception) {
throw RuntimeException(e)
}
d.isModelQuantized = isQuantized
// Pre-allocate buffers.
val numBytesPerChannel: Int = if (isQuantized) {
1 // Quantized
} else {
4 // Floating point
}
d.imgData = ByteBuffer.allocateDirect(1 * d.inputSize * d.inputSize * 3 * numBytesPerChannel)
d.imgData!!.order(ByteOrder.nativeOrder())
d.intValues = IntArray(d.inputSize * d.inputSize)
d.outputLocations = Array(1) { Array(NUM_DETECTIONS) { FloatArray(2) } }
d.outputClasses = Array(1) { FloatArray(NUM_DETECTIONS) }
d.outputScores = Array(1) { FloatArray(NUM_DETECTIONS) }
d.numDetections = FloatArray(1)
return d
}
}
When I change the outputLocation to
outputLocations = Array(1) { Array(NUM_DETECTIONS) { FloatArray(2) } }
I get the following error Cannot copy from a TensorFlowLite tensor (Identity) with shape [1, 3087, 4] to a Java object with shape [1, 3087, 2]
What is Identity and Identity_1? I've looked at my model on Netron and can see both but I'm not sure how to understand the model.
Can anyone help? Is there anything else I can change or is my model just not suitable for mobile platforms?
Cannot copy from a TensorFlowLite tensor (Identity) with shape [1, 25200, 8] to a Java object with shape [1, 80, 80, 255].
I have encountered similar problems and haven't found a solution yet
I created a .tflite model with keras.
I tested this model with python for different set of inputs (28x28x3) and it works fine -> shows different predictions depending on photo on the input.
When I use it on Android, it shows the same prediction value for all photo inputs.
Does anybody knows how to resolve this issue?
Python script
image = cv2.resize(image, (28, 28))
image = image.astype("float") / 255.0
image = img_to_array(image)
image = np.expand_dims(image, axis=0)
# load the trained convolutional neural network
print("[INFO] loading network...")
model = load_model(args["model"])
Android
private fun convertBitmapToByteBuffer(bitmap: Bitmap): ByteBuffer {
val inputSize = 28
val byteBuffer = ByteBuffer.allocateDirect(2352 * 4)
byteBuffer.order(ByteOrder.nativeOrder())
val intValues = IntArray(inputSize * inputSize)
bitmap.getPixels(intValues, 0, bitmap.width, 0, 0, bitmap.width, bitmap.height)
var pixel = 0
for (i in 0 until inputSize) {
for (j in 0 until inputSize) {
val value = intValues[pixel++]
byteBuffer.put(((value shr 16 and 0xFF) / 255.0f).toByte())
byteBuffer.put(((value shr 8 and 0xFF) / 255.0f).toByte())
byteBuffer.put(((value and 0xFF) / 255.0f).toByte())
}
}
return byteBuffer
}
private fun getPrediction(bitmap: Bitmap) {
tensorFlowModel?.let { tensorFlowModel ->
val tflite = Interpreter(tensorFlowModel, Interpreter.Options())
val inputData = convertBitmapToByteBuffer(bitmap)
val labelProbArray: Array<FloatArray> = Array(1) { FloatArray(2) }
tflite.run(inputData, labelProbArray)
val prediction = (labelProbArray[0][0])
}
}
I am using MLKiT for loading custom tensoflow model While reading the model gets following error
java.lang.IllegalArgumentException: Cannot convert between a TensorFlowLite tensor with type UINT8 and a Java object of type [[[[F (which is compatible with the TensorFlowLite type FLOAT32).
I am using below code for object detection using tlflite file
private fun bitmapToInputArray(bitmap: Bitmap): Array<Array<Array<FloatArray>>> {
var bitmap = bitmap
bitmap = Bitmap.createScaledBitmap(bitmap, 224, 224, true)
val batchNum = 0
val input = Array(1) { Array(224) { Array(224) { FloatArray(3) } } }
for (x in 0..223) {
for (y in 0..223) {
val pixel = bitmap.getPixel(x, y)
// Normalize channel values to [-1.0, 1.0]. This requirement varies by
// model. For example, some models might require values to be normalized
// to the range [0.0, 1.0] instead.
input[batchNum][x][y][0] = (Color.red(pixel) - 127) / 128.0f
input[batchNum][x][y][1] = (Color.green(pixel) - 127) / 128.0f
input[batchNum][x][y][2] = (Color.blue(pixel) - 127) / 128.0f
}
}
return input
}
private fun setImageData(input: Array<Array<Array<FloatArray>>>) {
var inputs: FirebaseModelInputs? = null
try {
inputs = FirebaseModelInputs.Builder()
.add(input) // add() as many input arrays as your model requires
.build()
} catch (e: FirebaseMLException) {
e.printStackTrace()
}
firebaseInterpreter!!.run(inputs!!, inputOutputOptions!!)
.addOnSuccessListener(
OnSuccessListener<FirebaseModelOutputs> {
// ...
Log.d("Final",it.toString());
})
.addOnFailureListener(
object : OnFailureListener {
override fun onFailure(p0: Exception) {
// Task failed with an exception
// ..
}
})
}
Your model expects a quantized image. You can prepare it thus:
val input = Array(1) { Array(224) { Array(224) { ByteArray(3) } } }
for (x in 0..223) {
for (y in 0..223) {
val pixel = bitmap.getPixel(x, y)
input[batchNum][x][y][0] = Color.red(pixel)
input[batchNum][x][y][1] = Color.green(pixel)
input[batchNum][x][y][2] = Color.blue(pixel)
}
}
Note that it's often easier to pass a ByteBuffer to tflite, instead of a multidimensional array.