How to interpret mobilenetv2 segmentation result output on Android?

How to interpret mobilenetv2 segmentation result output on Android? - android

I trained a quantized semantic segmentation model with my own dataset using the python scripts available on Deeplab's official Github page. I used the mobilenetv2_coco_voc_trainaug backbone. I checked the result model in Netron and this how the input an output looks:
As you can see, the output is an array of int64 with size of 257x257. From my understanding this array should contain the index of label with the highest probability at every array index, or am I missing something?
But when when I try to read this in Android, I got just zeros and ones, indiferent of what is in picture, people, cow, etc.
for (y in 0 until imageHeight) {
for (x in 0 until imageWidth) {
// resultBuffer is a ByteBuffer of size imageSize * imageSize * 8
val value = resultBuffer.getLong((y * imageWidth + x) * 8)
}
}
The result is not that accurate either, since I'm getting segmentation values where I shouldn't.
Any help would be appreciated!

Cant comment yet, lets try guess.
You are trying to use quantized model with int64 output. Output should be 8bit type
And yes, accuracy will drop with quantized model

Related

TensorFlow object detection fails on Xamarin Android with a reshape issue

I am following this blog post and GitHub almost exactly:
Blog
Github
But when I run, take a picture and call this line:
var outputs = new float[tfLabels.Count];
tfInterface.Feed("Placeholder", floatValues, 1, 227, 227, 3);
tfInterface.Run(new[] { "loss" });
tfInterface.Fetch("loss", outputs);
The app actually crashes and generates the error below on the .Run line.
I get this error in the output window (and the app crashes):
04-04 17:39:12.575 E/TensorFlowInferenceInterface( 8017): Failed to
run TensorFlow inference with inputs:[Placeholder], outputs:[loss]
Unhandled Exception:
Java.Lang.IllegalArgumentException: Input to reshape is a tensor with
97556 values, but the requested shape requires a multiple of 90944
[[Node: block0_0_reshape0 = Reshape[T=DT_FLOAT, Tshape=DT_INT32,
_device="/job:localhost/replica:0/task:0/device:CPU:0"](block0_0_concat,
block0_0_reshape0/shape)]]
According to the posts I am reading from the searching I am doing on this error, I sort of understand this is due to the image not fitting the expected size exactly but in the example I am following, this is resized to fit 227x227 everytime and converted to float like in these lines:
var resizedBitmap = Bitmap.CreateScaledBitmap(bitmap, 227, 227, false).Copy(Bitmap.Config.Argb8888, false);
var floatValues = new float[227 * 227 * 3];
var intValues = new int[227 * 227];
resizedBitmap.GetPixels(intValues, 0, 227, 0, 0, 227, 227);
for(int i = 0; i < intValues.Length; i++)
{
var val = intValues[i];
floatValues[i * 3 + 0] = ((val & 0xFF) - 104);
floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - 117);
floatValues[i * 3 + 2] = (((val >> 16) & 0xFF) - 123);
}
So, I don't understand what is causing this or how to fix it. Please help!
UPDATE: I found out the issue is with my model or my labels. I found this out by simply swapping in the model and label file from the sample/github above while leaving all my code the same. When I did this, I no longer get the error. HOWEVER, this still doesn't tell me much. The error is not very explanatory to point me in a direction of what could be wrong with my model. I assume it is the model because the labels file is simply just a text file with labels on each line. I used Custom Vision Service on Azure to create my model. It trained fine and tests just fine on the web portal. I then exported it as TensorFlow. So, I am not sure what I could have done wrong or how to fix it.
Thanks!

After no answers here and several days of searching and trial and error, I have found the issue. In general, I guess this reshape error I was getting you can get if you are feeding the model with an image size other that it is expecting or setup to receive.
The issue is that, everything I have read says that typically you must feed the model with a 227 x 227 x 3 image. Then, I started noticing that size varies on some posts. Some people say 225 x 225 x 3, others say 250 x 250 x 3 and so on. I had tried those sizes as well with no luck.
As you can see in my edit in the question, I did have a clue. When using somebody else's pretrained model, my code works fine. However, when I use my custom model which I created on the Microsoft Azure CustomVision.ai site, I was getting this error.
So, I decided I would try to inspect the models to see what was different. I followed this post: Inspect a pre trained model
When I inspected the model that works using TensorBoard, I see that the input is 227 x 227 x 3 which is what I expected. However, when I viewed my model, I noticed that it was 224 x 224 x 3! I changed my code to resize the image to that size and it works! Problem went away.
So, to summarize, for some reason Microsoft Custom Vision service model generated a model to expect an image size of 224 x 224 x 3. I didn't see any documentation or setting for this. I also don't know if that number will change with each model. If you get a similar shape error, the first place I would check is the size of the image you are feeding your model and what it expects as an input. The good news is you can check your model, even if pre-trained, using TensorBoard and the post I linked above. Look at the input section, it should look something like this:
Hope this helps!

Improving threshold result for Tesseract

I am kind of stuck with this problem, and I know there are so many questions about it on stack overflow but in my case. Nothing gives the expected result.
The Context:
Am using Android OpenCV along with Tesseract so I can read the MRZ area in the passport. When the camera is started I pass the input frame to an AsyncTask, the frame is processed, the MRZ area is extracted succesfully, I pass the extracted MRZ area to a function prepareForOCR(inputImage) that takes the MRZ area as gray Mat and Will output a bitmap with the thresholded image that I will pass to Tesseract.
The problem:
The problem is while thresholding the Image, I use adaptive thresholding with blockSize = 13 and C = 15, but the result given is not always the same depending on the lighting of the image and the conditions in general from which the frame is taken.
What I have tried:
First I am resizing the image to a specific size (871,108) so the input image is always the same and not dependant on which phone is used.
After resizing, I try with different BlockSize and C values
//toOcr contains the extracted MRZ area
Bitmap toOCRBitmap = Bitmap.createBitmap(bitmap);
Mat inputFrame = new Mat();
Mat toOcr = new Mat();
Utils.bitmapToMat(toOCRBitmap, inputFrame);
Imgproc.cvtColor(inputFrame, inputFrame, Imgproc.COLOR_BGR2GRAY);
TesseractResult lastResult = null;
for (int B = 11; B < 70; B++) {
for (int C = 11; C < 70; C++){
if (IsPrime(B) && IsPrime(C)){
Imgproc.adaptiveThreshold(inputFrame, toOcr, 255, Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C, Imgproc.THRESH_BINARY, B ,C);
Bitmap toOcrBitmap = OpenCVHelper.getBitmap(toOcr);
TesseractResult result = TesseractInstance.extractFrame(toOcrBitmap, "ocrba");
if (result.getMeanConfidence()> 70) {
if (MrzParser.tryParse(result.getText())){
Log.d("Main2Activity", "Best result with " + B + " : " + C);
return result;
}
}
}
}
}
Using the code below, the thresholded result image is a black on white image which gives a confidence greater than 70, I can't really post the whole image for privacy reasons, but here's a clipped one and a dummy password one.
Using the MrzParser.tryParse function which adds checks for the character position and its validity within the MRZ, am able to correct some occurences like a name containing a 8 instead of B, and get a good result but it takes so much time, which is normal because am thresholding almost 255 images in the loop, adding to that the Tesseract call.
I already tried getting a list of C and B values which occurs the most but the results are different.
The question:
Is there a way to define a C and blocksize value so that it s always giving the same result, maybe adding more OpenCV calls so The input image like increasing contrast and so on, I searched the web for 2 weeks now I can't find a viable solution, this is the only one that is giving accurate results

You can use a clustering algorithm to cluster the pixels based on color. The characters are dark and there is a good contrast in the MRZ region, so a clustering method will most probably give you a good segmentation if you apply it to the MRZ region.
Here I demonstrate it with MRZ regions obtained from sample images that can be found on the internet.
I use color images, apply some smoothing, convert to Lab color space, then cluster the a, b channel data using kmeans (k=2). The code is in python but you can easily adapt it to java. Due to the randomized nature of the kmeans algorithm, the segmented characters will have label 0 or 1. You can easily sort it out by inspecting cluster centers. The cluster-center corresponding to characters should have a dark value in the color space you are using.
I just used the Lab color space here. You can use RGB, HSV or even GRAY and see which one is better for you.
After segmenting like this, I think you can even find good values for B and C of your adaptive-threshold using the properties of the stroke width of the characters (if you think the adaptive-threshold gives a better quality output).
import cv2
import numpy as np
im = cv2.imread('mrz1.png')
# convert to Lab
lab = cv2.cvtColor(cv2.GaussianBlur(im, (3, 3), 1), cv2.COLOR_BGR2Lab)
im32f = np.array(im[:, :, 1:3], dtype=np.float32)
k = 2 # 2 clusters
term_crit = (cv2.TERM_CRITERIA_EPS, 30, 0.1)
ret, labels, centers = cv2.kmeans(im32f.reshape([im.shape[0]*im.shape[1], -1]),
k, None, term_crit, 10, 0)
# segmented image
labels = labels.reshape([im.shape[0], im.shape[1]]) * 255
Some results:

Saving off raw with Camera2 on a camera with rowstride > width

What I have developed thus far is the capability to write out various devices raw information using the standard DngCreator scheme as per below.
On one device that I am encountering however (HTC 10) the Image class contains planar information whose row stride is larger than the width. I so far have an understanding that this can happen with images, but I can't find out how to correct for it with the SDK available to us.
ByteBuffer byteBuffer = ByteBuffer.wrap(cameraImageF.getRawBytes());
byteBuffer.rewind();
dngCreator.writeByteBuffer(new FileOutputStream(rawLoggerFileF),
new Size(cameraImageF.getRawImageSize().getWidth(), cameraImageF.getRawImageSize().getHeight()),
byteBuffer, 0);
I have held onto the bytes from the original Image class and do some substantial calculations in between when I received them and when they were taken (this is the point of the application). So, I need to let go of the Image so that I can keep getting additional frames from the camera.
Now, this approach works fine for various devices (Samsung S7, Nexus 5, Nexus 6p, etc.). However on the HTC 10 the stride is 16 bytes longer per row and it seems as though I have no way of letting the DngCreator know that.
Underneath in the source code, the writeBuffer defaults to an internal rowStride = width * pixelStride. I do not have the capability to send in a different stride for a parameter. The rowStride does not equal the defaults.
The dngCreator.saveImage(Outputstream, Image) uses the internal Image's stride when it writes out to a buffer. However, I can't hold on to an Image class on the camera because it needs to be released and it is not a cloneable object.
I am a bit lost and trying to understand how to write out a valid .dng for a photograph that has rowStride > width.

You'll have to remove the extra bytes manually - that is, copy the raw image to a new ByteBuffer, and remove the extra bytes at the end of each row. So something like:
byte[] rawBytes = cameraImageF.getRawBytes();
ByteBuffer dst = ByteBuffer.allocate(cameraImageF.getRawImageSize().getWidth() * cameraImageF.getRawImageSize().getHeight() * 2);
for (int row = 0; row < cameraImageF.getRawImageSize().getHeight(); row++) {
dst.put(rawBytes,
row * cameraImageF.getRawImageRowStride(),
cameraImageF.getRawImageSize().getWidth() * 2);
}
dst.rewind();
dngCreator.writeByteBuffer(new FileOutputStream(rawLoggerFileF),
new Size(cameraImageF.getRawImageSize().getWidth(),
cameraImageF.getRawImageSize().getHeight()),
dst, 0);
That's of course not lovely for performance, but since DngCreator won't let you specify a row stride with the ByteBuffer interface, it's your only option.
Is there a reason you can't just increase your RAW ImageReader's maxCount to a higher one, so that you can hold on to the Image until you're done processing it?

Nexus 9 Camera2 API - YUV_420_888 vs. getOutputSizes()

I'm implementing the Camera2 API with the YUV_420_888 format on a Nexus 9. I checked the output sizes and wanted to use the largest (8MP, 3280 x 2460) size to save. However, it just appears as static lines, similar to how old TV's looked without a signal. I would like to stick with YUV_420_888 since my end goal is to save grayscale data (Y component).
I originally thought it was a camera bandwidth issue, but the same thing happened at some of the small sizes (320 x 240). None of the problems went away even when I increased frame duration and decreased the size of the preview to save on bandwidth. Some of the other sizes DID work (2048 x 1536, 1280 x 720) but I did not check all of them.
I'm starting to think getOutputSizes() may not necessarily be accurate. It gave me the same results for all other formats except RAW_SENSOR (JPEG, YUV_420_888, YV12). Has anyone encountered this or determined a solution?

Figured out the issue. I was not taking into account the rowStride of the returned pixels. So I had to run a for-loop to extract the non-padded data before saving it:
myRowStride = mImage.getPlanes()[0].getRowStride();
int iSkippedBytes = 0;
for (int i = 0; i < mStillSize.getWidth() * mStillSize.getHeight(); i++){
if (i % mStillSize.getWidth() == 0 && i != 0)
iSkippedBytes = iSkippedBytes + (myRowStride - mStillSize.getWidth());
imageBytes[i] = bytes[i + iSkippedBytes];
}

Android compute hash of a bitmap

I want to compute a SHA1 hash of different bitmaps (SHA isn't forced).
The problem is that there are some bitmaps (captchas) wich are basicly the same, but the name changes often.
I've found this:
Compute SHA256 Hash in Android/Java and C#
But it isn't the soloution i wanted.
The Bitmap.hashCode(), generates only a Integer, and when im right
Returns an integer hash code for this object. By contract, any two objects for which equals(Object) returns true must return the same hash code value. This means that subclasses of Object usually override both methods or neither method.
I dont't want a hash code of the object, i want the hashcode of the bitmap content.
Thanx!

In Android 3.1 or later (API Level 12) there is a method on Bitmap called sameAs() which will compare the pixels and return if the two represent the same image. It does this in native code so it is relatively fast.
If you must target a lower API level, you must write a method that iterates over each pixel of the two objects and see if they match. This will be a very intensive process if done in Java code, so you may consider writing a small routine using the NDK that you can call from your application to do the comparison in native code (there are Bitmap APIs in the NDK so you can easily get at the pixel buffers).
If you opt to do so in Java, getPixels() will assist you in obtaining arrays of the pixel data that you can compare between the two images.
HTH

Here is a more native way for computing Bitmap hash, using Arrays.hashCode, and bitmap.getPixels
int hash(Bitmap bitmap){
int[] buffer = new int[bitmap.getWidth(), bitmap.getHeight()];
bitmap.getPixels(buffer, 0, 0, 0, 0, bitmap.getWidth(), bitmap.getHeight());
return Arrays.hashCode(buffer);
}

The fastest solution I have found so far in kotlin:
fun Bitmap.hash(): Int {
val buffer: ByteBuffer = ByteBuffer.allocate(this.height * this.rowBytes)
this.copyPixelsToBuffer(buffer)
return buffer.hashCode()
}
nearly 100x faster than the accepted answer

You could try to write your own function using only the Pixel from the Bitmap:
public long hashBitmap(Bitmap bmp){
long hash = 31 //or a higher prime at your choice
for(int x = 0; x < bmp.getWidth(); x++){
for (int y = 0; y < bmp.getHeight(); y++){
hash *= (bmp.getPixel(x,y) + 31);
}
}
return hash;
}
if its only about comparing two images you could optimise this routine to hash just every second or x pixel

Similar problem and this worked for me (solved problem with getting a new name for a specific bitmap, so I could check if it was already stored):
fun getUniqueBitmapFileName(bitmap: Bitmap): String {
val buffer = ByteBuffer.allocate(bitmap.getByteCount())
bitmap.copyPixelsToBuffer(buffer)
return Arrays.hashCode(buffer.array()).toString()
}

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.