How to draw bounding boxes around classified objects using tensorflow lite? - android

I would like to know if it is possible to draw bounding boxes using Tensorflow lite. I have been able to draw them using tensorflow-android in version 1.12 but I have no example for drawing bounding boxes in tensorflow lite.
In the code below you can see my way in tensorflow-android 1.12 to get the outputLocations which is working.
inferenceInterface.run(outputNames, logStats);
LOGGER.d("End Section run " + System.currentTimeMillis());
Trace.endSection();
// Copy the output Tensor back into the output array.
Trace.beginSection("fetch");
LOGGER.d("Begin Section fetch " + System.currentTimeMillis());
outputLocations = new float[MAX_RESULTS * 4];
outputScores = new float[MAX_RESULTS];
outputClasses = new float[MAX_RESULTS];
outputNumDetections = new float[1];
inferenceInterface.fetch(outputNames[0], outputLocations);
It would be great if you could tell me how to get outputLocations using runInference() from trensorflow-lite instead.

If you use object detection models such as the following: http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18.tar.gz
The output tensors already have output locations, scores, classes etc.
You can follow example similar to Android Java sample app:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java

Related

squeeze and unsqueeze tensor in Android (Java)

I am trying to port my torchscript over from Python to Android (Java). Currently, I ran into a problem of trying to squeeze/ unsqueeze my input and output tensor in Android. In Python, here's how I did it:
**tensor = torch.Tensor(image_n.transpose(2, 0, 1).astype('float32')).unsqueeze(0)
tensor = tensor.to(device)**
output tensor:
**with torch.no_grad():
prob = model.forward(tensor)
prediction = prob.squeeze().numpy().astype('uint8')**
In Android, I managed to input and output my tensors following the Pytorch tutorial as such:
**final Tensor inputTensor = TensorImageUtils.bitmapToFloat32Tensor(mBitmap,
TensorImageUtils.TORCHVISION_NORM_MEAN_RGB, TensorImageUtils.TORCHVISION_NORM_STD_RGB);
final float[] inputs = inputTensor.getDataAsFloatArray();**
and the output tensor:
**Map<String, IValue> outTensors = mModule.forward(IValue.from(inputTensor)).toDictStringKey();**
The problem is, without squeeze and unsqueeze, although the code managed to run, the dimensions are wrong and I didnt manage to get the correct output.
Does anyone know if there is actually squeeze/unsqueeze function for pytorch in Android?
Edit: just to add on, my input tensor has the size of (3,224,416) (an RGB image), and my out put tensor has the size of (1,1,224,416) (a grayscale image)

Tensorflow Model output weights have different values

I am developing an Android application which requires an ML model integration.For it I am using TensorFlow lite for deployment.I am using Custom Model based Siamese Network for output and the output shape is [1 128].When I infer the tf lite model in python on Google Colab the output [1 128] numbers are different from the one being produced on my Android device.THe input image is same on both inferences and also the input and output shapes but still I am getting different output vectors on my Android Phone and Python TFlite model.I am using Firebase Machine Learning.
Android Code
val interpreter=Interpreter(model)
val imageBitmap= Bitmap.createScaledBitmap(BitmapFactory.decodeFileDescriptor(contentResolver.openFileDescriptor(fileUri,"r")?.fileDescriptor),256,256,true)
val inputImage=ByteBuffer.allocateDirect(256*256*3*4).order(ByteOrder.nativeOrder())
for(ycord in 0 until 256){
for(xcord in 0 until 256){
val pixel=imageBitmap.getPixel(xcord,ycord)
inputImage.putFloat(Color.red(pixel)/1.0f)
inputImage.putFloat(Color.green(pixel)/1.0f)
inputImage.putFloat(Color.blue(pixel)/1.0f)
}
}
imageBitmap.recycle()
val modelOutput=ByteBuffer.allocateDirect(outputSize).order(ByteOrder.nativeOrder())
interpreter.run(inputImage,modelOutput)
modelOutput.rewind()
val probs=modelOutput.asFloatBuffer()
success(ImageProcessResult.Success(probs))
Kindly help me.I need it soon.Any help is appreciated
You are resizing the bitmap to [256,256] in the Android platform.
Even the slightest change in input vectors would change the output vector. When you resize the bitmaps, you change the input vector. However, if the model is general enough the final result which would be argmax of the output vector (in classification) would be the same.
In the case of Siamese, I believe it won't affect the final result (similarity score) in a meaningful way if the model is not overfitted.

Image pre-processing parameters for tensorflow models

I have a basic question about how to determine the image pre-processing parameters like - "IMAGE_MEAN", "IMAGE_STD" for various tensorflow pre-trained models. The Android sample applications for TensorFlow provides these parameters for a certain inception_v3 model in the ClassifierActivity.java (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/ClassifierActivity.java) as shown below -
"If you want to use a model that's been produced from the TensorFlow for Poets codelab, you'll need to set IMAGE_SIZE = 299, IMAGE_MEAN = 128, IMAGE_STD = 128"
How do I determine these parameters for other TF models
Also, while converting the TF model to CoreML model, to be used on iOS, there are additional image pre-processing parameters that need to be specified (like - red_bias, green_bias, blue_bias and image_scale) as shown in the code segment below. The below parameters are for inception_v1_2016.pb model. If I want to use another pre-trained model like - ResNet50, MobileNet, etc how do I determine these parameters
tf_converter.convert(tf_model_path = 'inception_v1_2016_08_28_frozen.pb',
mlmodel_path = 'InceptionV1.mlmodel',
output_feature_names = ['InceptionV1/Logits/Predictions/Softmax:0'],
image_input_names = 'input:0',
class_labels = 'imagenet_slim_labels.txt',
red_bias = -1,
green_bias = -1,
blue_bias = -1,
image_scale = 2.0/255.0
)
Any help will be greatly appreciated
Unfortunately, the preprocessing requirements of various ImageNet models are still under documented. ResNet and VGG models both use the same preprocessing parameters. You can find biases for each of the color channels here:
https://github.com/fchollet/deep-learning-models/blob/master/imagenet_utils.py#L11
The preprocessing for Inception_V3, MobileNet, and other models can be found in the individual model files of this repo: https://github.com/fchollet/deep-learning-models
When converting to Core ML you always need to specify preprocessing biases on a per channel basis. So in the case of a VGG-type preprocessing, you can just copy each channel's biases directly from the code linked to above. It's super important to note that the biases are applied (added) BEFORE scaling. You can read more about setting the proper values here: http://machinethink.net/blog/help-core-ml-gives-wrong-output/
The conversion code you posted looks good for MobileNet or Inception_V3 models, but would not work for VGG or ResNet. For those you'd need:
tf_converter.convert(...
red_bias=-123.68,
green_bias=-116.78,
blue_bias=-103.94
)
No scaling is required.

Generate and export point cloud from Project Tango

After some weeks of waiting I finally have my Project Tango. My idea is to create an app that generates a point cloud of my room and exports this to .xyz data. I'll then use the .xyz file to show the point cloud in a browser! I started off by compiling and adjusting the point cloud example that's on Google's github.
Right now I use the onXyzIjAvailable(TangoXyzIjData tangoXyzIjData) to get a frame of x y and z values; the points. I then save these frames in a PCLManager in the form of Vector3. After I'm done scanning my room, I simple write all the Vector3 from the PCLManager to a .xyz file using:
OutputStream os = new FileOutputStream(file);
size = pointCloud.size();
for (int i = 0; i < size; i++) {
String row = String.valueOf(pointCloud.get(i).x) + " "
+ String.valueOf(pointCloud.get(i).y) + " "
+ String.valueOf(pointCloud.get(i).z) + "\r\n";
os.write(row.getBytes());
}
os.close();
Everything works fine, not compilation errors or crashes. The only thing that seems to be going wrong is the rotation or translation of the points in the cloud. When I view the point cloud everything is messed up; the area I scanned is not recognizable, though the amount of points is the same as recorded.
Could this have to do something with the fact that I don't use PoseData together with the XyzIjData? I'm kind of new to this subject and have a hard time understanding what the PoseData exactly does. Could someone explain it to me and help me fix my point cloud?
Yes, you have to use TangoPoseData.
I guess you are using TangoXyzIjData correctly; but the data you get this way is relative to where the device is and how the device is tilted when you take the shot.
Here's how i solved this:
I started from java_point_to_point_example. In this example they get the coords of 2 different points with 2 different coordinate system and then write those coordinates wrt the base Coordinate frame pair.
First of all you have to setup your exstrinsics, so you'll be able to perform all the transformations you'll need. To do that I call mExstrinsics = setupExtrinsics(mTango) function at the end of my setTangoListener() function. Here's the code (that you can find also in the example I linked above).
private DeviceExtrinsics setupExtrinsics(Tango mTango) {
//camera to IMU tranform
TangoCoordinateFramePair framePair = new TangoCoordinateFramePair();
framePair.baseFrame = TangoPoseData.COORDINATE_FRAME_IMU;
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_COLOR;
TangoPoseData imu_T_rgb = mTango.getPoseAtTime(0.0,framePair);
//IMU to device transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_DEVICE;
TangoPoseData imu_T_device = mTango.getPoseAtTime(0.0,framePair);
//IMU to depth transform
framePair.targetFrame = TangoPoseData.COORDINATE_FRAME_CAMERA_DEPTH;
TangoPoseData imu_T_depth = mTango.getPoseAtTime(0.0,framePair);
return new DeviceExtrinsics(imu_T_device,imu_T_rgb,imu_T_depth);
}
Then when you get the point Cloud you have to "normalize" it. Using your exstrinsics is pretty simple:
public ArrayList<Vector3> normalize(TangoXyzIjData cloud, TangoPoseData cameraPose, DeviceExtrinsics extrinsics) {
ArrayList<Vector3> normalizedCloud = new ArrayList<>();
TangoPoseData camera_T_imu = ScenePoseCalculator.matrixToTangoPose(extrinsics.getDeviceTDepthCamera());
while (cloud.xyz.hasRemaining()) {
Vector3 rotatedV = ScenePoseCalculator.getPointInEngineFrame(
new Vector3(cloud.xyz.get(),cloud.xyz.get(),cloud.xyz.get()),
camera_T_imu,
cameraPose
);
normalizedCloud.add(rotatedV);
}
return normalizedCloud;
}
This should be enough, now you have a point cloud wrt you base frame of reference.
If you overimpose two or more of this "normalized" cloud you can get the 3D representation of your room.
There is another way to do this with rotation matrix, explained here.
My solution is pretty slow (it takes around 700ms to the dev kit to normalize a cloud of ~3000 points), so it is not suitable for a real time application for 3D reconstruction.
Atm i'm trying to use Tango 3D Reconstruction Library in C using NDK and JNI. The library is well documented but it is very painful to set up your environment and start using JNI. (I'm stuck at the moment in fact).
Drifting
There still is a problem when I turn around with the device. It seems that the point cloud spreads out a lot.
I guess you are experiencing some drifting.
Drifting happens when you use Motion Tracking alone: it consist of a lot of very small error in estimating your Pose that all together cause a big error in your pose relative to the world. For instance if you take your tango device and you walk in a circle tracking your TangoPoseData and then you draw you trajectory in a spreadsheet or whatever you want you'll notice that the Tablet will never return at his starting point because he is drifting away.
Solution to that is using Area Learning.
If you have no clear ideas about this topic i'll suggest watching this talk from Google I/O 2016. It will cover lots of point and give you a nice introduction.
Using area learning is quite simple.
You have just to change your base frame of reference in TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION. In this way you tell your Tango to estimate his pose not wrt on where it was when you launched the app but wrt some fixed point in the area.
Here's my code:
private static final ArrayList<TangoCoordinateFramePair> FRAME_PAIRS =
new ArrayList<TangoCoordinateFramePair>();
{
FRAME_PAIRS.add(new TangoCoordinateFramePair(
TangoPoseData.COORDINATE_FRAME_AREA_DESCRIPTION,
TangoPoseData.COORDINATE_FRAME_DEVICE
));
}
Now you can use this FRAME_PAIRS as usual.
Then you have to modify your TangoConfig in order to issue Tango to use Area Learning using the key TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION. Remember that when using TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION you CAN'T use learningmode and load ADF (area description file).
So you cant use:
TangoConfig.KEY_BOOLEAN_LEARNINGMODE
TangoConfig.KEY_STRING_AREADESCRIPTION
Here's how I initialize TangoConfig in my app:
TangoConfig config = tango.getConfig(TangoConfig.CONFIG_TYPE_DEFAULT);
//Turning depth sensor on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DEPTH, true);
//Turning motiontracking on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_MOTIONTRACKING,true);
//If tango gets stuck he tries to autorecover himself.
config.putBoolean(TangoConfig.KEY_BOOLEAN_AUTORECOVERY,true);
//Tango tries to store and remember places and rooms,
//this is used to reduce drifting.
config.putBoolean(TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION,true);
//Turns the color camera on.
config.putBoolean(TangoConfig.KEY_BOOLEAN_COLORCAMERA, true);
Using this technique you'll get rid of those spreads.
PS
In the Talk i linked above, at around 22:35 they show you how to port your application to Area Learning. In their example they use TangoConfig.KEY_BOOLEAN_ENABLE_DRIFT_CORRECTION. This key does not exist anymore (at least in Java API). Use TangoConfig.KEY_BOOLEAN_DRIFT_CORRECTION instead.

Graph/Plots on ANDROID

I am new to android and trying to learn creation or plotting of graphs in android. I've come across 2 libraries:
GraphView
AndroidPlot.
My intent would be to receive some sound file and plot it on a graph. So, for this purpose which library would be better. Also I wanna know, where I can see the complete implementation or definitions of these libraries i.e. the structure and code for the API's used in the above libraries.
Also I have tried some sample codes available on net. But I'm looking for a more sophiticated code which I could develop on eclipse ADT and hence can learn something out of it.
My intent would be to receive some sound file and plot it on a graph
Neither library does this by default. The libraries are used to plot a graph given a set of data points. Getting the data points from the sound file is up to you.
So, for this purpose which library would be better.
Either library should be fine once you get the data points.
Also I wanna know, where I can see the complete implementation or definitions of these libraries i.e. the structure and code for the API's used in the above libraries.
Check out the sources for GraphView and AndroidPlot.
I have used Achartengine some times and it works great. I modified it without difficulties.
If You are drawing simple Line Graph using canvas to draw the graph.
Use AndroidPlot. This code draw the content of Vector(in this case filled of zeros). You only have to pass the info of the wav file to the vector. And you can check this thread for the reading issue.
Android: Reading wav file and displaying its values
plot = (XYPlot) findViewById(R.id.Grafica);
plot.setDomainStep(XYStepMode.INCREMENT_BY_VAL, 0.5);
plot.setRangeStep(XYStepMode.INCREMENT_BY_VAL, 1);
plot.getGraphWidget().getGridBackgroundPaint().setColor(Color.rgb(255, 255, 255));
plot.getGraphWidget().getDomainGridLinePaint().setColor(Color.rgb(255, 255, 255));
plot.getGraphWidget().getRangeGridLinePaint().setColor(Color.rgb(255, 255, 255));
plot.getGraphWidget().setDomainLabelPaint(null);
plot.getGraphWidget().setRangeLabelPaint(null);
plot.getGraphWidget().setDomainOriginLabelPaint(null);
plot.getGraphWidget().setRangeOriginLabelPaint(null);
plot.setBorderStyle(BorderStyle.NONE, null, null);
plot.getLayoutManager().remove(plot.getLegendWidget());
plot.getLayoutManager().remove(plot.getDomainLabelWidget());
plot.getLayoutManager().remove(plot.getRangeLabelWidget());
plot.getLayoutManager().remove(plot.getTitleWidget());
//plot.getBackgroundPaint().setColor(Color.WHITE);
//plot.getGraphWidget().getBackgroundPaint().setColor(Color.WHITE);
plot.setRangeBoundaries(-25, 25, BoundaryMode.FIXED);// check that, these //boundaries wasn't for audio files
InicializarLasVariables();
for(int i=0; i<min/11;i++){
DatoY=0;
Vector.add(DatoY);
}
XYSeries series = new SimpleXYSeries(Vector, SimpleXYSeries.ArrayFormat.Y_VALS_ONLY,"");
LineAndPointFormatter seriesFormat = new LineAndPointFormatter(Color.rgb(0, 0, 0), 0x000000, 0x000000, null);
plot.clear();
plot.addSeries(series, seriesFormat);

Categories

Resources