Reusing a Neural Network

Reusing a Neural Network - android

I'm very new to Neural Network's, but for a Project of mine they seem to fit. The application should run on a Android phone in the end. My idea is to use TenserFlow, but I'm not sure if its a fit.
I have following Situation, My Input is a Set of Images (the order of them should not have any impact on the output). The Set size is not fixed, but in most cases lower then 10. My output for the whole set is just a binary categorisation (Pass/Fail).
I will have a Convoluted Neural Network, which calculates two outputs, an weight and a pass/fail value. Each Image is supplied seperately to this CNN, the resulting values are then aggregated into a final pass/fail value by using a weighted arithmetic mean.
My Question is, can I train such a network with TensorFlow?
I do not have the values for the CNN outputs in my training data, but only the outputs after the aggregation. Is this possible in general with a gradient oriented Framework or do I have to use a Genetic Algorithm aproach for that.

You can definitely do this with tensorflow. After you've done the intro tutorials, you should look at the CNN tutorial to learn how to implement a convolutional neural network in tensorflow.
All of the heavy lifting is already taken care of. All you have to do is use the tf.nn.conv2d() method to make the convolutional layer, and then use one of the pooling and normalization ops and so on.
If you're unfamiliar with what this means, you should read up on it, but in a nutshell, there are three unique components to a CNN. The convolutional layer scans a window through the image that looks for certain patterns and its activations are recorded in the output as a grid. It's important for learning to lower the dimensionality of the data, and that's what the pooling layer does; it takes the output of the convolutional layer and reduces its size. The normalization layer than normalizes this because having normalized data tends to improve learning.
If you only have the aggregate outputs, then you need to think of a way of coming up with reasonable proxy outputs for individual images. One thing you could do is just use the aggregate output as labels for each individual image in a set and use gradient descent to train.

Related

Plotting 3D Math Functions in Android

as part of my project, I need to plot 2D and 3D functions in android using android studio. I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Thanks in advance.

I know how to plot 2D functions but I'm struggling with 3D functions.
What is the best way to plot 3D functions? What do I need and where do I start?
I'm not looking for code or external libraries that do all the work, I just need to know what I need to learn to be able to do it myself.
Since you already understand 2D and want to advance to 3D there's a simple and non-optimal method:
Decide on how much z depth you desire:
EG: Currently your limits for x and y in your drawing functions are 1920 and 1080 (or even 4096x4096), if you want to save memory and have things a bit low resolution use a size of 1920x1080x1000 - that's going to use 1000x more memory and has the potential to increase the drawing time of some calls by 1000 times - but that's the cost of 3D.
A more practical limit is matrices of 8192,8192,16384 but be aware that video games at that resolution need 4-6GB graphic cards to work half decently (more being a bit better) - so you'll be chewing up some main memory starting at that size.
It's easy enough to implement a smaller drawing space and simply increase your allocation and limit variables later, not only does that test that future increases will go smoothly but it allows everything to run faster while you're ironing the bugs out.
Add a 3rd dimension to the functions:
EG: Instead of a function that is simply line_draw(x,y) change it to line_draw(x,y,z), use the global variable z_limit (or whatever you decide to name it) to test that z doesn't exceed the limit.
You'll need to decide if objects at the maximum distance are a dot or not visible. While testing having all z's past the limit changed to the limit value (thus making them a visible dot) is useful. For the finished product once it goes past the limit that you are implementing it's best that it isn't visible.
Start by allocating the drawing buffer and implementing a single function first, there's no point (and possibly great frustration) changing everything and hoping it's just going to work - it should but if it doesn't you'll have a lot on your plate if there's a common fault in every function.
Once you have this 3D buffer filled with an image (start with just a few 3D lines, such as a full screen sized "X" and "+") you draw to your 2D screen X,Y by starting at the largest value of Z first (EG: z=1000). Once you finish that layer decrement z and continue, drawing each layer until you reach zero, the objects closest to you.
That's the simplest (and slowest) way to make certain that closest objects obscure the furthest objects.
Now does it look OK? You don't want distant objects the same size (or larger) than the closest objects, you want to make certain that you scale them.
The reason to choose numbers such as 8192 is because after writing your functions in C (or whichever language you choose) you'll want to optimize them with several versions each, written in assembly language, optimized for specific CPUs and GPU architectures. Without specifically optimized versions everything will be extremely slow.
I understand that you don't want to use a library but looking at several should give you an idea of the work involved and how you might implement your own. No need to copy, improve instead.
There are similar questions and answers that might fill in the blanks:
Reddit - "I want to create a 3D engine from scratch. Where do I start?"
Davrous - "Tutorial series: learning how to write a 3D soft engine from scratch in C#, TypeScript or JavaScript"
GameDev.StackExchange - "How to write my own 3-D graphics library for Windows? [closed]"

What is a good approach to implementing a real time raster plot on Android?

What is a suggested implementation approach for a real time scrolling raster plot on Android?
I'm not looking for a full source code dump or anything, just some implementation guidance or an outline on the "what" and "how".
what: Should I use built in Android components for drawing or go straight to OpenGL ES2? Or maybe something else I haven't heard of. This is my first bout with graphics of any sort, but I'm not afraid to get a little dirty with OpenGL.
how: Given a certain set of drawing components how would I approach implementation? I feel like the plot is basically a texture that needs updating and translating.
Background
I need do design an Android application that as part of its functionality displays a real time scrolling raster plot (i.e. a spectrogram or waterfall plot). The data will first be coming out of libUSB and passing through native C++ where signal processing will happen. Then, I assume, the plotting can happen either in C++ or Kotlin depending on what is easier and whether passing the data over the JNI is a big enough bottleneck or not.
My main concern is drawing the base raster itself in real time and not so much extra things such as zooming, axes, or other added functionality. I'm trying to start simple.
Constraints
I'm limited to free software.
Platform: Android version 7.0+ on modern device
GPU hardware acceleration is preferred as the CPU will be doing a good amount of number crunching bringing streaming data to the plot.
Thanks in advance!

Renderscript: Create a vector of structs

I'm writing a small piece of Renderscript to dynamically take an image and sort the pixels into 'buckets' based on each pixel's RGB values. The number of buckets could vary, so my instinct would be to create an arraylist. This isn't possible within Renderscript, obviously, so I was wondering what the approach to creating a dynamic list of structs within the script. Any help greatly appreciated.

There's no clear answer to this. The problem is that dynamic memory management is anathema to platforms like RenderScript--it's slow, implies a lot of things about page tables and TLBs that may not be easy to guarantee from a given processor at an arbitrary time, and is almost never an efficient way to do what you want to do.
What the right alternative is depends entirely on what you're doing with the buckets after they're created. Do you need everything categorized without sorting everything into buckets? Just create a per-pixel mask (or use the alpha channel) and store the category alongside the pixel data. Do you have some upper bound on the size of each bucket? Allocate every bucket to be that size.
Sorry that this is open-ended, but memory management is one of those things that brings high-performance code to a screeching halt. Workarounds are necessary, but the right workaround varies in every case.

I'll try to answer your goal question of classifying pixel values, and not your title question of creating a dynamically-sized list of structs.
Without knowing much about your algorithm, I will frame my answer using one of the two algorithms:
RGB Joint Histogram
Does not use neighboring pixel values.
Connected Component
Requires neighboring pixel values.
Requires a supporting data structure called "Disjoint set".
Common advice.
Both algorithms require a lot of memory per worker thread. Also, both algorithms are poorly adapted to GPU because they require some kind of random memory access (Note). Therefore, it is likely that both algorithms will end up being executed on the CPU. It is therefore a good idea to reduce the number of "threads" to avoid multiplying the memory requirement.
Note: Non-coalesced (non-sequential) memory access - reads, writes, or both.
RGB Joint Histogram
The best way is to compute a joint color histogram using Renderscript, and then run your classification algorithm on the histogram instead (presumably on the CPU). After that, you can perform a final step of pixel-wise label assignment back in Renderscript.
The whole process is almost exactly the same as Tim Murray's Renderscript presentation in Google I/O 2013.
Link to recorded session (video)
Link to slides (PDF)
The joint color histogram will have to have hard-coded size. For example, a 32x32x32 RGB joint histogram uses 32768 histogram bins. This allows 32 levels of shades for each channel. The error per channel would be +/- 2 levels out of 256 levels.
Connected Component
I have successfully implemented multi-threaded connected component labeling on Renderscript. Note that my implementation is limited to execution on CPU; it is not possible to execute my implementation on the GPU.
Prerequisites.
Understand the Union-Find algorithm (and its various theoretical parts, such as path-compression and ranking) and how connected-component labeling benefits from it.
Some design choices.
I use a 32-bit integer array, same size as the image, to store the "links".
Linking occurs in the same way as Union-Find, except that I do not have the benefit of ranking. This means the tree may become highly unbalanced, and therefore the path length may become long.
On the other hand, I perform path-compression at various steps of the algorithm, which counteracts the risk of suboptimal tree merging by shortening the paths (depths).
One small but important implementation detail.
The values stored in the integer array is essentially an encoding of the "(x, y)" coordinates to (i) itself, if the pixel is its own root, or (ii) a different pixel which has the same label as the current pixel.
Steps.
The multi-threaded stage.
Divide the image into small tiles.
Inside each tile, compute the connected components, using label values local to that tile.
Perform path compression inside each tile.
Convert the label values into global coordinates and copy the tile's labels into the main result matrix.
The single-threaded stage.
Horizontal stitching.
Vertical stitching.
A global round of path-compression.

Suggestions for digit recognition

I'm writing an Android app to extract a Sudoku puzzle from a picture. For each cell in the 9x9 Sudoku grid, I need to determine whether it contains one of the digits 1 through 9 or is blank. I start off with a Sudoku like this:
I pre-process the Sudoku using OpenCV to extract black-and-white images of the individual digits and then put them through Tesseract. There are a couple of limitations to Tesseract, though:
Tesseract is large, contains lots of functionality I don't need (I.e. Full text recognition), and requires English-language training data in order to function, which I think has to go onto the device's SD card. At least I can tell it to only look for digits using tesseract.setVariable("tessedit_char_whitelist", "123456789");
Tesseract often misinterprets a single digits as a string of digits, often containing newlines. It also sometimes just plain gets it wrong. Here are a few examples from the above Sudoku:
I have three questions:
Is there any way I can overcome the limitations of Tesseract?
If not, what is a useful, accurate method to detect individual digits (not k-nearest neighbours) that would be feasible to implement on Android - this could be a free library or a DIY solution.
How can I improve the pre-processing to target that method? One possibility I've considered is using a thinning algorithm, as suggested by this post, but I'm not going to bother implementing it unless it will make a difference.

I took a class with one of the computer vision superstars who was/is at the top of the digit recognition algorithm rankings. He was really adamant that the best way to do digit recognition is...
1. Get some hand-labeled training data.
2. Run Histogram of Oriented Gradients (HOG) on the training data, and produce one
long, concatenated feature vector per image
3. Feed each image's HOG features and its label into an SVM
4. For test data (digits on a sudoku puzzle), run HOG on the digits, then ask
the SVM classify the HOG features from the sudoku puzzle
OpenCV has a HOGDescriptor object, which computes HOG features. Look at this paper for advice on how to tune your HOG feature parameters. Any SVM library should do the job...the CvSVM stuff that comes with OpenCV should be fine.
For training data, I recommend using the MNIST handwritten digit database, which has thousands of pictures of digits with ground-truth data.
A slightly harder problem is to draw a bounding box around digits that appear in nature. Fortunately, it looks like you've already found a strategy for doing bounding boxes. :)

Easiest thing is to use Normalized Central Moments for digit recognition.
If you have one font (or very similar fonts it works good).
See this solution: https://github.com/grzesiu/Sudoku-GUI
In core there are things responsible for digit recognition, extraction, moments training.
First time application is run operator must provide information what number is seen. Then moments of image (extracted square roi) are assigned to number (operator input). Application base on comparing moments.
Here first youtube movie shows how application works: http://synergia.pwr.wroc.pl/2012/06/22/irb-komunikacja-pc/

Perfect object to be recognized with OpenCV

I have an application where I want to track 2 objects at a time that are rather small in the picture.
This application should be running on Android and iPhone, so the algorithm should be efficient.
For my customer it is perfectly fine if we deliver some patterns along with the software that are attached to the objects to be tracked to have a well-recognizable target.
This means that I can make up a pattern on my own.
As I am not that much into image processing yet, I don't know which objects are easiest to recognize in a picture even if they are rather small.
Color is also possible, although processing several planes separately is not desired because of the generated overhead.
Thank you for any advice!!
Best,
guitarflow

If I get this straight, your object should:
Be printable on an A4
Be recognizeable up to 4 meters
Rotational invariance is not so important (I'm making the assumption that the user will hold the phone +/- upright)
I recommend printing a large checkboard and using a combination of color-matching and corner detection. Try different combinations to see what's faster and more robust at difference distances.
Color: if you only want to work on one channel, you can print in red/green/blue*, and then work only on that respective channel. This will already filter a lot and increase contrast "for free".
Otherwise, a histogram backprojection is in my experience quite fast. See here.
Also, let's say you have only 4 squares with RGB+black (see image), it would be easy to get all red contours, then check if it has the correct neighbouring colors: a patch of blue to it's right and a patch of green below it, both of roughly the same area. This alone might be robust enough, and is equivalent to working on 1 channel since for each step you're only accessing one specific channel (search for contours in red, check right in blue, check below in green).
If you're getting a lot of false-positives, you can then use corners to filter your hits. In the example image, you have 9 corners already, in fact even more if you separate channels, and if it isn't enough you can make a true checkerboard with several squares in order to have more corners. It will probably be sufficient to check how many corners are detected in the ROI in order to reject false-positives, otherwise you can also check that the spacing between detected corners in x and y direction is uniform (i.e. form a grid).
Corners: Detecting corners has been greatly explored and there are several methods here. I don't know how efficient each one is, but they are fast enough, and after you've reduced the ROIs based on color, this should not be an issue.
Perhaps the simplest is to simply erode/dilate with a cross to find corners. See here .
You'll want to first threshold the image to create a binary map, probably based on color as metnioned above.
Other corner detectors such as Harris detector are well documented.
Oh and I don't recommend using Haar-classifiers. Seems unnecessarily complicated and not so fast (though very robust for complex objects: i.e. if you can't use your own pattern), not to mention the huge amount of work for training.

Haar training is your friend mate.
This tutorial should get you started: http://note.sonots.com/SciSoftware/haartraining.html
Basically you train something called a classifier based on sample images (2000 or so of the object you want to track). OpenCV already has the tools required to build these classifiers and functions in the library to detect objects.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.