Neural Net Design Issues - Output Representations

Next: Early Results Up: Main Page Previous: Neural Networks for Boundary Tracing

Neural Net Design Issues - Output Representations

Our initial neural network design had one output unit, providing a single value on the range [0,1]: a low evaluation indicates a pixel is off to the left of the contour, a high evaluation indicates off to the right, and a value near 0.5 indicates the pixel is on the desired contour. The network learns an evaluation function that produces a smoothly changing value as a pixel and its neighbors change from left-of-contour values, to on-contour values, and then to right-of-contour values.

An alternative network design we studied also has one output unit, but this unit produces a low value for pixels centered on the contour, and a high value for off-contour pixels. This style of output is a feature-detector unit, where the output unit goes low on recognizing the contour, and stays high in non-contour regions.

Neural Net Design Issues - Training

Our initial experiments demonstrated the smooth-evaluation-function output unit works well when following a gradient, or ramp edge (for example, see Figures 3 and 5). Unfortunately, this is unworkable if the network is trying to learn to follow a thin line rather than a gradient. When following a line, the local neighborhoods off to the left and right side of the line are similar, and since they are expected to produce different outputs, this is no longer a functional form and thus can't be learned.

Exemplars of the contour for training are easy to derive, given an established contour in the image. Over the training set, the true extension of the curve for several pixels ahead is known, and can be added to the training set. A key issue, though, is the generation and spacing of negative exemplars. The set of possible extensions considered for each point needs to be looked at in the known training set, and appropriate non-contour training values established.

Neural Net Design Issues - Input Representations

There are a variety of options for representing the input pixel space:

raw pixel values
filtered inputs (Laplacian, Sobel, ...)
masks based on neurologically-inspired models (center-surround, directed gradient, ...)

Since neural nets can automatically extract high-order moments from the data, it may seem best to just feed raw neighborhood pixel data into the network, and let it automatically learn its best model. This is the strategy used by the path-following system ALVINN [3].

However, in our application, efficient learning is also an issue, since one goal of our systems is to keep pace with human operators. Appropriate preprocessing of inputs should be able to accelerate the contour learning.

Next: Early Results Up: Main Page Previous: Neural Networks for Boundary Tracing

stewart crawford-hines