Technology

The key to using neural networks effectively is to follow good data modeling practices. The basic steps include preparing data, selecting input variables, and training the neural network. After a network is trained, it is saved as a file on the operating system so that it can later be used, either by the product that created the network or by a custom application. Since a neural network learns from data, of course there must be sufficient data available for training the network, and the data must also be representative of the entire problem space. In other words, there must be examples of relationships which span the range of inputs that might be found when the neural network is placed into service.

Note that the training data does NOT have to contain every possible set of relationships (in theory, an infinite number of possibilities). A "good" neural network learns from training data how to generalize; it can then produce an appropriate output when new data that was not used in training is processed. However, training data does have to be diverse - each input/output pair should represent a different area of the problem space.

Preparing Data

The first step in developing empirical models is to obtain and prepare data. Typically this involves ensuring that there are no missing values, and then plotting data to get a quick view of possible trends and outliers.

During the data analysis phase of Predict Engine processing, missing field values are replaced by an appropriate value, depending on whether the field is numeric or alphanumeric. In addition, fields that would not be useful for modeling (e.g., sequence numbers that increment by a fixed value) are eliminated from further consideration.

Although NeuralSight is based on the Predict Engine, NeuralSight pre-processes data files and ignores any records which contain missing or invalid field values.

When using Professional II/PLUS to develop neural networks, any required dataset preparation must be performed before using the dataset for training.

Data Noise

Most real-world systems yield data with varying degrees of noise, which may be the result of inefficiencies in physical systems, or the result of widely varying preferences and beliefs related to human influences. The Predict Engine implements two fundamental learning rules; the level of noise in data that you specify determines which rule is used during neural network training. For most modeling, the learning rule is an adaptive gradient rule. However, if you specify Very Noisy data, a Kalman Filter rule is employed. These options are also available in NeuralSight.

When Professional II/PLUS is used to create a neural network, there are several learning rules available, depending on the type of neural network you choose to construct.

Data Transformations

The fundamental goal of empirical modeling is to map input values to output values. In situations which require supervised learning, a better mapping can be achieved by mathematically transforming raw values in ways that result in better matches between input value distributions and output value distributions. The Predict Engine incorporates a variety of transformations, such as log, exponent, square, square root, and others, that are applied to raw values in order to obtain a better correlation between input value distributions and output value distributions. The number of transformations applied depends on the option level selected. Data transformation options are also selectable in NeuralSight.

When Professional II/PLUS is used to create a neural network, any data transformations that might be appropriate must be applied to data before it is used to train models.

Selecting Input Variables

A critical element of any empirical modeling is the choice of inputs for the model. Very often there are inputs which are not fundamentally relevant, or which may even be detrimental to model performance (i.e., the inputs act as noise). The Predict Engine employs a genetic algorithm (GA) optimizer to identify the inputs which are most likely to produce the best model. Essentially, the GA explores different combinations of inputs and the effect they have on model performance (either a linear regression model or a limited neural network model serves as the fitness function). The set of inputs that occur most frequently in the best models are the inputs ultimately used to train the neural network. While the Predict Engine defaults to using the GA for variable selection, the feature can be turned off (in Predict and in NeuralSight) to permit performance comparisons with neural networks created using all available inputs.

When Professional II/PLUS is used to create a neural network, appropriate inputs for the neural network must be identified manually in what is essentially a trial-and-error process.

Training the Neural Network

Neural networks are trained to learn patterns and relationships in data. There are two fundamental types of training: supervised and unsupervised.

In unsupervised training, the relationships in data are not known, so the neural network identifies relationships through some type of metric - typically a distance metric. Unsupervised training is also referred to as competititve learning, since processing elements in the network "compete" to win (i.e., if the metric is a distance metric, the PE which is closest to the input data record is the winner). Each training record consists of record attributes - there are no output fields. The NeuralWorks Predict engine implements a Self-Organizing Map for clustering problems which require unsupervised training.

In supervised training, historical data provide a set of input-output pairs which reflect the relationships between inputs and outputs. Each record in the training dataset consists of a set of inputs and the associated output (or outputs). The NeuralWorks Predict Engine implements a feed-forward neural network and back-propagation for problems which require supervised training (prediction or classification). Depending on the type of problem, the Predict Engine chooses an appropriate internal objective function to optimize. For prediction problems, the objective function is RMS Error. For two-class classification problems, the objective function is a Bernoulli distribution. For multi-class classification problems the objective function is Relative Entropy.

When Professional II/PLUS is used to create a neural network, you specify the type of network, the type of training, and an objective function.

Network Architecture

The architecture of a neural network is defined by its number of hidden layers and hidden units (also known as processing elements, or PEs).

A typical feed-forward neural network consists of an input layer, a small number (typically 1 to 4) of hidden layers, each with some number of hidden unit processing elements, and an output layer. The input layer connects to the available input data; the output layer produces a value that represents the output of the neural network - which can be used as the basis for decisions. Hidden unit processing elements contain differentiable non-linear functions which are the basis for learning.

This requirement to specify the number of hidden layers and number of hidden units per layer in traditional neural networks makes applying neural network technology somewhat difficult to implement using first generation tools. The Predict Engine eliminates the need to specify the number of hidden layers and hidden unit processing elements in a feed-forward neural network through use of a technique named Cascade Correlation, developed by Scott Fahlman at Carnegie Mellon University. In the initial stages of Cascade Correlation training, the neural network has no hidden units - inputs are connected to outputs directly (with weights on each connection). As training progresses (i.e., as weights are adjusted), periodically the performance of the network is evaluated, and if performance is not satisfactory hidden units are added and new connections established. In this way, the network architecture is dynamically constructed, with hidden units added only if they improve performance.

A typical Self-Organizing Map (SOM) neural network consists of an input layer that is fully connected to the Kohonen layer, which contains processing elements. The processing units usually comprise a two-dimensional plane, although the Predict Engine supports creating SOMs with up to 4 conceptual dimensions. Before training can commence, you must specify the number of dimensions and the number of processing elements in each dimension. During training the weights of the processing element which is closest to the training data record currently being processed are adjusted. In the classical Self-Organizing Map implementation, training terminates after a set number of iterations through the training dataset.

When you employ Professional II/PLUS to create a feed-forward neural network, the number of hidden layers and the number of hidden units in each layer must be specified before network training can commence. While this permits very fine-grain control over network architectures and learning rules, the over-all process can be time consuming and prone to errors. If you use Professional II/PLUS to create a SOM, you must also specify the number of dimensions and the number of processing elements in each dimension.