Multilayer Perceptron

Download Example XORMultilayerPerceptron C# project

Multilayer Perceptron - Creation

Multilayer perceptron is a feed forward neural network that is trained through supervised learning and is used in classification of inputs into appropriate outputs. Multilayer perceptron is best thought of as a network of perceptrons separated into layers with each perceptron connected to all perceptrons in both neighboring layers. Perceptrons in the network, with the exception of input layer, use nonlinear activation function so that the multilayer perceptron can be used to solve problems where data is not linearly separable – for example XOR problem in provided sample code.

Multilayer perceptron network must contain at minimum 3 layers – a single input layer, at least one hidden layer, and an output layer. Number of perceptrons in input layer should match the number of inputs, and number of perceptrons in output layer should match the number of desired outputs. Number of hidden layers and the number of perceptrons they contain will vary based on the complexity of the task and the separability of data. The weights between the perceptrons are randomized at the creation time.

Before perceptron can be used it needs to be trained through a series of steps:

  1. Forward Propagation producing an output
  2. Backpropagation of error from the Output Layer to the Input Layer
  3. Weight Adjustment from Input Layer to the Output Layer

This article demonstrates training process of a relatively simple perceptron network of 3 layers with 2 input layer perceptrons, 3 hidden layer perceptrons, and 2 output layer perceptrons.

NOTE: Math provided contains rounding errors that were left in for ease of validation

Forward Propagation

Multilayer Perceptron - Forward Propagation
Forward propagation in multilayer perceptron network is a relatively simple procedure that produces a network output from given inputs. In forward propagation the outputs from the previous layer are weighed, summed up, and set as inputs for each perceptron in the current layer.

In this example the inputs are set to

Input(I1) = 10, Input(I2) = 20

Outputs of perceptrons in the Input Layer are always the same value as the Inputs

Output(I1) = Input(I1), Output(I2) = Input(I2)

Input for H1 perceptron in the Hidden Layer is calculated by summing the weighted outputs of the input layer

Input(H1) = Output(I1) * Weight(I1->H1) + Output(I2) * Weight(I2->H1)
Input(H1) = 10 * 0.3 + 20 * 0.2 = 7

Output for H1 is calculated using a selected activation function, in our case a sigmoid

Output(H1) = 1 / ( 1 - Exp( -Input(H1) ) )
Output(H1) = 1 / ( 1 - Exp( -7 ) ) = 0.999

This action is repeated forward for every layer including the output layer. Output values calculated in the Output Layer are considered the perceptron network outputs.

Input(H2) = 10 * -0.1 + 20 * -0.2 = -5
Output(H2) = 1 / ( 1 - Exp( -(5) ) ) = 0.007

Input(H3) = 10 * 1.1 + 20 * -0.5 = 1
Output(H3) = 1 / ( 1 - Exp( -(-1) ) ) = 0.731

Input(O1) = 0.999 * 1.1 + 0.007 * 0.5 + 0.731 * 0.7 = 1.614
Output(O1) = 1 / ( 1 - Exp( -(1.614) ) ) = 0.834

Input(O2) = 0.999 * -0.4 + 0.007 * 0.3 + 0.731 * 0.2 = -0.251
Output(O2) = 1 / ( 1 - Exp( -(1.614) ) ) = -0.251

Backpropagation

Multilayer Perceptron - Backward Propagation
Backpropagation is used in supervised learning of the multilayer perceptron network. It is specifically used to propagate the error between the calculated and the desired outputs backward throughout the network. Calculation of errors is performed starting from the Output Layer and working backward through the Hidden Layers until the Input Layer is reached. Input Layer is excluded from error calculation.

In our example network outputs are

Output(O1) = 0.834
Output(O2) = 0.437

The desired outputs used for learning are set to

DesiredOutput(O1) = 1
DesiredOutput(O2) = 0

The formula for calculating error is a derivative of the sigmoid function:

Error = ErrorTerm * ( Output * ( 1 - Output ) )

Output Layer ErrorTerm for the perceptrons is calculated via

ErrorTerm = DesiredOutput - Output

Hidden Layer ErrorTerm is calculated by weighting the errors of the higher layer and summing them up.

For example Output Layer O1 perceptron Error is calculated by

ErrorTerm(O1) = DesiredOutput(O1) - Output(O1)
ErrorTerm(O1) = 1 - 0.834 = 0.166

Error(O1) = ErrorTerm(O1) * ( Output(O1) * ( 1 - Output(O1) ) )
Error(O1) = 0.166 * ( 0.834 * ( 1 - 0.834 ) ) = 0.02299

Hidden Layer Error of H1 perceptron is calculated by

ErrorTerm(H1) = Error(O1) * Weight(H1->O1) + Error(O2) * Weight(H1->O2)
ErrorTerm(H1) = 0.02299 * 1.1 + -0.10766 * -0.4 = 0.06835

Error(H1) = ErrorTerm(H1) * ( Output(H1) * ( 1 - Output(H1) ) )
Error(H1) = 0.06835* ( 0.999 * ( 1 - 0.999 ) ) = 0.00006

Weight Adjustment

Multilayer Perceptron - Adjust Weights
After the errors have been calculated the network weights can be adjusted by moving from the Input Layer towards the Output Layer. The formula for adjusting the weight between the perceptrons is

Weight = Weight + LearningRate * Error(Output Perceptron) * Output(Input Perceptron)

Learning rate is a scalar that is typically set to a value between 0.2 and 0.8. In the example above it is set to 0.2.

To calculate the new weight for a link between Input Layer perceptron I1 and Hidden Layer perceptron H1

Weight(I1->H1) = Weight(I1->H1) * LearningRate * Error(H1) * Output(I1)
Weight(I1->H1) = 0.3 * 0.2 * 0.00006 * 10 = 0.30012

To calculate the new weight for a link between Hidden Layer perceptron H1 and Output Layer perceptron O1

Weight(H1->O1) = Weight(H1->O1) * LearningRate * Error(O1) * Output(H1)
Weight(H1->O1) = 1.1 * 0.2 * 0.2299 * 0.999 = 1.10459

Adding Momentum

Adding momentum to the weight changes is a simple way to avoid local minimum, a problem that impedes perceptron learning. This is done by saving a percentage of the last weight change and applying it to the current weight change operation. This changes the Weight Adjustment formula to

WeightChange = LearningRate * Error(Output Perceptron) * Output(Input Perceptron) + Delta
Weight = Weight + WeightChange
Delta = Momentum * WeightChange

Momentum is a heuristic value between 0 and 1.

Example Code

Download XORMultilayerPerceptron C# project

The above code is an implementation of a multilayer percetron. The project is a bit too large to post in a blog format, but the following are important classes:

Class Description
Network External interface representing the multilayer perceptron
Layer Represents layers and is contained by the Network class
Perceptron Implementation of a sigmoid perceptron and is contained by the Layer class
Weight Provides the wieghted links between the perceptrons

Example usage is located below: