Download Example XORMultilayerPerceptron C# project

Multilayer perceptron is a feed forward neural network that is trained through supervised learning and is used in classification of inputs into appropriate outputs. Multilayer perceptron is best thought of as a network of perceptrons separated into layers with each perceptron connected to all perceptrons in both neighboring layers. Perceptrons in the network, with the exception of input layer, use nonlinear activation function so that the multilayer perceptron can be used to solve problems where data is not linearly separable – for example XOR problem in provided sample code.

Multilayer perceptron network must contain at minimum 3 layers – a single input layer, at least one hidden layer, and an output layer. Number of perceptrons in input layer should match the number of inputs, and number of perceptrons in output layer should match the number of desired outputs. Number of hidden layers and the number of perceptrons they contain will vary based on the complexity of the task and the separability of data. The weights between the perceptrons are randomized at the creation time.

Before perceptron can be used it needs to be trained through a series of steps:

- Forward Propagation producing an output
- Backpropagation of error from the Output Layer to the Input Layer
- Weight Adjustment from Input Layer to the Output Layer

This article demonstrates training process of a relatively simple perceptron network of 3 layers with 2 input layer perceptrons, 3 hidden layer perceptrons, and 2 output layer perceptrons.

**NOTE: Math provided contains rounding errors that were left in for ease of validation**

## Forward Propagation

Forward propagation in multilayer perceptron network is a relatively simple procedure that produces a network output from given inputs. In forward propagation the outputs from the previous layer are weighed, summed up, and set as inputs for each perceptron in the current layer.

In this example the inputs are set to

`Input(I1) = 10, Input(I2) = 20`

Outputs of perceptrons in the Input Layer are always the same value as the Inputs

`Output(I1) = Input(I1), Output(I2) = Input(I2)`

Input for H1 perceptron in the Hidden Layer is calculated by summing the weighted outputs of the input layer

`Input(H1) = Output(I1) * Weight(I1->H1) + Output(I2) * Weight(I2->H1)`

Input(H1) = 10 * 0.3 + 20 * 0.2 = 7

Output for H1 is calculated using a selected activation function, in our case a sigmoid

`Output(H1) = 1 / ( 1 - Exp( -Input(H1) ) )`

Output(H1) = 1 / ( 1 - Exp( -7 ) ) = 0.999

This action is repeated forward for every layer including the output layer. Output values calculated in the Output Layer are considered the perceptron network outputs.

`Input(H2) = 10 * -0.1 + 20 * -0.2 = -5`

Output(H2) = 1 / ( 1 - Exp( -(5) ) ) = 0.007

`Input(H3) = 10 * 1.1 + 20 * -0.5 = 1`

Output(H3) = 1 / ( 1 - Exp( -(-1) ) ) = 0.731

`Input(O1) = 0.999 * 1.1 + 0.007 * 0.5 + 0.731 * 0.7 = 1.614`

Output(O1) = 1 / ( 1 - Exp( -(1.614) ) ) = 0.834

`Input(O2) = 0.999 * -0.4 + 0.007 * 0.3 + 0.731 * 0.2 = -0.251`

Output(O2) = 1 / ( 1 - Exp( -(1.614) ) ) = -0.251

## Backpropagation

Backpropagation is used in supervised learning of the multilayer perceptron network. It is specifically used to propagate the error between the calculated and the desired outputs backward throughout the network. Calculation of errors is performed starting from the Output Layer and working backward through the Hidden Layers until the Input Layer is reached. Input Layer is excluded from error calculation.

In our example network outputs are

`Output(O1) = 0.834`

Output(O2) = 0.437

The desired outputs used for learning are set to

`DesiredOutput(O1) = 1`

DesiredOutput(O2) = 0

The formula for calculating error is a derivative of the sigmoid function:

`Error = ErrorTerm * ( Output * ( 1 - Output ) )`

Output Layer `ErrorTerm`

for the perceptrons is calculated via

`ErrorTerm = DesiredOutput - Output`

Hidden Layer `ErrorTerm`

is calculated by weighting the errors of the higher layer and summing them up.

For example Output Layer O1 perceptron `Error`

is calculated by

`ErrorTerm(O1) = DesiredOutput(O1) - Output(O1)`

ErrorTerm(O1) = 1 - 0.834 = 0.166

`Error(O1) = ErrorTerm(O1) * ( Output(O1) * ( 1 - Output(O1) ) )`

Error(O1) = 0.166 * ( 0.834 * ( 1 - 0.834 ) ) = 0.02299

Hidden Layer `Error`

of H1 perceptron is calculated by

`ErrorTerm(H1) = Error(O1) * Weight(H1->O1) + Error(O2) * Weight(H1->O2)`

ErrorTerm(H1) = 0.02299 * 1.1 + -0.10766 * -0.4 = 0.06835

`Error(H1) = ErrorTerm(H1) * ( Output(H1) * ( 1 - Output(H1) ) )`

Error(H1) = 0.06835* ( 0.999 * ( 1 - 0.999 ) ) = 0.00006

## Weight Adjustment

After the errors have been calculated the network weights can be adjusted by moving from the Input Layer towards the Output Layer. The formula for adjusting the weight between the perceptrons is

`Weight = Weight + LearningRate * Error(Output Perceptron) * Output(Input Perceptron)`

Learning rate is a scalar that is typically set to a value between 0.2 and 0.8. In the example above it is set to 0.2.

To calculate the new weight for a link between Input Layer perceptron I1 and Hidden Layer perceptron H1

`Weight(I1->H1) = Weight(I1->H1) * LearningRate * Error(H1) * Output(I1)`

Weight(I1->H1) = 0.3 * 0.2 * 0.00006 * 10 = 0.30012

To calculate the new weight for a link between Hidden Layer perceptron H1 and Output Layer perceptron O1

`Weight(H1->O1) = Weight(H1->O1) * LearningRate * Error(O1) * Output(H1)`

Weight(H1->O1) = 1.1 * 0.2 * 0.2299 * 0.999 = 1.10459

### Adding Momentum

Adding momentum to the weight changes is a simple way to avoid local minimum, a problem that impedes perceptron learning. This is done by saving a percentage of the last weight change and applying it to the current weight change operation. This changes the Weight Adjustment formula to

`WeightChange = LearningRate * Error(Output Perceptron) * Output(Input Perceptron) + Delta`

Weight = Weight + WeightChange

Delta = Momentum * WeightChange

Momentum is a heuristic value between 0 and 1.

## Example Code

Download XORMultilayerPerceptron C# project

The above code is an implementation of a multilayer percetron. The project is a bit too large to post in a blog format, but the following are important classes:

Class | Description |
---|---|

Network | External interface representing the multilayer perceptron |

Layer | Represents layers and is contained by the Network class |

Perceptron | Implementation of a sigmoid perceptron and is contained by the Layer class |

Weight | Provides the wieghted links between the perceptrons |

Example usage is located below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
static void Main(string[] args) { // create network with 2 inputs in the input layer // 5 perceptrons in first and 6 in second hidden layer // and 1 output var network = new Network(2, 5, 6, 1); var trainingSet = XORTrainingItem.Create(); // teach the neural network XOR function until // all the inputs are correctly clasified while (true) { int errorCount = 0; foreach (var item in trainingSet) { // pass the training item through var outputs = network.GetOutputs(item.Inputs); // check if the result is not expected if (!item.CheckResult(outputs[0])) { // if the result is not expected, repeat the procedure network.Learn(item.Result); errorCount++; } } // only quit when there were no unexpected outputs detected if (errorCount == 0) break; } } |