🌑

☀️

Stephen's Blog

Home Archives About

Backpropagation in Neural Networks - The Engine Behind Deep Learning

Stephen Cheng

Intro

Backpropagation (short for “Backward Propagation of Errors”) is a method used to train artificial neural networks. Its goal is to reduce the difference between the model’s predicted output and the actual output by adjusting the weights and biases in the network. In this article, we will explore what backpropagation is, why it is crucial in machine learning, and how it works.

What is Backpropagation?

Introduced in the 1970s, the backpropagation algorithm is the method for fine-tuning the weights of a neural network with respect to the error rate obtained in the previous iteration or epoch, and this is a standard method of training artificial neural networks, particularly feed-forward networks. You can think of it as a feedback system where, after each round of training or ‘epoch,’ the network reviews its performance on tasks. It calculates the difference between its output and the correct answer, known as the error.

Backpropagation works iteratively, minimizing the cost function by adjusting weights and biases. In each epoch, the model adapts these parameters, reducing loss by following the error gradient. Backpropagation often utilizes optimization algorithms like gradient descent or stochastic gradient descent. The algorithm computes the gradient using the chain rule from calculus, allowing it to effectively navigate complex layers in the neural network to minimize the cost function.

Why is Backpropagation Important?

Backpropagation plays a critical role in how neural networks improve over time. Here’s why:

Efficient Weight Update: It computes the gradient of the loss function with respect to each weight using the chain rule, making it possible to update weights efficiently.
Scalability: The backpropagation algorithm scales well to networks with multiple layers and complex architectures, making deep learning feasible.
Automated Learning: With backpropagation, the learning process becomes automated, and the model can adjust itself to optimize its performance.

How Does Backpropagation Work?

There are overall four main steps in the backpropagation algorithm:

The Forward Pass
Errors Calculation (The Loss Function)
The Backward Pass
Weights Update (Optimizer/Optimization Algorithm)

Next, let’s understand each of these steps from the above animation.

The Forward Pass

In the forward pass, the input data is fed into the input layer. These inputs, combined with their respective weights, are passed to hidden layers. For example, in a network with two hidden layers (h1 and h2 as shown in Fig.), the output from h1 serves as the input to h2. Before applying an activation function, a bias is added to the weighted inputs. Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which returns the input if it’s positive and zero otherwise. This adds non-linearity, allowing the model to learn complex relationships in the data. Finally, the outputs from the last hidden layer are passed to the output layer, where an activation function, such as softmax, converts the weighted outputs into probabilities for classification.

The forward pass is the first step of the backpropagation process, and it’s illustrated below:

The data (inputs X1 and X2) is fed to the input layer.
Then, each input is multiplied by its corresponding weight, and the results are passed to the neurons N1X and N2X of the hidden layers, where X takes the values of 1, 2 and 3.
Those neurons apply an activation function to the weighted inputs they receive, and the result passes to the output layer.

Errors Calculation (The Loss Function)

The process continues until the output layer generates the final output (o/p). The output of the network is then compared to the ground truth (desired output), and the difference is calculated, resulting in an error value.

The Backward Pass

This is an actual backpropagation step, and can not be performed without the above forward and the loss function steps. Here is how it works:

The error value obtained previously is used to calculate the gradient of the loss function.
The gradient of the error is propagated back through the network, starting from the output layer to the hidden layers.
As the error gradient propagates back, the weights (represented by the lines connecting the nodes) are updated according to their contribution to the error. This involves taking the derivative of the error with respect to each weight, which indicates how much a change in the weight would change the error.
The learning rate determines the size of the weight updates. A smaller learning rate means than the weights are updated by a smaller amount, and vice-versa.

One common method for error calculation is the Mean Squared Error (MSE), given by:

MSE = (Predicted Output − Actual Output)^2

Weights Update (Optimizer/Optimization Algorithm)

The weights are updated in the opposite direction of the gradient, leading to the name gradient descent. It aims to reduce the error in the next forward pass. This process of forward pass, error calculation, backward pass, and weights update continues for multiple epochs until the network performance reaches a satisfactory level or stops improving significantly. The activation function, through its derivative, plays a crucial role in computing these gradients during backpropagation.

Optimizers are algorithms or methods used to minimize an error function(loss function)or to maximize the efficiency of production. Optimizers help to know how to change weights and learning rate of neural network to reduce the losses. There are different types of optimizers, such as Gradient Descent algorithm, Stochastic Gradient Descent (SGD), Mini-Batch Gradient Descent, SGD with Momentum, Adaptive Gradient Descent (AdaGrad), Root Mean Square Propagation (RMS-Prop), AdaDelta, Adaptive Moment Estimation (Adam).

Advantages of Backpropagation

The key benefits of using the backpropagation algorithm are:

Ease of Implementation: Backpropagation is beginner-friendly, requiring no prior neural network knowledge, and simplifies programming by adjusting weights via error derivatives.
Simplicity and Flexibility: Its straightforward design suits a range of tasks, from basic feedforward to complex convolutional or recurrent networks.
Efficiency: Backpropagation accelerates learning by directly updating weights based on error, especially in deep networks.
Generalization: It helps models generalize well to new data, improving prediction accuracy on unseen examples.
Scalability: The algorithm scales efficiently with larger datasets and more complex networks, making it ideal for large-scale tasks.

Limitations and Challenges

While backpropagation is powerful, it does face some challenges:

Vanishing Gradient Problem: In deep networks, the gradients can become very small during backpropagation, making it difficult for the network to learn. This is common when using activation functions like sigmoid or tanh.
Exploding Gradients: The gradients can also become excessively large, causing the network to diverge during training.
Overfitting: If the network is too complex, it might memorize the training data instead of learning general patterns.

An Example of Backpropagation

Let’s walk through an example of backpropagation in machine learning. Assume the neurons use the sigmoid activation function for the forward and backward pass. The target output is 0.5, and the learning rate is 1.

Here’s how backpropagation is implemented:

The Forward Pass

Initial Calculation

The weighted sum at each node is calculated using:

Sigmoid Function

The sigmoid function returns a value between 0 and 1, introducing non-linearity into the model.

Computing Outputs

At h1 node,

Once, we calculated the a1 value, we can now proceed to find the y3 value:

Similarly find the values of y4 at h2 and y5 at O3,

Errors Calculation (The Loss Function)

Note that, our actual output is 0.5 but we obtained 0.67. To calculate the error, we can use the below formula:

Using this error value, we will be backpropagating.

The Backward Pass

Calculating Gradients

The change in each weight is calculated as:

Output Unit Error

For O3:

Hidden Unit Error

For h1:

For h2:

Weight Updates

The updated weights are illustrated below.

Final Forward Pass:

After updating the weights, the forward pass is repeated.

y3 = 0.57
y4 = 0.56
y5 = 0.61

Since y5 = 0.61 is still not the target output, the process of calculating the error and backpropagating continues until the desired output is reached. This process demonstrates how backpropagation iteratively updates weights by minimizing errors until the network accurately predicts the output. This process is said to be continued until the actual output is gained by the neural network.

Implementation in Python

This code demonstrates how backpropagation is used in a neural network to solve the XOR problem. The neural network consists of:

Input layer with 2 inputs,
Hidden layer with 4 neurons,
Output layer with 1 output neuron.

Key steps:

The Forward Pass: The inputs are passed through the network, activating the hidden and output layers using the sigmoid function.
The Backward Pass (Backpropagation): The errors between the predicted and actual outputs are computed. The gradients are calculated using the derivative of the sigmoid function, and weights and biases are updated accordingly.
Training: The network is trained over 10,000 epochs using the backpropagation algorithm with a learning rate of 0.1, progressively reducing the error.

This implementation highlights how backpropagation adjusts weights and biases to minimize the loss and improve predictions over time.

import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        # Initialize weights
        self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size)
        self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size)

        # Initialize the biases
        self.bias_hidden = np.zeros((1, self.hidden_size))
        self.bias_output = np.zeros((1, self.output_size))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return x * (1 - x)

    def feedforward(self, X):
        # Input to hidden
        self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden
        self.hidden_output = self.sigmoid(self.hidden_activation)

        # Hidden to output
        self.output_activation = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
        self.predicted_output = self.sigmoid(self.output_activation)

        return self.predicted_output

    def backward(self, X, y, learning_rate):
        # Compute the output layer error
        output_error = y - self.predicted_output
        output_delta = output_error * self.sigmoid_derivative(self.predicted_output)

        # Compute the hidden layer error
        hidden_error = np.dot(output_delta, self.weights_hidden_output.T)
        hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)

        # Update weights and biases
        self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate
        self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
        self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate
        self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            output = self.feedforward(X)
            self.backward(X, y, learning_rate)
            if epoch % 4000 == 0:
                loss = np.mean(np.square(y - output))
                print(f"Epoch {epoch}, Loss:{loss}")

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)

# Test the trained model
output = nn.feedforward(X)
print("Predictions after training:")
print(output)

Output:

Epoch 0, Loss:0.26804276270586413
Epoch 4000, Loss:0.012477301332301533
Epoch 8000, Loss:0.0029801470220045504

Predictions after training:
[[0.02330965]
 [0.95658721]
 [0.95049451]
 [0.05896647]]

Conclusion

Backpropagation is the engine that drives neural network learning. By propagating errors backward and adjusting the weights and biases, neural networks can gradually improve their predictions. Though it has some limitations like vanishing gradients, many techniques, such as using ReLU activation or optimizing learning rates, have been developed to address these issues.

Backpropagation, Deep Learning, Neural Networks — Oct 2, 2024

Search

Made with ❤️ and ☀️ on Earth.