Backpropagation in Neural Networks - The Engine Behind Deep Learning
Stephen Cheng
Intro
Backpropagation (short for “Backward Propagation of Errors”) is a method used to train artificial neural networks. Its goal is to reduce the difference between the model’s predicted output and the actual output by adjusting the weights and biases in the network. In this article, we will explore what backpropagation is, why it is crucial in machine learning, and how it works.
What is Backpropagation?
Introduced in the 1970s, the backpropagation algorithm is the method for fine-tuning the weights of a neural network with respect to the error rate obtained in the previous iteration or epoch, and this is a standard method of training artificial neural networks, particularly feed-forward networks. You can think of it as a feedback system where, after each round of training or ‘epoch,’ the network reviews its performance on tasks. It calculates the difference between its output and the correct answer, known as the error.
Backpropagation works iteratively, minimizing the cost function by adjusting weights and biases. In each epoch, the model adapts these parameters, reducing loss by following the error gradient. Backpropagation often utilizes optimization algorithms like gradient descent or stochastic gradient descent. The algorithm computes the gradient using the chain rule from calculus, allowing it to effectively navigate complex layers in the neural network to minimize the cost function.
Why is Backpropagation Important?
Backpropagation plays a critical role in how neural networks improve over time. Here’s why:
Efficient Weight Update: It computes the gradient of the loss function with respect to each weight using the chain rule, making it possible to update weights efficiently.
Scalability: The backpropagation algorithm scales well to networks with multiple layers and complex architectures, making deep learning feasible.
Automated Learning: With backpropagation, the learning process becomes automated, and the model can adjust itself to optimize its performance.
How Does Backpropagation Work?
There are overall four main steps in the backpropagation algorithm:
The Forward Pass
Errors Calculation (The Loss Function)
The Backward Pass
Weights Update (Optimizer/Optimization Algorithm)
Next, let’s understand each of these steps from the above animation.
The Forward Pass
In the forward pass, the input data is fed into the input layer. These inputs, combined with their respective weights, are passed to hidden layers. For example, in a network with two hidden layers (h1 and h2 as shown in Fig.), the output from h1 serves as the input to h2. Before applying an activation function, a bias is added to the weighted inputs. Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which returns the input if it’s positive and zero otherwise. This adds non-linearity, allowing the model to learn complex relationships in the data. Finally, the outputs from the last hidden layer are passed to the output layer, where an activation function, such as softmax, converts the weighted outputs into probabilities for classification.
The forward pass is the first step of the backpropagation process, and it’s illustrated below:
The data (inputs X1 and X2) is fed to the input layer.
Then, each input is multiplied by its corresponding weight, and the results are passed to the neurons N1X and N2X of the hidden layers, where X takes the values of 1, 2 and 3.
Those neurons apply an activation function to the weighted inputs they receive, and the result passes to the output layer.
Errors Calculation (The Loss Function)
The process continues until the output layer generates the final output (o/p). The output of the network is then compared to the ground truth (desired output), and the difference is calculated, resulting in an error value.
The Backward Pass
This is an actual backpropagation step, and can not be performed without the above forward and the loss function steps. Here is how it works:
The error value obtained previously is used to calculate the gradient of the loss function.
The gradient of the error is propagated back through the network, starting from the output layer to the hidden layers.
As the error gradient propagates back, the weights (represented by the lines connecting the nodes) are updated according to their contribution to the error. This involves taking the derivative of the error with respect to each weight, which indicates how much a change in the weight would change the error.
The learning rate determines the size of the weight updates. A smaller learning rate means than the weights are updated by a smaller amount, and vice-versa.
One common method for error calculation is the Mean Squared Error (MSE), given by:
MSE = (Predicted Output − Actual Output)^2
Weights Update (Optimizer/Optimization Algorithm)
The weights are updated in the opposite direction of the gradient, leading to the name gradient descent. It aims to reduce the error in the next forward pass. This process of forward pass, error calculation, backward pass, and weights update continues for multiple epochs until the network performance reaches a satisfactory level or stops improving significantly. The activation function, through its derivative, plays a crucial role in computing these gradients during backpropagation.
Optimizers are algorithms or methods used to minimize an error function(loss function)or to maximize the efficiency of production. Optimizers help to know how to change weights and learning rate of neural network to reduce the losses. There are different types of optimizers, such as Gradient Descent algorithm, Stochastic Gradient Descent (SGD), Mini-Batch Gradient Descent, SGD with Momentum, Adaptive Gradient Descent (AdaGrad), Root Mean Square Propagation (RMS-Prop), AdaDelta, Adaptive Moment Estimation (Adam).
Advantages of Backpropagation
The key benefits of using the backpropagation algorithm are:
Ease of Implementation: Backpropagation is beginner-friendly, requiring no prior neural network knowledge, and simplifies programming by adjusting weights via error derivatives.
Simplicity and Flexibility: Its straightforward design suits a range of tasks, from basic feedforward to complex convolutional or recurrent networks.
Efficiency: Backpropagation accelerates learning by directly updating weights based on error, especially in deep networks.
Generalization: It helps models generalize well to new data, improving prediction accuracy on unseen examples.
Scalability: The algorithm scales efficiently with larger datasets and more complex networks, making it ideal for large-scale tasks.
Limitations and Challenges
While backpropagation is powerful, it does face some challenges:
Vanishing Gradient Problem: In deep networks, the gradients can become very small during backpropagation, making it difficult for the network to learn. This is common when using activation functions like sigmoid or tanh.
Exploding Gradients: The gradients can also become excessively large, causing the network to diverge during training.
Overfitting: If the network is too complex, it might memorize the training data instead of learning general patterns.
An Example of Backpropagation
Let’s walk through an example of backpropagation in machine learning. Assume the neurons use the sigmoid activation function for the forward and backward pass. The target output is 0.5, and the learning rate is 1.
Here’s how backpropagation is implemented:
The Forward Pass
Initial Calculation
The weighted sum at each node is calculated using:
Sigmoid Function
The sigmoid function returns a value between 0 and 1, introducing non-linearity into the model.
Computing Outputs
At h1 node,
Once, we calculated the a1 value, we can now proceed to find the y3 value:
Similarly find the values of y4 at h2 and y5 at O3,
Errors Calculation (The Loss Function)
Note that, our actual output is 0.5 but we obtained 0.67. To calculate the error, we can use the below formula:
Using this error value, we will be backpropagating.
The Backward Pass
Calculating Gradients
The change in each weight is calculated as:
Output Unit Error
For O3:
Hidden Unit Error
For h1:
For h2:
Weight Updates
The updated weights are illustrated below.
Final Forward Pass:
After updating the weights, the forward pass is repeated.
y3 = 0.57
y4 = 0.56
y5 = 0.61
Since y5 = 0.61 is still not the target output, the process of calculating the error and backpropagating continues until the desired output is reached. This process demonstrates how backpropagation iteratively updates weights by minimizing errors until the network accurately predicts the output. This process is said to be continued until the actual output is gained by the neural network.
Implementation in Python
This code demonstrates how backpropagation is used in a neural network to solve the XOR problem. The neural network consists of:
Input layer with 2 inputs,
Hidden layer with 4 neurons,
Output layer with 1 output neuron.
Key steps:
The Forward Pass: The inputs are passed through the network, activating the hidden and output layers using the sigmoid function.
The Backward Pass (Backpropagation): The errors between the predicted and actual outputs are computed. The gradients are calculated using the derivative of the sigmoid function, and weights and biases are updated accordingly.
Training: The network is trained over 10,000 epochs using the backpropagation algorithm with a learning rate of 0.1, progressively reducing the error.
This implementation highlights how backpropagation adjusts weights and biases to minimize the loss and improve predictions over time.
deftrain(self, X, y, epochs, learning_rate): for epoch in range(epochs): output = self.feedforward(X) self.backward(X, y, learning_rate) if epoch % 4000 == 0: loss = np.mean(np.square(y - output)) print(f"Epoch {epoch}, Loss:{loss}")
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1) nn.train(X, y, epochs=10000, learning_rate=0.1)
# Test the trained model output = nn.feedforward(X) print("Predictions after training:") print(output)
Predictions after training: [[0.02330965] [0.95658721] [0.95049451] [0.05896647]]
Conclusion
Backpropagation is the engine that drives neural network learning. By propagating errors backward and adjusting the weights and biases, neural networks can gradually improve their predictions. Though it has some limitations like vanishing gradients, many techniques, such as using ReLU activation or optimizing learning rates, have been developed to address these issues.